Ò׽ؽØÍ¼Èí¼þ¡¢µ¥Îļþ¡¢Ãâ°²×°¡¢´¿ÂÌÉ«¡¢½ö160KB

lucene Ë÷ÒýHTMLÎĵµ ÉîδÀ´¼¼Êõ


1¡¢´ó²¿·ÖWEBÎĵµ²ÉÓÃHTML¸ñʽ¡£
2¡¢±¾ÀýÓÃÈçÏÂHTMLÎĵµ
<html>
   <head>
      <title>
         Laptop power supplies are avaliable in First class only
       </title>
   </head>
    <body>
       <h1>code,write,fly</h1>
   </body>
</html>
3¡¢Ê¹ÓÃJTidy
JTidyÓÉAndy Quick±àдµÄTidyµÄJava°æ±¾¡£
public class JTidyHTMLHandler implements DocumentHandler{
   publicorg.apache.lucene.document.Document getDocument(InputStreamis) 
      throwsDocumentHandlerException{ //´«ÈëÒ»¸ö´ú±íHTMLÎĵµµÄInputStream¶ÔÏó
   Tidy tidy=new Tidy();
   tidy.setQuiet(true);
   tidy.setShowWarnings(false);
  //½âÎö´ú±íHTMLÎĵµµÄInputStream¶ÔÏó
   org.w3c.dom.Documentroot=tidy.parseDOM(is,null);
   ElementrawDoc=root.getDocumentElement();
  
  org.apache.lucene.document.Document doc=neworg.apache.lucene.document.Document();
   Stringtitle=getTitle(rawDoc);//»ñµÃ±êÌâ
   Stringbody=getBody(rawDoc);//»ñµÃ<body>ºÍ</body>Ö®¼äËùÓÐÔªËØ
   if((title!=null)&&(!title.equals("")))  {
     doc.add(Field.Text("title",title));
   }
   if((body!=null)&&(!body.equals(""))){
      doc.add(Field.Text("body",body));
   }
   return doc;
 }
 protected String getTitle(Element rawDoc){
    if(rawDoc==null){
        returnnull;
    }
   
    Stringtitle="";
    NodeListchildren=rawDoc.getElementsB


Ïà¹ØÎĵµ£º

HTMLÏà¶Ô·¾¶ Éϼ¶Ä¿Â¼¼°Ï¼¶Ä¿Â¼µÄд·¨


 ÈçºÎ±íʾÉϼ¶Ä¿Â¼
../±íʾԴÎļþËùÔÚĿ¼µÄÉÏÒ»¼¶Ä¿Â¼£¬../../±íʾԴÎļþËùÔÚĿ¼µÄÉÏÉϼ¶Ä¿Â¼£¬ÒÔ´ËÀàÍÆ¡£
¼ÙÉèinfo.html·¾¶ÊÇ£ºc:\Inetpub\wwwroot\sites\blabla\info.html
¼ÙÉèindex.html·¾¶ÊÇ£ºc:\Inetpub\wwwroot\sites\index.html
ÔÚinfo.html¼ÓÈëindex.html³¬Á´½ÓµÄ´úÂëÓ¦¸ÃÕâÑùд£º
<a href ......

html ÖÐÀûÓÃjsµ÷ÓÃÒþ²ØdivÄ£·Â¶Ô»°¿òСÀý

<html>
<head>
    <script>  
  function   locking(){  
  document.all.ly.style.display="block";  
  document.all.ly.style.width=document.body.clientWidth;  
  document.all.ly.style.height ......

Óà HTML Canvas ´´½¨Ò»¸öͼƬä¯ÀÀÆ÷


ÑÕ ÁÖ
, Èí¼þ¹¤³Ìʦ, IBM
2009 Äê 12 ÔÂ 10 ÈÕ
HTML
5 ÒýÈëÁËеĵÄÍøÒ³ÔªËØ£º<canvas>¡£Canvas ÊÇһƬ¿Õ°×µÄ»æÍ¼ÇøÓò£¬ÍøÒ³¿ª·¢Õß¿ÉÒÔÀûÓà JavaScript
ÔÚ¸ÃÇøÓòÖÐ×ÔÓɵؽøÐÐ 2D »æÍ¼¡£Canvas ¿ÉÒÔÓÃÓÚäÖȾ»ªÀöµÄÍøÒ³Éè¼Æ½çÃæ¡£±¾ÎÄͨ¹ýÒ»¸öÏêϸµÄʵÀýÀ´ËµÃ÷ÈçºÎÓà Canvas
À´ÖÆ×÷Ò»¸öͼƬµÄä¯ÀÀÆ÷¡£×îÖÕµÄÐ ......

HTML DOM Document ¶ÔÏó


Document ¶ÔÏó
Document ¶ÔÏó´ú±íÕû¸ö HTML Îĵµ£¬¿ÉÓÃÀ´·ÃÎÊÒ³ÃæÖеÄËùÓÐÔªËØ¡£
Document ¶ÔÏóÊÇ Window ¶ÔÏóµÄÒ»¸ö²¿·Ö£¬¿Éͨ¹ý window.document ÊôÐÔÀ´·ÃÎÊ¡£
ÓÐ¹Ø Document ¶ÔÏóµÄÏêϸÃèÊö¡£
IE: Internet Explorer, F: Firefox, O: Opera, W3C: World Wide Web Consortium (Internet ±ê×¼).
Document ¶ÔÏóµÄ¼¯º ......

¹ýÂËhtmlÎı¾·½·¨

        public static string filterStr(string html)
        {
            System.Text.RegularExpressions.Regex regex1 = new System.Text.RegularExpressions.Regex(@"<s ......
© 2009 ej38.com All Rights Reserved. ¹ØÓÚE½¡ÍøÁªÏµÎÒÃÇ | Õ¾µãµØÍ¼ | ¸ÓICP±¸09004571ºÅ