lucene Ë÷ÒýHTMLÎĵµ
ÉîδÀ´¼¼Êõ
1¡¢´ó²¿·ÖWEBÎĵµ²ÉÓÃHTML¸ñʽ¡£
2¡¢±¾ÀýÓÃÈçÏÂHTMLÎĵµ
<html>
<head>
<title>
Laptop power supplies are avaliable in First class only
</title>
</head>
<body>
<h1>code,write,fly</h1>
</body>
</html>
3¡¢Ê¹ÓÃJTidy
JTidyÓÉAndy Quick±àдµÄTidyµÄJava°æ±¾¡£
public class JTidyHTMLHandler implements DocumentHandler{
publicorg.apache.lucene.document.Document getDocument(InputStreamis)
throwsDocumentHandlerException{ //´«ÈëÒ»¸ö´ú±íHTMLÎĵµµÄInputStream¶ÔÏó
Tidy tidy=new Tidy();
tidy.setQuiet(true);
tidy.setShowWarnings(false);
//½âÎö´ú±íHTMLÎĵµµÄInputStream¶ÔÏó
org.w3c.dom.Documentroot=tidy.parseDOM(is,null);
ElementrawDoc=root.getDocumentElement();
org.apache.lucene.document.Document doc=neworg.apache.lucene.document.Document();
Stringtitle=getTitle(rawDoc);//»ñµÃ±êÌâ
Stringbody=getBody(rawDoc);//»ñµÃ<body>ºÍ</body>Ö®¼äËùÓÐÔªËØ
if((title!=null)&&(!title.equals(""))) {
doc.add(Field.Text("title",title));
}
if((body!=null)&&(!body.equals(""))){
doc.add(Field.Text("body",body));
}
return doc;
}
protected String getTitle(Element rawDoc){
if(rawDoc==null){
returnnull;
}
Stringtitle="";
NodeListchildren=rawDoc.getElementsB
Ïà¹ØÎĵµ£º
×ªÔØ£ºhttp://jiangzhengjun.javaeye.com/blog/480996
ʼþ
DOMͬʱ֧³ÖÁ½ÖÖʼþģʽ£º²¶»ñÐÍʼþºÍðÅÝÐÍʼþ£¬µ«ÊÇ£¬²¶»ñÐÍʼþÏÈ·¢Éú¡£Á½ÖÖʼþÁ÷»á´¥¼°DOMÖеÄËùÓжÔÏ󣬴Ódocument¶ÔÏó¿ª
ʼ£¬Ò²ÔÚdocument¶ÔÏó½áÊø£¨´ó²¿·Ö¼æÈݱê×¼µÄä¯ÀÀ»á¼ÌÐø½«Ê¼þ²¶»ñ/ðÅÝÑÓÐøÖÁwindow¶ÔÏ󣩣¬DOMÖеÄÔªËØ¶¼»áÁ¬ÐøÊÕµ½Á½´ÎÊ ......
<html>
<frameset rows="10%,*">
<frame src="http://g.cn" scrolling="no">
<frameset cols="25%,*">
<frame src="http://g.cn" scrolling="no">
<frameset rows="10%,*">
<frame src="http://g.cn" scrolling="no">
......
±¾½ÚÖеÄÖ÷ÌâÃèÊöÈçºÎʹÓà Visual Web Developer ¹¤¾ßÏäµÄ“HTML”Ñ¡ÏÉ쵀 ASP.NET Web ·þÎñÆ÷¿Ø¼þ¡£
ĬÈÏÇé¿öÏ£¬·þÎñÆ÷ÎÞ·¨Ê¹Óà ASP.NET ÍøÒ³É쵀 HTML ÔªËØ£»ÕâÐ©ÔªËØ±»ÊÓΪ´«µÝ¸øä¯ÀÀÆ÷µÄ²»Í¸Ã÷Îı¾¡£µ«ÊÇ£¬Í¨¹ý½« HTML ÔªËØ×ª»»³É HTML ·þÎñÆ÷¿Ø¼þ£¬¿ÉÒÔ½«ËüÃǹ«¿ªÎª ......
×÷Õߣº¹â½ÅѾ˼¿¼ ʱ¼ä£º12/23/2009 1:51:00 PM
Ò»¿ªÊ¼¾Í¾õµÃHTML±à¼Æ÷ÕâÍæÒâÓ¦¸ÃÊǺܸßÉîβâµÄ¡£ËæËæ±ã±ã¾ÍÏëÕûÒ»¸öÓ¦¸Ã²»ÊÇÒ»¼þÈÝÒ×µÄÊÂÇé¡£ºóÀ´¶ÔWebBrowser¿Ø¼þÓÐÁËһЩÁ˽⣬²»¹ý¶¼ÊǺܷôdzµÄÁ˽⡣ֻ֪µÀÓÃÕâ¸ö¿Ø¼þ¾ÍÄܹ»ÔÚ×Ô¼ºµÄ³ÌÐòÖиãÒ»¸öWEBä¯ÀÀÆ÷Ö®ÀàµÄ¶«Î÷£¬´ÓÀ´Ã»ÓÐÏë¹ýHTML±à¼Æ÷Ò²¿ÉÒÔʹÓÃÕâ¸ö¿Ø¼þÀ´ÊµÏ ......
Window ¶ÔÏó
Window ¶ÔÏóÊÇ JavaScript ²ã¼¶ÖеĶ¥²ã¶ÔÏó¡£
Window ¶ÔÏó´ú±íÒ»¸öä¯ÀÀÆ÷´°¿Ú»òÒ»¸ö¿ò¼Ü¡£
Window ¶ÔÏó»áÔÚ <body> »ò <frameset> ÿ´Î³öÏÖʱ±»×Ô¶¯´´½¨¡£
ÓÐ¹Ø Window ¶ÔÏóµÄÏêϸÃèÊö¡£
IE: Internet Explorer, F: Firefox, O: Opera.
Window ¶ÔÏóµÄ¼¯ºÏ
CollectionDescriptionIEFO
fr ......