¡¾Nutch¡¿LinuxÏÂÓ¦ÓÃnutch 1.0 Webǰ¶ËʵÏÖµ¥»ú¼ìË÷
nutchµÄÅÀ³æºÍËÑË÷¿ÉÒÔ˵ÊÇ·ÖÀëµÄÁ½¿é£¬ÅÀ³æ¿ÉÒÔÊÇM/R×÷Òµ£¬µ«ËÑË÷²»ÊÇM/R×÷Òµ¡£ËÑË÷ÓÐÁ½ÖÖ·½Ê½£ºÒ»Êǽ«ÅÀ³æÊý¾Ý(»òÕß³ÆË÷ÒýÊý¾Ý)·ÅÔÚ±¾µØÓ²ÅÌ£¬½øÐÐËÑË÷¡£¶þÊÇÖ±½ÓËÑË÷HDFSÖеÄÅÀ³æÊý¾Ý¡£
ÕâÀï½éÉÜÈçºÎʹÓÃnutch-1.0µÄWEBǰ¶Ë¼ìË÷±¾µØÅÀ³æÊý¾Ý£º
(1)NutchµÄËÑË÷¿ÉÒÔ¶ÀÁ¢ÓÚhadoop¼¯Èº£¬Ö»Òª½«ÅÀ³æÏÂÀ´µÄÊý¾Ýcopyµ½ÈκλúÆ÷£¬ÔÚ´Ë»úÆ÷Éϰ²×°Ò»¸ötomcat£¬²¢ÔËÐÐnutch×Ô´øµÄWEBǰ¶Ë³ÌÐò²¢×öÏàÓ¦ÅäÖ㬾ͿÉʵÏÖËÑË÷¡£
(2)½«Ê¹ÓÃÃüÁîbin/nutch crawl -dir data -depth 3 -topN 5ÅÀ³æÏÂÏÂÀ´µÄÊý¾Ýdata·ÅÔÚ±¾µØÄ³Ä¿Â¼Ï£¨Èç¹ûÊÇ·Ö²¼Ê½ÅÀ³æ£¬¿ÉÒÔʹÓÃÃüÁî" bin/hadoop dfs -copyfromLocal data ±¾µØÄ¿Â¼" ½«ÅÀ³æÊý¾Ýdata¸´ÖƵ½±¾µØÄ¿Â¼£©£¬ÀýÈ罫Éú³ÉµÄdataĿ¼¸´ÖƵ½/home/nutch/nutchinstall/crawltest/Ŀ¼Ï¡££¨°²È«Æð¼û£¬ÇëÈ·±£Ä¿Â¼Â·¾¶ÖÐûÓпոñ£¬Õâ¸ö¿ÉÄÜÓÐÓ°Ï죩¡£
˵Ã÷£º
dataĿ¼ÊÇÅÀ³æÉú³ÉµÄĿ¼£¬ÏÂÃæÓÐÕâЩ×ÓĿ¼£ºcrawldb,index,indexes,linkdb,segments
(3)°²×°tomcat£¬ÇëÈ·±£°²×°Â·¾¶Ã»Óпոñ£¬ÕâºÜÖØÒª£¬ÔÚwindowsÉÏÒòΪÓпոñµ¼ÖÂËÑË÷½á¹ûʼÖÕΪ0.
(4)½«NutchÖ÷Ŀ¼ÏµÄWEBǰ¶Ë³ÌÐònutch-1.0.war¸´ÖƵ½ /usr/program/apache-tomcat-6.0.18/webapps/Ŀ¼ÏÂ(apache°²×°Ä¿Â¼ÊÇ/usr/program/apache-tomcat-6.0.18)
(5)ä¯ÀÀÆ÷ÖÐÊäÈëhttp://localhost:8080/nutch-1.0£¬½«×Ô¶¯½âѹnutch-1.0.war¡£
(6)ÅäÖÃWEBǰ¶Ë³ÌÐòÖеÄnutch-site.xmlÎļþ£¬ÅäÖÃÍê³Éºó±ØÐëÖØÆôtomcat(/usr/program/apache-tomcat-6.0.18/bin/shutdown.sh,È»ºóÔÚstart.sh)¡£
nutch-site.xmlÔÚĿ¼/usr/program/apache-tomcat-6.0.18/webapps/nutch-1.0/WEB-INF/classes/Ï£¬
ÅäÖÃÈçÏ£º
<property>
<name>http.agent.name</name> ²»¿ÉÉÙ£¬·ñÔòÎÞËÑË÷½á¹û
<value>nutch-1.0</value>
<description>HTTP 'User-Agent' request header.</description>
</property>
<property>
<name>http.robots.agents</name>
<value>nutch-1.0,*</value>
<description>The agent strings we'll look for in robots.txt files,
comma-separated, in decreasing order of precedence. You should
put the value of http.agent.name as the first agent name, and keep the
default * at the end of the li
Ïà¹ØÎĵµ£º
¡¾51CTO¾«Ñ¡ÒëÎÄ¡¿¶àÊýÓû§ÔÚ¸Õ¸Õ½Ó´¥Ä³¸ö×ÀÃæÏµÍ³Ê±£¬ÍùÍùÈÝÒ×·¸ºÜ¶à´íÎó¡£²»¹ýÈç¹ûËûÃÇ´ÓÒ»¿ªÊ¼¾ÍÁ˽âÄÄЩÊǹ²ÐԵĴíÎ󣬾ͿÉÒÔ°ÑÊÜ´ìµÄ¸Ð¾õ½µÖÁ×îµÍ¡£ÒÔÏÂÊÇÐÂÓû§ÔÚʹÓÃijЩ³£ÓÃLinux×ÀÃæÏµÍ³Ê±ÈÝÒ×·¸µÄÊ®´ó´íÎó¡£
1¡¢ÑÓÐøÊ¹ÓÃWindowsµÄ˼ά ÕâÊÇÒ»¸öÔÙÃ÷ÏÔ²»¹ýµÄ´íÎ󡣯ÕͨÓû§Í¨³£Òâʶ²»µ½×Ô¼ºÔÚʹÓÃÒ»¸öÍêÈ«² ......
1. #vi /etc/profile£¬Ôö¼Óulimit -HSn 65536
2. #vi /etc/security/limits.conf£¬ÎªÄãµÄÓû§Ôö¼Ó
user soft nofile 38192
user hard nofile 65536
3. ÖØÆôShell¼´¿É
4. ulimit -a ²é¿´ÉèÖÃÊÇ·ñ³É¹¦¡£
5. ²é¿´ÏµÍ³×ÜÏÞÖÆµÄÃüÁ
#cat /proc/sys/fs/file-max
6 ......
1. ÍøÂç×Óϵͳ
* ϵͳµ÷ÓýӿÚ
* ÐÒéÎ޹زã
* ÐÒéʵÏÖ²ã
* Çý¶¯Î޹زã
* Çý¶¯³ÌÐò²ã
×¢£ºµ÷ÓýӿÚ<->ÐÒé²ã<->Çý¶¯³ÌÐò
2. Íø¿¨Çý¶¯³ÌÐò
* λÓÚÊý¾ÝÁ´Â·²ã
3. ¹Ø¼üÊý¾Ý½á¹¹
* struct net_device
* struct sk_buffer
4. ³õʼ»¯
* Éú³Énet_device£¬²¢³õʼ»¯Æä³ÉÔ±
* ¸ù¾ÝÐèÒª£¬¶ÔÍø¿¨×ÔÉí¼Ä´æÆ÷ ......
±êÌ⣺ÅäÖÃVNC·þÎñʵÏÖºìÆìLinuxÔ¶³Ì×ÀÃæ·ÃÎÊ
ÄÚÈݼò½é£º
VNC (Virtual Network Computing)ÊÇÐéÄâÍøÂç¼ÆËã»úµÄËõд£¬ÊÇÒ»¿îÓÅÐãµÄÔ¶³Ì¿ØÖƹ¤¾ßÈí¼þ£¬ÓÉÖøÃûµÄAT&TµÄÅ·ÖÞÑо¿ÊµÑéÊÒ¿ª·¢¡£
ÏÂÃæ½éÉÜÔÚ“ºìÆìLinux DC Server 5.0”ºÍ“ºìÆìLinux ×ÀÃæ°æ 6.0”²Ù×÷ϵͳÖУ¬ÅäÖÃVNC·þÎñ£¬ÊµÏÖ¿Í»§¶ËÒ ......
LINUX³£ÓÃÃüÁ»ù´¡£©
1. man ¶ÔÄãÊìϤ»ò²»ÊìϤµÄÃüÁîÌṩ°ïÖú½âÊÍ
eg:man ls ¾Í¿ÉÒԲ鿴lsÏà¹ØµÄÓ÷¨
×¢£º°´q¼ü»òÕßctrl+cÍ˳ö£¬ÔÚlinuxÏ¿ÉÒÔʹÓÃctrl+cÖÕÖ¹µ±Ç°³ÌÐòÔËÐС£
2. ls ²é¿´Ä¿Â¼»òÕßÎļþµÄÊô*£¬ÁоٳöÈÎһĿ¼ÏÂÃæµÄÎļþ
eg: ls /usr/man
ls -l
a.d±íʾĿ¼(directory)£¬Èç¹ûÊÇÒ»¸ö"-"±íʾÊÇÎļþ£¬Èç¹û ......