»°ËµPython£¨ËÄ£©»¶ÓСÂéȸ
С°×ÊǸö΢ÈíÃÔ£¬ËûµÄżÏñÊDZȶû´óÊ壬ÔÒòµ±È»ÊǵØÇòÈ˶¼ÖªµÀÀ²¡£´ó¶þµÄʱºò£¬ËûµÄ“ê¡Ñ§¼Æ»®”ÔøÒ»¶ÈµÃ³Ñ£¬ÔÒòÊÇËû¹Ò¿ÆÌ«¶à¡£µ±È»£¬´óÈýÐÂѧÆÚ¿ªÊ¼µÄʱºò£¬Ãæ¶Ô¹«ÖÚÖÊÒÉ£¬Ð¡°×Õ¾ÔÚÒÎ×ÓÉÏ£¬Ïñ¼«ÁË¡¶´óÄÚÃÜ̽ÁãÁã·¢¡·ÀïµÄÎ÷ÃÅ´µÑ©£º“ÊÀ½çÊ׸»±È²»Ò»¶¨Óжà³öÉ«£¬ÕâÖ»²»¹ýÊÇÄãÃÇÕâЩÐǶ·ÊÐÃñÒ»ÏáÇéÔ¸µÄÏë·¨°ÕÁË¡£”
ÿÌìС°×¶¼»áÔÚËÞÉá×ªÓÆ£¬ºÃÏñºÜ“¹Â¶À”£¬×ìÀïÄîÄîÓдʣº“Õâ¸öÊÀ½çÕýÔÚ·¢Éú×Å·Ì츲µØµÄ±ä»¯£¬¶øÎÒÃÇÈ´ÏñÃ«Â¿ËÆµÄÉú»î¡£”×îºó£¬Ëû×Ü»áÀ´Ò»¾ä£º“ÎÒÒª³ÉÁ¢µÚ¶þ¸ö¹È¸è£¡”
Õâ½Ú¿Î£¬ÎÒÃǾͻáÁ˽âËÑË÷ÒýÇæ£¬»¹»á±àдһ¸öСÐ͵ÄÍøÂçÅÀ³æ¡£
ËÑË÷ÒýÇæÓÐÄ¿·Ö¹¹³É£¿
Ê×ÏȸÐлÕâÕÅͼµÄÔ×÷Õߣ¬Ö÷Òª»¹ÊÇÒª¸ÐлCountry¡£Í¨¹ýÕâÕÅͼ£¬ÎÒÃÇ¿ÉÒÔ¿´µ½£ºÊ×ÏÈ£¬ÍøÂçÖ©Öë×¥È¡ÍøÒ³£¬½«ÍøÒ³ÄÚÈݼ°Á´½Ó´æµ½Êý¾Ý¿âÖС£È»ºóÓÉË÷ÒýÄ£¿é½¨Á¢¹Ø¼ü´Êµ½ÍøÖ·µÄË÷Òý£¬¹©¼ìË÷Ä£¿é²éѯ¡£¼ìË÷Ä£¿éÊǸù¾ÝÄãÊäÈëµÄÄÚÈÝ´ÓË÷ÒýÊý¾Ý¿âÌáÈ¡Êý¾Ý¡£Ö÷Ҫģ¿é½éÉÜÈçÏ£º
ÍøÒ³×¥È¡Ä£¿é£º°üÀ¨CrawlerºÍCrawler control£¬ÆäÖÐCrawler¸ºÔðץȡ²¢·ÖÎöÍøÒ³Á´½Ó£¬·µ»ØpageºÍurl£»Crawler control¸ºÔð¿ØÖÆ¡¢µ÷¶ÈCrawler¡£
ÍøÒ³´æ´¢Ä£¿é:Page cache£¬ÓÃÓÚ´æ´¢Crawlerץȡµ½µÄÍøÒ³ÄÚÈÝ¡£
Ë÷ÒýÄ£¿é:½¨Á¢¹Ø¼ü´Êµ½Á´½ÓºÍÍøÒ³µÄË÷Òý¡£
¼ìË÷Ä£¿é£º½«Òª²éѯµÄÄÚÈÝ·Ö½âΪÊʺϲéѯµÄ´Ê¡£
Óû§½Ó¿Ú£º½ÓÊÜÓû§ÊäÈ룬´«µÝµ½¼ìË÷Ä£¿é¡£
½ÓÏÂÀ´µÄ¿Î³ÌÀïÎÒÃÇ»á¸ù¾ÝËùѧµÄPython֪ʶ¿ª·¢Ò»¸öСÐ͵ÄËÑË÷ÒýÇæ¡£Ãû×Ö½ÐSparrow¼´Âéȸ£¬Òâ˼ÊÇ“ÂéȸËäС£¬ÎåÔà¾ãÈ«”¡£ÎÒÃǵēÂéȸ”»áËæ×ÅÎÒÃÇ֪ʶµÄÔö¼Ó¶øÔ½·ÉÔ½¸ß£¬Ëµ²»¶¨»á±ä³É·ï»ËÄØ¡£µ±È»£¬ÏÖÔÚËü»¹Ã»ÓÐÆð·É¡£
ÈÃÎÒÃÇ¿ªÊ¼ËÑË÷ÒýÇæÖ®Âðɣ¡
Ê×ÏÈÎÒÃÇҪѧϰµÄÄ£¿éÊÇÍøÒ³×¥È¡Ä£¿é£¨Crawler£©£¬ÓÖ½Ð×öÍøÂçÖ©Ö루Spider£©¡£
Õâ¸öÄ£¿éÓÉCrawlerÀàÀ´Íê³É£¬¸ÃÀà³õʼ»¯Ê±Ê×ÏȽÓÊÜCrawler controlÄ£¿é´«µÝµÄurl£¬Ö´ÐÐÍê±Ï×îºó·µ»ØÍøÒ³ÄÚÈÝpageºÍÍøÒ³ÄÚ³öÏÖµÄurlÁ´½Ólink¡£Ô´ÂëÈçÏ£º
import urllib.request #ÓÃÓÚ»ñÈ¡ÍøÒ³ÄÚÈÝ
import urllib.parse #½âÎöÍøÖ·µÄÄ£¿é
import re #ÕýÔò±í´ïʽ
import queue #²Ù×÷¶ÓÁеÄÄ£¿é
class Crawler(object): #ÍøÂç×
Ïà¹ØÎĵµ£º
import urllib2
import time
import socket
from datetime import datetime
from thread_pool import *
def main():
url_list = {"sina":"http://www.sina.com.cn",
"sohu":"http://www.sohu.com",
"yahoo":"http://www.yahoo.com",
"xiaonei":"http://www.x ......
1£® Ê×ÏȾÍÊÇÔÚ±àÒëÆ÷ÖаÑpython°²×°Ä¿Â¼include/Óëlibs/¼ÓÈ룬¶ÔÓÚÕâµãÎÒÔÚvc6ÖпÉÒÔ£¬µ«ÊÇÔÚdev c++Öм´Ê¹¼ÓÈëÁ˱àÒëÒ²»á³ö´í£¬ËµÕÒ²»µ½pythonÍ·Îļþ£¬Õâµã±È½ÏÓôÃÆ£¬²»¹ý¿¼Âǵ½Ò»°ãwindows±à³Ì¶¼ÓõÄÊÇvc£¬ËùÒÔ²¢Ã»ÓÐʲôӰÏì°É£¡£¡£¡
È»ºóÓÃ#include <Python.h>¾Í¿ÉÒÔ°ÑpythonµÄÖ÷Í·Îļþ°üº¬½øÀ´ÁË¡£
µ«Ê ......
import urllib
from HTMLParser import HTMLParser
class TitleParser(HTMLParser):
def __init__(self):
self.title = ''
self.divcontent = ''
self.readingtitle = 0
self.readingdiv = 0
HTMLParser.__init__(self)
def handle_starttag(self, tag, attrs):
......
pythonµÄ±ä²Î
*argsºÍ**dargsÊÇPythonµÄÁ½¸ö¿É
±ä²ÎÊý£¬Á½ÕßÓÐËù²»Í¬µÄÊÇ*argsÊǸötuple£¬**dargsÊǸödict¡£
*args
ºÍ**dargs²¢ÓÃʱ£¬*args±ØÐë·ÅÔÚ**dargsµÄÇ°Ãæ¡£
ÀýÈ磺
def func(a,b, *c):
pass
º¯ÊýfuncÖÁÉÙÓÐÁ½¸ö²ÎÊý±ä²ÎÊý·ÅÔÚtuple cÖÐ
def func(*c): »òÕß def func(**d ......
#from pp3e Chapter 9.3
#############################################################################
# popup three new window, with style
# destroy() kills one window, quit() kills all windows and app; top-level
# windows have title, icon, iconify/deiconify and protocol for wm events;
# there ......