zoukankan html css js c++ java

Python爬虫重写。

有想法重写了。把一些结构写出来。。

class Crawler(object):
   def __init__(self,url,depth,threadNum,dbfile,key):
      #要获取url的队列
      self.urlQueue = Queue()
      #读取的html队列
      self.htmlQueue = Queue()
      #已经访问的url
      self.readUrls = []
      #未访问的链接
      self.links = []
      #线程数
      self.threadNum = threadNum
      #数据库文件名
      self.dbfile = dbfile
      #创建存储数据库对象
      self.dataBase = SaveDataBase(self.dbfile)
      #指点线程数目的线程池
      self.threadPool = ThreadPool(self.threadNum)
      #初始化url队列
      self.urlQueue.put(url)
      #关键字,使用console的默认编码来解码
      self.key = key.decode(getdefaultlocale()[1])
      #爬行深度
      self.depth = depth
      #当前爬行深度
      self.currentDepth = 1
      #当前程序运行状态
      self.state = False

查看全文

相关阅读:
【关系抽取-mre-in-one-pass】加载数据（一）
google colab上如何下载bert相关模型
 【关系抽取-R-BERT】定义训练和验证循环
 【关系抽取-R-BERT】模型结构
 【关系抽取-R-BERT】加载数据集
 【python刷题】关于一个序列的入栈出栈有多少种方式相关
 【python刷题】二维数组的旋转
 transformer相关变体
 数据结构与算法：树
 数据结构与算法：哈希表

原文地址：https://www.cnblogs.com/xiaoCon/p/3594474.html