zoukankan html css js c++ java

BS4爬虫实例应用-CISP

爬取目前在官网可查询的CISP证书编号以及有效期并入库

也算是暴力破解，burp使用grep功能呢也可以实现。

下面是python的代码

#coding=utf-8
import requests
import sys
from bs4 import BeautifulSoup
#demourl='http://www.itsec.gov.cn/export/sites/itsec/person/peregester/CNITSEC2012CISE01098/'
counter = 1 
for i in range(2000,2017):
    for t in ['CISE','CISA','CISO','CISM','CISE-E','CISO-E','CISM-E','CISA-E','CISP-Auditor']:
        for j in range(10000):
            SNum = "CNITSEC"+str(i)+t+"0"+str(j).zfill(4)
            url = "http://www.itsec.gov.cn/export/sites/itsec/person/peregester/%s/"% SNum
            print counter , SNum ,'  Checking .........'
            try:
                res = requests.get(url)
                res.encoding = 'utf-8'
                soup = BeautifulSoup(res.text,'html.parser')
                clength   = res.headers['content-length']

                if 200<= int(res.status_code) <=210 :
                    itsecid   = soup.select('.detail_title')[0].text.encode('gb2312','ignore').strip()
                    starttime = soup.select('.tdm')[0].text.encode('utf-8','ignore').strip().replace("
","").replace("                ","")
                    endtime   = soup.select('.tdm')[1].text.encode('utf-8','ignore').strip().replace("
","").replace("                ","")
                    username  = soup.select('.tdm')[2].text.encode('utf-8','ignore').strip()
                    authlevel = soup.select('.tdm')[3].text.encode('utf-8','ignore').strip()
                    print clength
                    print itsecid
                    print starttime
                    print endtime
                    print username
                    print authlevel
                    with open('cispall.txt','a') as f:
                        f.writelines("%s%s%s%s%s  %s
"%(itsecid,starttime,endtime,username,authlevel,clength))
                else:
                    print SNum ,'Non-existent ########'
                counter+=1
            except:
                info=sys.exc_info()
                print 'except error'
                print info[0],":",info[1]

过程：

根据分割特点可入库存储

查看全文

相关阅读:
JavaScript使用DeviceOne开发实战（四）仿优酷视频应用
 JavaScript使用DeviceOne开发实战（三）仿微信应用
 JavaScript使用DeviceOne开发实战（二）生成调试安装包
 生成器
 Python第一周的复习
 Ubuntu学习—-——第一课
 英语缩写 (四)
英语缩写（三）
英语常用缩写（二）
英语常用缩写（一）

原文地址：https://www.cnblogs.com/shellr00t/p/Crawler.html