模拟爬虫引擎绕过一些防火墙
1 #搜索引擎爬虫模拟及模拟真实用户 2 import requests 3 import time 4 5 headers={ 6 'Connection': 'keep-alive', 7 'Cache-Control': 'max-age=0', 8 'Upgrade-Insecure-Requests': '1', 9 #模拟用户 Kit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36 10 #模拟引擎 Mozilla/5.0 (compatible; Baiduspider-render/2.0; +http://www.baidu.com/search/spider.html) 11 #更多爬虫引擎:https://www.cnblogs.com/iack/p/3557371.html 12 'User-Agent': 'Mozilla/5.0 (compatible; Baiduspider-render/2.0; +http://www.baidu.com/search/spider.html)', 13 'Sec-Fetch-Dest': 'document', 14 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9', 15 'Sec-Fetch-Site': 'none', 16 'Sec-Fetch-Mode': 'navigate', 17 'Sec-Fetch-User': '?1', 18 'Accept-Encoding': 'gzip, deflate, br', 19 'Accept-Language': 'zh-CN,zh;q=0.9,en-US;q=0.8,en;q=0.7', 20 'Cookie': 'xxx',#根据当前访问cookie 21 } 22 23 for paths in open('php_b.txt',encoding='utf-8'): 24 url='http://192.168.0.103:8081/' 25 paths=paths.replace(' ','') 26 urls=url+paths 27 #如需测试加代理,或加入代理池需加代理 28 proxy = { 29 'http': '127.0.0.1:7777' 30 } 31 try: 32 code=requests.get(urls,headers=headers,verify=False).status_code 33 print(urls+'|'+str(code)) 34 if code==200 or code==403: 35 print(urls+'|'+str(code)) 36 except Exception as err: 37 print('connecting error') 38 #time.sleep(3) 模拟用户需延时 引擎可用可不用(根据请求速度)
目前测试:
安全狗:爬虫引擎对安全狗有效
阿里云:延时或者代理池,爬虫引擎对阿里云无效 延迟设置3秒有效,2秒都不行
宝塔:黑名单各种扫描软件,awvs,nmap,等;爬虫位置;延迟或者代理池可以绕过 延迟设置2秒左右
60秒内6次恶意请求封IP600秒 //绕过,字典60秒5次,或者加干扰.bak.和文件上传的绕过原理差不多