zoukankan      html  css  js  c++  java
  • requests+正则表达式爬取ip

     1 #requests+正则表达式爬取ip
     2 #findall方法,如果表达式中包含有子组,则会把子组单独返回出来,如果有多个子组,则会组合成元祖
     3 import requests
     4 import re
     5 def get_ip(url):
     6     headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.75 Safari/537.36 LBBROWSER'}
     7     response = requests.get(url,headers=headers)
     8     pattern= re.compile(r'(?:(?:[0-1]{0,1}d{0,1}d|2[0-4]d|25[0-5]).){3}(?:[0-1]{0,1}d{0,1}d|2[0-4]d|25[0-5]).*s*.*(?:d+)')
     9     result = re.findall(pattern,response.text)
    10     #print(result)
    11     return result
    12 
    13 def make_iplist(iplist,result):
    14 
    15     for ip in result:
    16         ip = re.sub(r'</td>s*.*<td>',':',ip)
    17         iplist.append(ip)
    18     return iplist
    19 
    20 def main(num):
    21 
    22     iplist = []
    23     for i in range(1,num):
    24         url = 'http://www.xicidaili.com/nt/'
    25         url =url + str(num)
    26         #print(url)
    27         result = get_ip(url)
    28         iplist = make_iplist(iplist,result)
    29 
    30     for j in iplist:
    31         print(j)
    32 if __name__ == '__main__':
    33     num=int(input('请输入要抓取的页数:'))
    34     main(num)
  • 相关阅读:
    加密
    读取excel
    poj 1852 Ants
    关于运行时间
    poj 1001 Exponentiation
    Poj 3669 Meteor Shower
    一道简单题目的优化过程——抽签问题
    高精度四则运算
    Usaco_Contest_2013_Open_Bovine Problem 1. Bovine Ballet
    h5 音频 视频全屏设置
  • 原文地址:https://www.cnblogs.com/themost/p/6847730.html
Copyright © 2011-2022 走看看