zoukankan      html  css  js  c++  java
  • 原创:Python爬虫实战之爬取代理ip

      编程的快乐只有在运行成功的那一刻才知道QAQ

      目标网站:https://www.kuaidaili.com/free/inha/  #若有侵权请联系我

      因为上面的代理都是http的所以没写这个判断

      代码如下:

     1 #!/usr/bin/env python
     2 # -*- coding: utf-8 -*-
     3 import urllib.request
     4 import re
     5 import time
     6 n = 1
     7 headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}
     8 def web(url):
     9     req=urllib.request.Request(url=url,headers=headers)
    10     response = urllib.request.urlopen(url)
    11     html = response.read().decode('UTF-8','ignore')
    12     ip = r'[0-9]+(?:.[0-9]+){3}'
    13     port = r'"PORT">(d{0,1}d{0,1}d{0,1}d{0,1}d)<'
    14     out = re.findall(ip,html)
    15     out1 = re.findall(port,html)
    16     i = 0
    17     dictionary = {}
    18     while i <= 14:
    19         dictionary[0] = (out[i],out1[i])
    20         store(dictionary)
    21         i += 1
    22     print(out,'
    ',out1)
    23 def store(dictionary):
    24     with open('ip.txt','a') as f:
    25         c = 'ip:' + dictionary[0][0] + '	port:' + dictionary[0][1] + '
    '
    26         f.write(c)
    27         print('store successfully')        
    28 while n <= 3313:
    29     url1 = "https://www.kuaidaili.com/free/inha/"
    30     url = url1 + str(n) +'/'
    31     web(url)
    32     time.sleep(5)
    33     n += 1
  • 相关阅读:
    Django 前戏
    SQL基本语句
    如何正确安装Mysql
    JQuery
    解疑答惑—解决脱离标准文档流(恶心的浮动)
    事件
    卷基于快照进行恢复
    centos7下Firewall使用详解
    基于镜像卷启动的虚机快照代码分析
    nova卸载volume源码分析
  • 原文地址:https://www.cnblogs.com/vhhi/p/12380560.html
Copyright © 2011-2022 走看看