zoukankan      html  css  js  c++  java
  • python requests 爬取数据

    import requests
    from lxml import etree
    import time
    import pymysql
    import json
    headers={
    
        'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36',
        'Content-Type':'application/x-www-form-urlencoded',
        'Pragma':'no-cache',
        'Upgrade-Insecure-Requests':'1',
        'Content-Length':'86',
        'Host':'www.bjda.gov.cn'
    }
    
    headers_xiangqing={
    
        'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36',
        'Pragma':'no-cache',
        'Upgrade-Insecure-Requests':'1',
        'Host':'www.bjda.gov.cn'
    }
    
    dd={
        'pageSize':'20'
    }
    
    temp=[]
    
    dd['currentPage'] = '10'
    print(dd)
    response = requests.post('http://www.bjda.gov.cn/eportal/ui?pageId=348736', headers=headers, data=dd)
    selector = etree.HTML(response.text)
    item_spider = list(set(selector.xpath('//tr[@class="chaxun_con"]//a/@href')))
    temp.extend(item_spider)
    
    for i in temp:
        print('http://www.bjda.gov.cn/eportal/ui?pageId=348738&'+i[1:])
        response=requests.get('http://www.bjda.gov.cn/eportal/ui?pageId=348738&'+i[1:],headers=headers_xiangqing)
        print(response.status_code)
        selector=etree.HTML(response.text)
        tr=selector.xpath('//table[@class="table_sjcx"]//tr')
        print(tr
  • 相关阅读:
    Hello Springboot
    Spring AOP
    代理模式
    Spring 面向注解开发
    Spring Bean 的配置
    IDEA 14 for Mac 提示要安装java 6的修改
    NAS DIY
    Maven Jetty SSL配置
    图书管理系统(jsp+nysql实现)
    互联网+XX项目技术架构
  • 原文地址:https://www.cnblogs.com/ruiy/p/8872962.html
Copyright © 2011-2022 走看看