zoukankan      html  css  js  c++  java
  • python requests 爬取数据

    import requests
    from lxml import etree
    import time
    import pymysql
    import json
    headers={
    
        'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36',
        'Content-Type':'application/x-www-form-urlencoded',
        'Pragma':'no-cache',
        'Upgrade-Insecure-Requests':'1',
        'Content-Length':'86',
        'Host':'www.bjda.gov.cn'
    }
    
    headers_xiangqing={
    
        'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36',
        'Pragma':'no-cache',
        'Upgrade-Insecure-Requests':'1',
        'Host':'www.bjda.gov.cn'
    }
    
    dd={
        'pageSize':'20'
    }
    
    temp=[]
    
    dd['currentPage'] = '10'
    print(dd)
    response = requests.post('http://www.bjda.gov.cn/eportal/ui?pageId=348736', headers=headers, data=dd)
    selector = etree.HTML(response.text)
    item_spider = list(set(selector.xpath('//tr[@class="chaxun_con"]//a/@href')))
    temp.extend(item_spider)
    
    for i in temp:
        print('http://www.bjda.gov.cn/eportal/ui?pageId=348738&'+i[1:])
        response=requests.get('http://www.bjda.gov.cn/eportal/ui?pageId=348738&'+i[1:],headers=headers_xiangqing)
        print(response.status_code)
        selector=etree.HTML(response.text)
        tr=selector.xpath('//table[@class="table_sjcx"]//tr')
        print(tr
  • 相关阅读:
    使用JDK创建webService
    eclipse换工作空间要做的事情
    JAVA输出表格(适配中英文)
    linux下如何用GDB调试c++程序
    C++编译的四个步骤
    linux下如何设置root密码(第一次)
    chp01、Dreamweaver介绍
    服务器端程序
    1_计算机网络概述
    Oracle Java JDBC: Get Primary Key Of Inserted Record
  • 原文地址:https://www.cnblogs.com/ruiy/p/8872962.html
Copyright © 2011-2022 走看看