zoukankan      html  css  js  c++  java
  • Python爬虫(五)

    源码:

     1 import requests
     2 from lxml import etree
     3 from my_mysql import MysqlConnect
     4 
     5 
     6 mc = MysqlConnect('127.0.0.1','root','123456','homework')
     7 sql = 'insert into lianjia(title,addr,shape,area,dire,price) values(%s,%s,%s,%s,%s,%s)'
     8 for page in range(3):
     9     url = 'https://bj.lianjia.com/zufang/pg{}rp2rp1/'.format(page)
    10     response = requests.get(url)
    11     html = etree.HTML(response.text)
    12     li_list = html.xpath('//ul[@id="house-lst"]/li')
    13     # print(li_list)
    14     for li_ele in li_list:
    15         title = li_ele.xpath('./div[2]/h2/a')[0].text
    16         addr = li_ele.xpath('./div[2]/div[1]/div[1]/a/span')[0].text
    17         shape = li_ele.xpath('./div[2]/div[1]/div[1]/span[1]/span')[0].text
    18         area = li_ele.xpath('./div[2]/div[1]/div[1]/span[2]')[0].text
    19         dire = li_ele.xpath('./div[2]/div[1]/div[1]/span[3]')[0].text
    20         price = li_ele.xpath('./div[2]/div[2]/div[1]/span')[0].text
    21         # print(title,addr,shape,area,price)
    22         data = (title,addr,shape,area,dire,price)
    23         print(data)
    24         mc.exec_data(sql,data)
    25         # break
  • 相关阅读:
    shiro注解,初始化资源和权限,会话管理
    shiro标签
    20个为前端开发者准备的文档和指南
    Canvas处理头像上传
    Chrome 实用调试技巧
    JS 代码编一个倒时器
    sql server优化
    在线图片压缩网站
    Request.QueryString
    C#网络爬虫
  • 原文地址:https://www.cnblogs.com/zhxd-python/p/9501310.html
Copyright © 2011-2022 走看看