zoukankan      html  css  js  c++  java
  • Python爬虫(五)

    源码:

     1 import requests
     2 from lxml import etree
     3 from my_mysql import MysqlConnect
     4 
     5 
     6 mc = MysqlConnect('127.0.0.1','root','123456','homework')
     7 sql = 'insert into lianjia(title,addr,shape,area,dire,price) values(%s,%s,%s,%s,%s,%s)'
     8 for page in range(3):
     9     url = 'https://bj.lianjia.com/zufang/pg{}rp2rp1/'.format(page)
    10     response = requests.get(url)
    11     html = etree.HTML(response.text)
    12     li_list = html.xpath('//ul[@id="house-lst"]/li')
    13     # print(li_list)
    14     for li_ele in li_list:
    15         title = li_ele.xpath('./div[2]/h2/a')[0].text
    16         addr = li_ele.xpath('./div[2]/div[1]/div[1]/a/span')[0].text
    17         shape = li_ele.xpath('./div[2]/div[1]/div[1]/span[1]/span')[0].text
    18         area = li_ele.xpath('./div[2]/div[1]/div[1]/span[2]')[0].text
    19         dire = li_ele.xpath('./div[2]/div[1]/div[1]/span[3]')[0].text
    20         price = li_ele.xpath('./div[2]/div[2]/div[1]/span')[0].text
    21         # print(title,addr,shape,area,price)
    22         data = (title,addr,shape,area,dire,price)
    23         print(data)
    24         mc.exec_data(sql,data)
    25         # break
  • 相关阅读:
    闭包函数 (字符编码,文件处理,函数基础总结)
    函数参数详解
    文件处理及函数基础
    文件处理高级
    面向对象----反射
    正则表达式与re模块
    常用模块
    模块和包
    内置函数与匿名函数
    HDU
  • 原文地址:https://www.cnblogs.com/zhxd-python/p/9501310.html
Copyright © 2011-2022 走看看