zoukankan      html  css  js  c++  java
  • 利用BeautifulSoup爬去我爱我家的租房数据

    因为之前对BeautifulSoup一直不是很熟悉,刚好身边的朋友同事在找房子,就想着能不能自己写个爬虫爬一下数据,因此就写了这个爬虫。基本都是边看书边写的,不过也没什么好讲的。直接粘代码了。

    # coding=utf-8
    import requests
    from bs4 import BeautifulSoup
    import  pymysql
    import time
    db= pymysql.connect(host="127.0.0.1",port =3306,user="root" ,passwd="root",db="woaiwojia",charset='utf8')
    cursor = db.cursor()
    for num in range(1,81):
        url = "https://sh.5i5j.com/zufang/o8r1u1n"+str(num)+"/"
        time.sleep(10)
        strhtml = requests.get(url)
        fanlist = BeautifulSoup(strhtml.text,"lxml")
        sthtml = fanlist.find_all("ul",{"class":"pList"})
        for ul in fanlist.find_all("ul",{"class":"pList"}):
            for li in ul.find_all(name="li"):
                for div in li.find_all("div",{"class":"listCon"}):
                    xiaoqu = div.h3.a.string
                    detailUrl = "https://sh.5i5j.com"+div.h3.a.attrs['href']
                    detailhtml = requests.get(detailUrl)
                    detail = BeautifulSoup(detailhtml.text,"lxml")
                    jinjirenlist =detail.find_all("div",{"id":"housebroker"})
                    for div1 in  div.find_all("div",{"class":"listX"}):
                        area = div1.find_all("p")[0].text
                        community = div1.find_all("p")[1].text
                        hot = div1.find_all("p")[2].text
                        price = div1.find_all("div",{"class":"jia"})[0].p.strong.string
                        for uldiv in detail.find_all("div",{"id":"housebroker"}):
                            for  ul in uldiv.find_all("ul"):
                                lxrphone = ul.h3.string+ul.label.string
                                sql = "insert into zufang(area,xiaoqu,community,hot,price,lxrphone) VALUES  ('%s','%s','%s','%s','%s','%s');" % (area, xiaoqu,community,hot,price,lxrphone)
                        try:
                            cursor.execute(sql)
                            db.commit()
                        except:
                            print('插入失败')
    

    有什么问题或者建议可以评论与我进行交流

  • 相关阅读:
    梯度下降_机器学习-李宏毅
    LeTex算法伪代码环境
    数据结构之线性表
    Java中的初始化块、构造器、静态初始化块的执行顺序
    Java中的内省(introspector)
    JSP (Java Server Page)
    eclipse的web工程默认部署到了哪里
    Persistence机制(永久保存/序列化Serialize)
    VC++中使用正则表达式RegExp
    Java中解析和生成xml
  • 原文地址:https://www.cnblogs.com/zhendiao/p/9333004.html
Copyright © 2011-2022 走看看