zoukankan      html  css  js  c++  java
  • python爬虫--房产数据爬取并保存本地

    
    

    import requests
    import csv
    from bs4 import BeautifulSoup
    headers={'user-agent':'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.79 Safari/537.36 Maxthon/5.2.6.1000'}
    for i in range(1,10):
    link='https://fz.anjuke.com/sale/p'+str(i)+'/#filtersort'
    r=requests.get(link,headers=headers)
    print(str(i + 1), "页响应状态码:", r.status_code)
    soup=BeautifulSoup(r.text,'lxml')
    house_list=soup.find_all('li',class_="list-item")
    with open('test.csv', 'a',newline='',encoding='utf-8-sig')as csvfile:
    w=csv.writer(csvfile)
    w.writerow(('标题','价格','均价','面积','楼层'))
    for house in house_list:
    temp = []
    name=house.find('div',class_='house-title').a.text.strip()
    price=house.find('div',class_='pro-price').contents[1].text.strip()
    price_ave=house.find('div',class_='pro-price').contents[2].text.strip()
    area=house.find('div',class_='details-item').span.text
    floor=house.find('div',class_='details-item').contents[5].text
    temp=[name,price,price_ave,area,floor]
    print(temp)
    w.writerow(temp)

     

    几个注意点:

    1、with open('test.csv', 'a',newline='',encoding='utf-8-sig')as csvfile:,注意utf8转码,否则数据保存本地会为乱码形式

    2、插入标题的方式,数组的写入

  • 相关阅读:
    redux和react-redux做简单组件控制
    store(转)
    react+redux开发详细步骤
    rudex 实现流程
    react eject 报错
    react生命周期
    python logging
    hihocoder 1754
    hihocoder_offer收割编程练习赛58
    内存检查
  • 原文地址:https://www.cnblogs.com/leon507/p/10401091.html
Copyright © 2011-2022 走看看