zoukankan      html  css  js  c++  java
  • 知乎抓取、写入文档

    import requests
    from pyquery import PyQuery as pq
    
    url='https://www.zhihu.com/explore'
    headers={
        'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'
        }
    html=requests.get(url,headers=headers).text
    doc=pq(html)
    items=doc('.explore-tab .feed-item ').items()
    #print(items)
    
    for item in items:
        question = item.find('h2').text()
        author = item.find('.author-link-line').text()
        answer = pq(item.find('.content').html()).text()
        file = open('explore.txt', 'a', encoding='utf-8')
        file.write('
    '.join([question, author, answer]))
        file.write('
    ' + '=' * 50 + '
    ')
        file.close()
  • 相关阅读:
    os和sys模块
    time模块
    collections模块
    re模块
    Python初识一
    Python闭包函数
    压栈
    isinstance()和issubclass()
    匿名函数--lambda函数
    机器学习入门文章
  • 原文地址:https://www.cnblogs.com/chenxi188/p/10524053.html
Copyright © 2011-2022 走看看