zoukankan      html  css  js  c++  java
  • 微信文本的爬取

    import requests
    from lxml import etree
    
    
    def body():
        url = "https://mp.weixin.qq.com/s/6XYNToX51bWX7ij5MiTWFA"
        header = {'User-Agent':'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36'}
        respones = requests.get(url,headers=header)
        respones.encoding = "utf_8"
        html = respones.text
        obj = etree.HTML(html)
        obj_body = obj.xpath('//div[@class="rich_media_content "]/p//text() ')
        obj_titer = obj.xpath('//div[@id="img-content"]/h2/text()')
        y = []
        for i in obj_titer:
            s = i.strip()
            y.append(s)
        obj = y + obj_body
        print(obj)
        v = []
        for i in obj:       
            v.append("u3000"+"u3000" + i+"
    ")
        with open(r"F:day08人民日报微信文章\%s.text"%y,"w",encoding="utf-8") as f:
            for i in v:
                f.write(i)
    
    body()
    

      

  • 相关阅读:
    css
    js -【 数组】判断一个变量是数组类型的几种方法
    【消灭代办】第2周
    【本周面试题】第2周
    【本周面试题】第1周
    【消灭代办】第1周
    echarts
    css
    js
    JS方法
  • 原文地址:https://www.cnblogs.com/heluobing/p/10829305.html
Copyright © 2011-2022 走看看