zoukankan      html  css  js  c++  java
  • Python 爬取页面内容

    import urllib.request
    import requests
    from bs4 import BeautifulSoup
    
    url = "http://www.stats.gov.cn/tjsj/tjbz/tjyqhdmhcxhfdm/2018/12/1201.html"
    headers = ("User-Agent","Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36")
    opener = urllib.request.build_opener()
    opener.addheaders = [headers]
    data = opener.open(url).read()
    content = data.decode('GB2312')
    soup = BeautifulSoup(content, 'html.parser')
    print(soup.find_all('a'))
    
    for link in soup.find_all('a'):
        print('url:',link.attrs['href'])
        print('text:',link.get_text('title'))
  • 相关阅读:
    hive笔记
    hive数据倾斜的解决办法
    Kafka笔记7
    kafka笔记6
    Kafka笔记5
    kafka笔记4(2)
    kafka笔记4
    Kafka笔记3
    kafka笔记2
    kafka笔记1
  • 原文地址:https://www.cnblogs.com/isungge/p/11598112.html
Copyright © 2011-2022 走看看