zoukankan      html  css  js  c++  java
  • 网络爬虫基本练习

    1.取出h1标签的文本

    import requests
    url = 'http://news.gzcc.cn/html/2018/xiaoyuanxinwen_0328/9113.html'
    res = requests.get(url)
    res.encoding='utf-8'
    from bs4 import BeautifulSoup
    soup = BeautifulSoup(res.text,'html.parser')
    soup.h1.text

    2.取出a标签的链接

    soup.a.attrs.get('href')

    3.取出所有li标签的所有内容

     for i in soup.select('li'):
        print(i.text)

    4.取出一条新闻的标题、链接、发布时间、来源

    soup.select('.news-list-title')[0].text
    soup.select('li')[1].a.attrs['href']
    soup.select('.news-list-info')[0].contents[0].text
    soup.select('.news-list-info')[0].contents[1].text
  • 相关阅读:
    Brupsuite 中插件CO2的使用
    记事本默认编码改为UTF-8
    Crackme031
    Crackme030
    Crackme029
    Crackme028
    Crackme026
    Crackme025
    Crackme024
    Crackme022
  • 原文地址:https://www.cnblogs.com/guoyaowen/p/8669108.html
Copyright © 2011-2022 走看看