zoukankan      html  css  js  c++  java
  • 用requests库和BeautifulSoup4库爬取新闻列表

    • 用requests库和BeautifulSoup4库,爬取校园新闻列表的时间、标题、链接、来源。
    import requests
    from bs4 import BeautifulSoup
    
    url_main="http://news.gzcc.cn/html/xiaoyuanxinwen/"
    res = requests.get(url_main)
    res.encoding = 'utf-8'
    
    soup = BeautifulSoup(res.text,'html.parser')
    li = soup.select('li')
    for li_title in li:
        if len(li_title.select('.news-list-title'))>0:
            herf = li_title.select('a')[0]['href']
            title = li_title.select('.news-list-title')[0].text
            time = li_title.select('span')[0].text
            info = li_title.select('span')[1].text
            li_res = requests.get(herf)
            li_res.encoding = 'utf-8'
            li_soup = BeautifulSoup(li_res.text,'html.parser')
            li_text = li_soup.select('.show-content')[0].text
            print(time,title,herf,info,'
    ',li_text)

    • 选一个自己感兴趣的主题,做类似的操作,为“爬取网络数据并进行文本分析”做准备。
    import requests
    from bs4 import BeautifulSoup
    
    url_main="https://www.jd.com/?cu=true&utm_source=kong&utm_medium=unionliaotian&utm_campaign=t_1000222402_&utm_term=ecac100033064339b9fad5482e8396e9&abt=3"
    res = requests.get(url_main)
    res.encoding = 'utf-8'
    
    soup = BeautifulSoup(res.text,'html.parser')
    jd = soup.select('.cate_menu_lk')
    for lk in jd:
        print(lk.text)

  • 相关阅读:
    头插法建立单链表
    顺序表
    栈的顺序存储实现
    折半查找
    myeclipe 快捷键盘
    ztree redio单选按钮
    webuploader上传进度条 上传删除
    svn乱码解决办法
    异构SOA系统架构之Asp.net实现(兼容dubbo)
    RPC框架
  • 原文地址:https://www.cnblogs.com/zeson/p/7604121.html
Copyright © 2011-2022 走看看