zoukankan      html  css  js  c++  java
  • 用requests库和BeautifulSoup4库爬取新闻列表

    • 用requests库和BeautifulSoup4库,爬取校园新闻列表的时间、标题、链接、来源。
    import requests
    from bs4 import BeautifulSoup
    
    url_main="http://news.gzcc.cn/html/xiaoyuanxinwen/"
    res = requests.get(url_main)
    res.encoding = 'utf-8'
    
    soup = BeautifulSoup(res.text,'html.parser')
    li = soup.select('li')
    for li_title in li:
        if len(li_title.select('.news-list-title'))>0:
            herf = li_title.select('a')[0]['href']
            title = li_title.select('.news-list-title')[0].text
            time = li_title.select('span')[0].text
            info = li_title.select('span')[1].text
            li_res = requests.get(herf)
            li_res.encoding = 'utf-8'
            li_soup = BeautifulSoup(li_res.text,'html.parser')
            li_text = li_soup.select('.show-content')[0].text
            print(time,title,herf,info,'
    ',li_text)

    • 选一个自己感兴趣的主题,做类似的操作,为“爬取网络数据并进行文本分析”做准备。
    import requests
    from bs4 import BeautifulSoup
    
    url_main="https://www.jd.com/?cu=true&utm_source=kong&utm_medium=unionliaotian&utm_campaign=t_1000222402_&utm_term=ecac100033064339b9fad5482e8396e9&abt=3"
    res = requests.get(url_main)
    res.encoding = 'utf-8'
    
    soup = BeautifulSoup(res.text,'html.parser')
    jd = soup.select('.cate_menu_lk')
    for lk in jd:
        print(lk.text)

  • 相关阅读:
    反射
    Ajax
    JSP(二)
    JSP
    Servlet(三)
    Servlet(二)
    Servlet
    idea的Tomcat的配置
    使用Idea创建Maven构造的Web工程
    Maven的下载、安装与环境配置
  • 原文地址:https://www.cnblogs.com/zeson/p/7604121.html
Copyright © 2011-2022 走看看