zoukankan      html  css  js  c++  java
  • 一个简单的Python爬虫+写入文本

    import os
    import requests
    from bs4 import BeautifulSoup

    # 获取HTML文档
    def get_html(url):
    response = requests.get(url)
    response.encoding = 'uft-8'
    return response.text

    # 获取笑话
    def get_joke(html):
    soup = BeautifulSoup(html,'lxml')

    abc = ''
    num = 0
    for link in soup.find_all("div", class_="content"):
    # for i in range(10):
    # joke_content = soup.select('div.content')[i].get_text()
    num = num + 1
    abc += "--------" + str(num) + link.get_text()
    return abc

    # 将笑话写入txt
    # ls = os.linesep

    def writeJoke(joke):
    while True:
    filename = input('文件名:')
    if os.path.exists(filename):
    print("错误:'%s' 该文件已存在" % filename) # 是否存在

    else:
    break

    fobj = open(filename, 'w') #写入文本

    fobj.write(joke)
    # fobj.writelines(['%s%s' % (x, ls) for x in all]) #每一个字符后面都会换行
    fobj.close()
    print('写入成功!')

    url_joke = "https://www.qiushibaike.com"
    html = get_html(url_joke)
    joke = get_joke(html)
    writeJoke(joke)
    # print(joke)
  • 相关阅读:
    P4168 [Violet]蒲公英
    P3320 [SDOI2015]寻宝游戏
    P2487 [SDOI2011]拦截导弹
    P3338 [ZJOI2014]力(FFT)
    P1975 [国家集训队]排队
    P4103 [HEOI2014]大工程
    虚树小结
    LVS初步
    常见指针定义解读
    可epoll队列
  • 原文地址:https://www.cnblogs.com/lbx6935/p/9508084.html
Copyright © 2011-2022 走看看