zoukankan      html  css  js  c++  java
  • 爬取杭电oj所有题目

    杭电oj并没有反爬

    所以直接爬就好了

    直接贴源码(参数可改,循环次数可改,存储路径可改)

    import requests
    from bs4 import BeautifulSoup
    import time
    
    def write_in_file(number,string):#output function
        with open ('D:\python\python_code\hdoj\'+str(number)+".txt","a+",encoding='utf-8') as f:
            f.write(string)
            f.close()
    
    
    link = "http://acm.hdu.edu.cn/showproblem.php?pid="
    headers = {
        'user-agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Mobile/15A372 Safari/604.1'    
    }
    for i in range (1503,1900):
        print("acquire the request now")
        r = requests.get(link+str(i),headers = headers,timeout = 10)
        print("acquire the reuest completed")
        soup = BeautifulSoup(r.text,"lxml")
        problem_title = soup.find("h1").text#get the title
        write_in_file(i,"question: "+problem_title+"
    ")
        problem_des = soup.find_all("div",class_="panel_content") 
        the_title = soup.find_all("div",class_ ="panel_title")
        #print(the_title)
        print("write into file now")
        print("now write in the NO. "+str(i) +" file")
        len_of_the_title = len(the_title)
        for m in range(0,len_of_the_title):
            write_in_file(i,the_title[m].text+": "+problem_des[m].text+"
    ")
        time.sleep(1)#sleep for one second

    另:爬取纯粹是兴趣,无商业用途,侵删

    希望对大家有所帮助

    以上

  • 相关阅读:
    刷题[极客大挑战 2019]HardSQL
    刷题[安洵杯 2019]不是文件上传
    归并排序算法及其JS实现
    快速排序算法原理及其js实现
    圣杯布局
    什么是文档流
    AngularJs四大特性
    call,apply,bind的区别
    计算给定数组 arr 中所有元素的总和的几种方法
    es6之Decorator
  • 原文地址:https://www.cnblogs.com/lavender-pansy/p/12118004.html
Copyright © 2011-2022 走看看