zoukankan      html  css  js  c++  java
  • 20191118孙源《Python程序设计》实验四报告

     

     

    实验报告

     

     

    课  程:

    Python程序设计

    实验名称:

    实验四

    实验日期:

    2020年6月1日

    学  号:

    20191108

    姓  名:

    孙源

    任课教师:

    王志强老师

     

     

    成绩:

    评语: 

     


    实验目的与要求

    使用Python爬虫进行网页内容爬取

    实验设计与实现

    import requests
    import bs4
    import re

    code_class = {"sh_cpp": ".cpp",
                  "sh_c": ".c", "": ".py",
                  "sh_pascal": ".pas",
                  "sh_java": ".java"}
    mylog = {"redirectUrl": "http://openjudge.cn/",
             "password": "",
             "email": ""}
    session_requests = requests.session()
    own_url = "http://openjudge.cn/"


    # 爬取个人主页url下accept的内容,若有next page则继续爬取
    def download_url(url):
        global session_requests
        global own_url
        global code_class
        ans = session_requests.get(url)
        s = str(ans.content, encoding="utf-8")
        soup = bs4.BeautifulSoup(s, "html.parser")
        blocks = soup.find_all("a", class_="result-right")
        # blocks包含accept代码网页的信息,遍历该页
        if blocks != []:
            for i in blocks:
                solution_url = i["href"]
                solution = session_requests.get(solution_url)
                ss = str(solution.content, encoding="utf-8")
                s_soup = bs4.BeautifulSoup(ss, "html.parser")
                # 判断该题的代码类型
                for class_name in code_class:
                    block = s_soup.find("pre", class_=class_name)
                    if block == None:
                        continue
                    try:
                        name = s_soup.find_all("h3")[1]
                        name = name.text[:-5]
                    except:
                        print("Get name wrong!")
                    # 去掉第一个':'前面的编号
                    index = name.find(':')
                    if (index != -1):
                        name = name[index + 1:]
                    # 去除题名中的非法字符和开头结尾的空格
                    name = re.sub(r"[\/:*?#"<>|:]", " ", name).strip()
                    try:
                        # 已存在同名代码
                        f = open("C:/tmp/" + name + code_class[class_name])
                        print(name + " has already downloaded")
                        continue
                    except IOError:
                        # 不存在同名代码
                        print("downloading your correct code " + name)
                        try:
                            f = open("C:/tmp/" + name + code_class[class_name], 'w', encoding="utf-8")
                            new_str = block.text
                            f.write(new_str)
                            f.close()
                        except Exception as e:
                            print(name + " can't be downloaded correctly")
                            print(e)
        # next 是下一页的相对路径
        next = soup.find("a", class_="nextprev", rel="next")
        if next != None:
            download_url(own_url + next["href"])


    def spider():
        global code_class
        global mylog, session_requests, own_url
        mylog["email"] = input("请输入您登陆openjudge使用的email账号: ")
        mylog["password"] = input("请输入您的密码: ")
        login_url = "http://openjudge.cn/api/auth/login/"
        result = session_requests.post(  # 向服务器发送post请求
            login_url,
            data=mylog,
            headers=dict(referer=login_url),
        )
        result = session_requests.get("http://openjudge.cn/")
        # 用正则表达式匹配寻找个人首页的url
        pt = r"<a href="(http://[^"]*)">个人首页</a>"
        try:
            own_url = re.search(pt, result.text).group(1)
            print("这是您的主页:" + own_url)
        except:
            print("账号不存在或密码错误!请重新输入!")
            spider()
            return
            own_url = ''
        download_url(own_url)
        print("您已成功下载所有accept的程序至c:\tmp文件夹下!")


    spider()

    课程感悟

    课程开始时因为对程序设计不了解,又没有好好复习,经常对老师上课讲的知识一头雾水,直到慢慢看云班课的视频,在网上看教程,才能慢慢跟上老师的节奏,当初选择这门课的时候就抱着学一门新技术的想法,学python确实受益匪浅,还为学习C语言提供了很多帮助。

    ## 参考资料:

    -  [《Python爬虫实例》](https://www.jianshu.com/p/757d8981fdda)

    -  [《Python 网络编程》](https://www.runoob.com/python/python-socket.html)

    -  [《Python爬虫实例》](https://www.jianshu.com/p/757d8981fdda)

    ##附码云链接:

    [实验四]( https://gitee.com/sunyuan1118/python-test-2020)

  • 相关阅读:
    《移动开发者周刊》第十一期
    2012安卓巴士开发者沙龙成都站大家抓紧报名
    23岁那年你正处在哪个状态?现在呢?
    《老罗Android开发视频教程》老罗来交国庆的答卷了
    程序员,你的一千万在哪里?
    《老罗Android开发视频教程》更新
    2012全球开发者大会项目投资一对一相亲会
    windows远程桌面
    [LeetCode] NQueens
    [LeetCode] Pascal's Triangle II
  • 原文地址:https://www.cnblogs.com/1118yuan/p/13254583.html
Copyright © 2011-2022 走看看