zoukankan      html  css  js  c++  java
  • 爬虫小案例:输入电影名称获取资源下载链接

    需求:用户输入喜欢的电影名字,程序即可在电影天堂https://www.ygdy8.com爬取电影所对应的下载链接,并将下载链接打印出来

    import requests
    from bs4 import BeautifulSoup
    from urllib.request import pathname2url
    
    # 为躲避反爬机制,伪装成浏览器的请求头
    headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36 OPR/65.0.3467.78 (Edition Baidu)'}
    
    # 获取电影磁力链接
    def getMovieDownloadLink(filmlink):
        res = requests.get(filmlink, headers=headers)
        if res.status_code == 200:
    
            # 请求后的内容中文乱码处理办法:
            # 当response编码是‘ISO-8859-1’,我们应该首先查找response header设置的编码;如果此编码不存在,查看返回的Html的header设置的编码
            if res.encoding == 'ISO-8859-1':
                encodings = requests.utils.get_encodings_from_content(res.text)
                if encodings:
                    encoding = encodings[0]
                else:
                    encoding = res.apparent_encoding
            else:
                encoding = res.encoding
            encode_content = res.content.decode(encoding, 'replace').encode('utf-8', 'replace')
    
            soup = BeautifulSoup(encode_content, 'html.parser')
            Zoom = soup.select_one('#Zoom')
            fileurl = Zoom.find('table').find('a').text
            with open('./17-电影天堂磁力.txt','a', newline='') as file:
                file.write(fileurl + '
    ')
    
        else:
            print('电影链接:{}请求失败!'.format(filmlink))
    
    def main():
        dyurl = 'https://www.ygdy8.com'
        movie = input('请输入电影名称:')
        # movie = '沉睡魔咒'
        movie = movie.encode('gbk')
        url = 'http://s.ygdy8.com/plus/s0.php?typeid=1&keyword={0}'.format(pathname2url(movie))
        res = requests.get(url, headers=headers)
        if res.status_code == 200:
            htmltext = res.text
            soup = BeautifulSoup(htmltext, 'html.parser')
            co_content8 = soup.find('div', class_='co_content8')
            tables = co_content8.find('ul').find_all('table')
            if len(tables) <= 0:
                print('没有找到相关的资源,可到站点上搜索 {0}'.format(dyurl))
            else:
                for table in tables:
                    filmlink = dyurl + table.find('a')['href']
                    getMovieDownloadLink(filmlink)
    
        else:
            print('请求失败!')
    
    main()
  • 相关阅读:
    __cdecl, __stdcall, __fastcall,__pascal调用区别
    Windows Hook原理与实现
    C语言四大存储区域总结
    MFC DestroyWindow、OnDestroy、OnClose 程序关闭相关
    VC++动态链接库DLL编程深入浅出"
    windows 安全模型简介
    获取当前焦点窗口进程名
    获取IE URL
    DLL编写中extern “C”和__stdcall的作用
    Django2支持跨域方法
  • 原文地址:https://www.cnblogs.com/KeenLeung/p/12161223.html
Copyright © 2011-2022 走看看