zoukankan      html  css  js  c++  java
  • 爬虫爬取视频

     

    爬取步骤

      第一步:获取视频所在的网页

      第二步:F12中找到视频真正所在的链接

      第三步:获取链接并转换成二进制

      第四部:保存

    保存步骤代码

    import re
    import requests
    response =  requests.get('https://vd4.bdstatic.com/mda-jcrx64vi5vct2d2u/sc/mda-jcrx64vi5vct2d2u.mp4?auth_key=1557734214-0-0-d6a29a90222c6caf233e8a2a34c2e37a&bcevod_channel=searchbox_feed&pd=bjh&abtest=all')
    video = response.content         #把文件保存成二进制
    with open(r'D:图片绿色.mp4','wb') as fw:
        fw.write(video)           #将文件内容写入该文件
        fw.flush()               #刷新

    爬酷6首页的所有视频

    #有点偷懒变量名用简单字母啦.............
    # https://www.ku6.com/index
    # <a class="video-image-warp" target="_blank" href="(.*?)">
    #this.src({type: "video/mp4", src: "(.*?)"})
    #src({type: "video/mp4", src: "(.*?)"})
    import re  # 载入模块
    import requests  # 载入模块
    new_list = []
    time = 0
    response = requests.get('https://www.ku6.com/index')
    data = response.text
    # print(data)
    url = re.findall('<a class="video-image-warp" target="_blank" href="(.*?)">',data)
    for a in url : #type:str
        if a.startswith('/v') or a.startswith('/d'):
            new_list.append(f'https://www.ku6.com{a}')
        elif a.startswith('ht'):
            new_list.append(f"{a.split('垃')[0]}")
    for url_1 in new_list:
        response_1 = requests.get(url_1)
        data_1 = response_1.text
        video = re.findall('<source src="(.*?)" type="video/mp4">',data_1) or re.findall('type: "video/mp4", src: "(.*?)"',data_1)
        video_1 = video[0]
        x = video_1.split('/')[-1]
        name = f'{x}.mp4'
        video_response = requests.get(video_1)
        video_3 = video_response.content
        with open(f'D:图片{name}','wb') as fw:
            fw.write(video_3)
            fw.flush()
            time += 1
            print(f'已经爬取{time}个视频')
  • 相关阅读:
    C++学习9 this指针详解
    福建省第八届 Triangles
    UVA 11584 Partitioning by Palindromes
    POJ 2752 Seek the Name, Seek the Fame
    UVA 11437 Triangle Fun
    UVA 11488 Hyper Prefix Sets (字典树)
    HDU 2988 Dark roads(kruskal模板题)
    HDU 1385 Minimum Transport Cost
    HDU 2112 HDU Today
    HDU 1548 A strange lift(最短路&&bfs)
  • 原文地址:https://www.cnblogs.com/xpptt/p/11799221.html
Copyright © 2011-2022 走看看