zoukankan      html  css  js  c++  java
  • 爬虫爬取视频

     

    爬取步骤

      第一步:获取视频所在的网页

      第二步:F12中找到视频真正所在的链接

      第三步:获取链接并转换成二进制

      第四部:保存

    保存步骤代码

    import re
    import requests
    response =  requests.get('https://vd4.bdstatic.com/mda-jcrx64vi5vct2d2u/sc/mda-jcrx64vi5vct2d2u.mp4?auth_key=1557734214-0-0-d6a29a90222c6caf233e8a2a34c2e37a&bcevod_channel=searchbox_feed&pd=bjh&abtest=all')
    video = response.content         #把文件保存成二进制
    with open(r'D:图片绿色.mp4','wb') as fw:
        fw.write(video)           #将文件内容写入该文件
        fw.flush()               #刷新

    爬酷6首页的所有视频

    #有点偷懒变量名用简单字母啦.............
    # https://www.ku6.com/index
    # <a class="video-image-warp" target="_blank" href="(.*?)">
    #this.src({type: "video/mp4", src: "(.*?)"})
    #src({type: "video/mp4", src: "(.*?)"})
    import re  # 载入模块
    import requests  # 载入模块
    new_list = []
    time = 0
    response = requests.get('https://www.ku6.com/index')
    data = response.text
    # print(data)
    url = re.findall('<a class="video-image-warp" target="_blank" href="(.*?)">',data)
    for a in url : #type:str
        if a.startswith('/v') or a.startswith('/d'):
            new_list.append(f'https://www.ku6.com{a}')
        elif a.startswith('ht'):
            new_list.append(f"{a.split('垃')[0]}")
    for url_1 in new_list:
        response_1 = requests.get(url_1)
        data_1 = response_1.text
        video = re.findall('<source src="(.*?)" type="video/mp4">',data_1) or re.findall('type: "video/mp4", src: "(.*?)"',data_1)
        video_1 = video[0]
        x = video_1.split('/')[-1]
        name = f'{x}.mp4'
        video_response = requests.get(video_1)
        video_3 = video_response.content
        with open(f'D:图片{name}','wb') as fw:
            fw.write(video_3)
            fw.flush()
            time += 1
            print(f'已经爬取{time}个视频')
  • 相关阅读:
    java连接oracle
    用js实现登录的简单验证
    合并链表,按主键升序
    Jquery中.ajax和.post详解
    简洁的Jquery弹出窗插件
    服务端缓存页面及IIS缓存设置
    C#托管代码、非托管代码及回收机制
    页面滑动底部自动加载下一页信息
    EF各版本增删查改及执行Sql语句
    Node.Js and Mongoose
  • 原文地址:https://www.cnblogs.com/xpptt/p/11799221.html
Copyright © 2011-2022 走看看