zoukankan      html  css  js  c++  java
  • python爬虫(爬取视频)

    爬虫爬视频

    爬取步骤

    第一步:获取视频所在的网页

    第二步:F12中找到视频真正所在的链接

    第三步:获取链接并转换成二进制

    第四部:保存

    保存步骤代码

    import re
    import requests
    response =  requests.get('https://vd4.bdstatic.com/mda-jcrx64vi5vct2d2u/sc/mda-jcrx64vi5vct2d2u.mp4?auth_key=1557734214-0-0-d6a29a90222c6caf233e8a2a34c2e37a&bcevod_channel=searchbox_feed&pd=bjh&abtest=all')
    video = response.content         #把文件保存成二进制
    with open(r'D:图片绿色.mp4','wb') as fw:
        fw.write(video)           #将文件内容写入该文件
        fw.flush()               #刷新
    

    爬酷6首页的所有视频

    #有点偷懒变量名用简单字母啦.............
    # https://www.ku6.com/index
    # <a class="video-image-warp" target="_blank" href="(.*?)">
    #this.src({type: "video/mp4", src: "(.*?)"})
    #src({type: "video/mp4", src: "(.*?)"})
    import re  # 载入模块
    import requests  # 载入模块
    new_list = []
    time = 0
    response = requests.get('https://www.ku6.com/index')
    data = response.text
    # print(data)
    url = re.findall('<a class="video-image-warp" target="_blank" href="(.*?)">',data)
    for a in url : #type:str
        if a.startswith('/v') or a.startswith('/d'):
            new_list.append(f'https://www.ku6.com{a}')
        elif a.startswith('ht'):
            new_list.append(f"{a.split('垃')[0]}")
    for url_1 in new_list:
        response_1 = requests.get(url_1)
        data_1 = response_1.text
        video = re.findall('<source src="(.*?)" type="video/mp4">',data_1) or re.findall('type: "video/mp4", src: "(.*?)"',data_1)
        video_1 = video[0]
        x = video_1.split('/')[-1]
        name = f'{x}.mp4'
        video_response = requests.get(video_1)
        video_3 = video_response.content
        with open(f'D:图片{name}','wb') as fw:
            fw.write(video_3)
            fw.flush()
            time += 1
            print(f'已经爬取{time}个视频')
    
  • 相关阅读:
    js数组求交集
    php安装oci8和pdo_oci扩展实现连接oracle数据库
    nginx配置静态资源压缩
    SHELL递归遍历文件夹下所有文件
    PHP函数获取临时文件目录
    php去除文件bom头
    tcpdump抓取udp报文
    linux获取当前运行级别
    当安装软件后提示依赖没有安装时
    Ubuntu卸载通过apt-get命令安装的软件
  • 原文地址:https://www.cnblogs.com/pythonywy/p/10857032.html
Copyright © 2011-2022 走看看