zoukankan      html  css  js  c++  java
  • 1.记我的第一次python爬虫爬取网页视频

    It is my first time to public some notes on this platform, and I just want to improve myself by recording something that I learned everyday.

    Partly , I don't know much about network crawler , and that makes me just understanding something that floats on the surface.

    But since I was learning three days when I got a method to craw some videos on the web.

    I am very excited, I just know how to craw something from the internet to computer hard disk. It is a start,  surely, this is the first step, I just got to keep moving. 

    Step 1: Find a video on the web page, then plays the video online, press the keyboard shortcuts F12, it occurs element-checked page 

    as the following pictures:

     Click .ts file and then you will see the URL, that is the point.

    Step 2: Writing python code,  as following:

     1 from multiprocessing import Pool
     2 import requests
     3 
     4 
     5 def demo(i):
     6     try:
     7         url = "https://vip.holyshitdo.com/2019/5/8/c2417/playlist%0d.ts"%i
     8         #simulate browser
     9         print(url)
    10         headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36Name','Referer':'http://91.com','Content-Type': 'multipart/form-data; session_language=cn_CN'} 
    11         r = requests.get(url, headers=headers)
    12         #print(r.content) save the video with binary format
    13         with open('./mp4/{}'.format(url[-10:]),'wb')as f:
    14             f.write(r.content)
    15     except:
    16         return ""
    17     
    18 
    19 if __name__=='__main__':           # program entry
    20     pool = Pool(10)              # create a process pool
    21     for i in range(193):
    22         pool.apply_async(demo,(i,))    # execute
    23 
    24
    25     pool.close()
    26     pool.join()

    Step 3:Running code

    Step 4 : Last but not least, merge .ts fragments into MP4 format.

    Get to the terminal interface , under the saved diretory and use command line "copy /b *.ts newfile.mp4"

    -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

    THAT IS ALL FOR NOW, TO BE CONTINUED~( ̄▽ ̄~)~

  • 相关阅读:
    asp.net页面生命周期追踪
    asp.net Forums 之配置,缓存,多数据访问
    沪江技术部程序员招聘试题,大家一起讨论一下。
    httpd does not appear to be running and proxying cobbler, or SELinux is in the way.
    网络知识OSI七层网络与TCP/IP五层网络架构及二层/三层网络
    python中用psutil模块,yagmail模块监控CPU、硬盘、内存使用,阈值后发送邮件
    Linux中访问Apache报403错误处理方法
    centos7的启动流程
    pycharm介绍
    监测NGINX服务的shell脚本
  • 原文地址:https://www.cnblogs.com/kamisamalz/p/11629105.html
Copyright © 2011-2022 走看看