zoukankan      html  css  js  c++  java
  • 我的第一个网络爬虫 C#版 福利 程序员专车

    最近在自觉python,看到了知乎上一篇文章(https://www.zhihu.com/question/20799742),在福利网上爬视频。。。

    由是我就开始跟着做了,但答主给的例子是基于python2.x的,而我开始学的是3.x,把print用法改了以后还是有很多模块导入不了,新手又不知道该怎么解决。

    于是,为了学(shang)习(che),我就把其中的一段代码用C#写了一次。在加了一些延时的情况下,一会儿硬盘就被占用了3个多g了。。。同学们,要注意身体啊

    下面贴出代码。。代码中故意留了几个bug,避免非程序员上车

    class Program
        {
            static void Main(string[] args)
            {
                var baseString = "http://w*w.46ek.c*m/view/{0}.html";
                Regex regex = new Regex(@"http://m4.26ts.com/[.0-9-a-zA-Z]*.mp4");
                WebClient wc = new WebClient();
    
    
                uint startIndex = ReadStartIndex();
                uint loop = ReadLoopLen();
    
                for (int i = 0; i < lop; i++)
                {
                    var subUrl = string.Format(baseString, startIndex + i);
                    WebRequest wReq = System.Net.WebRequest.Create(subUrl)
    
                    try
                    {
                        WebResponse wResp = wReq.GetResponse();
                        Stream respStream = wResp.GetResponseStream();
    
                        using (StreamReader reader = new StreamReader(respStream, Encoding.GetEncoding("GB18030")))
                        {
                            var htmlString = reader.ReadToEnd();
    
                            Match m = regex.Match(htmlString);
                            if (m.Success)
                            {
                                DownloadFile(wc, m.Value, string.Format("{0}.mp4", startIndex + i));
                            }
                        }
                    }
                    catch (Exception exc)
                    {
                        Console.WriteLine("Error : {0}", exc.Message);
                    }
    
                    Thread.Sleep(5);
                }
                
            }
    
            private static uint ReadStartIndex()
            {
                while (true)
                {
                    Console.Write("Set start index :");
    
                    string line = Console.ReadLine();
    
                    uint index = 0;
    
                    if (UInt32.TryParse(line, out index))
                    {
                        Console.WriteLine("Start index setted : "+ index);
                        return index;
                    }
    
                    Thread.Sleep(500);
                }
            }
    
            private static uint ReadLoopLen()
            {
                while (true)
                {
                    Console.Write("Set loop len :");
    
                    string line = Console.ReadLine();
    
                    uint index = 0;
    
                    if (UInt32.TryParse(line, out index))
                    {
                        Console.WriteLine("Loop len setted : " + index);
                        return index;
                    }
    
                    Thread.Sleep(500);
                }
            }
    
            private static void DownloadFile(WebClient wc, string url, string localname)
            {
                Console.WriteLine("Downloading file {1} to {2}", url, localname);
    
                wc.DownloadFile(url, localname);
    
                Console.WriteLine("File {0} download completed!", localname);
            }
  • 相关阅读:
    MS SQL Server迁移至Azure SQL
    Aras Innovator 11 sp2 firefox客户端设置
    Aras Innovator 11 sp2 IE客户端设置
    Aras Innovator 11 sp2安装
    JDK Windows安装
    mocha测试es6问题
    jQuery中animate与scrollTop、offset().top实例
    AI下载步骤
    Visual Studio Code必备插件
    Visual Studio code快捷键
  • 原文地址:https://www.cnblogs.com/GuoRL/p/8328329.html
Copyright © 2011-2022 走看看