zoukankan      html  css  js  c++  java
  • httpWebRequest获取流和WebClient的文件抓取

    httpWebRequest获取流和WebClient的文件抓取

    昨天写一个抓取,遇到了一个坑,就是在获取网络流的时候,人为的使用了stream.Length来获取流的长度,获取的时候会抛出错误,查了查文档,原因是某些流是无法获取到数据的长度的,所以不能直接得到。如果是常和stream打交道就能避免这个问题。其实直接使用do-while来获取就行了,代码如下:

    int i=0;
    do
    {
        byte[] buffer = new byte[1024];
    
        i = stream.Read(buffer, 0, 1024);
    
        fs.Write(buffer, 0, i);
    
    } while (i >0);

    其中while后只能写i>0;而不能写成i>=1024;原因可以看MSDN中的一段解释:msdn

    仅当流中没有更多数据且预期不会有更多数据(如套接字已关闭或位于文件结尾)时,Read 才返回 0。 即使尚未到达流的末尾,实现仍可以随意返回少于所请求的字节。

    一下是httpwebrequest和webClient抓取数据的简短代码:

    httpWebRequest

    /// <summary>
    /// 
    /// </summary>
    /// <param name="url">抓取url</param>
    /// <param name="filePath">保存文件名</param>
    /// <param name="oldurl">来源路径</param>
    /// <returns></returns>
    public static bool HttpDown(string url, string filePath, string oldurl)
    {
        try
        {
            HttpWebRequest req = WebRequest.Create(url) as HttpWebRequest;
    
            req.Accept = @"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
        ";
            req.Referer = oldurl;
            req.UserAgent = @" Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.154 Safari/537.36
        ";
            req.ContentType = "application/octet-stream";
    
            HttpWebResponse response = req.GetResponse() as HttpWebResponse;
    
            Stream stream = response.GetResponseStream();
    
           // StreamReader readStream=new StreamReader 
    
            FileStream fs = File.Create(filePath);
    
            long length = response.ContentLength;
    
    
            int i=0;
            do
            {
                byte[] buffer = new byte[1024];
    
                i = stream.Read(buffer, 0, 1024);
    
                fs.Write(buffer, 0, i);
    
            } while (i >0);
             
    
            fs.Close();
    
            return true;
        }
        catch (Exception ex) 
        { 
            return false;
        }
    
    
    }

    WebClient

    public static bool Down(string url, string desc,string oldurl)
    {
        try
        {
            WebClient wc = new WebClient();
            wc.Headers.Add(HttpRequestHeader.Accept, @"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
    ");
    
            wc.Headers.Add(HttpRequestHeader.Referer, oldurl);
            wc.Headers.Add(HttpRequestHeader.UserAgent, @" Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.154 Safari/537.36
    ");
            wc.Headers.Add(HttpRequestHeader.ContentType, "application/octet-stream");
    
    
            wc.DownloadFile(new Uri(url), desc);
    
            Console.WriteLine(url);
            Console.WriteLine("    "+desc + "   yes!");
            return true;
    
        }
        catch (Exception ex)
        {
            return false;
        }
    
    }
  • 相关阅读:
    iOS开发tips-UITableView、UICollectionView行高/尺寸自适应
    10559
    日志系统之基于Zookeeper的分布式协同设计
    IOS 图片上传处理 图片压缩 图片处理
    istream, outstream使用及常见错误
    matlab 扩大虚拟内存
    github不小心同步覆盖了本地文件
    经典统计语言模型
    Makefile 快速入门
    word2vec——高效word特征提取
  • 原文地址:https://www.cnblogs.com/zxtceq/p/7154396.html
Copyright © 2011-2022 走看看