zoukankan      html  css  js  c++  java
  • 第一篇随笔:用VB.NET搞点简单事情(1)

    网络上能搜索到的爬虫文章大多是用python做的,也有少部分是C#做的(小声:所以用VB.NET也可以做爬虫.本文写的是第一步:获取网页)

    使用代码前先imports以下内容

    Imports System.IO, System.IO.Compression, System.Text, System.Net

    写程序前先开浏览器(我用的Chrome),随便上个网页,F12看下header,粘下来useragent备用,也可以粘下accept,cookie等(在本文中用不到

    用httpwebrequest建立请求,用httpwebresponse得到响应体.然后考虑下压缩的问题(imports System.IO.Compression就是解决这个的)

    最后得到真正的返回流,streamreader读取之,然后网页的http代码就搞下来了.用这种方法可以搞定编码为UTF-8的网页对于编码是GB2312或GBK的需有改动:使用streamreader时第二个参数改为Encoding.GetEncoding("gbk")

    下面是代码:

     1 Public Function GetHttpContent(url As String) As String
     2         Try
     3             Dim req As HttpWebRequest = HttpWebRequest.CreateHttp(url), resp As HttpWebResponse, sol$
     4             With req
     5                 .UserAgent = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"
     6                 .Accept = "*/*"
     7                 .Method = "GET"
     8                 .Timeout = 300000
     9                 .Headers.Add("accept-encoding", " gzip, deflate")
    10             End With
    11             resp = req.GetResponse
    12             Select Case resp.ContentEncoding.ToLower
    13                 Case "gzip"
    14                     Using z As New GZipStream(resp.GetResponseStream, CompressionMode.Decompress)
    15                         Using sr As New StreamReader(z, Encoding.UTF8)
    16                             sol = sr.ReadToEnd
    17                         End Using
    18                     End Using
    19                     Exit Select
    20                 Case "deflate"
    21                     Using z As New DeflateStream(resp.GetResponseStream, CompressionMode.Decompress)
    22                         Using sr As New StreamReader(z, Encoding.UTF8)
    23                             sol = sr.ReadToEnd
    24                         End Using
    25                     End Using
    26                     Exit Select
    27                 Case Else
    28                     Using sr As New StreamReader(resp.GetResponseStream, Encoding.UTF8)
    29                         sol = sr.ReadToEnd
    30                     End Using
    31                     Exit Select
    32             End Select
    33             Return sol
    34         Catch ex As Exception
    35             Return ""
    36         End Try
    37     End Function

    (本人水平有限,代码有不完善的地方欢迎指出

  • 相关阅读:
    SQL Server ->> Database Snapshot(数据块快照)
    SQL Server ->> Sparse File(稀疏文件)
    Linux ->> Sudo命令
    Linux ->> mkdir命令
    Linux ->> VMWare Workstation虚拟机里的UBuntu系统安装VMWare-tools
    Microsoft Office ->> 完整卸载Office 2007
    SQL Server ->> XML方法
    SQL Server ->> 更改服务器时区对SQL Server Agent服务器的影响
    分析java内存情况
    oracle 10g 11g 12c区别
  • 原文地址:https://www.cnblogs.com/woshilxcdexuesheng/p/11414764.html
Copyright © 2011-2022 走看看