zoukankan      html  css  js  c++  java
  • 第一篇随笔:用VB.NET搞点简单事情(1)

    网络上能搜索到的爬虫文章大多是用python做的,也有少部分是C#做的(小声:所以用VB.NET也可以做爬虫.本文写的是第一步:获取网页)

    使用代码前先imports以下内容

    Imports System.IO, System.IO.Compression, System.Text, System.Net

    写程序前先开浏览器(我用的Chrome),随便上个网页,F12看下header,粘下来useragent备用,也可以粘下accept,cookie等(在本文中用不到

    用httpwebrequest建立请求,用httpwebresponse得到响应体.然后考虑下压缩的问题(imports System.IO.Compression就是解决这个的)

    最后得到真正的返回流,streamreader读取之,然后网页的http代码就搞下来了.用这种方法可以搞定编码为UTF-8的网页对于编码是GB2312或GBK的需有改动:使用streamreader时第二个参数改为Encoding.GetEncoding("gbk")

    下面是代码:

     1 Public Function GetHttpContent(url As String) As String
     2         Try
     3             Dim req As HttpWebRequest = HttpWebRequest.CreateHttp(url), resp As HttpWebResponse, sol$
     4             With req
     5                 .UserAgent = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"
     6                 .Accept = "*/*"
     7                 .Method = "GET"
     8                 .Timeout = 300000
     9                 .Headers.Add("accept-encoding", " gzip, deflate")
    10             End With
    11             resp = req.GetResponse
    12             Select Case resp.ContentEncoding.ToLower
    13                 Case "gzip"
    14                     Using z As New GZipStream(resp.GetResponseStream, CompressionMode.Decompress)
    15                         Using sr As New StreamReader(z, Encoding.UTF8)
    16                             sol = sr.ReadToEnd
    17                         End Using
    18                     End Using
    19                     Exit Select
    20                 Case "deflate"
    21                     Using z As New DeflateStream(resp.GetResponseStream, CompressionMode.Decompress)
    22                         Using sr As New StreamReader(z, Encoding.UTF8)
    23                             sol = sr.ReadToEnd
    24                         End Using
    25                     End Using
    26                     Exit Select
    27                 Case Else
    28                     Using sr As New StreamReader(resp.GetResponseStream, Encoding.UTF8)
    29                         sol = sr.ReadToEnd
    30                     End Using
    31                     Exit Select
    32             End Select
    33             Return sol
    34         Catch ex As Exception
    35             Return ""
    36         End Try
    37     End Function

    (本人水平有限,代码有不完善的地方欢迎指出

  • 相关阅读:
    一个简单的PHP登录演示(SESSION版 与 COOKIE版)
    web系统之session劫持解决
    CKFinder 1.4.3 任意文件上传漏洞
    linux服务器磁盘扩容的方法
    Linux下lvm在线扩容步骤
    Centos7使用LVM扩容磁盘(测试成功)
    CentOS7下利用init.d启动脚本实现tomcat开机自启动
    Linux tomcat安装详解(未完)
    linux下 目录(扩容)挂载磁盘
    Linux下环境变量设置
  • 原文地址:https://www.cnblogs.com/woshilxcdexuesheng/p/11414764.html
Copyright © 2011-2022 走看看