zoukankan      html  css  js  c++  java
  • VBA学习资料分享-6

    从网上抓取数据到EXCEL中是VBA的一个常用之处,今天分享下VBA网抓的一些套路,主要有以下几种:

    第一种:msxml2.xmlhttp/Microsoft.XMLHTTP/WinHttp.WinHttpRequest.5.1

    With CreateObject("msxml2.xmlhttp")
    'With CreateObject("Microsoft.XMLHTTP")
    'With CreateObject("WinHttp.WinHttpRequest.5.1")
    .Open "GET", "http://xinpi.cs.com.cn/new/", False
    .Send
    Do Until .ReadyState = 4
    DoEvents
    Loop
    s = .ResponseText
    End With

    第二种:internetexplorer.application

    With CreateObject("internetexplorer.application")
    .Visible = False
    .Navigate "http://xinpi.cs.com.cn/new/"
    Do Until .ReadyState = 4
    DoEvents
    Loop
    s = CStr(.Document.All.tags("html")(0).outerHTML)
    End With

    第三种:webbrowser控件

    Sub d1()
        Dim dmt As HTMLDocument
        Dim htMent As HTMLCommentElement    'As IHTMLElementCollection
        Dim i As Integer
        Set dmt = UserForm1.WebBrowser1.Document
        Application.ScreenUpdating = False
        Dim Quote As IHTMLElement
        On Error Resume Next
        For i = 0 To dmt.all.Length - 1
            Set htMent = dmt.all(i)
            Vcels1 htMent, i
        Next i
    End Sub
    Sub Vcels1(htMent As HTMLCommentElement, i)
        On Error Resume Next
        With Sheets(1)
            .Cells(i + 2, "A") = htMent.tagName
            .Cells(i + 2, "B") = TypeName(htMent)
            .Cells(i + 2, "C") = htMent.ID
            .Cells(i + 2, "D") = htMent.Name
            .Cells(i + 2, "E") = htMent.Value
            .Cells(i + 2, "F") = htMent.Text
            .Cells(i + 2, "G") = htMent.innerText
            .Cells(i + 2, "H") = htMent.outerText
            .Cells(i + 2, "I") = htMent.nameProp
            .Cells(i + 2, "J") = htMent.tagUrn
            .Cells(i + 2, "K") = htMent.sourceIndex
            .Cells(i + 2, "L") = htMent.href
            .Columns("A:L").WARPTEXT = False
        End With
    End Sub

    要先在窗体中插入webbrowser控件,还要引用Microsoft HTML Object Library,窗体部分的代码如下:

    Private Sub CommandButton1_Click()
    WebBrowser1.Navigate TextBox1.Value
    End Sub
    
    Private Sub CommandButton2_Click()
    Sheets(1).[a2:zz1000000].ClearContents
    Call d1
    Sheet1.Columns("E:H").WrapText = False  
    End Sub
    
    Private Sub UserForm_Initialize()
    TextBox1.Value = "www.baidu.com"
    Sheets(1).[a2:zz1000000].ClearContents
    End Sub

    第四种:QueryTable

    With ActiveSheet.QueryTables.Add("url;http://data.bank.hexun.com/lccp/jrxp.aspx", Range("a1"))
    .WebFormatting = xlWebFormattingNone '不包含格式'
    .WebSelectionType = xlSpecifiedTables '指定table模式'
    .WebTables = "2" '第2张table'
    .Refresh False
    End With

    拿到了网页源代码后,可以利用VBA自带的Split和Replace函数或者正则表达式将需要的部分截取出来,写进数组,或者通过javascript(CreateObject("scriptcontrol"))进行处理,这就根据各种情况进行自定义开发了。这里提供一个用ie对象网抓的稍微复杂的例子:AFP

  • 相关阅读:
    TCP协议的三次握手、四次挥手
    .NET Framework 3.5 安装
    grep命令总结
    线性回归
    K-Mean聚类算法
    Logistic回归
    朴素贝叶斯
    Decision Tree
    KNN
    GCC for Win32开发环境介绍
  • 原文地址:https://www.cnblogs.com/JTCLASSROOM/p/10893898.html
Copyright © 2011-2022 走看看