从网上抓取数据到EXCEL中是VBA的一个常用之处,今天分享下VBA网抓的一些套路,主要有以下几种:
第一种:msxml2.xmlhttp/Microsoft.XMLHTTP/WinHttp.WinHttpRequest.5.1
With CreateObject("msxml2.xmlhttp") 'With CreateObject("Microsoft.XMLHTTP") 'With CreateObject("WinHttp.WinHttpRequest.5.1") .Open "GET", "http://xinpi.cs.com.cn/new/", False .Send Do Until .ReadyState = 4 DoEvents Loop s = .ResponseText End With
第二种:internetexplorer.application
With CreateObject("internetexplorer.application") .Visible = False .Navigate "http://xinpi.cs.com.cn/new/" Do Until .ReadyState = 4 DoEvents Loop s = CStr(.Document.All.tags("html")(0).outerHTML) End With
第三种:webbrowser控件
Sub d1() Dim dmt As HTMLDocument Dim htMent As HTMLCommentElement 'As IHTMLElementCollection Dim i As Integer Set dmt = UserForm1.WebBrowser1.Document Application.ScreenUpdating = False Dim Quote As IHTMLElement On Error Resume Next For i = 0 To dmt.all.Length - 1 Set htMent = dmt.all(i) Vcels1 htMent, i Next i End Sub Sub Vcels1(htMent As HTMLCommentElement, i) On Error Resume Next With Sheets(1) .Cells(i + 2, "A") = htMent.tagName .Cells(i + 2, "B") = TypeName(htMent) .Cells(i + 2, "C") = htMent.ID .Cells(i + 2, "D") = htMent.Name .Cells(i + 2, "E") = htMent.Value .Cells(i + 2, "F") = htMent.Text .Cells(i + 2, "G") = htMent.innerText .Cells(i + 2, "H") = htMent.outerText .Cells(i + 2, "I") = htMent.nameProp .Cells(i + 2, "J") = htMent.tagUrn .Cells(i + 2, "K") = htMent.sourceIndex .Cells(i + 2, "L") = htMent.href .Columns("A:L").WARPTEXT = False End With End Sub
要先在窗体中插入webbrowser控件,还要引用Microsoft HTML Object Library,窗体部分的代码如下:
Private Sub CommandButton1_Click() WebBrowser1.Navigate TextBox1.Value End Sub Private Sub CommandButton2_Click() Sheets(1).[a2:zz1000000].ClearContents Call d1 Sheet1.Columns("E:H").WrapText = False End Sub Private Sub UserForm_Initialize() TextBox1.Value = "www.baidu.com" Sheets(1).[a2:zz1000000].ClearContents End Sub
第四种:QueryTable
With ActiveSheet.QueryTables.Add("url;http://data.bank.hexun.com/lccp/jrxp.aspx", Range("a1")) .WebFormatting = xlWebFormattingNone '不包含格式' .WebSelectionType = xlSpecifiedTables '指定table模式' .WebTables = "2" '第2张table' .Refresh False End With
拿到了网页源代码后,可以利用VBA自带的Split和Replace函数或者正则表达式将需要的部分截取出来,写进数组,或者通过javascript(CreateObject("scriptcontrol"))进行处理,这就根据各种情况进行自定义开发了。这里提供一个用ie对象网抓的稍微复杂的例子:AFP