zoukankan      html  css  js  c++  java
  • PHP用curl采集天猫详细页

    代码如下

    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, 'http://s.click.taobao.com/t?e=m%3D2%26s%3DItfnIoWePBscQipKwQzePOeEDrYVVa64LKpWJ%2Bin0XJRAdhuF14FMco7venWsqMa5x%2BIUlGKNpXfihkA92r7Zcnjyd38oaEmvvt5KfsX9OP1aKQAlGlgSeMqBIqCftrB');
    curl_setopt($ch, CURLOPT_HEADER, 1);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION,1);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    $res = curl_exec($ch);
    curl_close($ch);
    preg_match('/^Location: (?P<location>.*?)$/m', $res,$match);
    var_dump($match);

    CURLOPT_HEADER 会返回头信息

    CURLOPT_FOLLOWLOCATION 会一直根据跳转抓取新页面;

    CURLOPT_RETURNRANSFER 这个用来定义是输出到页面还是赋值给变量

    我curl采集天猫详情页:http://detail.tmall.com/item.htm?id=15670523848 这个网址,经常还跳转了9次,不可思议;

    HTTP/1.1 302 Found
    Server: Tengine
    Date: Fri, 29 Nov 2013 04:16:49 GMT
    Content-Type: text/html
    Content-Length: 260
    Connection: keep-alive
    at_bucketid: sbucket_-1
    X-Bucket-Id: -1
    Location: http://jump.taobao.com/jump?target=http%3a%2f%2fdetail.tmall.com%2fitem.htm%3fid%3d15670523848%26tbpm%3d1
    Cache-Control: 
    
    HTTP/1.1 302 Found
    Date: Fri, 29 Nov 2013 04:16:49 GMT
    Content-Type: text/html
    Content-Length: 260
    Connection: close
    Set-Cookie: _tb_token_=P0MPHCP5TfrL;domain=.taobao.com;Path=/;HttpOnly
    Set-Cookie: cookie2=ed9f56e828c5d24c4a7d656bc23631db;domain=.taobao.com;Path=/;HttpOnly
    Set-Cookie: t=5f0679cd08a7143456da77263771b8a9;domain=.taobao.com;Expires=Thu, 27-Feb-2014 04:16:49 GMT;Path=/
    P3P: CP='CURa ADMa DEVa PSAo PSDo OUR BUS UNI PUR INT DEM STA PRE COM NAV OTC NOI DSP COR'
    Location: http://pass.tmall.com/add?_tb_token_=P0MPHCP5TfrL&cookie2=ed9f56e828c5d24c4a7d656bc23631db&t=5f0679cd08a7143456da77263771b8a9&target=http%3a%2f%2fdetail.tmall.com%2fitem.htm%3fid%3d15670523848%26tbpm%3d1&pacc=emBUzSxUwN3zo8xNNtW6PQ==&opi=27.189.36.232&tmsc=1385698609459713
    
    HTTP/1.1 302 Found
    Date: Fri, 29 Nov 2013 04:16:49 GMT
    Content-Type: text/html
    Content-Length: 260
    Connection: close
    P3P: CP='CURa ADMa DEVa PSAo PSDo OUR BUS UNI PUR INT DEM STA PRE COM NAV OTC NOI DSP COR'
    Set-Cookie: _tb_token_=P0MPHCP5TfrL;domain=.tmall.com;Path=/
    Set-Cookie: cookie2=ed9f56e828c5d24c4a7d656bc23631db;domain=.tmall.com;Path=/
    Set-Cookie: t=5f0679cd08a7143456da77263771b8a9;domain=.tmall.com;Path=/
    Location: http://detail.tmall.com/item.htm?id=15670523848&tbpm=1
    
    HTTP/1.1 302 Found
    Server: Tengine
    Date: Fri, 29 Nov 2013 04:16:49 GMT
    Content-Type: text/html
    Content-Length: 260
    Connection: keep-alive
    at_bucketid: sbucket_-1
    X-Bucket-Id: -1
    Location: http://jump.taobao.com/jump?target=http%3a%2f%2fdetail.tmall.com%2fitem.htm%3fid%3d15670523848%26tbpm%3d2
    Cache-Control: 
    
    HTTP/1.1 302 Found
    Date: Fri, 29 Nov 2013 04:16:49 GMT
    Content-Type: text/html
    Content-Length: 260
    Connection: close
    Set-Cookie: _tb_token_=dmG3YeXiTDZl;domain=.taobao.com;Path=/;HttpOnly
    Set-Cookie: cookie2=d4c7645aa3ac3e1e7e160e6f172cf82c;domain=.taobao.com;Path=/;HttpOnly
    Set-Cookie: t=6d359ffa66dfa0c2cb8cd32d455718de;domain=.taobao.com;Expires=Thu, 27-Feb-2014 04:16:49 GMT;Path=/
    P3P: CP='CURa ADMa DEVa PSAo PSDo OUR BUS UNI PUR INT DEM STA PRE COM NAV OTC NOI DSP COR'
    Location: http://pass.tmall.com/add?_tb_token_=dmG3YeXiTDZl&cookie2=d4c7645aa3ac3e1e7e160e6f172cf82c&t=6d359ffa66dfa0c2cb8cd32d455718de&target=http%3a%2f%2fdetail.tmall.com%2fitem.htm%3fid%3d15670523848%26tbpm%3d2&pacc=ZBXxk-2FEQEv_91OVA1_mg==&opi=27.189.36.232&tmsc=1385698609736322
    
    HTTP/1.1 302 Found
    Date: Fri, 29 Nov 2013 04:16:49 GMT
    Content-Type: text/html
    Content-Length: 260
    Connection: close
    P3P: CP='CURa ADMa DEVa PSAo PSDo OUR BUS UNI PUR INT DEM STA PRE COM NAV OTC NOI DSP COR'
    Set-Cookie: _tb_token_=dmG3YeXiTDZl;domain=.tmall.com;Path=/
    Set-Cookie: cookie2=d4c7645aa3ac3e1e7e160e6f172cf82c;domain=.tmall.com;Path=/
    Set-Cookie: t=6d359ffa66dfa0c2cb8cd32d455718de;domain=.tmall.com;Path=/
    Location: http://detail.tmall.com/item.htm?id=15670523848&tbpm=2
    
    HTTP/1.1 302 Found
    Server: Tengine
    Date: Fri, 29 Nov 2013 04:16:49 GMT
    Content-Type: text/html
    Content-Length: 260
    Connection: keep-alive
    at_bucketid: sbucket_-1
    X-Bucket-Id: -1
    Location: http://jump.taobao.com/jump?target=http%3a%2f%2fdetail.tmall.com%2fitem.htm%3fid%3d15670523848%26tbpm%3d3
    Cache-Control: 
    
    HTTP/1.1 302 Found
    Date: Fri, 29 Nov 2013 04:16:49 GMT
    Content-Type: text/html
    Content-Length: 260
    Connection: close
    Set-Cookie: _tb_token_=kCsYFWOg1JIp;domain=.taobao.com;Path=/;HttpOnly
    Set-Cookie: cookie2=b5f8f3695f410dde27a42d6d251933ff;domain=.taobao.com;Path=/;HttpOnly
    Set-Cookie: t=2290940aff64002f214e1a12480b41dd;domain=.taobao.com;Expires=Thu, 27-Feb-2014 04:16:49 GMT;Path=/
    P3P: CP='CURa ADMa DEVa PSAo PSDo OUR BUS UNI PUR INT DEM STA PRE COM NAV OTC NOI DSP COR'
    Location: http://pass.tmall.com/add?_tb_token_=kCsYFWOg1JIp&cookie2=b5f8f3695f410dde27a42d6d251933ff&t=2290940aff64002f214e1a12480b41dd&target=http%3a%2f%2fdetail.tmall.com%2fitem.htm%3fid%3d15670523848%26tbpm%3d3&pacc=eOoBCHB4T2s-kmCmd9qMcg==&opi=27.189.36.232&tmsc=1385698609985332
    
    HTTP/1.1 302 Found
    Date: Fri, 29 Nov 2013 04:16:50 GMT
    Content-Type: text/html
    Content-Length: 260
    Connection: close
    P3P: CP='CURa ADMa DEVa PSAo PSDo OUR BUS UNI PUR INT DEM STA PRE COM NAV OTC NOI DSP COR'
    Set-Cookie: _tb_token_=kCsYFWOg1JIp;domain=.tmall.com;Path=/
    Set-Cookie: cookie2=b5f8f3695f410dde27a42d6d251933ff;domain=.tmall.com;Path=/
    Set-Cookie: t=2290940aff64002f214e1a12480b41dd;domain=.tmall.com;Path=/
    Location: http://detail.tmall.com/item.htm?id=15670523848&tbpm=3
    
    HTTP/1.1 200 OK
    Server: Tengine
    Date: Fri, 29 Nov 2013 04:16:50 GMT
    Content-Type: text/html;charset=GBK
    Transfer-Encoding: chunked
    Connection: keep-alive
    Vary: Accept-Encoding
    at_bucketid: sbucket_-1
    X-Bucket-Id: -1
    Cache-Control: max-age=1
    At_Autype: 4_63237005
    At_Cat: item_50013957
    X-Category: /cat/50011397
    At_Nick: %E7%A5%A5%E7%A5%AF%E7%A6%8F%E7%8F%A0%E5%AE%9D%E6%97%97%E8%88%B0%E5%BA%97
    At_Itemid: 15670523848
    At_Isb: 1
    At_Pgty: 2
    At_Cat: 50013957
    At_Brid: 86306948
    At_Prid: 222598572
    At_Autype: 0_63237005
    At_Auid: 15670523848
    Content-Language: zh-CN
    X-Cache: HIT TCP_MEM_HIT dirn:-2:-2
    Via: wagbridge010238184026.cm4:8888
    Age: 1330

    curl确实挺强大的

  • 相关阅读:
    五步搞定Android开发环境部署
    Android 内存监测工具 DDMS --> Heap
    Android自动化测试之MonkeyRunner
    monkeyrunner功能函数
    python基础语法(4)
    python基础语法(3)
    python基础语法(2)
    执行插件超过2分钟超时错误,如何办?
    Dynamics 365出现数据加密错误怎么办?
    Dynamics 365执行操作报SQL Server已超时,更改这个超时设置的方法
  • 原文地址:https://www.cnblogs.com/wangtongphp/p/3449585.html
Copyright © 2011-2022 走看看