zoukankan      html  css  js  c++  java
  • 采集淘宝或天猫商品的店铺名称/商家旺旺/商品首图/商品标题

    一 用的QueryList库

    二 安装方法

    确认已经安装了composer,因为速度会很慢,可以切换到中国镜像:

    composer config -g repo.packagist composer https://packagist.phpcomposer.com

    安装QueryList:

    composer require jaeger/querylist

    QueryList文档地址,可以了解下:

    http://www.querylist.cc/#one

    三 需求如下

    通过淘宝或天猫的商品链接,采集该商品链接对应的商品标题、商品首图、店铺名称、商家旺旺名称

    四 目前的采集数据Demo可以适用于所有天猫商品+店铺名称在右边或上边的

    五 代码如下

    <?php
    
    include "vendor/autoload.php";
    
    use QLQueryList;
    
    function uni_decode($s) {  //针对部分淘宝宝贝链接的店铺名被加密进行解密处理
      preg_match_all('/&#([0-9]{2,5});/', $s, $html_uni);  
      preg_match_all('/[\\%]u([0-9a-f]{4})/ie', $s, $js_uni);  
      $source = array_merge($html_uni[0], $js_uni[0]);  
      $js = array();  
      for($i=0;$i<count($js_uni[1]);$i++) {  
        $js[] = hexdec($js_uni[1][$i]);  
      }  
      $utf8 = array_merge($html_uni[1], $js);  
      $code = $s;  
      for($j=0;$j<count($utf8);$j++) {  
        $code = str_replace($source[$j], unicode2utf8($utf8[$j]), $code);  
      }  
      return $code;
    }  
       
    function unicode2utf8($c) {  
      $str="";  
      if ($c < 0x80) {  
         $str.=chr($c);  
      } else if ($c < 0x800) {  
         $str.=chr(0xc0 | $c>>6);  
         $str.=chr(0x80 | $c & 0x3f);  
      } else if ($c < 0x10000) {  
         $str.=chr(0xe0 | $c>>12);  
         $str.=chr(0x80 | $c>>6 & 0x3f);  
         $str.=chr(0x80 | $c & 0x3f);  
      } else if ($c < 0x200000) {  
         $str.=chr(0xf0 | $c>>18);  
         $str.=chr(0x80 | $c>>12 & 0x3f);  
         $str.=chr(0x80 | $c>>6 & 0x3f);  
         $str.=chr(0x80 | $c & 0x3f);  
      }  
      return $str;  
    } 
    
    function get_between($input, $start, $end) {//截取指定两个字符之间的内容
    
    	return substr($input, strlen($start)+strpos($input, $start),(strlen($input) - strpos($input, $end))*(-1));
    }
    
    function trimall($str)//删除空格
    {
        $qian=array(" "," ","	","
    ","
    ");
        $hou=array("","","","","");
        return str_replace($qian,$hou,$str); 
    }
    
    $url = 'https://item.taobao.com/item.htm?spm=a230r.1.14.34.47cd6ace3iAnm0&id=564043247193&ns=1&abbucket=19#detail';
    
    $ql = QueryList::get($url)->encoding('UTF-8','GBK');//防止数据乱码
    
    //针对1天猫宝贝链接 2淘宝店铺名在右边 3淘宝店铺名在上面 采取不同的采集方式
    if (substr($url, 0, 24) == 'https://detail.tmall.com') {
    
    	$rt = [
    		'img' 		=> $ql->find('#J_ImgBooth')->attr('src'),
    		'title' 	=> $ql->find(':input[name="title"]')->attr('value'),
    		'shop_name' => $ql->find('.slogo-shopname')->text()
    	];
    
    	$rt['seller_name'] = $rt['shop_name'];
    
    } else {
    
    	$rt = [
    		'img' 		  => $ql->find('#J_ImgBooth')->attr('src'),
    		'title' 	  => $ql->find('.tb-main-title')->text(),
    		'shop_name'   => $ql->find('.tb-shop-name>dl>dd>strong>a')->text(),
    		'seller_name' => $ql->find('.tb-seller-name')->text()
    	];
    
    	if (!$rt['shop_name']) {
    
    		$config = substr(trimall($ql->find('script')->eq(0)->text()), 100, 150);
    
    		$shop_name = get_between($config, "shopName:'", "',sellerId");
    
    		$rt['shop_name'] = uni_decode($shop_name);
    
    		$rt['seller_name'] = get_between($config, "sellerNick:'", "',sibUrl");
    	}
    }
    	
    var_dump($rt['shop_name']);
    
    echo '<hr />';
    ?>
    
    
    <!DOCTYPE html>
    <html lang="en">
    <head>
    	<meta charset="UTF-8">
    	<title>爬取淘宝商品数据Demo</title>
    </head>
    <body>
    	<h4>标题:<?php echo $rt['title']; ?></h4>
    	<h4>店铺:<?php echo $rt['shop_name']; ?></h4>
    	<h4>旺旺:<?php echo $rt['seller_name']; ?></h4>
    	<h4>图片:</h4>
    	<img src="<?php echo $rt['img'] ?>" alt="">
    </body>
    </html>
    

    六 效果展示

    1 天猫商品链接

    https://detail.tmall.com/item.htm?spm=a230r.1.14.9.47cd6ace3iAnm0&id=591124740347&cm_id=140105335569ed55e27b&abbucket=19&skuId=4060807516519

    采集效果:

     2 店铺名称在右边的淘宝商品链接

    https://item.taobao.com/item.htm?spm=a230r.1.14.34.47cd6ace3iAnm0&id=564043247193&ns=1&abbucket=19#detail

     采集效果:

     3 店铺名称在上方的商品链接(这个稍微有些麻烦,因为这种类型的商家旺旺和店铺名都是在js中的,而且店铺名称还是加过密的)

    https://item.taobao.com/item.htm?spm=a230r.1.14.34.47cd6ace3iAnm0&id=564043247193&ns=1&abbucket=19#detail

     采集效果:

     7 最近项目中刚好有这个需求,所以写的这个Demo,如果需要采集其它的数据,可以参考QueryList手册,根据实际产品业务需求进行更改

  • 相关阅读:
    Jump Game II
    Trapping Rain Water
    First Missing Positive
    Median of Two Sorted Arrays
    noip2012开车旅行 题解
    AC自动机专题总结
    初探数位DP
    斯坦纳树 [bzoj2595][wc2008]游览计划 题解
    [bzoj3244][noi2013]树的计数 题解
    网络流模型小结
  • 原文地址:https://www.cnblogs.com/qczy/p/11551466.html
Copyright © 2011-2022 走看看