zoukankan      html  css  js  c++  java
  • PHP采集淘宝商品

    项目需求:

      1.通过PHP程序更新所采集淘宝商品的价格以及是否停售

    数据表:

      

    CREATE TABLE `goods` (
    `id`  int(11) NOT NULL AUTO_INCREMENT ,
    `type`  varchar(30) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT '' ,
    `keyid`  varchar(60) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT '' ,
    `shop_id`  int(11) NULL DEFAULT 0 ,
    `cid`  smallint(6) NULL DEFAULT 0 ,
    `img_id`  int(11) NULL DEFAULT 0 ,
    `imgs`  varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL ,
    `name`  varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT '' ,
    `url`  varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT '' ,
    `taoke_url`  varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT '' ,
    `price`  decimal(10,2) NULL DEFAULT 0.00 ,
    `sellerid`  int(11) UNSIGNED NULL DEFAULT NULL ,
    `is_off_sale`  tinyint(1) UNSIGNED NULL DEFAULT 0 ,
    `delist_time`  int(11) NULL DEFAULT 0 ,
    `create_time`  int(11) NULL DEFAULT 0 ,
    `ctime`  int(11) NULL DEFAULT NULL ,
    `cache_data`  text CHARACTER SET utf8 COLLATE utf8_general_ci NULL ,
    `create_day`  int(11) NULL DEFAULT 0 ,
    `commission`  decimal(10,2) NULL DEFAULT 0.00 ,
    `comment_collect_time`  int(11) NULL DEFAULT 0 ,
    `color`  smallint(6) NULL DEFAULT 0 ,
    `content`  text CHARACTER SET utf8 COLLATE utf8_general_ci NULL ,
    `taobao_desc_url`  varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL DEFAULT '''' COMMENT '淘宝商品详细介绍地址' ,
    PRIMARY KEY (`id`),
    UNIQUE INDEX `keyid` (`keyid`) USING BTREE ,
    INDEX `shop_id` (`shop_id`) USING BTREE ,
    INDEX `delist_time` (`delist_time`) USING BTREE ,
    INDEX `cid` (`cid`) USING BTREE ,
    INDEX `create_day` (`create_day`) USING BTREE 
    )
    ENGINE=InnoDB
    DEFAULT CHARACTER SET=utf8 COLLATE=utf8_general_ci
    AUTO_INCREMENT=40614
    ROW_FORMAT=COMPACT
    ;

    PHP文件:

    <?php
    $con=mysql_connect('localhost','root',''); 
    mysql_select_db("fanwe", $con);
    $start_time=microtime(1);
    $sql="SELECT id,price,url FROM s_goods WHERE url!='' AND keyid like 'taobao%' AND is_off_sale=0 ORDER BY id LIMIT 10";//更新前10个
    $rs=mysql_query($sql);
    echo 'COUNT:'.mysql_num_rows($rs)."
    ";
    $error=array();
    $i=0;
    $h=fopen('d:/mydomain/updateprice.log','a+');
    while ($r=mysql_fetch_array($rs)){
        $s=microtime(1);
        echo $i++.':'.$r['id']."	";
        $price=getPrice($r['url']);
        $t=microtime(1)-$s;
        echo (ceil($t*1000)/1000)."	";
        if($price===false) echo 'FALSE';
        elseif(bccomp($price,$r['price'])==0) echo 'EQUAL';
        else{
            echo "UPDATE	".$r['price']."	".$price;
            mysql_query("UPDATE s_goods SET price=".$price." WHERE id=".$r['id']);
            fputcsv($h,array(date('Y-m-d H:i:s'),$r['id'],$r['price'],$price));
        }
        echo "
    ";
    }
    fclose($h);
    echo 'COUNT:'.mysql_num_rows($rs)."	TIME:".ceil(microtime(1)-$start_time);
    function getPrice($url,$time=1){
        $des_url='';
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows NT 6.2; WOW64; rv:26.0) Gecko/20100101 Firefox/26.0');
        curl_setopt($ch, CURLOPT_REFERER,'http://www.tmall.com/');
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION,1);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);//设置输出方式, 0为自动输出返回的内容, 1为返回输出的内容,但不自动输出.
        curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30); //timeout on connect
        curl_setopt($ch, CURLOPT_TIMEOUT, 30); //timeout on response
        curl_setopt($ch, CURLOPT_HEADER, 1);//是否输出头信息,0为不输出,非零则输出
        curl_setopt($ch, CURLOPT_MAXREDIRS, 10 );
        curl_setopt($ch, CURLOPT_URL, $url);
        $content = curl_exec($ch);
        curl_close($ch);
        if(preg_match("/'reservePrice's*:s*'([d.]+?)',/",$content,$price)){
            $price = (float)$price[1];
        }elseif(preg_match('/price:([d.]+?),/',$content,$price)){
            $price = (float)$price[1];
        }
        if(!$price&&preg_match('/tmall/',$url)){//天猫促销价 add LiuYang 2014-02-24 15:10
            preg_match('/id=(d+)+/',$url,$temp);
            $url2="http://mdskip.taobao.com/core/initItemDetail.htm?itemId=".$temp[1];
            $ch = curl_init();
            curl_setopt( $ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1" );
            curl_setopt( $ch, CURLOPT_URL, $url2 );
            curl_setopt( $ch, CURLOPT_FOLLOWLOCATION, true );
            curl_setopt( $ch, CURLOPT_ENCODING, "" );
            curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true );
            curl_setopt( $ch, CURLOPT_REFERER, 'http://www.tmall.com' );
            curl_setopt( $ch, CURLOPT_AUTOREFERER, true );
            curl_setopt( $ch, CURLOPT_SSL_VERIFYPEER, false );   
            curl_setopt( $ch, CURLOPT_CONNECTTIMEOUT, 10 );
            curl_setopt( $ch, CURLOPT_TIMEOUT, 10 );
            curl_setopt( $ch, CURLOPT_MAXREDIRS, 10 );
            $price_content = curl_exec( $ch );
            $response = curl_getinfo( $ch );
            curl_close ( $ch );
            $price_content=json_decode(iconv('gbk','utf-8',preg_replace('/(d{10,}):/','"${1}":',$price_content)),true);
            $priceinfo=$price_content['defaultModel']['itemPriceResultDO']['priceInfo'];
            $price=array();
            foreach ($priceinfo as $v){
                $price[]=$v['price'];
                if(is_array($v['promotionList'])){
                    foreach ($v['promotionList'] as $v2){
                        $price[]=$v2['extraPromPrice']?$v2['extraPromPrice']:$v2['price'];
                    }
                }
                if(is_array($v['suggestivePromotionList'])){
                    foreach ($v['suggestivePromotionList'] as $v2){
                        $price[]=$v2['extraPromPrice']?$v2['extraPromPrice']:$v2['price'];
                    }
                }
            }
            $price=count($price)>0?min($price):false;
        }
        if($price) return $price;
        elseif($time==1) return getPrice($url,2);
        else return false;
    }
    ?>

    执行方式如果采用apache或nginx等服务器,会因为各个服务器的最大响应时间而受影响.如果只更新10个那可能会完成,如果是上百个肯定是不能完全更新的.

    可以采用PHP命令的等式来执行.

    D:mydomain>php updateprice.php
    COUNT:10
    0:36599 0.232   EQUAL
    1:36600 1.091   EQUAL
    2:36601 1.039   EQUAL
    3:36603 1.08    EQUAL
    4:36604 0.984   EQUAL
    5:36605 0.972   EQUAL
    6:36610 1.019   EQUAL
    7:36611 0.971   EQUAL
    8:36612 1.048   EQUAL
    9:36613 1.149   EQUAL
    COUNT:10        TIME:10

    如此既清晰又明了,而且会一直执行到程序完成,不会担心服务器没有响应.

    上面是在windows下面执行,如果输入php提示:

    'php' 不是内部或外部命令,也不是可运行的程序
    或批处理文件。

    就必须把php.exe所在的地址补全

    D:mydomain>D:AppServphp5php updatePrice.php

    或者把php.exe所在的地址加入全局变量

    此方法在Linux下同样有用只用修改对应的地址即可,在linux中php命令是可以直接用的.

    [root@liu ~]# php updatePrice.php

    此方法有一个缺点,就是执行效率问题.一个商品采集平均需要0.8秒.那10000个商品采集完需要2个半小时.

    如想继续了解,请看下篇.

  • 相关阅读:
    Leetcode Binary Tree Preorder Traversal
    Leetcode Minimum Depth of Binary Tree
    Leetcode 148. Sort List
    Leetcode 61. Rotate List
    Leetcode 86. Partition List
    Leetcode 21. Merge Two Sorted Lists
    Leetcode 143. Reorder List
    J2EE项目应用开发过程中的易错点
    JNDI初认识
    奔腾的代码
  • 原文地址:https://www.cnblogs.com/lywy510/p/3613528.html
Copyright © 2011-2022 走看看