zoukankan      html  css  js  c++  java
  • 第二篇:速卖通产品采集系列 之 产品采集实战

        上一篇 第一篇:速卖通产品采集系列 之 产品采集分析,对速卖通产品采集做了分析,包含要采集产品信息,以及如何采集这些产品信息,这一篇接着来采集实战,相关技术前篇也说过了,不废话直接开项目做。

    一, 创建解决方案,编写采集代码

    1. 创建解决方案“CollectorSolution”,在其中新建“Collector” 空 ASP.NET MVC 项目,解决方案结构图如下:

    2.在“Collector” 项目中,分别新增“CollectingController” 控制器,以及和控制器相关的视图,并将原来默认路由 Home -》 Index 改成 Collecting -》 Index,截图如下:

    RouteConfig 修改成如下:

     1 using System.Web.Mvc;
     2 using System.Web.Routing;
     3 
     4 namespace Collector
     5 {
     6     public class RouteConfig
     7     {
     8         public static void RegisterRoutes(RouteCollection routes)
     9         {
    10             routes.IgnoreRoute("{resource}.axd/{*pathInfo}");
    11 
    12             routes.MapRoute(
    13                 name: "Default",
    14                 url: "{controller}/{action}/{id}",
    15                 defaults: new { controller = "Collecting", action = "Index", id = UrlParameter.Optional }
    16             );
    17         }
    18     }
    19 }

    3. 分别新增“CollectionViewModel” ,"CollectedProductViewModel","CollectedProductImageViewModel" 视图模型,和一个存放正则表达式的结构体:“ParseProductPatterns”,代码分别如下

    1.> CollectionViewModel

     1 using System.Collections.Generic;
     2 
     3 namespace Collector.Models
     4 {
     5     public class CollectionViewModel
     6     {
     7         public CollectionViewModel()
     8         {
     9             ProductViews = new List<CollectedProductViewModel>();
    10         }
    11         public string CollectionUrl { get; set; }
    12         public IEnumerable<CollectedProductViewModel> ProductViews { get; set; }
    13     }
    14 }

    2.> CollectedProductViewModel

     1 using System.Collections.Generic;
     2 
     3 namespace Collector.Models
     4 {
     5     public class CollectedProductViewModel
     6     {
     7         public CollectedProductViewModel()
     8         {
     9             ProductImages = new List<CollectedProductImageViewModel>();
    10         }
    11         public string ProductName { get; set; }
    12         public decimal ProductPrice { get; set; }
    13         public decimal ProductDiscountPrice { get; set; }
    14         public string ProductCurrency { get; set; }
    15         public string ProductColor { get; set; }
    16         public string ProductSize { get; set; }
    17         public IEnumerable<CollectedProductImageViewModel> ProductImages { get; set; }
    18     }
    19 }

    3.>CollectedProductImageViewModel

    1 namespace Collector.Models
    2 {
    3     public class CollectedProductImageViewModel
    4     {
    5         public string ImageUrl { get; set; }
    6         public int Sort { get; set; }
    7     }
    8 }

    4.>ParseProductPatterns

    namespace Collector.Models
    {
        public struct ParseProductPatterns
        {
            public static string ProductNamePattern = "(?<=<h1 class="product-name" itemprop="name">).*?(?=</h1>)";
            public static string ProductJsnPattern = @"(?<=var skuProducts=).*?(?=;s*var skuAttrIds=)";
            public static string ProductImageJsonPattern = "(?<=window.runParams.imageBigViewURL=).*?(?=;)";
            public static string ProductCurrencyPattern = "(?<=window.runParams.currencyCode=").*?(?=";)";
            public static string ProductColorPattern =
                "(?<=<a data-role="sku" data-sku-id="{0}" id="sku-1-{0}" title=").*?(?=")";
            public static string ProductSizePattern =
                "(?<=<a data-role="sku" data-sku-id="{0}" id="sku-2-{0}" href="javascript:void\(0\)"\s+><span>).*?(?=</)";
        }
    }

    基本上容易理解,我这里就不再一一讲解了。

    4. 视图布局设计很简单,如下图 

    采集地址 就是速卖通产品地址,这里不支持店铺和类型采集地址。表格就是采集产品信息展示。

    5. 控制器和视图代码如下

    1.> CollectingController

      1 using System;
      2 using System.Collections.Generic;
      3 using System.Linq;
      4 using System.Text.RegularExpressions;
      5 using System.Web.Mvc;
      6 using Collector.Models;
      7 using Newtonsoft.Json.Linq;
      8 using RestSharp;
      9 
     10 namespace Collector.Controllers
     11 {
     12     public class CollectingController : Controller
     13     {
     14         // GET: Collecting
     15         public ActionResult Index()
     16         {
     17             return View();
     18         }
     19 
     20         [HttpPost]
     21         public ActionResult Index(CollectionViewModel collectionView)
     22         {
     23             collectionView = ColllectWithParse(collectionView);
     24             return View(collectionView);
     25         }
     26 
     27         public CollectionViewModel ColllectWithParse(CollectionViewModel collectionView)
     28         {
     29             if (collectionView == null || string.IsNullOrEmpty(collectionView.CollectionUrl))
     30             {
     31                 return collectionView;
     32             }
     33             var client = new RestClient(collectionView.CollectionUrl);
     34             var request = new RestRequest(Method.GET);
     35             var response = client.Execute(request);
     36             var htmlContent = response.Content;
     37             collectionView.ProductViews = ParseProducts(htmlContent);
     38             return collectionView;
     39         }
     40 
     41         public IEnumerable<CollectedProductViewModel> ParseProducts(string productHtmlContent)
     42         {
     43             var productName = RegexMatchValue(ParseProductPatterns.ProductNamePattern, productHtmlContent);
     44             var productCuurency = RegexMatchValue(ParseProductPatterns.ProductCurrencyPattern, productHtmlContent);
     45 
     46             var productJson = RegexMatchValue(ParseProductPatterns.ProductJsnPattern, productHtmlContent);
     47 
     48             var prodctJsonArray = JArray.Parse(productJson);
     49             var products =
     50                 prodctJsonArray.Select(pja =>
     51                 {
     52                     var colorWithSizeCode = pja["skuPropIds"].ToString().Split(',');
     53                     var priceJson = pja["skuVal"];
     54                     var skuPrice = priceJson["skuPrice"];
     55                     var price = skuPrice == null ? "0" : skuPrice.ToString();
     56                     var actSkuPrice = priceJson["actSkuPrice"];
     57                     var discountPrice = actSkuPrice == null ? "0" : actSkuPrice.ToString();
     58                     return new
     59                     {
     60                         ColorCode = colorWithSizeCode.First(),
     61                         SizeCode = colorWithSizeCode.Last(),
     62                         Price = Convert.ToDecimal(price),
     63                         DiscountPrice = Convert.ToDecimal(discountPrice),
     64                     };
     65                 }).ToList();
     66 
     67             var collectedImages = ParseProducImages(productHtmlContent);
     68 
     69             var collectedProducts = products.Select(p => new CollectedProductViewModel
     70             {
     71                 ProductName = productName,
     72                 ProductPrice = p.Price,
     73                 ProductDiscountPrice = p.DiscountPrice,
     74                 ProductCurrency = productCuurency,
     75                 ProductColor = SetProductColorWithSize(ParseProductPatterns.ProductColorPattern,p.ColorCode,productHtmlContent),
     76                 ProductSize = SetProductColorWithSize(ParseProductPatterns.ProductSizePattern, p.SizeCode, productHtmlContent),
     77                 ProductImages = collectedImages
     78             }).ToList();
     79             return collectedProducts;
     80         }
     81 
     82         private IEnumerable<CollectedProductImageViewModel> ParseProducImages(string productHtmlContent)
     83         {
     84             var imagesJson = RegexMatchValue(ParseProductPatterns.ProductImageJsonPattern, productHtmlContent);
     85             var imageJsonArray = JArray.Parse(imagesJson);
     86 
     87             var images = imageJsonArray.ToObject<List<string>>();
     88             return images.Select((t, i) => new CollectedProductImageViewModel
     89             {
     90                 ImageUrl = t,
     91                 Sort = i
     92             });
     93         }
     94 
     95         private string SetProductColorWithSize(string pattern, string colorWithSizeCode,string input)
     96         {
     97             var newPattern = string.Format(pattern, colorWithSizeCode);
     98             return RegexMatchValue(newPattern, input);
     99         }
    100 
    101         private string RegexMatchValue(string pattern, string input, RegexOptions regexOptions = RegexOptions.IgnoreCase|RegexOptions.Singleline)
    102         {
    103             var regex = new Regex(pattern, regexOptions);
    104             var match = regex.Match(input);
    105             return match.Value;
    106         }
    107     }
    108 }
    View Code

    2.> Collecting->Index 

     1 @model  Collector.Models.CollectionViewModel
     2 <!DOCTYPE html>
     3 
     4 <html>
     5 <head>
     6     <meta name="viewport" content="width=device-width" />
     7     <title></title>
     8     <!-- CSS goes in the document HEAD or added to your external stylesheet -->
     9     <style type="text/css">
    10         table.gridtable {
    11             font-family: verdana,arial,sans-serif;
    12             font-size: 11px;
    13             color: #333333;
    14             border-width: 1px;
    15             border-color: #666666;
    16             border-collapse: collapse;
    17         }
    18 
    19             table.gridtable th {
    20                 border-width: 1px;
    21                 padding: 8px;
    22                 border-style: solid;
    23                 border-color: #666666;
    24                 background-color: #dedede;
    25             }
    26 
    27             table.gridtable td {
    28                 border-width: 1px;
    29                 padding: 8px;
    30                 border-style: solid;
    31                 border-color: #666666;
    32                 background-color: #ffffff;
    33             }
    34     </style>
    35 </head>
    36 <body>
    37     <div>
    38         @using (Html.BeginForm("Index", "Collecting", FormMethod.Post))
    39         {
    40             <table>
    41                 <tr>
    42                     <td>采集地址:</td>
    43                     <td>
    44                         @Html.TextAreaFor(m => m.CollectionUrl, 4, 0, new { style = "1500px;" })
    45                     </td>
    46                     
    47                 </tr>
    48                 <tr><td colspan="2" style="text-align: right;"><input type="submit" value="开始采集" /></td></tr>
    49             </table>
    50         }
    51     </div>
    52     <div>
    53         <table class="gridtable">
    54             <thead>
    55                 <tr>
    56                     <th width="5%">编号</th>
    57                     <th width="5%">图片</th>
    58                     <th width="30%">产品名称</th>
    59 
    60                     <th width="10%">产品单价</th>
    61                     <th width="10%">产品参考单价</th>
    62                     <th width="10%">产品币别</th>
    63                     <th width="10%">产品颜色</th>
    64                     <th width="10%">产品大小</th>
    65                 </tr>
    66             </thead>
    67             <tbody>
    68                 @{
    69                     var i = 0;
    70                     if (Model == null || Model.ProductViews == null)
    71                     {
    72                         return;
    73                     }
    74                 }
    75                 @foreach (var collectedProduct in Model.ProductViews)
    76                 {
    77                     <tr>
    78                         <td align="center">@{i++;}@i</td>
    79                         <td><img src="@collectedProduct.ProductImages.FirstOrDefault().ImageUrl" width="60" height="60" /></td>
    80                         <td>@collectedProduct.ProductName</td>
    81                         <td>@collectedProduct.ProductDiscountPrice</td>
    82                         <td>@collectedProduct.ProductPrice</td>
    83                         <td>@collectedProduct.ProductCurrency</td>
    84                         <td>@collectedProduct.ProductColor</td>
    85                         <td>@collectedProduct.ProductSize</td>
    86                     </tr>
    87                 }
    88 
    89             </tbody>
    90 
    91         </table>
    92     </div>
    93 </body>
    94 </html>
    View Code

    这里要说明的是,本篇只是采集的冰山一角的例子,所有没有搞得很复杂,没有严格封装,不管是前端,还是后端,希望大家了解,还有本人不喜好在代码中加注释,在我看来代码就是注释。

    二, 测试结果,将MVC项目,部署到IIS,端口号1005,走起看效果。

    1. 测试上一篇速卖通产品地址:

    http://www.aliexpress.com/store/product/Yoga-Tops-Women-Women-Yoga-Shirts-Womens-Sportswear-Gym-Woman-Running-Shirt-Camisetas-Deporte-Mujer-Gym/1025110_32620359354.html?spm=a2g01.8032156.template-section-container.27.wcM8ES&sdom=3514.555719.493653.0_32620359354

    效果截图如下:

    刚刚采集发现上一篇写的这个产品地址,速卖通不打折,因此没有了折扣价格。

    2.再采集一个地址:

    http://www.aliexpress.com/store/product/LEVEL-4-shock-Professional-running-intensive-training-without-rims-snow-sports-bra-open-front-zipper-style/1025110_32357688343.html?spm=2114.12010108.1000013.1.uvJqBj

    截图如下

    这个产品的产品变体有很多,所有一网页还显示不了。

    源码码:https://github.com/haibozhou1011/Collector

    总结:

    好了,速卖通产品采集系列,就全部结束了,总的来说,采集这个活技术都是大家经常用的,主要是前期分析,抓产品信息规则,每个网站多有规律,大家留心观察就会找到一些蛛丝马迹,就会有所突破。希望大家如果有更好的采集方法,一定要和大家分享。

     

  • 相关阅读:
    CentOS7怎样安装Nginx1.12.2
    CentOS7怎样安装MySQL5.7.22
    CentOS7怎样安装Java8
    CentOS安装JMeter
    CentOS安装nmon
    Unsupported major.minor version 51.0
    ssh问题_2
    数据库索引
    Name node is in safe mode.
    hadoop节点之间通信问题
  • 原文地址:https://www.cnblogs.com/davidzhou/p/5479958.html
Copyright © 2011-2022 走看看