zoukankan      html  css  js  c++  java
  • perl HTML::LinkExtor模块(2)

     1 use LWP::Simple;
     2 use HTML::LinkExtor;
     3 
     4 $html_code = get("https://tieba.baidu.com/p/4929234512");
     5 $img_link = HTML::LinkExtor->new(&IMG);
     6 $img_link->parse($html_code);
     7 
     8 #爬图片链接
     9 sub IMG{
    10     ($tag, %links) = @_;
    11     if($tag eq 'img'){
    12     #如里是图片标签
    13         foreach $key(keys %links){
    14             print "$key -> $links{$key}
    "
    15         }
    16     }
    17 }
    18 
    19 
    20 
    21 # src -> https://gss0.bdstatic.com/6LZ1dD3d1sgCo2Kml5_Y_D3/sys/portrait/item/343a66656e6768756f7069616e323031af7c
    22 # src -> //tb2.bdstatic.com/tb/static-pb/img/head_80.jpg
    23 # src -> //tb2.bdstatic.com/tb/static-pb/img/head_80.jpg
    24 # src -> //tb2.bdstatic.com/tb/static-pb/img/head_80.jpg
    25 # src -> //tb2.bdstatic.com/tb/static-pb/img/head_80.jpg
    26 # src -> //tb2.bdstatic.com/tb/static-pb/img/head_80.jpg
    27 # src -> //tb2.bdstatic.com/tb/static-pb/img/head_80.jpg
    28 # src -> //tb2.bdstatic.com/tb/static-pb/img/head_80.jpg
    29 # src -> https://ss0.bdstatic.com/9r-1bjml2gcT8tyhnq/fc-feed/0/pic/51d89e69dd318a8c2bcb07341879ac64.jpg
    30 # src -> https://ss0.bdstatic.com/9r-1bjml2gcT8tyhnq/fc-feed/0/pic/223a419756a2209b84f8f306d021a4a5.jpg
    31 # src -> //tb2.bdstatic.com/tb/static-pb/img/head_80.jpg
    32 # src -> //tb2.bdstatic.com/tb/static-pb/img/head_80.jpg
    33 # src -> //tb2.bdstatic.com/tb/static-pb/img/head_80.jpg
    34 # src -> https://gsp0.baidu.com/5aAHeD3nKhI2p27j8IqW0jdnxx1xbK/tb/editor/images/client/image_emoticon25.png
    35 # src -> https://gsp0.baidu.com/5aAHeD3nKhI2p27j8IqW0jdnxx1xbK/tb/editor/images/client/image_emoticon25.png
    36 # src -> //tb2.bdstatic.com/tb/static-pb/img/head_80.jpg
    37 # src -> //tb2.bdstatic.com/tb/static-pb/img/head_80.jpg
    38 # src -> //tb2.bdstatic.com/tb/static-pb/img/head_80.jpg
    39 # src -> //tb2.bdstatic.com/tb/static-pb/img/head_80.jpg
    40 # src -> //tb2.bdstatic.com/tb/static-pb/img/head_80.jpg
    41 # src -> //tb2.bdstatic.com/tb/static-pb/img/head_80.jpg
    42 # src -> //tb2.bdstatic.com/tb/static-pb/img/head_80.jpg
    43 # src -> https://imgsa.baidu.com/forum/pic/item/d933c895d143ad4bcf1ab5478b025aafa40f0604.jpg
    44 # src -> https://imgsa.baidu.com/forum/pic/item/78f0f736afc379319921ed85e2c4b74542a911d4.jpg
    45 # src -> https://imgsa.baidu.com/forum/pic/item/2f2eb9389b504fc23bf50aaaecdde71191ef6df3.jpg
    46 # src -> https://imgsa.baidu.com/forum/pic/item/d100baa1cd11728ba5c4656bc1fcc3cec2fd2c8a.jpg
    47 # src -> https://imgsa.baidu.com/forum/pic/item/2df5e0fe9925bc31b71993f157df8db1cb137017.jpg

    当然, 你还可以加一下正则, 去掉不是http://开头的也行

  • 相关阅读:
    P2572 [SCOI2010]序列操作
    P2787 语文1(chin1)- 理理思维
    P1835 素数密度_NOI导刊2011提高(04)
    P3942 将军令
    P1273 有线电视网
    U45490 还没想好名字的题Ⅱ
    U40620 还没想好名字的题
    P4644 [Usaco2005 Dec]Cleaning Shifts 清理牛棚
    P2921 [USACO08DEC]在农场万圣节Trick or Treat on the Farm
    T51071 Tony到死都想不出の数学题
  • 原文地址:https://www.cnblogs.com/perl6/p/6536882.html
Copyright © 2011-2022 走看看