zoukankan      html  css  js  c++  java
  • shell抓取

    #!/bin/sh
    
    dir=`dirname $0`
    configDir="$dir/config"
    
    ipport="$configDir/ip_port"
    
    url="http://www.youdaili.cn/Daili/http/"
    indexs=$(curl -s --max-time 200 "$url" |piconv -f utf8 -t gbk|awk '$0~/http://www.youdaili.cn/static/images/hot.gif/{print substr($2,41,length($2)-46)}')
    
    pages="$(curl -s --max-time 200  "${url}${indexs}.html"|piconv -f utf8 -t gbk|awk '$0~/共.*页/{page=gensub(/.*共([^页]+).*/,"\1","1",$0);print page}')"
    
    for((page=1;page<=$pages;page++))
    do
            if [[ $page -eq 1  ]]
            then
                    curl -s --max-time 200  "${url}${indexs}.html"|piconv -f utf8 -t gbk|awk '$0~/.*@HTTP#.*<br />/{gsub(".*<p>","",$0);gsub(".*<span>","",$0);gsub("@HTTP#.*","",$0);print}'
            else
                    link="${url}${indexs}_$page.html"
                    curl -s --max-time 200  "$link"|piconv -f utf8 -t gbk|awk '$0~/.*@HTTP#.*<br />/{gsub(".*<p>","",$0);gsub(".*<span>","",$0);gsub("@HTTP#.*","",$0);print}'
            fi
    done | sort -u >$ipport
  • 相关阅读:
    iOS sandbox
    属性和成员变量
    SDWebImage
    MRC和ARC混编
    MRC转ARC(2)
    MRC转ARC
    CentOS7.x关闭防火墙
    Linux下Tomcat带日志启动命令
    SpringBoot-属性文件properties形式
    SpringBoot-配置Java方式
  • 原文地址:https://www.cnblogs.com/code-style/p/3664964.html
Copyright © 2011-2022 走看看