zoukankan      html  css  js  c++  java
  • PHP抓取网页内容,获取链接绝对路径和图片绝对路径

    抓取网页内容方法:

    $ch = @curl_init($url);
    @curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $text = @curl_exec($ch);
    @curl_close($ch);
    $text=relative_to_absolute($text,$url);

    相对路径转绝对路径方法:

    function relative_to_absolute($content, $feed_url) {
        preg_match('/(http|https|ftp):\/\//', $feed_url, $protocol);
        $server_url = preg_replace("/(http|https|ftp|news):\/\//", "", $feed_url);
        $server_url = preg_replace("/\/.*/", "", $server_url);
    
        if ($server_url == '') {
            return $content;
        }
    
        if (isset($protocol[0])) {
            $new_content = preg_replace('/href="\//', 'href="'.$protocol[0].$server_url.'/', $content);
            $new_content = preg_replace('/src="\//', 'src="'.$protocol[0].$server_url.'/', $new_content);
        } else {
            $new_content = $content;
        }
        return $new_content;
    }

    获取所有超链接方法:

    function get_links($content) {
        $pattern = '/<a(.*?)href="(.*?)"(.*?)>(.*?)<\/a>/i';
        preg_match_all($pattern, $content, $m);
        $re=array_unique($m[2]);
        $i=0;
        foreach ($re as $key => $value)
        {
            $regex = "(http|https|ftp|telnet|news)";
            if((!empty($value)||strlen($value)>0)&&preg_match($regex,$value))
                $output[$i++]=$value;
        }
        return  $output;
    }

    获取所有图片链接方法:

    function get_pic($str)
    {
        $imgs=array();
        preg_match_all("/((http|https|ftp|telnet|news):\/\/[a-z0-9\/\-_+=.~!%@?#%&;:$\\()|]+\.(jpg|gif|png|bmp|swf|rar|zip))/isU",$str,$imgs);
        return array_unique($imgs[0]);;
    }
  • 相关阅读:
    磁盘分区,fdisk,gdisk,开机自动挂载,swap分区,修复文件系统,备份文件
    进程脱离窗口运行,僵尸、孤儿进程
    top命令、kill命令
    进程状态
    rpm包、挂载、yum命令
    DRF源码分析
    forms组件源码
    Django CBV源码分析
    魔法方法
    鸭子类型
  • 原文地址:https://www.cnblogs.com/zhishan/p/3102960.html
Copyright © 2011-2022 走看看