zoukankan      html  css  js  c++  java
  • PHP抓取网页内容,获取链接绝对路径和图片绝对路径

    抓取网页内容方法:

    $ch = @curl_init($url);
    @curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $text = @curl_exec($ch);
    @curl_close($ch);
    $text=relative_to_absolute($text,$url);

    相对路径转绝对路径方法:

    function relative_to_absolute($content, $feed_url) {
        preg_match('/(http|https|ftp):\/\//', $feed_url, $protocol);
        $server_url = preg_replace("/(http|https|ftp|news):\/\//", "", $feed_url);
        $server_url = preg_replace("/\/.*/", "", $server_url);
    
        if ($server_url == '') {
            return $content;
        }
    
        if (isset($protocol[0])) {
            $new_content = preg_replace('/href="\//', 'href="'.$protocol[0].$server_url.'/', $content);
            $new_content = preg_replace('/src="\//', 'src="'.$protocol[0].$server_url.'/', $new_content);
        } else {
            $new_content = $content;
        }
        return $new_content;
    }

    获取所有超链接方法:

    function get_links($content) {
        $pattern = '/<a(.*?)href="(.*?)"(.*?)>(.*?)<\/a>/i';
        preg_match_all($pattern, $content, $m);
        $re=array_unique($m[2]);
        $i=0;
        foreach ($re as $key => $value)
        {
            $regex = "(http|https|ftp|telnet|news)";
            if((!empty($value)||strlen($value)>0)&&preg_match($regex,$value))
                $output[$i++]=$value;
        }
        return  $output;
    }

    获取所有图片链接方法:

    function get_pic($str)
    {
        $imgs=array();
        preg_match_all("/((http|https|ftp|telnet|news):\/\/[a-z0-9\/\-_+=.~!%@?#%&;:$\\()|]+\.(jpg|gif|png|bmp|swf|rar|zip))/isU",$str,$imgs);
        return array_unique($imgs[0]);;
    }
  • 相关阅读:
    LeetCode234回文链表
    LeetCode445两数相加II
    LeetCode24两两交换链表中的节点
    LeetCode19删除链表的倒数第N个节点
    LeetCode513找树左下角的值
    LeetCode637二叉树的层平均值
    LeetCode671二叉树中第二小的节点
    LeetCode337打家劫舍III
    LeetCode124二叉树中的最大路径和
    LeetCode687最长同值路径
  • 原文地址:https://www.cnblogs.com/zhishan/p/3102960.html
Copyright © 2011-2022 走看看