zoukankan      html  css  js  c++  java
  • PHP抓取网页内容,获取链接绝对路径和图片绝对路径

    抓取网页内容方法:

    $ch = @curl_init($url);
    @curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $text = @curl_exec($ch);
    @curl_close($ch);
    $text=relative_to_absolute($text,$url);

    相对路径转绝对路径方法:

    function relative_to_absolute($content, $feed_url) {
        preg_match('/(http|https|ftp):\/\//', $feed_url, $protocol);
        $server_url = preg_replace("/(http|https|ftp|news):\/\//", "", $feed_url);
        $server_url = preg_replace("/\/.*/", "", $server_url);
    
        if ($server_url == '') {
            return $content;
        }
    
        if (isset($protocol[0])) {
            $new_content = preg_replace('/href="\//', 'href="'.$protocol[0].$server_url.'/', $content);
            $new_content = preg_replace('/src="\//', 'src="'.$protocol[0].$server_url.'/', $new_content);
        } else {
            $new_content = $content;
        }
        return $new_content;
    }

    获取所有超链接方法:

    function get_links($content) {
        $pattern = '/<a(.*?)href="(.*?)"(.*?)>(.*?)<\/a>/i';
        preg_match_all($pattern, $content, $m);
        $re=array_unique($m[2]);
        $i=0;
        foreach ($re as $key => $value)
        {
            $regex = "(http|https|ftp|telnet|news)";
            if((!empty($value)||strlen($value)>0)&&preg_match($regex,$value))
                $output[$i++]=$value;
        }
        return  $output;
    }

    获取所有图片链接方法:

    function get_pic($str)
    {
        $imgs=array();
        preg_match_all("/((http|https|ftp|telnet|news):\/\/[a-z0-9\/\-_+=.~!%@?#%&;:$\\()|]+\.(jpg|gif|png|bmp|swf|rar|zip))/isU",$str,$imgs);
        return array_unique($imgs[0]);;
    }
  • 相关阅读:
    侧滑界面的实现
    Private field 'XXX' is never assigned的解决办法
    android先加载注册页面而不是MainActivity主页面
    每日日报4
    每日日报3
    47 选择排序和插入排序
    计算机启动过程 BIOS MBR等
    ARM中MMU地址转换理解(转)
    深度学习框架 CatBoost 介绍
    预训练词嵌入
  • 原文地址:https://www.cnblogs.com/zhishan/p/3102960.html
Copyright © 2011-2022 走看看