（转）Python:正则表达式找出网页上所有链接

转自：http://www.linuxany.com/archives/596.html

import re
import urllib
def test(html,rex):
    alist = []
    r = re.compile(rex)
    matchs = r.findall(html)
    if matchs != None:
        for found in matchs:
            if found not in alist:
                alist.append(found)         
    return alist
             
rex = r'<as*href="(.*?)"'
page=urllib.urlopen('http://hi.baidu.com')
html=page.read()
page.close()
 
print test(html,rex)

查看全文

相关阅读:
PhpStorm Terminal 消失
 PhpStorm10.0快捷键大全 PhpStorm10.0常用快捷键和配置
 Laravel 学习笔记之语言包 IDE IDE提示工具 IDE插件笔记
 服务器搭建之php报错---<php5isapi.dll加载失败>
ios观察者模式和通知中心
 UIScrollView 实践经验
 ios app企业证书发布及升级
 关于数组在遍历过程中修改问题
 UIWebView与JavaScript的交互
 WebViewJavascriptBridge JS与iOS Native Code互调方法

原文地址：https://www.cnblogs.com/youthdream/p/3527787.html