zoukankan      html  css  js  c++  java
  • python模块之HTMLParser解析出URL链接

    # -*- coding: utf-8 -*-
    #python 27
    #xiaodeng
    #python模块之HTMLParser解析出URL链接
    #http://www.cnblogs.com/mfryf/p/3691563.html
    
    
    
    from HTMLParser import HTMLParser
    class MyHTMLParser(HTMLParser):   
        def __init__(self):   
            HTMLParser.__init__(self) #继承  
            self.links = []#links 链接
        
        def handle_starttag(self, tag, attrs):   
            #print "Encountered the beginning of a %s tag" % tag
            
            if tag == "a":   
                if len(attrs) == 0:   
                    pass   
                else:   
                    for variable, value in attrs:
                        if variable == "href":   
                            self.links.append(value)   
    
                         
    if __name__ == "__main__":
        #写入一个html长字符串
        html_code = """<a href="www.google.com"> google.com</a>
    <A Href="www.pythonclub.org"> PythonClub </a>
    <A HREF = "www.sina.com.cn"> Sina </a>
    """   
        hp = MyHTMLParser()
        hp.feed(html_code)
        hp.close()
        #print hp.handle_starttag('a', 'href')
        print hp.links #['www.google.com', 'www.pythonclub.org', 'www.sina.com.cn']
  • 相关阅读:
    二分
    枚举
    dp
    bfs
    bfs
    dfs
    ipython快捷键
    虚拟机串口连接嵌入式开发板
    rtmp向IR601移植过程(无功能步骤,只有移植步骤)
    静态库和动态库
  • 原文地址:https://www.cnblogs.com/dengyg200891/p/4983683.html
Copyright © 2011-2022 走看看