zoukankan      html  css  js  c++  java
  • Python网络爬虫

    下面我们创建一个真正的爬虫例子

    爬取我的博客园个人主页首页的推荐文章列表和地址

    scrape_home_articles.py

    from urllib.request import urlopen
    from bs4 import BeautifulSoup
    import re
    
    html = urlopen("http://www.cnblogs.com/davidgu")
    bsObj = BeautifulSoup(html, "html.parser")
    for link in bsObj.find("div", {"id":"main_container"}).findAll("a", href=re.compile("^http://www.cnblogs.com/davidgu/p")):
        if 'href' in link.attrs and not('class' in link.attrs):
            print(link.string)
            print(link.attrs['href'])
            print("--------------------------------------------------------------")

    运行结果:
    [置顶]解决adb server端口被占用的问题
    http://www.cnblogs.com/davidgu/p/4515236.html
    --------------------------------------------------------------
    [置顶]解决Eclipse下不自动拷贝apk到模拟器问题( The connection to adb is down, and a sever
    http://www.cnblogs.com/davidgu/p/4390661.html
    --------------------------------------------------------------
    常用的正则表达式一览
    http://www.cnblogs.com/davidgu/p/4831357.html
    --------------------------------------------------------------
    C++ 11 - STL - 函数对象(Function Object) (上)
    http://www.cnblogs.com/davidgu/p/4829097.html
    --------------------------------------------------------------

    ...

  • 相关阅读:
    导航控制器的出栈
    UIPickView的基本使用
    多控制器
    通过Xib加载控制器的View
    从StoryBoard加载控制器
    模仿UIApplication单例
    LaunchScreen原理
    UIWindow
    指定初始化的运用
    零长度数组在内核中的运用
  • 原文地址:https://www.cnblogs.com/davidgu/p/4831754.html
Copyright © 2011-2022 走看看