zoukankan      html  css  js  c++  java
  • Python爬虫:爬取自己博客的主页的标题,链接,和发布时间

    代码

    # -*- coding: utf-8 -*-
    """
    -------------------------------------------------
       File Name:     getCnblogs
       Description :
       Author :       神秘藏宝室
       date:          2017-09-21
    -------------------------------------------------
       Change Activity:
                       2017-09-21:
    -------------------------------------------------
    """
    import requests
    from bs4 import BeautifulSoup
    
    res = requests.get('http://www.cnblogs.com/Mysterious/')
    res.encoding = ('utf-8')
    
    soup = BeautifulSoup(res.text,'html.parser')
    
    def getBlogWriteTime(url):
        res = requests.get(url)
        res.encoding = ('utf-8')
        soup = BeautifulSoup(res.text,'html.parser')
        return soup.select('#post-date')[0].text
    
    #获取标题和链接
    num = 1
    for pt in soup.select('.postTitle2'):
        print num,'	',pt.text,'	',pt['href'],'	',getBlogWriteTime(pt['href'])
        num = num + 1
    

    结果

    1 	Python爬虫:获取新浪网新闻 	http://www.cnblogs.com/Mysterious/p/7538833.html 	2017-09-18 00:10
    2 	运行jupyter notebook 出错 Error executing Jupyter command 'notebook' 	http://www.cnblogs.com/Mysterious/p/7538169.html 	2017-09-17 22:10
    3 	安装和使用jupyter 	http://www.cnblogs.com/Mysterious/p/7533607.html 	2017-09-17 00:25
    4 	windows下python调用c文件流程 	http://www.cnblogs.com/Mysterious/p/7529228.html 	2017-09-16 00:01
    5 	python Unable to find vcvarsall.bat 错误 	http://www.cnblogs.com/Mysterious/p/7529142.html 	2017-09-15 23:30
    6 	阿里云公网IP不能使用 	http://www.cnblogs.com/Mysterious/p/7523618.html 	2017-09-14 22:36
    7 	Python2 socket TCPServer 多线程并发 超时关闭 	http://www.cnblogs.com/Mysterious/p/7523559.html 	2017-09-14 22:27
    8 	Python2 socket 多线程并发 ThreadingTCPServer Demo 	http://www.cnblogs.com/Mysterious/p/7507314.html 	2017-09-11 21:50
    9 	Python2 socket 多线程并发 TCPServer Demo 	http://www.cnblogs.com/Mysterious/p/7507221.html 	2017-09-11 21:28
    10 	Python socket TCPServer Demo 	http://www.cnblogs.com/Mysterious/p/7507042.html 	2017-09-11 20:59
    
  • 相关阅读:
    iOS7上在xib中使用UITableViewController设置背景色bug
    Android 转载一篇.9图片详解文章
    Android 中4种屏幕尺寸
    网络传输中的三张表,MAC地址表、ARP缓存表以及路由表
    防火墙简介
    Makefile有三个非常有用的变量。分别是$@,$^,$
    makefile简单helloworld
    Python异常处理try except
    shell 读取配置文件的方法
    ubuntu 添加开机启动服务
  • 原文地址:https://www.cnblogs.com/Mysterious/p/7571964.html
Copyright © 2011-2022 走看看