Python实现简单的爬虫获取某刀网的更新数据 - 走看看

zoukankan html css js c++ java

Python实现简单的爬虫获取某刀网的更新数据
昨天晚上无聊时，想着练习一下Python所以写了一个小爬虫获取小刀娱乐网里的更新数据
[python] view plain copy

#!/usr/bin/python

# coding: utf-8



import urllib.request

import re

#定义一个获取网页源码的子程序

head = "www.xiaodao.la"

def get():

    data = urllib.request.urlopen('http://www.xiaodao.la').read()

    #解码并去除无用文字

    str = data.decode("gbk").replace(r"font-weight:bold;","").replace(r" ","").replace(" ","").replace(" ","").replace(" ","").replace("#FF0000","#000000").strip()

    return str[str.find("好卡售"):str.find("20160303184868786878.gif")]#返回指定内容

#获取一次网页源码并赋值给str

str = get();

#print(str)



#定义正则表达式

#reg = r'href="(.*?)"style="color:#000000;"title="(.*?)"target="_blank">'

reg = r'href="(.*?)"style="color:#000000;"title="(.*?)"target="_blank">(.*?)</a></div></td><tdwidth=12.5%align=rightnowrap=nowrapstyle="color:#F00;">(.*?)</td>'



tmp = re.compile(reg);#创建正则表达式

list = re.findall(tmp,str);#正则表达式匹配

list = tuple(list)#转换类型



print("一共匹配到%d个"%(len(list)))#输出匹配数量

#print(list)



for i in range(len(list)):

    print("当前第%d个:"%(i+1))

    print("标题:%s 地址:%s更新时间:%s "%(list[i][1],head + list[i][0],list[i][3]))
查看全文

相关阅读:
摄影基础知识（二）
std::bind
摄影网站汇总
 std::function
常用路径说明
 摄影基础知识（一）
JavaScript 箭头函数：适用与不适用场景
 软帝学院：Java实现的5大排序算法
 软帝学院：用Java编写计算器，代码展示！
windows环境下运行java的脚本

原文地址：https://www.cnblogs.com/zxtceq/p/8985732.html

Copyright © 2011-2022 走看看