python根据正则表达式的简单爬虫

今天根据正则表达式简单的爬了一下大众点评,把北京的美食爬了爬,(店铺名,人均消费,地址)

import re
import urllib.request
from urllib.request import urlopen

def getPage(url):
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) '
                             'Chrome/51.0.2704.63 Safari/537.36'}
    req = urllib.request.Request(url=url, headers=headers)
    res = urllib.request.urlopen(req)
    return res.read().decode('utf-8')

def parsePage(s):
    ret = com.finditer(s)
    for i in ret:
        ret = {
            "店铺名": i.group("shop_name"),
            "人均价格": i.group("per_capita"),
            "地址": i.group("address"),
        }

        yield ret

def main(num):
    url = "http://www.dianping.com/beijing/ch10/p%s?aid=92020785%%2C102284990&cpt=92020785%%2C102284990" % num
    response_html = getPage(url)
    ret = parsePage(response_html)
    print(ret)
    f = open("eat_info", "a", encoding="utf-8")

    for obj in ret:
        print(obj)
        data = str(obj)
        f.write(data + "
")
com = re.compile(
        '<div class="txt">.*?<h4>(?P<shop_name>.*?)</h4>'
        '.*?<b>￥(?P<per_capita>d+)</b>.*?<span class="addr">(?P<address>.*?)</span>', re.S)

count = 1
for i in range(50):
    main(count)
    count += 1

简单爬虫

查看全文

相关阅读:
吴恩达深度学习第4课第3周编程作业 + PIL + Python3 + Anaconda环境 + Ubuntu + 导入PIL报错的解决
 Ubuntu 14.04 16.04 17.10 + Win10 双系统安装记录 + 分区大小选择办法
 NVIDIA Titan Xp Star Wars Collector's Edition显卡深度学习工作站 + Ubuntu17.10 + Tensorflow-gpu + Anaconda3 + Python 3.6 设置
 request.getParameter() request.getAttribute()
Gson 数据解析
 级联查询
 mybatis 从入门到精通读书笔记
 springboot 随笔
 springboot 跨域
 select

原文地址：https://www.cnblogs.com/zycorn/p/9444318.html