zoukankan      html  css  js  c++  java
  • Python之抓取网页元素

    import urllib.request
    
    from bs4 import BeautifulSoup
    
    url = "http://www.wal-martchina.com/walmart/store/14_hubei.htm"
    
    user_agent = "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36"
    
    request = urllib.request.Request(url)
    
    request.add_header("User-Agent", user_agent)
    
    content = urllib.request.urlopen(request)
    
    soup = BeautifulSoup(content,from_encoding="gb18030")
    
    #店名
    shopname = soup.find_all('td', {"class": "xl714445"})
    #地址
    addresss = soup.find_all('td', {"class": "xl684445"})
    #联系电话
    phones = soup.find_all('td', {"class": "xl744445"})
    
    for shop in shopname:
        print("店铺名称:"+shop.text.lstrip().rstrip())
    
    print("----------------------------------------------")
    
    for address in addresss:
          print("店铺地址:"+address.text.lstrip().rstrip())
    
    sum = 0
    for phone in phones:
        if sum % 2 == 0:
            print("联系电话:" + phone.text.lstrip().rstrip())
        else:
            print("交通路线:" + phone.text.lstrip().rstrip())
            print('---------------------------------------------------')
        sum += 1
    
  • 相关阅读:
    markdown文件的基本常用编写
    寒假作业安排及注意点
    Day2
    Day1
    Python格式化
    Python 遍历字典的键值
    python 判断是否为空
    git 回退版本
    Python获取当前文件夹位置
    Python3, Python2 获取当前时间
  • 原文地址:https://www.cnblogs.com/bilaisheng/p/10211043.html
Copyright © 2011-2022 走看看