zoukankan      html  css  js  c++  java
  • Python 爬虫 学习一

    # coding: utf8
    
    import requests
    from bs4 import BeautifulSoup
    
    
    PhotoName = 1
    
    
    DATA = []
    
    
    def save_img(url,name): ##保存图片
       # print('开始保存图片...')
        img = requests.get(url)
        file_name = name+ '.jpg'
       #print('开始保存文件')
        f = open(file_name, 'ab')
        f.write(img.content)
        print(url,'is ok!')
        f.close()
    
    
     
    
    
    
    #print(res.text)
    
    
    html_doc = res.text
    soup = BeautifulSoup(html_doc, 'lxml').find_all('img')  #声明BeautifulSoup对象
    
    
    #for m in soup:
     # print(m['src']) 
    
    
    for n in range(68):
        try:
            res = requests.get('http://qq.yh31.com/zjbq/0551964_'+str(n+2)+'.html',timeout=15)
        except: #(可以自己添加错误类型)
            pass
        #print(res.text)
        soup = BeautifulSoup(res.text, 'lxml').find_all('img',alt="")#class_='c_content_overflow'
        mess ='' +str(n+2)+'页,爬取图片地址'+ str(len(soup))+'张...'
        print(mess)
        for vl in soup:
            DATA.append('http://x.yh31.com:85'+vl['src'])
            PhotoName+=1
            save_img('http://x.yh31.com:85'+vl['src'],str(PhotoName))

    #取标签内值 vl.string

    了解 requests 库   简单获取网页数据 ,然后利用bs4来获取到我们想要的数据结构。

  • 相关阅读:
    BZOJ 1055 [HAOI2008]玩具取名
    BZOJ 1054 [HAOI2008]移动玩具
    BestCoder Round #51 (div.2)
    python对拍程序
    BZOJ 1053 [HAOI2007]反素数ant
    BZOJ 1051 [HAOI2006]受欢迎的牛
    Codeforces Round #315 (Div. 2)
    今天愉快的hack小记
    BZOJ 1050 [HAOI2006]旅行comf
    COJ 2135 Day10-例1
  • 原文地址:https://www.cnblogs.com/xuandi/p/6836990.html
Copyright © 2011-2022 走看看