摘要:由于刚入门python,所以目前只学会了一些简单的东西,通过做两个简单的爬取demo来总结入门python爬取的总结。
一、京东商品的爬取
![](https://images.cnblogs.com/OutliningIndicators/ContractedBlock.gif)
import requests#导入第三方库,必须 url = "https://item.jd.com/2967929.html"#网站 try: r = requests.get(url) print(r.status_code) # 200成功,404错误 # r.encoding = 'utf-8' r.encoding =r.apparent_encoding print(r.text[:1000])#爬取前1000字节 except: print("爬取失败")
二、网站图片的爬取
![](https://images.cnblogs.com/OutliningIndicators/ContractedBlock.gif)
import requests import os url="http://a0.att.hudong.com/16/12/01300535031999137270128786964.jpg" root="D://Other//Picture//"#存放地址 path=root+url.split('/')[-1] try: if not os.path.exists(root): os.major(root) if not os.path.exists(path): r=requests.get(url) with open(path,'wb') as f: f.write(r.content) f.close() print("保存成功") else: print("文件已存在") except: print("爬取失败")