zoukankan      html  css  js  c++  java
  • python爬虫学习(2)__抓取糗百段子,与存入mysql数据库

    import pymysql
    import requests
    from bs4 import BeautifulSoup
    #pymysql链接数据库 conn
    =pymysql.connect(host='127.0.1',unix_socket='/tmp/mysql.sock',user='root',passwd='19950311',db='mysql') cur=conn.cursor() cur.execute("USE scraping")
    #存储段子标题,内容
    def store(title,content): cur.execute("insert into pages(title,content) values("%s","%s")",(title,content)) cur.connection.commit() global links class QiuShi(object): def __init__(self,start_url): self.url=start_url def crawing(self): try: html=requests.get(self.url,'lxml') return html.content except ConnectionError as e: return '' def extract(self,htmlContent): if len(htmlContent)>0: bsobj=BeautifulSoup(htmlContent,'lxml') #print bsobj jokes=bsobj.findAll('div',{'class':'article block untagged mb15'}) for j in jokes: text=j.find('h2').text content=j.find('div',{'class':'content'}).string if text != None and content != None: # print text,content,数据库编码为utf-8 store(text.encode('utf-8'),content.encode('utf-8')) print text.encode('utf-8'),content.encode('utf-8') print '------------------------------------------------------------------------------' else: print '' def main(self): text=self.crawing() self.extract(text) try: qiushi=QiuShi('http://www.qiushibaike.com/') qiushi.main() finally:
    #关闭cursor,connection cur.close() conn.close()
  • 相关阅读:
    thinkphp中插入ueditor编辑器的代码
    编辑器
    php中上传图片,原生代码
    thinkphp中上传图片以及制成缩略图
    https://www.oschina.net/project/lang/19/java
    js中各种弹窗
    MYSQL数据库中中文乱码问题
    关于对CSS中超链接那部分的设置
    Collectors.groupingBy应用
    定时器算法
  • 原文地址:https://www.cnblogs.com/yunwuzhan/p/5765963.html
Copyright © 2011-2022 走看看