zoukankan      html  css  js  c++  java
  • ELK之开心小爬爬

    1.开心小爬爬

    在爬取之前需要先安装requests模块和BeautifulSoup这两个模块

    '''
    https://www.autohome.com.cn/all/
    爬取图片和链接
    写入数据库里边
    标题:title summary  a_url img_url tags...
    
    #https://www.autohome.com.cn/all/3/#liststart   #访问地址
    #懒加载,访问到才加载内容
    安装:
    pip install requests
    pip install BeautifulSoup4
    pip install -i https://pypi.doubanio.com/simple/requests
    
    设计表结构将数据存储到数据库中
    '''
    import requests
    from bs4 import BeautifulSoup
    from concurrent.futures import ThreadPoolExecutor    #开启线程池,更快的爬取数据
    import time
    import os
    def work(k): response
    =requests.get(url='https://www.autohome.com.cn/all/{}/#liststart'.format(k)) response.encoding="GBK" soup_obj=BeautifulSoup(response.text,'html.parser') div_obj=soup_obj.find(name='div',attrs={"id":"auto-channel-lazyload-article"}) li_list=div_obj.find_all(name='li') for i in li_list: no_obj=i.find(name='h3') if not no_obj:continue title=i.find(name='h3').text summary=i.find(name='p').text a='https'+i.find(name='a').get('href') img='https'+i.find(name='img').get('src') tags=a.split('/',4)[3] # print(response.url,title,tags) print(title,summary,a,img,tags)
    #下面是个models里边表名 info_obj
    =models.infodata(title=title,summary=summary,a=a,img=img,tags=tags) #下面是保存数据 到数据库
         info_obj.save() def spider():
    """爬取汽车之家""" t=ThreadPoolExecutor(10) for k in range(1,6839): t.submit(work,k) t.shutdown() # response=requests.get(url='https://www.autohome.com.cn/all/6836/#liststart') # print(response.headers) #头文件 # print(response.encoding) #编码 # print(response.status_code) #状态码 # print(response.text) #html文件 if __name__ == '__main__': #manage.py里边的内容,要对应起来 os.environ.setdefault("DJANGO_SETTINGS_MODULE", "myes007.settings") #写下面的两行代码
       import django django.setup()
      #导入models
    from web01 import models t1=time.time() spider() print(time.time()-t1)

    2.自定义models.py模块

    from django.db import models
    
    # Create your models here.
    # title summary  a_url img_url tags
    class infodata(models.Model):
        title=models.CharField(verbose_name="标题",max_length=200)
        summary=models.CharField(verbose_name="摘要",max_length=300)
        a=models.CharField(verbose_name="文章链接",max_length=100)
        img=models.CharField(verbose_name="图片链接",max_length=100)
        tags=models.CharField(verbose_name="标签",max_length=100)

    写完上边的内容,需要在Terminal窗体中执行命令

    python manage.py makemigrations  #保存models.py的变更记录
    python manage.py migrate       #把变更记录同步到数据库中

    3.前后端设计&&配置文件.

  • 相关阅读:
    layui 标签页切换
    m1配置多个git账户
    sqlserver 获取表和字段的注释方法
    springboot+mybatisPlus 配置多数据源--转载
    nginx 配置静态网页和反向代理
    ORA-01000: maximum open cursors exceeded
    重装系统我们选择FAT还是NTFS?U盘和硬盘格式化两者选谁?
    2、条件表达式
    1、javascript 知识拓展
    1_maven 问题
  • 原文地址:https://www.cnblogs.com/studybrother/p/10908438.html
Copyright © 2011-2022 走看看