zoukankan      html  css  js  c++  java
  • 爬取https://www.parenting.com/babynames/boys/earl网站top10男女生名字及相关信息

    爬取源代码如下:

    import requests
    import bs4
    from bs4 import BeautifulSoup
    import re
    import pandas as pd
    import io
    import sys
    sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='gb18030')
    
    lilist=[]
    
    r=requests.get('https://www.parenting.com/baby-names/boys/earl')
    soup=BeautifulSoup(r.text,"lxml")
    soup= soup.find_all('a',href=True)
    for i in soup:
        if 'https://www.parenting.com/pregnancy/baby-names/baby-boy-names/' in str(i)or'https://www.parenting.com/pregnancy/baby-names/girl-baby-names/' in str(i):
            lilist.append(i.get("href"))
    lilist1=[]
    results1=[]
    results=[]
    results2=[]
    
    for i in list(set(lilist)): 
        r=requests.get(i)
        soup=BeautifulSoup(r.text,"lxml")
        
     
        Source=soup.find_all('p')
        Source=soup.find_all(attrs={'class': 'description'})
        
        results0 = re.findall('<h4>(.*?)</h4>', r.text)
        for c in results0:
            if c!='':
                lilist1.append(c)
        #print(lilist1)
        #lilist1=[]
        pattern = re.compile('<p><strong>Origin:</strong>\s(.*?)</p>', re.S)
        results += re.findall(pattern, str(Source))
           
        pattern1 = re.compile('<p><strong>Meaning:</strong>\s(.*?)</p>', re.S)
        results1 += re.findall(pattern1, str(Source))
        pattern2 = re.compile("<p><strong>Why it’s big:</strong>\s(.*?)</p>", re.S)
        results2 += re.findall(pattern2, str(Source))
        
    
        
    print(lilist1)
    print(results1)
    print(results)
    print(results2)
    data = {
        'EnName':lilist1,
        'Meaning':results1,
        'Origin':results,
        'Description':results2
    }
    frame = pd.DataFrame(data)
    frame.to_csv('wt10.csv',encoding="gb18030")
    #print(results2)
     csv文件截图:
     
     
     
     
  • 相关阅读:
    Laravel 5.1 简单学习
    Laravel5.1 报错:控制器不存在
    集电极开路、漏极开路、上拉电阻、下拉电阻等接口相关基本概念
    UDS(ISO14229-2006) 汉译(No.7 应用层协议)
    Freescale 车身控制模块(BCM) 解决方案
    汽车控制器LIMPHOME电路设计
    区分整流二极管和稳压二极管
    耦合电容和滤波电容的区别
    二极管钳位电路
    开关二极管工作原理
  • 原文地址:https://www.cnblogs.com/c1q2s3/p/12078047.html
Copyright © 2011-2022 走看看