zoukankan      html  css  js  c++  java
  • 爬虫模拟有道字典进行翻译,还发现了一条好玩的js

    08.14自我总结

    爬虫模拟有道字典进行翻译

    一.代码

    import requests
    from lxml.html import etree
    
    # headers= {
    # 'User-Agent':' Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.90 Safari/537.36',
    # 'Cookie':' DICT_UGC=be3af0da19b5c5e6aa4e17bd8d90b28a|; webDict_HdAD=%7B%22req%22%3A%22http%3A//dict.youdao.com%22%2C%22width%22%3A960%2C%22height%22%3A240%2C%22showtime%22%3A5000%2C%22fadetime%22%3A500%2C%22notShowInterval%22%3A3%2C%22notShowInDays%22%3Afalse%2C%22lastShowDate%22%3A%22Mon%20Nov%2008%202010%22%7D; ___rl__test__cookies=1565782601235; OUTFOX_SEARCH_USER_ID=131296774@139.226.172.110; OUTFOX_SEARCH_USER_ID_NCOO=1369535179.7407944; _ntes_nnid=b3ad33663a64ae962e76c71b2df46330,1565057224869; JSESSIONID=abcfltcZlc31Td7QD1pYw; search-popup-show=8-14; DICT_UGC=be3af0da19b5c5e6aa4e17bd8d90b28a|; ___rl__test__cookies=1565782014056'
    # }
    # 本来以为有道可能会有点反爬措施,结果发现并没有...
    
    
    a = input('请输入你翻译的内容')
    rp = requests.get(f'https://dict.youdao.com/w/{a}/')  # 这个url是Fiddler获得的
    
    # 他翻译的内容可能会出现在两个xpath中
    
    data_xpath_1 = '//*[@id="phrsListTab"]/div/ul/li/text()'
    html = etree.HTML(rp.text)
    data = html.xpath(data_xpath_1)
    
    if not data:
        data_xpath_2 = '//*[@id="phrsListTab"]/div/ul/p/span[2]/a/text()'
        data = html.xpath(data_xpath_2)
    
    #一长串东西翻译
    if not data:
        data_xpath_3 = '//*[@id="fanyiToggle"]/div/p[2]/text()'
        data = html.xpath(data_xpath_3)
    
    #单纯的为了打印好看
    count = 1
    for english in data:
        print(f'翻译{count}:{english}')
        count += 1
    
    

    我还发现了一条好玩的js
    https://dict.youdao.com/word/wordarticle?query=这里是我们查询的翻译输入的内容&jsonp=jQuery191018231021198201125_1565783847667&_=1565783847668 HTTP/1.1
    很多内容会匹配不到,你输入天才可以试试
    是匹配相关文章的

  • 相关阅读:
    Revit扩展组件介绍之_AdWindow
    PropertyGrid使用总结5 UITypeEditor
    PropertyGrid使用总结4 IcustomTypeDescriptor
    PropertyGrid使用总结3 Descriptor
    PropertyGrid使用总结2 TypeConverter
    JavaScript之Ajax学习
    JavaScript正则表达式
    JavaScript面向对象学习笔记
    node入门学习1
    JavaScript随笔8
  • 原文地址:https://www.cnblogs.com/pythonywy/p/11354352.html
Copyright © 2011-2022 走看看