zoukankan      html  css  js  c++  java
  • 【Python爬虫】第五课(b站弹幕)

    首先,非常感谢大神的文章 https://www.cnblogs.com/LexMoon/p/pyspider03.html#4361286

    import requests
    import re
    av_id = '67946325'
    headers = {
        'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36',
        'Accept': 'text/html',
        'Cookie': "嘿嘿"
    
    }
    resp = requests.get('https://www.bilibili.com/video/av'+av_id,headers=headers)
    
    match_rule = r'cid=(.*?)&aid'
    oid = re.search(match_rule,resp.text).group().replace('cid=','').replace('&aid','')
    print('oid='+oid)
    
    xml_url = 'https://api.bilibili.com/x/v1/dm/list.so?oid='+oid
    
    resp = requests.get(xml_url,headers=headers)
    
    if resp.encoding == 'ISO-8859-1': encodings = requests.utils.get_encodings_from_content(resp.text) if encodings: encoding = encodings[0] else: encoding = resp.apparent_encoding global encode_content encode_content = resp.content.decode(encoding,'replace') print(encode_content) #爬虫headers需要包含什么内容才不会返回404呢?我尝试7个全写,发现就不对。 #正则表达式快忘记了…… #最后的乱码解决方案

     

  • 相关阅读:
    YTU 2928: 取不重复的子串。
    YTU 2922: Shape系列-8
    YTU 2920: Shape系列-7
    STL stl_config.h
    STL defalloc.h
    STL stl_alloc.h
    STL memory.cpp
    STL stl_construct.h
    STL stl_uninitialized.h
    stl_iterator.h
  • 原文地址:https://www.cnblogs.com/break03/p/11575327.html
Copyright © 2011-2022 走看看