zoukankan      html  css  js  c++  java
  • 爬虫

    1.下载requests库 在网上找一些我感兴趣的新闻 有些网站不让爬 借鉴了教员的方法才奏效

    2.代码如下

    import requests
    def f(a):
    m=requests.get(a)
    m.encoding='utf-8'
    return m.text
    a='https://baike.so.com/doc/6315748-6529342.html '
    print(f(a))

    3.结果为

    <!DOCTYPE html>
    <!--STATUS OK-->
    <html mode="noscript">
    <head>
    <meta http-equiv="X-UA-Compatible" content="IE=Edge" />
    <meta charset="utf-8" />
    <meta name="referrer" content="always" />

    <meta name="description" content="互联网上的实用生活指南。在这里,您可以找到许多经过实践检验的办法来解决现实中遇到的问题,也可以将自己的经验贡献出来让更多人因之受益"/>

    <title>您的访问出错了_百度经验</title>
    <link rel="shortcut icon" href="/favicon.ico?v=20171030" type="image/x-icon" />
    <link rel="icon" sizes="any" mask href="//exp.bdstatic.com/static/common-jquery/widget/img-baidu-com/baidu_icon_85beaf5.svg">

    <script type="text/javascript">
    !function(){var n={},t={};n.context=function(n,e){var i=arguments.length;if(i>1)t[n]=e;else if(1==i){if("object"!=typeof n)return t[n];for(var o in n)n.hasOwnProperty(o)&&(t[o]=n[o])}},"F"in window||(window.F=n)}();;

    F.context('user', {"isLogin":false,"uname":null});
    </script>
    <!--[if lte IE 8]>
    <script>!function(){for(var e="abbr,article,aside,audio,canvas,datalist,details,dialog,eventsource,figure,figcaption,footer,header,hgroup,mark,menu,meter,nav,output,progress,section,time,video".split(","),t=e.length;t--;)document.createElement(e[t])}();</script>
    <![endif]-->
    <link rel="stylesheet" type="text/css" href="//exp.bdstatic.com/static/common-jquery/pkg/common_c5a8df0.css"/><link rel="stylesheet" type="text/css" href="//exp.bdstatic.com/static/common-jquery/widget/js/ui/favor/favor_ee7602f.css"/><link rel="stylesheet" type="text/css" href="//exp.bdstatic.com/static/common-jquery/pkg/special-msg_d02dba8.css"/><link rel="stylesheet" type="text/css" href="//exp.bdstatic.com/static/common-jquery/widget/js/logic/msg/go-and-see/go-and-see_d818767.css"/><link rel="stylesheet" type="text/css" href="//exp.bdstatic.com/static/common-jquery/widget/js/logic/msg/douniwan/douniwan_6ec14e6.css"/><link rel="stylesheet" type="text/css" href="//exp.bdstatic.com/static/common-jquery/widget/js/logic/msg/school2016/school2016_b4c6cb1.css"/><link rel="stylesheet" type="text/css" href="//exp.bdstatic.com/static/common-jquery/widget/js/logic/msg/badge/badge_2385538.css"/><link rel="stylesheet" type="text/css" href="//exp.bdstatic.com/static/common-jquery/widget/js/logic/msg/super-2016/super-2016_46f8488.css"/><link rel="stylesheet" type="text/css" href="//exp.bdstatic.com/static/common-jquery/widget/js/logic/msg/marathon/marathon_d00c0a0.css"/><link rel="stylesheet" type="text/css" href="//exp.bdstatic.com/static/common-jquery/widget/js/logic/msg/redbox-video/redbox-video_68a545b.css"/><link rel="stylesheet" type="text/css" href="//exp.bdstatic.com/static/common-jquery/widget/js/logic/msg/prize2017/prize2017_99af780.css"/><link rel="stylesheet" type="text/css" href="//exp.bdstatic.com/static/common-jquery/widget/js/logic/msg/freshman/freshman_6f491b2.css"/><link rel="stylesheet" type="text/css" href="//exp.bdstatic.com/static/common-jquery/widget/js/logic/msg/msg_8ee3316.css"/><link rel="stylesheet" type="text/css" href="//exp.bdstatic.com/static/common-jquery/widget/js/logic/msg/amazing-2016/amazing-2016_b458231.css"/><link rel="stylesheet" type="text/css" href="//exp.bdstatic.com/static/common-jquery/pkg/jquery-ui_ebd8777.css"/><link rel="stylesheet" type="text/css" href="//exp.bdstatic.com/static/common-jquery/widget/bread/bread_7d32459.css"/><link rel="stylesheet" type="text/css" href="//exp.bdstatic.com/static/common-jquery/widget/js/logic/captcha/captcha_8184ca4.css"/><link rel="stylesheet" type="text/css" href="//exp.bdstatic.com/static/common-jquery/widget/js/ui/album/album_406e459.css"/><link rel="stylesheet" type="text/css" href="//exp.bdstatic.com/static/more/pkg/more_05cd285.css"/><link rel="stylesheet" type="text/css" href="//exp.bdstatic.com/static/more/pkg/feedback_6657a5c.css"/></head>

    <script>
    if(typeof(alog) !== 'undefined') {
    alog('speed.set', 'ht', +new Date);
    }
    </script>

    <body class="layout-center">


    <div id="userbar" class="userbar">
    <a href="/">百度经验</a>&nbsp;|&nbsp;<a href="https://zhidao.baidu.com/" target="_blank">百度知道</a>&nbsp;|&nbsp;<a href="https://www.baidu.com/" target="_blank" rel="nofollow">百度首页</a>&nbsp;|&nbsp;<a href="https://passport.baidu.com/v2/?login" id="userbar-login" rel="nofollow">登录</a>&nbsp;|&nbsp;<a id="top-reg-link" target="_blank" href="https://passport.baidu.com/v2/?reg&tpl=exp&u=" rel="nofollow">注册</a>
    </div>


    <header id="header" class="container">


    <div id="search-box" class="search-box">
    <div class="inner-warp">
    <div class="s-nav">
    <a class="logo" href="/">
    <img src="//exp.bdstatic.com/static/common-jquery/widget/search-box/img/logo_83ae7e2.png" width="137" height="46" alt="百度经验" />
    </a>
    <ul class="channel">
    <li><a href="http://news.baidu.com/" log="type:3100,menu:1" rel="nofollow">新闻</a></li>
    <li><a href="https://www.baidu.com/" log="type:3100,menu:2" rel="nofollow">网页</a></li>
    <li><a href="http://tieba.baidu.com/" log="type:3100,menu:3" rel="nofollow">贴吧</a></li>
    <li><a href="http://zhidao.baidu.com/" log="type:3100,menu:4">知道</a></li>
    <li><strong>经验</strong></li>
    <li><a href="http://music.baidu.com/" log="type:3100,menu:5" rel="nofollow">音乐</a></li>
    <li><a href="http://image.baidu.com/" log="type:3100,menu:6" rel="nofollow">图片</a></li>
    <li><a href="http://video.baidu.com/" log="type:3100,menu:7" rel="nofollow">视频</a></li>
    <li><a href="http://map.baidu.com/" log="type:3100,menu:8" rel="nofollow">地图</a></li>
    <li><a href="http://baike.baidu.com/" log="type:3100,menu:9" rel="nofollow">百科</a></li>
    <li><a href="http://wenku.baidu.com/" log="type:3100,menu:10" rel="nofollow">文库</a></li>
    </ul>
    </div>
    <form action="/search" name="top-search-form" method="get" id="top-search-form" class="top-search-form">
    <div class="clearfix box">
    <span class="s-ipt-wr">
    <input x-webkit-speech x-webkit-grammar="bUIltin:search" class="hdi" id="kw" autoComplete="off" maxlength="256" tabindex="1" size="46" name="word" value=''/>
    </span>
    <span class="s-btn-wr">
    <input type="submit" class="s-btn" id="sb" value="搜索经验">
    </span><span class="s-tools">
    <a href="http://www.baidu.com/search/jingyan_help.html" target="_blank" rel="nofollow">帮助</a>
    </span><div class="clear"></div>
    </div>
    </form>
    </div>
    </div>

  • 相关阅读:
    JavaScript在web自动化测试中的作用
    Python使用Pandas高效处理测试数据
    git update-index --assume-unchanged忽略跟踪
    git reset三种模式
    Python Unittest根据不同测试环境跳过用例详解
    python ddt 实现数据驱动
    测试用例重要性暨动端测试用例设计总结
    jenkins执行selenium自动化测试浏览器不显示解决方法
    《过目不忘的读书法》 读书笔记
    memcached 学习
  • 原文地址:https://www.cnblogs.com/qiuzy1209/p/12827201.html
Copyright © 2011-2022 走看看