zoukankan      html  css  js  c++  java
  • Python3中Urllib库基本使用

    什么是Urllib?

    Python内置的HTTP请求库

    urllib.request          请求模块

    urllib.error              异常处理模块

    urllib.parse             url解析模块

    urllib.robotparser    robots.txt解析模块

    相比Python的变化

    Python2中的urllib2在Python3中被统一移动到了urllib.request中

    python2

    import urllib2

    response = urllib2.urlopen('http://www.cnblogs.com/0bug')

    Python3

    import urllib.request

    response = urllib.request.urlopen('http://www.cnblogs.com/0bug/')

    urlopen()

    不加data是以GET方式发送,加data是以POST发送

    import urllib.request
    
    response = urllib.request.urlopen('http://www.cnblogs.com/0bug')
    html = response.read().decode('utf-8')
    print(html)
    
    <!DOCTYPE html>
    <html lang="zh-cn">
    <head>
    <meta charset="utf-8"/>
    <meta name="viewport" content="width=device-width, initial-scale=1" />
    <title>0bug - 博客园</title>
    <link type="text/css" rel="stylesheet" href="/bundles/blog-common.css?v=-hy83QNg62d4qYibixJzxMJkbf1P9fTBlqv7SK5zVL01"/>
    <link id="MainCss" type="text/css" rel="stylesheet" href="/skins/KJC/bundle-KJC.css?v=SBtLze_k2f8QMx9yu0UzPZOmkUXedeg_e6WBRIadVBo1"/>
    <link type="text/css" rel="stylesheet" href="/blog/customcss/314654.css?v=SL7ok7Br9Wq1UADrprqW%2fnQ%2bFQI%3d"/>
    <link id="mobile-style" media="only screen and (max- 767px)" type="text/css" rel="stylesheet" href="/skins/KJC/bundle-KJC-mobile.css?v=d9LctKHRIQp9rreugMcQ1-UJuq_j1fo0GZXTXj8Bqrk1"/>
    <link title="RSS" type="application/rss+xml" rel="alternate" href="http://www.cnblogs.com/0bug/rss"/>
    <link title="RSD" type="application/rsd+xml" rel="EditURI" href="http://www.cnblogs.com/0bug/rsd.xml"/>
    <link type="application/wlwmanifest+xml" rel="wlwmanifest" href="http://www.cnblogs.com/0bug/wlwmanifest.xml"/>
    <script src="//common.cnblogs.com/scripts/jquery-2.2.0.min.js"></script>
    <script type="text/javascript">var currentBlogApp = '0bug', cb_enable_mathjax=false;var isLogined=false;</script>
    <script src="/bundles/blog-common.js?v=taItysi72HxMPeH9Xg5nAYabRul6hhgahi3tVIMIKV81" type="text/javascript"></script>
    </head>
    <body>
    <a name="top"></a>
    <!--PageBeginHtml Block Begin-->
    <!--模拟知乎的回到顶部开始-->
     
    <style>
     
    div.go-top {
        display: none;
        opacity: 0.6;
        z-index: 999999;
        position: fixed;
        bottom: 113px;
        left: 90%;
        margin-left: 40px;
        border: 1px solid #a38a54;
         38px;
        height: 38px;
        background-color: #ffffff;
        border-radius: 3px;
        cursor: pointer;
    }
     
    div.go-top:hover {
        opacity: 1;
        filter: alpha(opacity=100);
    }
     
    div.go-top div.arrow {
        position: absolute;
        left: 10px;
        top: -1px;
         0;
        height: 0;
        border: 9px solid transparent;
        border-bottom-color: #9aaabf;
    }
     
    div.go-top div.stick {
        position: absolute;
        left: 15px;
        top: 15px;
         8px;
        height: 14px;
        display: block;
        background-color: #9aaabf;
        -webkit-border-radius: 1px;
        -moz-border-radius: 1px;
        border-radius: 1px;
    }
     
    </style>
     
    <script type="text/javascript">
    $(function() {
        $(window).scroll(function() {
            if ($(window).scrollTop() >600)
                $('div.go-top').show();
            else
                $('div.go-top').hide();
        });
        $('div.go-top').click(function() {
            $('html, body').animate({scrollTop: 0}, 600);
        });
    });
    </script>
    <body>
        <div class="go-top">
            <div class="arrow"></div>
            <div class="stick"></div>
        </div>
    </body>
     
    <!--模拟知乎的回到顶部结束-->
    
    <!--框架引入-->
    <link href="https://cdn.bootcss.com/bootstrap/3.3.7/css/bootstrap.min.css" rel="stylesheet">
    <!--PageBeginHtml Block End-->
    
    <table class="Framework" cellspacing="0" cellpadding="0" width="100%">
        <tr>
            <td colspan="3">
                
    <div id="top">
    <table cellpadding="10" cellspacing="0">
        <tr>
            <td nowrap>
                <h1><a id="Header1_HeaderTitle" class="headermaintitle" href="http://www.cnblogs.com/0bug/"></a></h1>
                
            </td>
        </tr>
    </table>
    </div>
    <div id="sub">
        <div id="sub-right"><div id="blog_stats">
    <div class="BlogStats">posts - 209, comments - 3, trackbacks - 0, articles - 44</div></div></div>
        
    &nbsp;
    <a id="blog_nav_sitehome" href="http://www.cnblogs.com/">博客园</a> :: 
    <a id="blog_nav_myhome" href="http://www.cnblogs.com/0bug/">首页</a> ::
    <a id="blog_nav_newpost" rel="nofollow" href="https://i.cnblogs.com/EditPosts.aspx?opt=1">新随笔</a> ::
    <a id="blog_nav_contact" accesskey="9" rel="nofollow" href="https://msg.cnblogs.com/send/0bug">联系</a> ::
    <a id="blog_nav_rss" href="http://www.cnblogs.com/0bug/rss">订阅</a>
    <a id="blog_nav_rss_image" class="XMLLink" href="http://www.cnblogs.com/0bug/rss"><img src="//www.cnblogs.com/images/xml.gif" alt="订阅" /></a> ::
    <a id="blog_nav_admin" rel="nofollow" href="https://i.cnblogs.com/">管理</a>
    
    </div>
            </td>
        </tr>
        <tr>
            <td class="LeftCell">
                <div id="leftmenu">
                    
                        <div id="blog-calendar" style="display:none"></div><script type="text/javascript">loadBlogDefaultCalendar();</script>
                        
    <div id=cell>
    <img src="/Skins/KJC/Images/icon-group.jpg" hspace=5 align=left vspace=2><h3>公告</h3>
    <div id=news>
        <div id="blog-news"></div><script type="text/javascript">loadBlogNews();</script>
    </div>
    </div>
    
                    
                    <div id="blog-sidecolumn"></div><script type="text/javascript">loadBlogSideColumn();</script>
                </div>
            </td>
            <td class="MainCell">
                <div id="main">
                    
    
    <p class="date">
        <span>
            <a id="homepage1_HomePageDays_ctl00_ImageLink" Title="Day Archive" href="http://www.cnblogs.com/0bug/" style="display:inline-block;">置顶随笔</a>
        </span>
    </p>
    
    
            <div class="post">
                <div class="posthead">
                    <h2 style="padding-top: 4px; padding-bottom: 4px;">
                        <a id="homepage1_HomePageDays_ctl00_DayList_TitleUrl_0" href="http://www.cnblogs.com/0bug/p/8788518.html">[置顶]Python开发工程师技术手记</a></h2>
                </div>
                <div class="postbody">
                    <div class="cnblogs-post-body" id="postlist_postbody_8788518">正文内容加载中...</div><script type="text/javascript">getBlogPostBody(8788518);</script></div>
    
                <p class="postfoot">
                    posted @ 2018-04-11 01:56 0bug 阅读(10) 评论(0)  <a href ="https://i.cnblogs.com/EditPosts.aspx?postid=8788518" rel="nofollow">编辑</a>
                </p>
            </div>
        
    
    <p class="date">
        <span>
            <a id="homepage1_HomePageDays_DaysList_ctl00_ImageLink" Title="Day Archive" href="//www.cnblogs.com/0bug/archive/2018/04/20.html" style="display:inline-block;">2018年4月20日</a>
        </span>
    </p>
    
    
            <div class="post">
                <div class="posthead">
                    <h2 style="padding-top: 4px; padding-bottom: 4px;">
                        <a id="homepage1_HomePageDays_DaysList_ctl00_DayList_TitleUrl_0" href="http://www.cnblogs.com/0bug/p/8893038.html">HTTP协议请求头信息和响应头信息</a></h2>
                </div>
                <div class="postbody">
                    <div class="c_b_p_desc">摘要: http的请求部分 基本结构 常用请头信息 Accept:text/html,image/*(告诉服务器,浏览器可以接受文本,网页图片) Accept-Charaset:ISO-8859-1 [接受字符编码:iso-8859-1] Accept-Encoding:gzip,compress[可以接受<a href="http://www.cnblogs.com/0bug/p/8893038.html" class="c_b_p_desc_readmore">阅读全文</a></div></div>
    
                <p class="postfoot">
                    posted @ 2018-04-20 19:12 0bug 阅读(2) 评论(0)  <a href ="https://i.cnblogs.com/EditPosts.aspx?postid=8893038" rel="nofollow">编辑</a>
                </p>
            </div>
        
            <div class="post">
                <div class="posthead">
                    <h2 style="padding-top: 4px; padding-bottom: 4px;">
                        <a id="homepage1_HomePageDays_DaysList_ctl00_DayList_TitleUrl_1" href="http://www.cnblogs.com/0bug/p/8892959.html">HTTP协议中GET和POST方法的区别</a></h2>
                </div>
                <div class="postbody">
                    <div class="c_b_p_desc">摘要: 最直观的区别就是GET把参数包含在URL中,POST通过request body传递参数。 GET在浏览器回退时是无害的,而POST会再次提交请求。 GET产生的URL地址可以被Bookmark,而POST不可以。 GET请求会被浏览器主动cache,而POST不会,除非手动设置。 GET请求只能进<a href="http://www.cnblogs.com/0bug/p/8892959.html" class="c_b_p_desc_readmore">阅读全文</a></div></div>
    
                <p class="postfoot">
                    posted @ 2018-04-20 18:59 0bug 阅读(2) 评论(0)  <a href ="https://i.cnblogs.com/EditPosts.aspx?postid=8892959" rel="nofollow">编辑</a>
                </p>
            </div>
        
            <div class="post">
                <div class="posthead">
                    <h2 style="padding-top: 4px; padding-bottom: 4px;">
                        <a id="homepage1_HomePageDays_DaysList_ctl00_DayList_TitleUrl_2" href="http://www.cnblogs.com/0bug/p/8892711.html">Redis环境安装</a></h2>
                </div>
                <div class="postbody">
                    <div class="c_b_p_desc">摘要: Windows下: 到https://github.com/MicrosoftArchive/redis/releases下载: 下载完成后一步一步安装就行。 然后在安装一个可视化工具:https://github.com/uglide/RedisDesktopManager Linux下安装以Ub<a href="http://www.cnblogs.com/0bug/p/8892711.html" class="c_b_p_desc_readmore">阅读全文</a></div></div>
    
                <p class="postfoot">
                    posted @ 2018-04-20 18:17 0bug 阅读(2) 评论(0)  <a href ="https://i.cnblogs.com/EditPosts.aspx?postid=8892711" rel="nofollow">编辑</a>
                </p>
            </div>
        
            <div class="post">
                <div class="posthead">
                    <h2 style="padding-top: 4px; padding-bottom: 4px;">
                        <a id="homepage1_HomePageDays_DaysList_ctl00_DayList_TitleUrl_3" href="http://www.cnblogs.com/0bug/p/8892714.html">自己动手,丰衣足食!Python3网络爬虫实战案例</a></h2>
                </div>
                <div class="postbody">
                    <div class="c_b_p_desc">摘要: 本教程是崔大大的爬虫实战教程的笔记:网易云课堂 Python3+Pip环境配置 Windows下安装Python:&#160;http://www.cnblogs.com/0bug/p/8228378.html Linux以Ubuntu为例,一般是自带的,只需配置一下默认版本:http://www.cnblo<a href="http://www.cnblogs.com/0bug/p/8892714.html" class="c_b_p_desc_readmore">阅读全文</a></div></div>
    
                <p class="postfoot">
                    posted @ 2018-04-20 18:17 0bug 阅读(2) 评论(0)  <a href ="https://i.cnblogs.com/EditPosts.aspx?postid=8892714" rel="nofollow">编辑</a>
                </p>
            </div>
        
    
    <p class="date">
        <span>
            <a id="homepage1_HomePageDays_DaysList_ctl01_ImageLink" Title="Day Archive" href="//www.cnblogs.com/0bug/archive/2018/04/19.html" style="display:inline-block;">2018年4月19日</a>
        </span>
    </p>
    
    
            <div class="post">
                <div class="posthead">
                    <h2 style="padding-top: 4px; padding-bottom: 4px;">
                        <a id="homepage1_HomePageDays_DaysList_ctl01_DayList_TitleUrl_0" href="http://www.cnblogs.com/0bug/p/8886663.html">Python Flask 构建微电影视频网站</a></h2>
                </div>
                <div class="postbody">
                    <div class="c_b_p_desc">摘要: 前言 学完本教程,你将掌握: 1.学会使用整形、浮点型、路径型、字符串型正则表达式路由转化器 2.学会使用post与get请求、上传文件、cookie获取与相应、404处理 3.学会适应模板自动转义、定义过滤器、定义全局上下文处理器、JinJa2语法、包含、继承、定义宏 4.学会使用flask-wt<a href="http://www.cnblogs.com/0bug/p/8886663.html" class="c_b_p_desc_readmore">阅读全文</a></div></div>
    
                <p class="postfoot">
                    posted @ 2018-04-19 22:31 0bug 阅读(5) 评论(0)  <a href ="https://i.cnblogs.com/EditPosts.aspx?postid=8886663" rel="nofollow">编辑</a>
                </p>
            </div>
        
    
    <p class="date">
        <span>
            <a id="homepage1_HomePageDays_DaysList_ctl02_ImageLink" Title="Day Archive" href="//www.cnblogs.com/0bug/archive/2018/04/18.html" style="display:inline-block;">2018年4月18日</a>
        </span>
    </p>
    
    
            <div class="post">
                <div class="posthead">
                    <h2 style="padding-top: 4px; padding-bottom: 4px;">
                        <a id="homepage1_HomePageDays_DaysList_ctl02_DayList_TitleUrl_0" href="http://www.cnblogs.com/0bug/p/8877479.html">基于Token的身份验证——JWT</a></h2>
                </div>
                <div class="postbody">
                    <div class="c_b_p_desc">摘要: 初次了解JWT,很基础,高手勿喷。 基于Token的身份验证用来替代传统的cookie+session身份验证方法中的session。 JWT是啥? JWT就是一个字符串,经过加密处理与校验处理的字符串,形式为: A.B.C A由JWT头部信息header加密得到 B由JWT用到的身份验证信息jso<a href="http://www.cnblogs.com/0bug/p/8877479.html" class="c_b_p_desc_readmore">阅读全文</a></div></div>
    
                <p class="postfoot">
                    posted @ 2018-04-18 20:53 0bug 阅读(10) 评论(0)  <a href ="https://i.cnblogs.com/EditPosts.aspx?postid=8877479" rel="nofollow">编辑</a>
                </p>
            </div>
        
            <div class="post">
                <div class="posthead">
                    <h2 style="padding-top: 4px; padding-bottom: 4px;">
                        <a id="homepage1_HomePageDays_DaysList_ctl02_DayList_TitleUrl_1" href="http://www.cnblogs.com/0bug/p/8874818.html">同步(Synchronous)和异步(Asynchronous)</a></h2>
                </div>
                <div class="postbody">
                    <div class="c_b_p_desc">摘要: 同步、异步的概念 同步和异步通常用来形容一次方法调用。 同步方法调用一旦开始,调用者必须等到方法调用返回后,才能继续后续的行为。 异步方法调用更像一个消息传递,一旦开始,方法调用就会立即返回,调用者就可以继续后续的操作。而,异步方法通常会在另外一个线程中,“真实”地执行着。整个过程,不会阻碍调用者的<a href="http://www.cnblogs.com/0bug/p/8874818.html" class="c_b_p_desc_readmore">阅读全文</a></div></div>
    
                <p class="postfoot">
                    posted @ 2018-04-18 14:52 0bug 阅读(2) 评论(0)  <a href ="https://i.cnblogs.com/EditPosts.aspx?postid=8874818" rel="nofollow">编辑</a>
                </p>
            </div>
        
            <div class="post">
                <div class="posthead">
                    <h2 style="padding-top: 4px; padding-bottom: 4px;">
                        <a id="homepage1_HomePageDays_DaysList_ctl02_DayList_TitleUrl_2" href="http://www.cnblogs.com/0bug/p/8872802.html">Window 通过cmd查看端口占用、相应进程、杀死进程等的命令</a></h2>
                </div>
                <div class="postbody">
                    <div class="c_b_p_desc">摘要: 一、 查看所有进程占用的端口 在开始-运行-cmd,输入:netstat –ano可以查看所有进程 二、查看占用指定端口的程序 当你在用tomcat发布程序时,经常会遇到端口被占用的情况,我们想知道是哪个程序或进程占用了端口,可以用该命令 netstat –ano|findstr “指定端口号” 二<a href="http://www.cnblogs.com/0bug/p/8872802.html" class="c_b_p_desc_readmore">阅读全文</a></div></div>
    
                <p class="postfoot">
                    posted @ 2018-04-18 10:58 0bug 阅读(4) 评论(0)  <a href ="https://i.cnblogs.com/EditPosts.aspx?postid=8872802" rel="nofollow">编辑</a>
                </p>
            </div>
        
    
    <p class="date">
        <span>
            <a id="homepage1_HomePageDays_DaysList_ctl03_ImageLink" Title="Day Archive" href="//www.cnblogs.com/0bug/archive/2018/04/16.html" style="display:inline-block;">2018年4月16日</a>
        </span>
    </p>
    
    
            <div class="post">
                <div class="posthead">
                    <h2 style="padding-top: 4px; padding-bottom: 4px;">
                        <a id="homepage1_HomePageDays_DaysList_ctl03_DayList_TitleUrl_0" href="http://www.cnblogs.com/0bug/p/8855883.html">优秀博客收录</a></h2>
                </div>
                <div class="postbody">
                    <div class="c_b_p_desc">摘要: 最新Django2.0.1在线教育零基础到上线教程 :https://www.jianshu.com/nb/21010157 vue+django2.0.2-rest-framework生鲜超市 :https://www.jianshu.com/nb/22309475<a href="http://www.cnblogs.com/0bug/p/8855883.html" class="c_b_p_desc_readmore">阅读全文</a></div></div>
    
                <p class="postfoot">
                    posted @ 2018-04-16 14:53 0bug 阅读(4) 评论(0)  <a href ="https://i.cnblogs.com/EditPosts.aspx?postid=8855883" rel="nofollow">编辑</a>
                </p>
            </div>
        
            <div class="post">
                <div class="posthead">
                    <h2 style="padding-top: 4px; padding-bottom: 4px;">
                        <a id="homepage1_HomePageDays_DaysList_ctl03_DayList_TitleUrl_1" href="http://www.cnblogs.com/0bug/p/8853596.html">查找Python项目依赖的库并生成requirements.txt</a></h2>
                </div>
                <div class="postbody">
                    <div class="c_b_p_desc">摘要: 使用pip freeze 这种方式配合virtualenv 才好使,否则把整个环境中的包都列出来了。 使用&#160;pipreqs 这个工具的好处是可以通过对项目目录的扫描,自动发现使用了那些类库,自动生成依赖清单。<a href="http://www.cnblogs.com/0bug/p/8853596.html" class="c_b_p_desc_readmore">阅读全文</a></div></div>
    
                <p class="postfoot">
                    posted @ 2018-04-16 08:24 0bug 阅读(6) 评论(0)  <a href ="https://i.cnblogs.com/EditPosts.aspx?postid=8853596" rel="nofollow">编辑</a>
                </p>
            </div>
        
    
    <div class="topicListFooter"><div id="nav_next_page"><a href="http://www.cnblogs.com/0bug/default.html?page=2">下一页</a></div></div>
    
    
                </div>
            </td>
        </tr>
        <tr>
            <td colspan="2" class="FooterCell">
                
    <p id="footer">
        Powered by: 
        <br />
        
        <a id="Footer1_Hyperlink3" NAME="Hyperlink1" href="http://www.cnblogs.com/"><font face="Verdana">博客园</font></a>
        <br />
        Copyright &copy; 0bug
    </p>
    
            </td>
        </tr>
    </table>
    
    <!--PageEndHtml Block Begin-->
    <!--自动生成目录-->
    <script language="javascript" type="text/javascript">
        //生成目录索引列表
        function GenerateContentList() {
            var jquery_h2_list = $('#cnblogs_post_body h2');//如果你的章节标题不是h2,只需要将这里的h2换掉即可
            if (jquery_h2_list.length > 0) {
                var content = '<a name="_labelTop"></a>';
                content += '<div id="navCategory">';
                content += '<p style="font-size:18px"><b>阅读目录</b></p>';
                content += '<ul>';
                for (var i = 0; i < jquery_h2_list.length; i++) {
                    var go_to_top = '<div style="text-align: right"><a href="#_labelTop"></a><a name="_label' + i + '"></a></div>';
                    $(jquery_h2_list[i]).before(go_to_top);
                    var li_content = '<li><a href="#_label' + i + '">' + $(jquery_h2_list[i]).text() + '</a></li>';
                    content += li_content;
                }
                content += '</ul>';
                content += '</div>';
                if ($('#cnblogs_post_body').length != 0) {
                    $($('#cnblogs_post_body')[0]).prepend(content);
                }
            }
        }
        GenerateContentList();
    </script>
    
    <script language="javascript" type="text/javascript">
    document.getElementById('footer').innerText='Life is short, you need Python';
    </script>
    <!--PageEndHtml Block End-->
    </body>
    </html>
    结果

    加data发送POST请求

    import urllib.parse
    import urllib.request
    
    data = bytes(urllib.parse.urlencode({'hello': '0bug'}), encoding='utf-8')
    response = urllib.request.urlopen('http://httpbin.org/post', data=data)
    print(response.read())
    
    1 b'{\n  "args": {}, \n  "data": "", \n  "files": {}, \n  "form": {\n    "hello": "0bug"\n  }, \n  "headers": {\n    "Accept-Encoding": "identity", \n    "Connection": "close", \n    "Content-Length": "10", \n    "Content-Type": "application/x-www-form-urlencoded", \n    "Host": "httpbin.org", \n    "User-Agent": "Python-urllib/3.6"\n  }, \n  "json": null, \n  "origin": "223.72.80.199", \n  "url": "http://httpbin.org/post"\n}\n'
    结果

    timeout超时间

    import urllib.request
    
    response = urllib.request.urlopen('http://www.cnblogs.com/0bug', timeout=0.01)
    print(response.read())
    
    urllib.error.URLError: <urlopen error timed out>
    结果
    import urllib.request
    import socket
    import urllib.error
    try:
        response = urllib.request.urlopen('http://www.cnblogs.com/0bug', timeout=0.01)
    except urllib.error.URLError as  e:
        if isinstance(e.reason,socket.timeout):
            print('请求超时')
    
    请求超时
    结果

    响应

    1.响应类型

    import urllib.request
    
    response = urllib.request.urlopen('http://www.cnblogs.com/0bug')
    print(type(response))
    
    <class 'http.client.HTTPResponse'>
    结果

    2.状态码、响应头

    import urllib.request
    
    response = urllib.request.urlopen('http://www.cnblogs.com/0bug')
    print(response.status)
    print(response.getheaders())
    print(response.getheader('Content-Type'))
    
    200
    [('Date', 'Fri, 20 Apr 2018 13:24:01 GMT'), ('Content-Type', 'text/html; charset=utf-8'), ('Content-Length', '19306'), ('Connection', 'close'), ('Vary', 'Accept-Encoding'), ('Cache-Control', 'private, max-age=10'), ('Expires', 'Fri, 20 Apr 2018 13:24:11 GMT'), ('Last-Modified', 'Fri, 20 Apr 2018 13:24:01 GMT'), ('X-UA-Compatible', 'IE=10'), ('X-Frame-Options', 'SAMEORIGIN')]
    text/html; charset=utf-8
    结果

    3.响应体

    响应体是字节流,需要decode('utf-8')

    import urllib.request
    
    response = urllib.request.urlopen('http://www.cnblogs.com/0bug')
    html = response.read().decode('utf-8')
    print(html)
    

    Request

    import urllib.request
    
    request = urllib.request.Request('http://www.cnblogs.com/0bug')
    response = urllib.request.urlopen(request)
    print(response.read().decode('utf-8'))
    
    <!DOCTYPE html>
    <html lang="zh-cn">
    <head>
    <meta charset="utf-8"/>
    <meta name="viewport" content="width=device-width, initial-scale=1" />
    <title>0bug - 博客园</title>
    <link type="text/css" rel="stylesheet" href="/bundles/blog-common.css?v=-hy83QNg62d4qYibixJzxMJkbf1P9fTBlqv7SK5zVL01"/>
    <link id="MainCss" type="text/css" rel="stylesheet" href="/skins/KJC/bundle-KJC.css?v=SBtLze_k2f8QMx9yu0UzPZOmkUXedeg_e6WBRIadVBo1"/>
    <link type="text/css" rel="stylesheet" href="/blog/customcss/314654.css?v=SL7ok7Br9Wq1UADrprqW%2fnQ%2bFQI%3d"/>
    <link id="mobile-style" media="only screen and (max- 767px)" type="text/css" rel="stylesheet" href="/skins/KJC/bundle-KJC-mobile.css?v=d9LctKHRIQp9rreugMcQ1-UJuq_j1fo0GZXTXj8Bqrk1"/>
    <link title="RSS" type="application/rss+xml" rel="alternate" href="http://www.cnblogs.com/0bug/rss"/>
    <link title="RSD" type="application/rsd+xml" rel="EditURI" href="http://www.cnblogs.com/0bug/rsd.xml"/>
    <link type="application/wlwmanifest+xml" rel="wlwmanifest" href="http://www.cnblogs.com/0bug/wlwmanifest.xml"/>
    <script src="//common.cnblogs.com/scripts/jquery-2.2.0.min.js"></script>
    <script type="text/javascript">var currentBlogApp = '0bug', cb_enable_mathjax=false;var isLogined=false;</script>
    <script src="/bundles/blog-common.js?v=taItysi72HxMPeH9Xg5nAYabRul6hhgahi3tVIMIKV81" type="text/javascript"></script>
    </head>
    <body>
    <a name="top"></a>
    <!--PageBeginHtml Block Begin-->
    <!--模拟知乎的回到顶部开始-->
     
    <style>
     
    div.go-top {
        display: none;
        opacity: 0.6;
        z-index: 999999;
        position: fixed;
        bottom: 113px;
        left: 90%;
        margin-left: 40px;
        border: 1px solid #a38a54;
         38px;
        height: 38px;
        background-color: #ffffff;
        border-radius: 3px;
        cursor: pointer;
    }
     
    div.go-top:hover {
        opacity: 1;
        filter: alpha(opacity=100);
    }
     
    div.go-top div.arrow {
        position: absolute;
        left: 10px;
        top: -1px;
         0;
        height: 0;
        border: 9px solid transparent;
        border-bottom-color: #9aaabf;
    }
     
    div.go-top div.stick {
        position: absolute;
        left: 15px;
        top: 15px;
         8px;
        height: 14px;
        display: block;
        background-color: #9aaabf;
        -webkit-border-radius: 1px;
        -moz-border-radius: 1px;
        border-radius: 1px;
    }
     
    </style>
     
    <script type="text/javascript">
    $(function() {
        $(window).scroll(function() {
            if ($(window).scrollTop() >600)
                $('div.go-top').show();
            else
                $('div.go-top').hide();
        });
        $('div.go-top').click(function() {
            $('html, body').animate({scrollTop: 0}, 600);
        });
    });
    </script>
    <body>
        <div class="go-top">
            <div class="arrow"></div>
            <div class="stick"></div>
        </div>
    </body>
     
    <!--模拟知乎的回到顶部结束-->
    
    <!--框架引入-->
    <link href="https://cdn.bootcss.com/bootstrap/3.3.7/css/bootstrap.min.css" rel="stylesheet">
    <!--PageBeginHtml Block End-->
    
    <table class="Framework" cellspacing="0" cellpadding="0" width="100%">
        <tr>
            <td colspan="3">
                
    <div id="top">
    <table cellpadding="10" cellspacing="0">
        <tr>
            <td nowrap>
                <h1><a id="Header1_HeaderTitle" class="headermaintitle" href="http://www.cnblogs.com/0bug/"></a></h1>
                
            </td>
        </tr>
    </table>
    </div>
    <div id="sub">
        <div id="sub-right"><div id="blog_stats">
    <div class="BlogStats">posts - 209, comments - 3, trackbacks - 0, articles - 44</div></div></div>
        
    &nbsp;
    <a id="blog_nav_sitehome" href="http://www.cnblogs.com/">博客园</a> :: 
    <a id="blog_nav_myhome" href="http://www.cnblogs.com/0bug/">首页</a> ::
    <a id="blog_nav_newpost" rel="nofollow" href="https://i.cnblogs.com/EditPosts.aspx?opt=1">新随笔</a> ::
    <a id="blog_nav_contact" accesskey="9" rel="nofollow" href="https://msg.cnblogs.com/send/0bug">联系</a> ::
    <a id="blog_nav_rss" href="http://www.cnblogs.com/0bug/rss">订阅</a>
    <a id="blog_nav_rss_image" class="XMLLink" href="http://www.cnblogs.com/0bug/rss"><img src="//www.cnblogs.com/images/xml.gif" alt="订阅" /></a> ::
    <a id="blog_nav_admin" rel="nofollow" href="https://i.cnblogs.com/">管理</a>
    
    </div>
            </td>
        </tr>
        <tr>
            <td class="LeftCell">
                <div id="leftmenu">
                    
                        <div id="blog-calendar" style="display:none"></div><script type="text/javascript">loadBlogDefaultCalendar();</script>
                        
    <div id=cell>
    <img src="/Skins/KJC/Images/icon-group.jpg" hspace=5 align=left vspace=2><h3>公告</h3>
    <div id=news>
        <div id="blog-news"></div><script type="text/javascript">loadBlogNews();</script>
    </div>
    </div>
    
                    
                    <div id="blog-sidecolumn"></div><script type="text/javascript">loadBlogSideColumn();</script>
                </div>
            </td>
            <td class="MainCell">
                <div id="main">
                    
    
    <p class="date">
        <span>
            <a id="homepage1_HomePageDays_ctl00_ImageLink" Title="Day Archive" href="http://www.cnblogs.com/0bug/" style="display:inline-block;">置顶随笔</a>
        </span>
    </p>
    
    
            <div class="post">
                <div class="posthead">
                    <h2 style="padding-top: 4px; padding-bottom: 4px;">
                        <a id="homepage1_HomePageDays_ctl00_DayList_TitleUrl_0" href="http://www.cnblogs.com/0bug/p/8788518.html">[置顶]Python开发工程师技术手记</a></h2>
                </div>
                <div class="postbody">
                    <div class="cnblogs-post-body" id="postlist_postbody_8788518">正文内容加载中...</div><script type="text/javascript">getBlogPostBody(8788518);</script></div>
    
                <p class="postfoot">
                    posted @ 2018-04-11 01:56 0bug 阅读(10) 评论(0)  <a href ="https://i.cnblogs.com/EditPosts.aspx?postid=8788518" rel="nofollow">编辑</a>
                </p>
            </div>
        
    
    <p class="date">
        <span>
            <a id="homepage1_HomePageDays_DaysList_ctl00_ImageLink" Title="Day Archive" href="//www.cnblogs.com/0bug/archive/2018/04/20.html" style="display:inline-block;">2018年4月20日</a>
        </span>
    </p>
    
    
            <div class="post">
                <div class="posthead">
                    <h2 style="padding-top: 4px; padding-bottom: 4px;">
                        <a id="homepage1_HomePageDays_DaysList_ctl00_DayList_TitleUrl_0" href="http://www.cnblogs.com/0bug/p/8893677.html">Urllib库基本使用</a></h2>
                </div>
                <div class="postbody">
                    <div class="c_b_p_desc">摘要: 什么是Urllib? Python内置的HTTP请求库 urllib.request 请求模块 urllib.error 异常处理模块 urllib.parse url解析模块 urllib.robotparser robots.txt解析模块 相比Python的变化 Python2中的urllib<a href="http://www.cnblogs.com/0bug/p/8893677.html" class="c_b_p_desc_readmore">阅读全文</a></div></div>
    
                <p class="postfoot">
                    posted @ 2018-04-20 21:17 0bug 阅读(2) 评论(0)  <a href ="https://i.cnblogs.com/EditPosts.aspx?postid=8893677" rel="nofollow">编辑</a>
                </p>
            </div>
        
            <div class="post">
                <div class="posthead">
                    <h2 style="padding-top: 4px; padding-bottom: 4px;">
                        <a id="homepage1_HomePageDays_DaysList_ctl00_DayList_TitleUrl_1" href="http://www.cnblogs.com/0bug/p/8893038.html">HTTP协议请求头信息和响应头信息</a></h2>
                </div>
                <div class="postbody">
                    <div class="c_b_p_desc">摘要: http的请求部分 基本结构 常用请头信息 Accept:text/html,image/*(告诉服务器,浏览器可以接受文本,网页图片) Accept-Charaset:ISO-8859-1 [接受字符编码:iso-8859-1] Accept-Encoding:gzip,compress[可以接受<a href="http://www.cnblogs.com/0bug/p/8893038.html" class="c_b_p_desc_readmore">阅读全文</a></div></div>
    
                <p class="postfoot">
                    posted @ 2018-04-20 19:12 0bug 阅读(2) 评论(0)  <a href ="https://i.cnblogs.com/EditPosts.aspx?postid=8893038" rel="nofollow">编辑</a>
                </p>
            </div>
        
            <div class="post">
                <div class="posthead">
                    <h2 style="padding-top: 4px; padding-bottom: 4px;">
                        <a id="homepage1_HomePageDays_DaysList_ctl00_DayList_TitleUrl_2" href="http://www.cnblogs.com/0bug/p/8892959.html">HTTP协议中GET和POST方法的区别</a></h2>
                </div>
                <div class="postbody">
                    <div class="c_b_p_desc">摘要: 最直观的区别就是GET把参数包含在URL中,POST通过request body传递参数。 GET在浏览器回退时是无害的,而POST会再次提交请求。 GET产生的URL地址可以被Bookmark,而POST不可以。 GET请求会被浏览器主动cache,而POST不会,除非手动设置。 GET请求只能进<a href="http://www.cnblogs.com/0bug/p/8892959.html" class="c_b_p_desc_readmore">阅读全文</a></div></div>
    
                <p class="postfoot">
                    posted @ 2018-04-20 18:59 0bug 阅读(2) 评论(0)  <a href ="https://i.cnblogs.com/EditPosts.aspx?postid=8892959" rel="nofollow">编辑</a>
                </p>
            </div>
        
            <div class="post">
                <div class="posthead">
                    <h2 style="padding-top: 4px; padding-bottom: 4px;">
                        <a id="homepage1_HomePageDays_DaysList_ctl00_DayList_TitleUrl_3" href="http://www.cnblogs.com/0bug/p/8892711.html">Redis环境安装</a></h2>
                </div>
                <div class="postbody">
                    <div class="c_b_p_desc">摘要: Windows下: 到https://github.com/MicrosoftArchive/redis/releases下载: 下载完成后一步一步安装就行。 然后在安装一个可视化工具:https://github.com/uglide/RedisDesktopManager Linux下安装以Ub<a href="http://www.cnblogs.com/0bug/p/8892711.html" class="c_b_p_desc_readmore">阅读全文</a></div></div>
    
                <p class="postfoot">
                    posted @ 2018-04-20 18:17 0bug 阅读(2) 评论(0)  <a href ="https://i.cnblogs.com/EditPosts.aspx?postid=8892711" rel="nofollow">编辑</a>
                </p>
            </div>
        
            <div class="post">
                <div class="posthead">
                    <h2 style="padding-top: 4px; padding-bottom: 4px;">
                        <a id="homepage1_HomePageDays_DaysList_ctl00_DayList_TitleUrl_4" href="http://www.cnblogs.com/0bug/p/8892714.html">自己动手,丰衣足食!Python3网络爬虫实战案例</a></h2>
                </div>
                <div class="postbody">
                    <div class="c_b_p_desc">摘要: 本教程是崔大大的爬虫实战教程的笔记:网易云课堂 Python3+Pip环境配置 Windows下安装Python:&#160;http://www.cnblogs.com/0bug/p/8228378.html Linux以Ubuntu为例,一般是自带的,只需配置一下默认版本:http://www.cnblo<a href="http://www.cnblogs.com/0bug/p/8892714.html" class="c_b_p_desc_readmore">阅读全文</a></div></div>
    
                <p class="postfoot">
                    posted @ 2018-04-20 18:17 0bug 阅读(2) 评论(0)  <a href ="https://i.cnblogs.com/EditPosts.aspx?postid=8892714" rel="nofollow">编辑</a>
                </p>
            </div>
        
    
    <p class="date">
        <span>
            <a id="homepage1_HomePageDays_DaysList_ctl01_ImageLink" Title="Day Archive" href="//www.cnblogs.com/0bug/archive/2018/04/19.html" style="display:inline-block;">2018年4月19日</a>
        </span>
    </p>
    
    
            <div class="post">
                <div class="posthead">
                    <h2 style="padding-top: 4px; padding-bottom: 4px;">
                        <a id="homepage1_HomePageDays_DaysList_ctl01_DayList_TitleUrl_0" href="http://www.cnblogs.com/0bug/p/8886663.html">Python Flask 构建微电影视频网站</a></h2>
                </div>
                <div class="postbody">
                    <div class="c_b_p_desc">摘要: 前言 学完本教程,你将掌握: 1.学会使用整形、浮点型、路径型、字符串型正则表达式路由转化器 2.学会使用post与get请求、上传文件、cookie获取与相应、404处理 3.学会适应模板自动转义、定义过滤器、定义全局上下文处理器、JinJa2语法、包含、继承、定义宏 4.学会使用flask-wt<a href="http://www.cnblogs.com/0bug/p/8886663.html" class="c_b_p_desc_readmore">阅读全文</a></div></div>
    
                <p class="postfoot">
                    posted @ 2018-04-19 22:31 0bug 阅读(6) 评论(0)  <a href ="https://i.cnblogs.com/EditPosts.aspx?postid=8886663" rel="nofollow">编辑</a>
                </p>
            </div>
        
    
    <p class="date">
        <span>
            <a id="homepage1_HomePageDays_DaysList_ctl02_ImageLink" Title="Day Archive" href="//www.cnblogs.com/0bug/archive/2018/04/18.html" style="display:inline-block;">2018年4月18日</a>
        </span>
    </p>
    
    
            <div class="post">
                <div class="posthead">
                    <h2 style="padding-top: 4px; padding-bottom: 4px;">
                        <a id="homepage1_HomePageDays_DaysList_ctl02_DayList_TitleUrl_0" href="http://www.cnblogs.com/0bug/p/8877479.html">基于Token的身份验证——JWT</a></h2>
                </div>
                <div class="postbody">
                    <div class="c_b_p_desc">摘要: 初次了解JWT,很基础,高手勿喷。 基于Token的身份验证用来替代传统的cookie+session身份验证方法中的session。 JWT是啥? JWT就是一个字符串,经过加密处理与校验处理的字符串,形式为: A.B.C A由JWT头部信息header加密得到 B由JWT用到的身份验证信息jso<a href="http://www.cnblogs.com/0bug/p/8877479.html" class="c_b_p_desc_readmore">阅读全文</a></div></div>
    
                <p class="postfoot">
                    posted @ 2018-04-18 20:53 0bug 阅读(10) 评论(0)  <a href ="https://i.cnblogs.com/EditPosts.aspx?postid=8877479" rel="nofollow">编辑</a>
                </p>
            </div>
        
            <div class="post">
                <div class="posthead">
                    <h2 style="padding-top: 4px; padding-bottom: 4px;">
                        <a id="homepage1_HomePageDays_DaysList_ctl02_DayList_TitleUrl_1" href="http://www.cnblogs.com/0bug/p/8874818.html">同步(Synchronous)和异步(Asynchronous)</a></h2>
                </div>
                <div class="postbody">
                    <div class="c_b_p_desc">摘要: 同步、异步的概念 同步和异步通常用来形容一次方法调用。 同步方法调用一旦开始,调用者必须等到方法调用返回后,才能继续后续的行为。 异步方法调用更像一个消息传递,一旦开始,方法调用就会立即返回,调用者就可以继续后续的操作。而,异步方法通常会在另外一个线程中,“真实”地执行着。整个过程,不会阻碍调用者的<a href="http://www.cnblogs.com/0bug/p/8874818.html" class="c_b_p_desc_readmore">阅读全文</a></div></div>
    
                <p class="postfoot">
                    posted @ 2018-04-18 14:52 0bug 阅读(2) 评论(0)  <a href ="https://i.cnblogs.com/EditPosts.aspx?postid=8874818" rel="nofollow">编辑</a>
                </p>
            </div>
        
            <div class="post">
                <div class="posthead">
                    <h2 style="padding-top: 4px; padding-bottom: 4px;">
                        <a id="homepage1_HomePageDays_DaysList_ctl02_DayList_TitleUrl_2" href="http://www.cnblogs.com/0bug/p/8872802.html">Window 通过cmd查看端口占用、相应进程、杀死进程等的命令</a></h2>
                </div>
                <div class="postbody">
                    <div class="c_b_p_desc">摘要: 一、 查看所有进程占用的端口 在开始-运行-cmd,输入:netstat –ano可以查看所有进程 二、查看占用指定端口的程序 当你在用tomcat发布程序时,经常会遇到端口被占用的情况,我们想知道是哪个程序或进程占用了端口,可以用该命令 netstat –ano|findstr “指定端口号” 二<a href="http://www.cnblogs.com/0bug/p/8872802.html" class="c_b_p_desc_readmore">阅读全文</a></div></div>
    
                <p class="postfoot">
                    posted @ 2018-04-18 10:58 0bug 阅读(4) 评论(0)  <a href ="https://i.cnblogs.com/EditPosts.aspx?postid=8872802" rel="nofollow">编辑</a>
                </p>
            </div>
        
    
    <p class="date">
        <span>
            <a id="homepage1_HomePageDays_DaysList_ctl03_ImageLink" Title="Day Archive" href="//www.cnblogs.com/0bug/archive/2018/04/16.html" style="display:inline-block;">2018年4月16日</a>
        </span>
    </p>
    
    
            <div class="post">
                <div class="posthead">
                    <h2 style="padding-top: 4px; padding-bottom: 4px;">
                        <a id="homepage1_HomePageDays_DaysList_ctl03_DayList_TitleUrl_0" href="http://www.cnblogs.com/0bug/p/8855883.html">优秀博客收录</a></h2>
                </div>
                <div class="postbody">
                    <div class="c_b_p_desc">摘要: 最新Django2.0.1在线教育零基础到上线教程 :https://www.jianshu.com/nb/21010157 vue+django2.0.2-rest-framework生鲜超市 :https://www.jianshu.com/nb/22309475<a href="http://www.cnblogs.com/0bug/p/8855883.html" class="c_b_p_desc_readmore">阅读全文</a></div></div>
    
                <p class="postfoot">
                    posted @ 2018-04-16 14:53 0bug 阅读(4) 评论(0)  <a href ="https://i.cnblogs.com/EditPosts.aspx?postid=8855883" rel="nofollow">编辑</a>
                </p>
            </div>
        
    
    <div class="topicListFooter"><div id="nav_next_page"><a href="http://www.cnblogs.com/0bug/default.html?page=2">下一页</a></div></div>
    
    
                </div>
            </td>
        </tr>
        <tr>
            <td colspan="2" class="FooterCell">
                
    <p id="footer">
        Powered by: 
        <br />
        
        <a id="Footer1_Hyperlink3" NAME="Hyperlink1" href="http://www.cnblogs.com/"><font face="Verdana">博客园</font></a>
        <br />
        Copyright &copy; 0bug
    </p>
    
            </td>
        </tr>
    </table>
    
    <!--PageEndHtml Block Begin-->
    <!--自动生成目录-->
    <script language="javascript" type="text/javascript">
        //生成目录索引列表
        function GenerateContentList() {
            var jquery_h2_list = $('#cnblogs_post_body h2');//如果你的章节标题不是h2,只需要将这里的h2换掉即可
            if (jquery_h2_list.length > 0) {
                var content = '<a name="_labelTop"></a>';
                content += '<div id="navCategory">';
                content += '<p style="font-size:18px"><b>阅读目录</b></p>';
                content += '<ul>';
                for (var i = 0; i < jquery_h2_list.length; i++) {
                    var go_to_top = '<div style="text-align: right"><a href="#_labelTop"></a><a name="_label' + i + '"></a></div>';
                    $(jquery_h2_list[i]).before(go_to_top);
                    var li_content = '<li><a href="#_label' + i + '">' + $(jquery_h2_list[i]).text() + '</a></li>';
                    content += li_content;
                }
                content += '</ul>';
                content += '</div>';
                if ($('#cnblogs_post_body').length != 0) {
                    $($('#cnblogs_post_body')[0]).prepend(content);
                }
            }
        }
        GenerateContentList();
    </script>
    
    <script language="javascript" type="text/javascript">
    document.getElementById('footer').innerText='Life is short, you need Python';
    </script>
    <!--PageEndHtml Block End-->
    </body>
    </html>
    结果

    添加请求头信息

    from urllib import request, parse
    
    url = 'http://httpbin.org/post'
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36',
        'Host': 'httpbin.org'
    }
    dic = {'name': '0bug'}
    data = bytes(parse.urlencode(dic), encoding='utf-8')
    req = request.Request(url=url, data=data, headers=headers, method='POST')
    response = request.urlopen(req)
    print(response.read().decode('utf-8'))
    
    {
      "args": {}, 
      "data": "", 
      "files": {}, 
      "form": {
        "name": "0bug"
      }, 
      "headers": {
        "Accept-Encoding": "identity", 
        "Connection": "close", 
        "Content-Length": "9", 
        "Content-Type": "application/x-www-form-urlencoded", 
        "Host": "httpbin.org", 
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36"
      }, 
      "json": null, 
      "origin": "223.72.80.199", 
      "url": "http://httpbin.org/post"
    }
    结果

    add_header

    from urllib import request, parse
    
    url = 'http://httpbin.org/post'
    dic = {'name': '0bug'}
    data = bytes(parse.urlencode(dic), encoding='utf-8')
    req = request.Request(url=url, data=data, method='POST')
    req.add_header('User-Agent',
                   'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36')
    response = request.urlopen(req)
    print(response.read().decode('utf-8'))
    

    Handler

    代理:

    import urllib.request
    
    proxy_handler = urllib.request.ProxyHandler({
        'http': 'http代理',
        'https': 'https代理'
    })
    opener = urllib.request.build_opener(proxy_handler)
    response = opener.open('http://www.cnblogs.com/0bug')
    print(response.read())

    Cookie

    import http.cookiejar, urllib.request
    
    cookie = http.cookiejar.CookieJar()
    handler = urllib.request.HTTPCookieProcessor(cookie)
    opener = urllib.request.build_opener(handler)
    response = opener.open('http://www.baidu.com')
    for item in cookie:
        print(item.name + "=" + item.value)
    
    BAIDUID=9992D9F175AFE48958F04ED9F2DC3659:FG=1
    BIDUPSID=9992D9F175AFE48958F04ED9F2DC3659
    H_PS_PSSID=26255_1437_21100_20927
    PSTM=1524234978
    BDSVRTM=0
    BD_HOME=0
    结果

    Cookie保存为文件

    import http.cookiejar, urllib.request
    
    filename = 'cookie.txt'
    cookie = http.cookiejar.MozillaCookieJar(filename)
    handler = urllib.request.HTTPCookieProcessor(cookie)
    opener = urllib.request.build_opener(handler)
    response = opener.open('http://www.baidu.com')
    cookie.save(ignore_discard=True, ignore_expires=True)
    
    # Netscape HTTP Cookie File
    # http://curl.haxx.se/rfc/cookie_spec.html
    # This is a generated file!  Do not edit.
    
    .baidu.com    TRUE    /    FALSE    3671718969    BAIDUID    0020A8919F61B2CA58DEC1CA10FF140E:FG=1
    .baidu.com    TRUE    /    FALSE    3671718969    BIDUPSID    0020A8919F61B2CA58DEC1CA10FF140E
    .baidu.com    TRUE    /    FALSE        H_PS_PSSID    1442_21110_18560_22158
    .baidu.com    TRUE    /    FALSE    3671718969    PSTM    1524235322
    www.baidu.com    FALSE    /    FALSE        BDSVRTM    0
    www.baidu.com    FALSE    /    FALSE        BD_HOME    0
    cookie.txt

    另一种方式存

    import http.cookiejar, urllib.request
    
    filename = 'cookie.txt'
    cookie = http.cookiejar.LWPCookieJar(filename)
    handler = urllib.request.HTTPCookieProcessor(cookie)
    opener = urllib.request.build_opener(handler)
    response = opener.open('http://www.baidu.com')
    cookie.save(ignore_discard=True, ignore_expires=True)
    
    #LWP-Cookies-2.0
    Set-Cookie3: BAIDUID="654FB9A36A8F5B5C257111158153D963:FG=1"; path="/"; domain=".baidu.com"; path_spec; domain_dot; expires="2086-05-08 17:58:12Z"; version=0
    Set-Cookie3: BIDUPSID=654FB9A36A8F5B5C257111158153D963; path="/"; domain=".baidu.com"; path_spec; domain_dot; expires="2086-05-08 17:58:12Z"; version=0
    Set-Cookie3: H_PS_PSSID=1463_21111; path="/"; domain=".baidu.com"; path_spec; domain_dot; discard; version=0
    Set-Cookie3: PSTM=1524235445; path="/"; domain=".baidu.com"; path_spec; domain_dot; expires="2086-05-08 17:58:12Z"; version=0
    Set-Cookie3: BDSVRTM=0; path="/"; domain="www.baidu.com"; path_spec; discard; version=0
    Set-Cookie3: BD_HOME=0; path="/"; domain="www.baidu.com"; path_spec; discard; version=0
    cookie.txt

    用什么格式的存就应该用什么格式的读

    import http.cookiejar, urllib.request
    
    cookie = http.cookiejar.LWPCookieJar()
    cookie.load('cookie.txt', ignore_discard=True, ignore_expires=True)
    handler = urllib.request.HTTPCookieProcessor(cookie)
    opener = urllib.request.build_opener(handler)
    response = opener.open('http://www.baidu.com')
    print(response.read().decode('utf-8'))
    

    异常处理

    from urllib import request, error
    
    try:
        response = request.urlopen('http://www.cnblogs.com/0bug/xxxx')
    except error.URLError as e:
        print(e.reason)
    
    Not Found
    结果
    from urllib import request, error
    
    try:
        response = request.urlopen('http://www.cnblogs.com/0bug/xxxx')
    except error.HTTPError as e:
        print(e.reason, e.code, e.headers, sep='\n')
    except error.URLError as e:
        print(e.reason)
    else:
        print('Request Successfully')
    
    Not Found
    404
    Date: Fri, 20 Apr 2018 14:57:48 GMT
    Content-Type: text/html
    Content-Length: 759
    Connection: close
    Cache-Control: private
    X-UA-Compatible: IE=10
    X-Frame-Options: SAMEORIGIN
    结果
    import socket
    import urllib.request
    import urllib.error
    
    try:
        response = urllib.request.urlopen('http://www.cnblogs.com/0bug/xxxx', timeout=0.001)
    except urllib.error.URLError as e:
        print(type(e.reason))
        if isinstance(e.reason, socket.timeout):
            print('请求超时')
    
    <class 'socket.timeout'>
    请求超时
    结果

    URL解析

    from urllib.parse import urlparse
    
    result = urlparse('www.baidu.com/index.html;user?id=5#comment')
    print(type(result))
    print(result)
    
    <class 'urllib.parse.ParseResult'>
    ParseResult(scheme='', netloc='', path='www.baidu.com/index.html', params='user', query='id=5', fragment='comment')
    结果
    from urllib.parse import urlparse
    
    result = urlparse('www.baidu.com/index.html;user?id=5#comment', scheme='https')
    print(result)
    
    ParseResult(scheme='https', netloc='', path='www.baidu.com/index.html', params='user', query='id=5', fragment='comment')
    结果
    from urllib.parse import urlparse
    
    result = urlparse('http://www.baidu.com/index.html;user?id=5#comment', scheme='https')
    print(result)
    
    ParseResult(scheme='http', netloc='www.baidu.com', path='/index.html', params='user', query='id=5', fragment='comment')
    结果
    from urllib.parse import urlparse
    
    result = urlparse('http://www.badiu.com/index.html;user?id=5#comment', allow_fragments=False)
    print(result)
    
    ParseResult(scheme='http', netloc='www.badiu.com', path='/index.html', params='user', query='id=5#comment', fragment='')
    结果
    from urllib.parse import urlparse
    
    result = urlparse('http://www.badiu.com/index.html#comment', allow_fragments=False)
    print(result)
    
    ParseResult(scheme='http', netloc='www.badiu.com', path='/index.html#comment', params='', query='', fragment='')
    结果

    urlunparse

    from urllib.parse import urlunparse
    
    data = ['http', 'www.baidu.com', 'index.html', 'user', 'id=6', 'comment']
    print(urlunparse(data))
    
    http://www.baidu.com/index.html;user?id=6#comment
    结果

    urljoin

    from urllib.parse import urljoin
    
    print(urljoin('http://www.baidu.com', 'ABC.html'))
    print(urljoin('http://www.baidu.com', 'https://www.cnblogs.com/0bug'))
    print(urljoin('http://www.baidu.com/0bug', 'https://www.cnblogs.com/0bug'))
    print(urljoin('http://www.baidu.com/0bug', 'https://www.cnblogs.com/0bug?q=2'))
    print(urljoin('http://www.baidu.com/0bug?q=2', 'https://www.cnblogs.com/0bug'))
    print(urljoin('http://www.baidu.com', '?q=2#comment'))
    print(urljoin('www.baidu.com', '?q=2#comment'))
    print(urljoin('www.baidu.com#comment', '?q=2'))
    
    http://www.baidu.com/ABC.html
    https://www.cnblogs.com/0bug
    https://www.cnblogs.com/0bug
    https://www.cnblogs.com/0bug?q=2
    https://www.cnblogs.com/0bug
    http://www.baidu.com?q=2#comment
    www.baidu.com?q=2#comment
    www.baidu.com?q=2
    结果

    urlencode

    from urllib.parse import urlencode
    
    params = {
        'name': '0bug',
        'age': 25
    }
    base_url = 'http://www.badiu.com?'
    url = base_url + urlencode(params)
    print(url)
    
    http://www.badiu.com?name=0bug&age=25
    结果
  • 相关阅读:
    古谚、评论与论断、名篇与名言
    重读《西游记》
    重读《西游记》
    命名之法 —— 时间、季节、地点
    命名之法 —— 时间、季节、地点
    文言的理解 —— 古时的称谓、别称、别名
    文言的理解 —— 古时的称谓、别称、别名
    Oracle GoldenGate for Oracle 11g to PostgreSQL 9.2.4 Configuration
    瀑布 敏捷 文档
    POJ 1325 ZOJ 1364 最小覆盖点集
  • 原文地址:https://www.cnblogs.com/0bug/p/8893677.html
Copyright © 2011-2022 走看看