zoukankan      html  css  js  c++  java
  • Python3中Urllib库基本使用

    什么是Urllib?

    Python内置的HTTP请求库

    urllib.request          请求模块

    urllib.error              异常处理模块

    urllib.parse             url解析模块

    urllib.robotparser    robots.txt解析模块

    相比Python的变化

    Python2中的urllib2在Python3中被统一移动到了urllib.request中

    python2

    import urllib2

    response = urllib2.urlopen('http://www.cnblogs.com/0bug')

    Python3

    import urllib.request

    response = urllib.request.urlopen('http://www.cnblogs.com/0bug/')

    urlopen()

    不加data是以GET方式发送,加data是以POST发送

    import urllib.request
    
    response = urllib.request.urlopen('http://www.cnblogs.com/0bug')
    html = response.read().decode('utf-8')
    print(html)
    
    <!DOCTYPE html>
    <html lang="zh-cn">
    <head>
    <meta charset="utf-8"/>
    <meta name="viewport" content="width=device-width, initial-scale=1" />
    <title>0bug - 博客园</title>
    <link type="text/css" rel="stylesheet" href="/bundles/blog-common.css?v=-hy83QNg62d4qYibixJzxMJkbf1P9fTBlqv7SK5zVL01"/>
    <link id="MainCss" type="text/css" rel="stylesheet" href="/skins/KJC/bundle-KJC.css?v=SBtLze_k2f8QMx9yu0UzPZOmkUXedeg_e6WBRIadVBo1"/>
    <link type="text/css" rel="stylesheet" href="/blog/customcss/314654.css?v=SL7ok7Br9Wq1UADrprqW%2fnQ%2bFQI%3d"/>
    <link id="mobile-style" media="only screen and (max- 767px)" type="text/css" rel="stylesheet" href="/skins/KJC/bundle-KJC-mobile.css?v=d9LctKHRIQp9rreugMcQ1-UJuq_j1fo0GZXTXj8Bqrk1"/>
    <link title="RSS" type="application/rss+xml" rel="alternate" href="http://www.cnblogs.com/0bug/rss"/>
    <link title="RSD" type="application/rsd+xml" rel="EditURI" href="http://www.cnblogs.com/0bug/rsd.xml"/>
    <link type="application/wlwmanifest+xml" rel="wlwmanifest" href="http://www.cnblogs.com/0bug/wlwmanifest.xml"/>
    <script src="//common.cnblogs.com/scripts/jquery-2.2.0.min.js"></script>
    <script type="text/javascript">var currentBlogApp = '0bug', cb_enable_mathjax=false;var isLogined=false;</script>
    <script src="/bundles/blog-common.js?v=taItysi72HxMPeH9Xg5nAYabRul6hhgahi3tVIMIKV81" type="text/javascript"></script>
    </head>
    <body>
    <a name="top"></a>
    <!--PageBeginHtml Block Begin-->
    <!--模拟知乎的回到顶部开始-->
     
    <style>
     
    div.go-top {
        display: none;
        opacity: 0.6;
        z-index: 999999;
        position: fixed;
        bottom: 113px;
        left: 90%;
        margin-left: 40px;
        border: 1px solid #a38a54;
         38px;
        height: 38px;
        background-color: #ffffff;
        border-radius: 3px;
        cursor: pointer;
    }
     
    div.go-top:hover {
        opacity: 1;
        filter: alpha(opacity=100);
    }
     
    div.go-top div.arrow {
        position: absolute;
        left: 10px;
        top: -1px;
         0;
        height: 0;
        border: 9px solid transparent;
        border-bottom-color: #9aaabf;
    }
     
    div.go-top div.stick {
        position: absolute;
        left: 15px;
        top: 15px;
         8px;
        height: 14px;
        display: block;
        background-color: #9aaabf;
        -webkit-border-radius: 1px;
        -moz-border-radius: 1px;
        border-radius: 1px;
    }
     
    </style>
     
    <script type="text/javascript">
    $(function() {
        $(window).scroll(function() {
            if ($(window).scrollTop() >600)
                $('div.go-top').show();
            else
                $('div.go-top').hide();
        });
        $('div.go-top').click(function() {
            $('html, body').animate({scrollTop: 0}, 600);
        });
    });
    </script>
    <body>
        <div class="go-top">
            <div class="arrow"></div>
            <div class="stick"></div>
        </div>
    </body>
     
    <!--模拟知乎的回到顶部结束-->
    
    <!--框架引入-->
    <link href="https://cdn.bootcss.com/bootstrap/3.3.7/css/bootstrap.min.css" rel="stylesheet">
    <!--PageBeginHtml Block End-->
    
    <table class="Framework" cellspacing="0" cellpadding="0" width="100%">
        <tr>
            <td colspan="3">
                
    <div id="top">
    <table cellpadding="10" cellspacing="0">
        <tr>
            <td nowrap>
                <h1><a id="Header1_HeaderTitle" class="headermaintitle" href="http://www.cnblogs.com/0bug/"></a></h1>
                
            </td>
        </tr>
    </table>
    </div>
    <div id="sub">
        <div id="sub-right"><div id="blog_stats">
    <div class="BlogStats">posts - 209, comments - 3, trackbacks - 0, articles - 44</div></div></div>
        
    &nbsp;
    <a id="blog_nav_sitehome" href="http://www.cnblogs.com/">博客园</a> :: 
    <a id="blog_nav_myhome" href="http://www.cnblogs.com/0bug/">首页</a> ::
    <a id="blog_nav_newpost" rel="nofollow" href="https://i.cnblogs.com/EditPosts.aspx?opt=1">新随笔</a> ::
    <a id="blog_nav_contact" accesskey="9" rel="nofollow" href="https://msg.cnblogs.com/send/0bug">联系</a> ::
    <a id="blog_nav_rss" href="http://www.cnblogs.com/0bug/rss">订阅</a>
    <a id="blog_nav_rss_image" class="XMLLink" href="http://www.cnblogs.com/0bug/rss"><img src="//www.cnblogs.com/images/xml.gif" alt="订阅" /></a> ::
    <a id="blog_nav_admin" rel="nofollow" href="https://i.cnblogs.com/">管理</a>
    
    </div>
            </td>
        </tr>
        <tr>
            <td class="LeftCell">
                <div id="leftmenu">
                    
                        <div id="blog-calendar" style="display:none"></div><script type="text/javascript">loadBlogDefaultCalendar();</script>
                        
    <div id=cell>
    <img src="/Skins/KJC/Images/icon-group.jpg" hspace=5 align=left vspace=2><h3>公告</h3>
    <div id=news>
        <div id="blog-news"></div><script type="text/javascript">loadBlogNews();</script>
    </div>
    </div>
    
                    
                    <div id="blog-sidecolumn"></div><script type="text/javascript">loadBlogSideColumn();</script>
                </div>
            </td>
            <td class="MainCell">
                <div id="main">
                    
    
    <p class="date">
        <span>
            <a id="homepage1_HomePageDays_ctl00_ImageLink" Title="Day Archive" href="http://www.cnblogs.com/0bug/" style="display:inline-block;">置顶随笔</a>
        </span>
    </p>
    
    
            <div class="post">
                <div class="posthead">
                    <h2 style="padding-top: 4px; padding-bottom: 4px;">
                        <a id="homepage1_HomePageDays_ctl00_DayList_TitleUrl_0" href="http://www.cnblogs.com/0bug/p/8788518.html">[置顶]Python开发工程师技术手记</a></h2>
                </div>
                <div class="postbody">
                    <div class="cnblogs-post-body" id="postlist_postbody_8788518">正文内容加载中...</div><script type="text/javascript">getBlogPostBody(8788518);</script></div>
    
                <p class="postfoot">
                    posted @ 2018-04-11 01:56 0bug 阅读(10) 评论(0)  <a href ="https://i.cnblogs.com/EditPosts.aspx?postid=8788518" rel="nofollow">编辑</a>
                </p>
            </div>
        
    
    <p class="date">
        <span>
            <a id="homepage1_HomePageDays_DaysList_ctl00_ImageLink" Title="Day Archive" href="//www.cnblogs.com/0bug/archive/2018/04/20.html" style="display:inline-block;">2018年4月20日</a>
        </span>
    </p>
    
    
            <div class="post">
                <div class="posthead">
                    <h2 style="padding-top: 4px; padding-bottom: 4px;">
                        <a id="homepage1_HomePageDays_DaysList_ctl00_DayList_TitleUrl_0" href="http://www.cnblogs.com/0bug/p/8893038.html">HTTP协议请求头信息和响应头信息</a></h2>
                </div>
                <div class="postbody">
                    <div class="c_b_p_desc">摘要: http的请求部分 基本结构 常用请头信息 Accept:text/html,image/*(告诉服务器,浏览器可以接受文本,网页图片) Accept-Charaset:ISO-8859-1 [接受字符编码:iso-8859-1] Accept-Encoding:gzip,compress[可以接受<a href="http://www.cnblogs.com/0bug/p/8893038.html" class="c_b_p_desc_readmore">阅读全文</a></div></div>
    
                <p class="postfoot">
                    posted @ 2018-04-20 19:12 0bug 阅读(2) 评论(0)  <a href ="https://i.cnblogs.com/EditPosts.aspx?postid=8893038" rel="nofollow">编辑</a>
                </p>
            </div>
        
            <div class="post">
                <div class="posthead">
                    <h2 style="padding-top: 4px; padding-bottom: 4px;">
                        <a id="homepage1_HomePageDays_DaysList_ctl00_DayList_TitleUrl_1" href="http://www.cnblogs.com/0bug/p/8892959.html">HTTP协议中GET和POST方法的区别</a></h2>
                </div>
                <div class="postbody">
                    <div class="c_b_p_desc">摘要: 最直观的区别就是GET把参数包含在URL中,POST通过request body传递参数。 GET在浏览器回退时是无害的,而POST会再次提交请求。 GET产生的URL地址可以被Bookmark,而POST不可以。 GET请求会被浏览器主动cache,而POST不会,除非手动设置。 GET请求只能进<a href="http://www.cnblogs.com/0bug/p/8892959.html" class="c_b_p_desc_readmore">阅读全文</a></div></div>
    
                <p class="postfoot">
                    posted @ 2018-04-20 18:59 0bug 阅读(2) 评论(0)  <a href ="https://i.cnblogs.com/EditPosts.aspx?postid=8892959" rel="nofollow">编辑</a>
                </p>
            </div>
        
            <div class="post">
                <div class="posthead">
                    <h2 style="padding-top: 4px; padding-bottom: 4px;">
                        <a id="homepage1_HomePageDays_DaysList_ctl00_DayList_TitleUrl_2" href="http://www.cnblogs.com/0bug/p/8892711.html">Redis环境安装</a></h2>
                </div>
                <div class="postbody">
                    <div class="c_b_p_desc">摘要: Windows下: 到https://github.com/MicrosoftArchive/redis/releases下载: 下载完成后一步一步安装就行。 然后在安装一个可视化工具:https://github.com/uglide/RedisDesktopManager Linux下安装以Ub<a href="http://www.cnblogs.com/0bug/p/8892711.html" class="c_b_p_desc_readmore">阅读全文</a></div></div>
    
                <p class="postfoot">
                    posted @ 2018-04-20 18:17 0bug 阅读(2) 评论(0)  <a href ="https://i.cnblogs.com/EditPosts.aspx?postid=8892711" rel="nofollow">编辑</a>
                </p>
            </div>
        
            <div class="post">
                <div class="posthead">
                    <h2 style="padding-top: 4px; padding-bottom: 4px;">
                        <a id="homepage1_HomePageDays_DaysList_ctl00_DayList_TitleUrl_3" href="http://www.cnblogs.com/0bug/p/8892714.html">自己动手,丰衣足食!Python3网络爬虫实战案例</a></h2>
                </div>
                <div class="postbody">
                    <div class="c_b_p_desc">摘要: 本教程是崔大大的爬虫实战教程的笔记:网易云课堂 Python3+Pip环境配置 Windows下安装Python:&#160;http://www.cnblogs.com/0bug/p/8228378.html Linux以Ubuntu为例,一般是自带的,只需配置一下默认版本:http://www.cnblo<a href="http://www.cnblogs.com/0bug/p/8892714.html" class="c_b_p_desc_readmore">阅读全文</a></div></div>
    
                <p class="postfoot">
                    posted @ 2018-04-20 18:17 0bug 阅读(2) 评论(0)  <a href ="https://i.cnblogs.com/EditPosts.aspx?postid=8892714" rel="nofollow">编辑</a>
                </p>
            </div>
        
    
    <p class="date">
        <span>
            <a id="homepage1_HomePageDays_DaysList_ctl01_ImageLink" Title="Day Archive" href="//www.cnblogs.com/0bug/archive/2018/04/19.html" style="display:inline-block;">2018年4月19日</a>
        </span>
    </p>
    
    
            <div class="post">
                <div class="posthead">
                    <h2 style="padding-top: 4px; padding-bottom: 4px;">
                        <a id="homepage1_HomePageDays_DaysList_ctl01_DayList_TitleUrl_0" href="http://www.cnblogs.com/0bug/p/8886663.html">Python Flask 构建微电影视频网站</a></h2>
                </div>
                <div class="postbody">
                    <div class="c_b_p_desc">摘要: 前言 学完本教程,你将掌握: 1.学会使用整形、浮点型、路径型、字符串型正则表达式路由转化器 2.学会使用post与get请求、上传文件、cookie获取与相应、404处理 3.学会适应模板自动转义、定义过滤器、定义全局上下文处理器、JinJa2语法、包含、继承、定义宏 4.学会使用flask-wt<a href="http://www.cnblogs.com/0bug/p/8886663.html" class="c_b_p_desc_readmore">阅读全文</a></div></div>
    
                <p class="postfoot">
                    posted @ 2018-04-19 22:31 0bug 阅读(5) 评论(0)  <a href ="https://i.cnblogs.com/EditPosts.aspx?postid=8886663" rel="nofollow">编辑</a>
                </p>
            </div>
        
    
    <p class="date">
        <span>
            <a id="homepage1_HomePageDays_DaysList_ctl02_ImageLink" Title="Day Archive" href="//www.cnblogs.com/0bug/archive/2018/04/18.html" style="display:inline-block;">2018年4月18日</a>
        </span>
    </p>
    
    
            <div class="post">
                <div class="posthead">
                    <h2 style="padding-top: 4px; padding-bottom: 4px;">
                        <a id="homepage1_HomePageDays_DaysList_ctl02_DayList_TitleUrl_0" href="http://www.cnblogs.com/0bug/p/8877479.html">基于Token的身份验证——JWT</a></h2>
                </div>
                <div class="postbody">
                    <div class="c_b_p_desc">摘要: 初次了解JWT,很基础,高手勿喷。 基于Token的身份验证用来替代传统的cookie+session身份验证方法中的session。 JWT是啥? JWT就是一个字符串,经过加密处理与校验处理的字符串,形式为: A.B.C A由JWT头部信息header加密得到 B由JWT用到的身份验证信息jso<a href="http://www.cnblogs.com/0bug/p/8877479.html" class="c_b_p_desc_readmore">阅读全文</a></div></div>
    
                <p class="postfoot">
                    posted @ 2018-04-18 20:53 0bug 阅读(10) 评论(0)  <a href ="https://i.cnblogs.com/EditPosts.aspx?postid=8877479" rel="nofollow">编辑</a>
                </p>
            </div>
        
            <div class="post">
                <div class="posthead">
                    <h2 style="padding-top: 4px; padding-bottom: 4px;">
                        <a id="homepage1_HomePageDays_DaysList_ctl02_DayList_TitleUrl_1" href="http://www.cnblogs.com/0bug/p/8874818.html">同步(Synchronous)和异步(Asynchronous)</a></h2>
                </div>
                <div class="postbody">
                    <div class="c_b_p_desc">摘要: 同步、异步的概念 同步和异步通常用来形容一次方法调用。 同步方法调用一旦开始,调用者必须等到方法调用返回后,才能继续后续的行为。 异步方法调用更像一个消息传递,一旦开始,方法调用就会立即返回,调用者就可以继续后续的操作。而,异步方法通常会在另外一个线程中,“真实”地执行着。整个过程,不会阻碍调用者的<a href="http://www.cnblogs.com/0bug/p/8874818.html" class="c_b_p_desc_readmore">阅读全文</a></div></div>
    
                <p class="postfoot">
                    posted @ 2018-04-18 14:52 0bug 阅读(2) 评论(0)  <a href ="https://i.cnblogs.com/EditPosts.aspx?postid=8874818" rel="nofollow">编辑</a>
                </p>
            </div>
        
            <div class="post">
                <div class="posthead">
                    <h2 style="padding-top: 4px; padding-bottom: 4px;">
                        <a id="homepage1_HomePageDays_DaysList_ctl02_DayList_TitleUrl_2" href="http://www.cnblogs.com/0bug/p/8872802.html">Window 通过cmd查看端口占用、相应进程、杀死进程等的命令</a></h2>
                </div>
                <div class="postbody">
                    <div class="c_b_p_desc">摘要: 一、 查看所有进程占用的端口 在开始-运行-cmd,输入:netstat –ano可以查看所有进程 二、查看占用指定端口的程序 当你在用tomcat发布程序时,经常会遇到端口被占用的情况,我们想知道是哪个程序或进程占用了端口,可以用该命令 netstat –ano|findstr “指定端口号” 二<a href="http://www.cnblogs.com/0bug/p/8872802.html" class="c_b_p_desc_readmore">阅读全文</a></div></div>
    
                <p class="postfoot">
                    posted @ 2018-04-18 10:58 0bug 阅读(4) 评论(0)  <a href ="https://i.cnblogs.com/EditPosts.aspx?postid=8872802" rel="nofollow">编辑</a>
                </p>
            </div>
        
    
    <p class="date">
        <span>
            <a id="homepage1_HomePageDays_DaysList_ctl03_ImageLink" Title="Day Archive" href="//www.cnblogs.com/0bug/archive/2018/04/16.html" style="display:inline-block;">2018年4月16日</a>
        </span>
    </p>
    
    
            <div class="post">
                <div class="posthead">
                    <h2 style="padding-top: 4px; padding-bottom: 4px;">
                        <a id="homepage1_HomePageDays_DaysList_ctl03_DayList_TitleUrl_0" href="http://www.cnblogs.com/0bug/p/8855883.html">优秀博客收录</a></h2>
                </div>
                <div class="postbody">
                    <div class="c_b_p_desc">摘要: 最新Django2.0.1在线教育零基础到上线教程 :https://www.jianshu.com/nb/21010157 vue+django2.0.2-rest-framework生鲜超市 :https://www.jianshu.com/nb/22309475<a href="http://www.cnblogs.com/0bug/p/8855883.html" class="c_b_p_desc_readmore">阅读全文</a></div></div>
    
                <p class="postfoot">
                    posted @ 2018-04-16 14:53 0bug 阅读(4) 评论(0)  <a href ="https://i.cnblogs.com/EditPosts.aspx?postid=8855883" rel="nofollow">编辑</a>
                </p>
            </div>
        
            <div class="post">
                <div class="posthead">
                    <h2 style="padding-top: 4px; padding-bottom: 4px;">
                        <a id="homepage1_HomePageDays_DaysList_ctl03_DayList_TitleUrl_1" href="http://www.cnblogs.com/0bug/p/8853596.html">查找Python项目依赖的库并生成requirements.txt</a></h2>
                </div>
                <div class="postbody">
                    <div class="c_b_p_desc">摘要: 使用pip freeze 这种方式配合virtualenv 才好使,否则把整个环境中的包都列出来了。 使用&#160;pipreqs 这个工具的好处是可以通过对项目目录的扫描,自动发现使用了那些类库,自动生成依赖清单。<a href="http://www.cnblogs.com/0bug/p/8853596.html" class="c_b_p_desc_readmore">阅读全文</a></div></div>
    
                <p class="postfoot">
                    posted @ 2018-04-16 08:24 0bug 阅读(6) 评论(0)  <a href ="https://i.cnblogs.com/EditPosts.aspx?postid=8853596" rel="nofollow">编辑</a>
                </p>
            </div>
        
    
    <div class="topicListFooter"><div id="nav_next_page"><a href="http://www.cnblogs.com/0bug/default.html?page=2">下一页</a></div></div>
    
    
                </div>
            </td>
        </tr>
        <tr>
            <td colspan="2" class="FooterCell">
                
    <p id="footer">
        Powered by: 
        <br />
        
        <a id="Footer1_Hyperlink3" NAME="Hyperlink1" href="http://www.cnblogs.com/"><font face="Verdana">博客园</font></a>
        <br />
        Copyright &copy; 0bug
    </p>
    
            </td>
        </tr>
    </table>
    
    <!--PageEndHtml Block Begin-->
    <!--自动生成目录-->
    <script language="javascript" type="text/javascript">
        //生成目录索引列表
        function GenerateContentList() {
            var jquery_h2_list = $('#cnblogs_post_body h2');//如果你的章节标题不是h2,只需要将这里的h2换掉即可
            if (jquery_h2_list.length > 0) {
                var content = '<a name="_labelTop"></a>';
                content += '<div id="navCategory">';
                content += '<p style="font-size:18px"><b>阅读目录</b></p>';
                content += '<ul>';
                for (var i = 0; i < jquery_h2_list.length; i++) {
                    var go_to_top = '<div style="text-align: right"><a href="#_labelTop"></a><a name="_label' + i + '"></a></div>';
                    $(jquery_h2_list[i]).before(go_to_top);
                    var li_content = '<li><a href="#_label' + i + '">' + $(jquery_h2_list[i]).text() + '</a></li>';
                    content += li_content;
                }
                content += '</ul>';
                content += '</div>';
                if ($('#cnblogs_post_body').length != 0) {
                    $($('#cnblogs_post_body')[0]).prepend(content);
                }
            }
        }
        GenerateContentList();
    </script>
    
    <script language="javascript" type="text/javascript">
    document.getElementById('footer').innerText='Life is short, you need Python';
    </script>
    <!--PageEndHtml Block End-->
    </body>
    </html>
    结果

    加data发送POST请求

    import urllib.parse
    import urllib.request
    
    data = bytes(urllib.parse.urlencode({'hello': '0bug'}), encoding='utf-8')
    response = urllib.request.urlopen('http://httpbin.org/post', data=data)
    print(response.read())
    
    1 b'{\n  "args": {}, \n  "data": "", \n  "files": {}, \n  "form": {\n    "hello": "0bug"\n  }, \n  "headers": {\n    "Accept-Encoding": "identity", \n    "Connection": "close", \n    "Content-Length": "10", \n    "Content-Type": "application/x-www-form-urlencoded", \n    "Host": "httpbin.org", \n    "User-Agent": "Python-urllib/3.6"\n  }, \n  "json": null, \n  "origin": "223.72.80.199", \n  "url": "http://httpbin.org/post"\n}\n'
    结果

    timeout超时间

    import urllib.request
    
    response = urllib.request.urlopen('http://www.cnblogs.com/0bug', timeout=0.01)
    print(response.read())
    
    urllib.error.URLError: <urlopen error timed out>
    结果
    import urllib.request
    import socket
    import urllib.error
    try:
        response = urllib.request.urlopen('http://www.cnblogs.com/0bug', timeout=0.01)
    except urllib.error.URLError as  e:
        if isinstance(e.reason,socket.timeout):
            print('请求超时')
    
    请求超时
    结果

    响应

    1.响应类型

    import urllib.request
    
    response = urllib.request.urlopen('http://www.cnblogs.com/0bug')
    print(type(response))
    
    <class 'http.client.HTTPResponse'>
    结果

    2.状态码、响应头

    import urllib.request
    
    response = urllib.request.urlopen('http://www.cnblogs.com/0bug')
    print(response.status)
    print(response.getheaders())
    print(response.getheader('Content-Type'))
    
    200
    [('Date', 'Fri, 20 Apr 2018 13:24:01 GMT'), ('Content-Type', 'text/html; charset=utf-8'), ('Content-Length', '19306'), ('Connection', 'close'), ('Vary', 'Accept-Encoding'), ('Cache-Control', 'private, max-age=10'), ('Expires', 'Fri, 20 Apr 2018 13:24:11 GMT'), ('Last-Modified', 'Fri, 20 Apr 2018 13:24:01 GMT'), ('X-UA-Compatible', 'IE=10'), ('X-Frame-Options', 'SAMEORIGIN')]
    text/html; charset=utf-8
    结果

    3.响应体

    响应体是字节流,需要decode('utf-8')

    import urllib.request
    
    response = urllib.request.urlopen('http://www.cnblogs.com/0bug')
    html = response.read().decode('utf-8')
    print(html)
    

    Request

    import urllib.request
    
    request = urllib.request.Request('http://www.cnblogs.com/0bug')
    response = urllib.request.urlopen(request)
    print(response.read().decode('utf-8'))
    
    <!DOCTYPE html>
    <html lang="zh-cn">
    <head>
    <meta charset="utf-8"/>
    <meta name="viewport" content="width=device-width, initial-scale=1" />
    <title>0bug - 博客园</title>
    <link type="text/css" rel="stylesheet" href="/bundles/blog-common.css?v=-hy83QNg62d4qYibixJzxMJkbf1P9fTBlqv7SK5zVL01"/>
    <link id="MainCss" type="text/css" rel="stylesheet" href="/skins/KJC/bundle-KJC.css?v=SBtLze_k2f8QMx9yu0UzPZOmkUXedeg_e6WBRIadVBo1"/>
    <link type="text/css" rel="stylesheet" href="/blog/customcss/314654.css?v=SL7ok7Br9Wq1UADrprqW%2fnQ%2bFQI%3d"/>
    <link id="mobile-style" media="only screen and (max- 767px)" type="text/css" rel="stylesheet" href="/skins/KJC/bundle-KJC-mobile.css?v=d9LctKHRIQp9rreugMcQ1-UJuq_j1fo0GZXTXj8Bqrk1"/>
    <link title="RSS" type="application/rss+xml" rel="alternate" href="http://www.cnblogs.com/0bug/rss"/>
    <link title="RSD" type="application/rsd+xml" rel="EditURI" href="http://www.cnblogs.com/0bug/rsd.xml"/>
    <link type="application/wlwmanifest+xml" rel="wlwmanifest" href="http://www.cnblogs.com/0bug/wlwmanifest.xml"/>
    <script src="//common.cnblogs.com/scripts/jquery-2.2.0.min.js"></script>
    <script type="text/javascript">var currentBlogApp = '0bug', cb_enable_mathjax=false;var isLogined=false;</script>
    <script src="/bundles/blog-common.js?v=taItysi72HxMPeH9Xg5nAYabRul6hhgahi3tVIMIKV81" type="text/javascript"></script>
    </head>
    <body>
    <a name="top"></a>
    <!--PageBeginHtml Block Begin-->
    <!--模拟知乎的回到顶部开始-->
     
    <style>
     
    div.go-top {
        display: none;
        opacity: 0.6;
        z-index: 999999;
        position: fixed;
        bottom: 113px;
        left: 90%;
        margin-left: 40px;
        border: 1px solid #a38a54;
         38px;
        height: 38px;
        background-color: #ffffff;
        border-radius: 3px;
        cursor: pointer;
    }
     
    div.go-top:hover {
        opacity: 1;
        filter: alpha(opacity=100);
    }
     
    div.go-top div.arrow {
        position: absolute;
        left: 10px;
        top: -1px;
         0;
        height: 0;
        border: 9px solid transparent;
        border-bottom-color: #9aaabf;
    }
     
    div.go-top div.stick {
        position: absolute;
        left: 15px;
        top: 15px;
         8px;
        height: 14px;
        display: block;
        background-color: #9aaabf;
        -webkit-border-radius: 1px;
        -moz-border-radius: 1px;
        border-radius: 1px;
    }
     
    </style>
     
    <script type="text/javascript">
    $(function() {
        $(window).scroll(function() {
            if ($(window).scrollTop() >600)
                $('div.go-top').show();
            else
                $('div.go-top').hide();
        });
        $('div.go-top').click(function() {
            $('html, body').animate({scrollTop: 0}, 600);
        });
    });
    </script>
    <body>
        <div class="go-top">
            <div class="arrow"></div>
            <div class="stick"></div>
        </div>
    </body>
     
    <!--模拟知乎的回到顶部结束-->
    
    <!--框架引入-->
    <link href="https://cdn.bootcss.com/bootstrap/3.3.7/css/bootstrap.min.css" rel="stylesheet">
    <!--PageBeginHtml Block End-->
    
    <table class="Framework" cellspacing="0" cellpadding="0" width="100%">
        <tr>
            <td colspan="3">
                
    <div id="top">
    <table cellpadding="10" cellspacing="0">
        <tr>
            <td nowrap>
                <h1><a id="Header1_HeaderTitle" class="headermaintitle" href="http://www.cnblogs.com/0bug/"></a></h1>
                
            </td>
        </tr>
    </table>
    </div>
    <div id="sub">
        <div id="sub-right"><div id="blog_stats">
    <div class="BlogStats">posts - 209, comments - 3, trackbacks - 0, articles - 44</div></div></div>
        
    &nbsp;
    <a id="blog_nav_sitehome" href="http://www.cnblogs.com/">博客园</a> :: 
    <a id="blog_nav_myhome" href="http://www.cnblogs.com/0bug/">首页</a> ::
    <a id="blog_nav_newpost" rel="nofollow" href="https://i.cnblogs.com/EditPosts.aspx?opt=1">新随笔</a> ::
    <a id="blog_nav_contact" accesskey="9" rel="nofollow" href="https://msg.cnblogs.com/send/0bug">联系</a> ::
    <a id="blog_nav_rss" href="http://www.cnblogs.com/0bug/rss">订阅</a>
    <a id="blog_nav_rss_image" class="XMLLink" href="http://www.cnblogs.com/0bug/rss"><img src="//www.cnblogs.com/images/xml.gif" alt="订阅" /></a> ::
    <a id="blog_nav_admin" rel="nofollow" href="https://i.cnblogs.com/">管理</a>
    
    </div>
            </td>
        </tr>
        <tr>
            <td class="LeftCell">
                <div id="leftmenu">
                    
                        <div id="blog-calendar" style="display:none"></div><script type="text/javascript">loadBlogDefaultCalendar();</script>
                        
    <div id=cell>
    <img src="/Skins/KJC/Images/icon-group.jpg" hspace=5 align=left vspace=2><h3>公告</h3>
    <div id=news>
        <div id="blog-news"></div><script type="text/javascript">loadBlogNews();</script>
    </div>
    </div>
    
                    
                    <div id="blog-sidecolumn"></div><script type="text/javascript">loadBlogSideColumn();</script>
                </div>
            </td>
            <td class="MainCell">
                <div id="main">
                    
    
    <p class="date">
        <span>
            <a id="homepage1_HomePageDays_ctl00_ImageLink" Title="Day Archive" href="http://www.cnblogs.com/0bug/" style="display:inline-block;">置顶随笔</a>
        </span>
    </p>
    
    
            <div class="post">
                <div class="posthead">
                    <h2 style="padding-top: 4px; padding-bottom: 4px;">
                        <a id="homepage1_HomePageDays_ctl00_DayList_TitleUrl_0" href="http://www.cnblogs.com/0bug/p/8788518.html">[置顶]Python开发工程师技术手记</a></h2>
                </div>
                <div class="postbody">
                    <div class="cnblogs-post-body" id="postlist_postbody_8788518">正文内容加载中...</div><script type="text/javascript">getBlogPostBody(8788518);</script></div>
    
                <p class="postfoot">
                    posted @ 2018-04-11 01:56 0bug 阅读(10) 评论(0)  <a href ="https://i.cnblogs.com/EditPosts.aspx?postid=8788518" rel="nofollow">编辑</a>
                </p>
            </div>
        
    
    <p class="date">
        <span>
            <a id="homepage1_HomePageDays_DaysList_ctl00_ImageLink" Title="Day Archive" href="//www.cnblogs.com/0bug/archive/2018/04/20.html" style="display:inline-block;">2018年4月20日</a>
        </span>
    </p>
    
    
            <div class="post">
                <div class="posthead">
                    <h2 style="padding-top: 4px; padding-bottom: 4px;">
                        <a id="homepage1_HomePageDays_DaysList_ctl00_DayList_TitleUrl_0" href="http://www.cnblogs.com/0bug/p/8893677.html">Urllib库基本使用</a></h2>
                </div>
                <div class="postbody">
                    <div class="c_b_p_desc">摘要: 什么是Urllib? Python内置的HTTP请求库 urllib.request 请求模块 urllib.error 异常处理模块 urllib.parse url解析模块 urllib.robotparser robots.txt解析模块 相比Python的变化 Python2中的urllib<a href="http://www.cnblogs.com/0bug/p/8893677.html" class="c_b_p_desc_readmore">阅读全文</a></div></div>
    
                <p class="postfoot">
                    posted @ 2018-04-20 21:17 0bug 阅读(2) 评论(0)  <a href ="https://i.cnblogs.com/EditPosts.aspx?postid=8893677" rel="nofollow">编辑</a>
                </p>
            </div>
        
            <div class="post">
                <div class="posthead">
                    <h2 style="padding-top: 4px; padding-bottom: 4px;">
                        <a id="homepage1_HomePageDays_DaysList_ctl00_DayList_TitleUrl_1" href="http://www.cnblogs.com/0bug/p/8893038.html">HTTP协议请求头信息和响应头信息</a></h2>
                </div>
                <div class="postbody">
                    <div class="c_b_p_desc">摘要: http的请求部分 基本结构 常用请头信息 Accept:text/html,image/*(告诉服务器,浏览器可以接受文本,网页图片) Accept-Charaset:ISO-8859-1 [接受字符编码:iso-8859-1] Accept-Encoding:gzip,compress[可以接受<a href="http://www.cnblogs.com/0bug/p/8893038.html" class="c_b_p_desc_readmore">阅读全文</a></div></div>
    
                <p class="postfoot">
                    posted @ 2018-04-20 19:12 0bug 阅读(2) 评论(0)  <a href ="https://i.cnblogs.com/EditPosts.aspx?postid=8893038" rel="nofollow">编辑</a>
                </p>
            </div>
        
            <div class="post">
                <div class="posthead">
                    <h2 style="padding-top: 4px; padding-bottom: 4px;">
                        <a id="homepage1_HomePageDays_DaysList_ctl00_DayList_TitleUrl_2" href="http://www.cnblogs.com/0bug/p/8892959.html">HTTP协议中GET和POST方法的区别</a></h2>
                </div>
                <div class="postbody">
                    <div class="c_b_p_desc">摘要: 最直观的区别就是GET把参数包含在URL中,POST通过request body传递参数。 GET在浏览器回退时是无害的,而POST会再次提交请求。 GET产生的URL地址可以被Bookmark,而POST不可以。 GET请求会被浏览器主动cache,而POST不会,除非手动设置。 GET请求只能进<a href="http://www.cnblogs.com/0bug/p/8892959.html" class="c_b_p_desc_readmore">阅读全文</a></div></div>
    
                <p class="postfoot">
                    posted @ 2018-04-20 18:59 0bug 阅读(2) 评论(0)  <a href ="https://i.cnblogs.com/EditPosts.aspx?postid=8892959" rel="nofollow">编辑</a>
                </p>
            </div>
        
            <div class="post">
                <div class="posthead">
                    <h2 style="padding-top: 4px; padding-bottom: 4px;">
                        <a id="homepage1_HomePageDays_DaysList_ctl00_DayList_TitleUrl_3" href="http://www.cnblogs.com/0bug/p/8892711.html">Redis环境安装</a></h2>
                </div>
                <div class="postbody">
                    <div class="c_b_p_desc">摘要: Windows下: 到https://github.com/MicrosoftArchive/redis/releases下载: 下载完成后一步一步安装就行。 然后在安装一个可视化工具:https://github.com/uglide/RedisDesktopManager Linux下安装以Ub<a href="http://www.cnblogs.com/0bug/p/8892711.html" class="c_b_p_desc_readmore">阅读全文</a></div></div>
    
                <p class="postfoot">
                    posted @ 2018-04-20 18:17 0bug 阅读(2) 评论(0)  <a href ="https://i.cnblogs.com/EditPosts.aspx?postid=8892711" rel="nofollow">编辑</a>
                </p>
            </div>
        
            <div class="post">
                <div class="posthead">
                    <h2 style="padding-top: 4px; padding-bottom: 4px;">
                        <a id="homepage1_HomePageDays_DaysList_ctl00_DayList_TitleUrl_4" href="http://www.cnblogs.com/0bug/p/8892714.html">自己动手,丰衣足食!Python3网络爬虫实战案例</a></h2>
                </div>
                <div class="postbody">
                    <div class="c_b_p_desc">摘要: 本教程是崔大大的爬虫实战教程的笔记:网易云课堂 Python3+Pip环境配置 Windows下安装Python:&#160;http://www.cnblogs.com/0bug/p/8228378.html Linux以Ubuntu为例,一般是自带的,只需配置一下默认版本:http://www.cnblo<a href="http://www.cnblogs.com/0bug/p/8892714.html" class="c_b_p_desc_readmore">阅读全文</a></div></div>
    
                <p class="postfoot">
                    posted @ 2018-04-20 18:17 0bug 阅读(2) 评论(0)  <a href ="https://i.cnblogs.com/EditPosts.aspx?postid=8892714" rel="nofollow">编辑</a>
                </p>
            </div>
        
    
    <p class="date">
        <span>
            <a id="homepage1_HomePageDays_DaysList_ctl01_ImageLink" Title="Day Archive" href="//www.cnblogs.com/0bug/archive/2018/04/19.html" style="display:inline-block;">2018年4月19日</a>
        </span>
    </p>
    
    
            <div class="post">
                <div class="posthead">
                    <h2 style="padding-top: 4px; padding-bottom: 4px;">
                        <a id="homepage1_HomePageDays_DaysList_ctl01_DayList_TitleUrl_0" href="http://www.cnblogs.com/0bug/p/8886663.html">Python Flask 构建微电影视频网站</a></h2>
                </div>
                <div class="postbody">
                    <div class="c_b_p_desc">摘要: 前言 学完本教程,你将掌握: 1.学会使用整形、浮点型、路径型、字符串型正则表达式路由转化器 2.学会使用post与get请求、上传文件、cookie获取与相应、404处理 3.学会适应模板自动转义、定义过滤器、定义全局上下文处理器、JinJa2语法、包含、继承、定义宏 4.学会使用flask-wt<a href="http://www.cnblogs.com/0bug/p/8886663.html" class="c_b_p_desc_readmore">阅读全文</a></div></div>
    
                <p class="postfoot">
                    posted @ 2018-04-19 22:31 0bug 阅读(6) 评论(0)  <a href ="https://i.cnblogs.com/EditPosts.aspx?postid=8886663" rel="nofollow">编辑</a>
                </p>
            </div>
        
    
    <p class="date">
        <span>
            <a id="homepage1_HomePageDays_DaysList_ctl02_ImageLink" Title="Day Archive" href="//www.cnblogs.com/0bug/archive/2018/04/18.html" style="display:inline-block;">2018年4月18日</a>
        </span>
    </p>
    
    
            <div class="post">
                <div class="posthead">
                    <h2 style="padding-top: 4px; padding-bottom: 4px;">
                        <a id="homepage1_HomePageDays_DaysList_ctl02_DayList_TitleUrl_0" href="http://www.cnblogs.com/0bug/p/8877479.html">基于Token的身份验证——JWT</a></h2>
                </div>
                <div class="postbody">
                    <div class="c_b_p_desc">摘要: 初次了解JWT,很基础,高手勿喷。 基于Token的身份验证用来替代传统的cookie+session身份验证方法中的session。 JWT是啥? JWT就是一个字符串,经过加密处理与校验处理的字符串,形式为: A.B.C A由JWT头部信息header加密得到 B由JWT用到的身份验证信息jso<a href="http://www.cnblogs.com/0bug/p/8877479.html" class="c_b_p_desc_readmore">阅读全文</a></div></div>
    
                <p class="postfoot">
                    posted @ 2018-04-18 20:53 0bug 阅读(10) 评论(0)  <a href ="https://i.cnblogs.com/EditPosts.aspx?postid=8877479" rel="nofollow">编辑</a>
                </p>
            </div>
        
            <div class="post">
                <div class="posthead">
                    <h2 style="padding-top: 4px; padding-bottom: 4px;">
                        <a id="homepage1_HomePageDays_DaysList_ctl02_DayList_TitleUrl_1" href="http://www.cnblogs.com/0bug/p/8874818.html">同步(Synchronous)和异步(Asynchronous)</a></h2>
                </div>
                <div class="postbody">
                    <div class="c_b_p_desc">摘要: 同步、异步的概念 同步和异步通常用来形容一次方法调用。 同步方法调用一旦开始,调用者必须等到方法调用返回后,才能继续后续的行为。 异步方法调用更像一个消息传递,一旦开始,方法调用就会立即返回,调用者就可以继续后续的操作。而,异步方法通常会在另外一个线程中,“真实”地执行着。整个过程,不会阻碍调用者的<a href="http://www.cnblogs.com/0bug/p/8874818.html" class="c_b_p_desc_readmore">阅读全文</a></div></div>
    
                <p class="postfoot">
                    posted @ 2018-04-18 14:52 0bug 阅读(2) 评论(0)  <a href ="https://i.cnblogs.com/EditPosts.aspx?postid=8874818" rel="nofollow">编辑</a>
                </p>
            </div>
        
            <div class="post">
                <div class="posthead">
                    <h2 style="padding-top: 4px; padding-bottom: 4px;">
                        <a id="homepage1_HomePageDays_DaysList_ctl02_DayList_TitleUrl_2" href="http://www.cnblogs.com/0bug/p/8872802.html">Window 通过cmd查看端口占用、相应进程、杀死进程等的命令</a></h2>
                </div>
                <div class="postbody">
                    <div class="c_b_p_desc">摘要: 一、 查看所有进程占用的端口 在开始-运行-cmd,输入:netstat –ano可以查看所有进程 二、查看占用指定端口的程序 当你在用tomcat发布程序时,经常会遇到端口被占用的情况,我们想知道是哪个程序或进程占用了端口,可以用该命令 netstat –ano|findstr “指定端口号” 二<a href="http://www.cnblogs.com/0bug/p/8872802.html" class="c_b_p_desc_readmore">阅读全文</a></div></div>
    
                <p class="postfoot">
                    posted @ 2018-04-18 10:58 0bug 阅读(4) 评论(0)  <a href ="https://i.cnblogs.com/EditPosts.aspx?postid=8872802" rel="nofollow">编辑</a>
                </p>
            </div>
        
    
    <p class="date">
        <span>
            <a id="homepage1_HomePageDays_DaysList_ctl03_ImageLink" Title="Day Archive" href="//www.cnblogs.com/0bug/archive/2018/04/16.html" style="display:inline-block;">2018年4月16日</a>
        </span>
    </p>
    
    
            <div class="post">
                <div class="posthead">
                    <h2 style="padding-top: 4px; padding-bottom: 4px;">
                        <a id="homepage1_HomePageDays_DaysList_ctl03_DayList_TitleUrl_0" href="http://www.cnblogs.com/0bug/p/8855883.html">优秀博客收录</a></h2>
                </div>
                <div class="postbody">
                    <div class="c_b_p_desc">摘要: 最新Django2.0.1在线教育零基础到上线教程 :https://www.jianshu.com/nb/21010157 vue+django2.0.2-rest-framework生鲜超市 :https://www.jianshu.com/nb/22309475<a href="http://www.cnblogs.com/0bug/p/8855883.html" class="c_b_p_desc_readmore">阅读全文</a></div></div>
    
                <p class="postfoot">
                    posted @ 2018-04-16 14:53 0bug 阅读(4) 评论(0)  <a href ="https://i.cnblogs.com/EditPosts.aspx?postid=8855883" rel="nofollow">编辑</a>
                </p>
            </div>
        
    
    <div class="topicListFooter"><div id="nav_next_page"><a href="http://www.cnblogs.com/0bug/default.html?page=2">下一页</a></div></div>
    
    
                </div>
            </td>
        </tr>
        <tr>
            <td colspan="2" class="FooterCell">
                
    <p id="footer">
        Powered by: 
        <br />
        
        <a id="Footer1_Hyperlink3" NAME="Hyperlink1" href="http://www.cnblogs.com/"><font face="Verdana">博客园</font></a>
        <br />
        Copyright &copy; 0bug
    </p>
    
            </td>
        </tr>
    </table>
    
    <!--PageEndHtml Block Begin-->
    <!--自动生成目录-->
    <script language="javascript" type="text/javascript">
        //生成目录索引列表
        function GenerateContentList() {
            var jquery_h2_list = $('#cnblogs_post_body h2');//如果你的章节标题不是h2,只需要将这里的h2换掉即可
            if (jquery_h2_list.length > 0) {
                var content = '<a name="_labelTop"></a>';
                content += '<div id="navCategory">';
                content += '<p style="font-size:18px"><b>阅读目录</b></p>';
                content += '<ul>';
                for (var i = 0; i < jquery_h2_list.length; i++) {
                    var go_to_top = '<div style="text-align: right"><a href="#_labelTop"></a><a name="_label' + i + '"></a></div>';
                    $(jquery_h2_list[i]).before(go_to_top);
                    var li_content = '<li><a href="#_label' + i + '">' + $(jquery_h2_list[i]).text() + '</a></li>';
                    content += li_content;
                }
                content += '</ul>';
                content += '</div>';
                if ($('#cnblogs_post_body').length != 0) {
                    $($('#cnblogs_post_body')[0]).prepend(content);
                }
            }
        }
        GenerateContentList();
    </script>
    
    <script language="javascript" type="text/javascript">
    document.getElementById('footer').innerText='Life is short, you need Python';
    </script>
    <!--PageEndHtml Block End-->
    </body>
    </html>
    结果

    添加请求头信息

    from urllib import request, parse
    
    url = 'http://httpbin.org/post'
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36',
        'Host': 'httpbin.org'
    }
    dic = {'name': '0bug'}
    data = bytes(parse.urlencode(dic), encoding='utf-8')
    req = request.Request(url=url, data=data, headers=headers, method='POST')
    response = request.urlopen(req)
    print(response.read().decode('utf-8'))
    
    {
      "args": {}, 
      "data": "", 
      "files": {}, 
      "form": {
        "name": "0bug"
      }, 
      "headers": {
        "Accept-Encoding": "identity", 
        "Connection": "close", 
        "Content-Length": "9", 
        "Content-Type": "application/x-www-form-urlencoded", 
        "Host": "httpbin.org", 
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36"
      }, 
      "json": null, 
      "origin": "223.72.80.199", 
      "url": "http://httpbin.org/post"
    }
    结果

    add_header

    from urllib import request, parse
    
    url = 'http://httpbin.org/post'
    dic = {'name': '0bug'}
    data = bytes(parse.urlencode(dic), encoding='utf-8')
    req = request.Request(url=url, data=data, method='POST')
    req.add_header('User-Agent',
                   'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36')
    response = request.urlopen(req)
    print(response.read().decode('utf-8'))
    

    Handler

    代理:

    import urllib.request
    
    proxy_handler = urllib.request.ProxyHandler({
        'http': 'http代理',
        'https': 'https代理'
    })
    opener = urllib.request.build_opener(proxy_handler)
    response = opener.open('http://www.cnblogs.com/0bug')
    print(response.read())

    Cookie

    import http.cookiejar, urllib.request
    
    cookie = http.cookiejar.CookieJar()
    handler = urllib.request.HTTPCookieProcessor(cookie)
    opener = urllib.request.build_opener(handler)
    response = opener.open('http://www.baidu.com')
    for item in cookie:
        print(item.name + "=" + item.value)
    
    BAIDUID=9992D9F175AFE48958F04ED9F2DC3659:FG=1
    BIDUPSID=9992D9F175AFE48958F04ED9F2DC3659
    H_PS_PSSID=26255_1437_21100_20927
    PSTM=1524234978
    BDSVRTM=0
    BD_HOME=0
    结果

    Cookie保存为文件

    import http.cookiejar, urllib.request
    
    filename = 'cookie.txt'
    cookie = http.cookiejar.MozillaCookieJar(filename)
    handler = urllib.request.HTTPCookieProcessor(cookie)
    opener = urllib.request.build_opener(handler)
    response = opener.open('http://www.baidu.com')
    cookie.save(ignore_discard=True, ignore_expires=True)
    
    # Netscape HTTP Cookie File
    # http://curl.haxx.se/rfc/cookie_spec.html
    # This is a generated file!  Do not edit.
    
    .baidu.com    TRUE    /    FALSE    3671718969    BAIDUID    0020A8919F61B2CA58DEC1CA10FF140E:FG=1
    .baidu.com    TRUE    /    FALSE    3671718969    BIDUPSID    0020A8919F61B2CA58DEC1CA10FF140E
    .baidu.com    TRUE    /    FALSE        H_PS_PSSID    1442_21110_18560_22158
    .baidu.com    TRUE    /    FALSE    3671718969    PSTM    1524235322
    www.baidu.com    FALSE    /    FALSE        BDSVRTM    0
    www.baidu.com    FALSE    /    FALSE        BD_HOME    0
    cookie.txt

    另一种方式存

    import http.cookiejar, urllib.request
    
    filename = 'cookie.txt'
    cookie = http.cookiejar.LWPCookieJar(filename)
    handler = urllib.request.HTTPCookieProcessor(cookie)
    opener = urllib.request.build_opener(handler)
    response = opener.open('http://www.baidu.com')
    cookie.save(ignore_discard=True, ignore_expires=True)
    
    #LWP-Cookies-2.0
    Set-Cookie3: BAIDUID="654FB9A36A8F5B5C257111158153D963:FG=1"; path="/"; domain=".baidu.com"; path_spec; domain_dot; expires="2086-05-08 17:58:12Z"; version=0
    Set-Cookie3: BIDUPSID=654FB9A36A8F5B5C257111158153D963; path="/"; domain=".baidu.com"; path_spec; domain_dot; expires="2086-05-08 17:58:12Z"; version=0
    Set-Cookie3: H_PS_PSSID=1463_21111; path="/"; domain=".baidu.com"; path_spec; domain_dot; discard; version=0
    Set-Cookie3: PSTM=1524235445; path="/"; domain=".baidu.com"; path_spec; domain_dot; expires="2086-05-08 17:58:12Z"; version=0
    Set-Cookie3: BDSVRTM=0; path="/"; domain="www.baidu.com"; path_spec; discard; version=0
    Set-Cookie3: BD_HOME=0; path="/"; domain="www.baidu.com"; path_spec; discard; version=0
    cookie.txt

    用什么格式的存就应该用什么格式的读

    import http.cookiejar, urllib.request
    
    cookie = http.cookiejar.LWPCookieJar()
    cookie.load('cookie.txt', ignore_discard=True, ignore_expires=True)
    handler = urllib.request.HTTPCookieProcessor(cookie)
    opener = urllib.request.build_opener(handler)
    response = opener.open('http://www.baidu.com')
    print(response.read().decode('utf-8'))
    

    异常处理

    from urllib import request, error
    
    try:
        response = request.urlopen('http://www.cnblogs.com/0bug/xxxx')
    except error.URLError as e:
        print(e.reason)
    
    Not Found
    结果
    from urllib import request, error
    
    try:
        response = request.urlopen('http://www.cnblogs.com/0bug/xxxx')
    except error.HTTPError as e:
        print(e.reason, e.code, e.headers, sep='\n')
    except error.URLError as e:
        print(e.reason)
    else:
        print('Request Successfully')
    
    Not Found
    404
    Date: Fri, 20 Apr 2018 14:57:48 GMT
    Content-Type: text/html
    Content-Length: 759
    Connection: close
    Cache-Control: private
    X-UA-Compatible: IE=10
    X-Frame-Options: SAMEORIGIN
    结果
    import socket
    import urllib.request
    import urllib.error
    
    try:
        response = urllib.request.urlopen('http://www.cnblogs.com/0bug/xxxx', timeout=0.001)
    except urllib.error.URLError as e:
        print(type(e.reason))
        if isinstance(e.reason, socket.timeout):
            print('请求超时')
    
    <class 'socket.timeout'>
    请求超时
    结果

    URL解析

    from urllib.parse import urlparse
    
    result = urlparse('www.baidu.com/index.html;user?id=5#comment')
    print(type(result))
    print(result)
    
    <class 'urllib.parse.ParseResult'>
    ParseResult(scheme='', netloc='', path='www.baidu.com/index.html', params='user', query='id=5', fragment='comment')
    结果
    from urllib.parse import urlparse
    
    result = urlparse('www.baidu.com/index.html;user?id=5#comment', scheme='https')
    print(result)
    
    ParseResult(scheme='https', netloc='', path='www.baidu.com/index.html', params='user', query='id=5', fragment='comment')
    结果
    from urllib.parse import urlparse
    
    result = urlparse('http://www.baidu.com/index.html;user?id=5#comment', scheme='https')
    print(result)
    
    ParseResult(scheme='http', netloc='www.baidu.com', path='/index.html', params='user', query='id=5', fragment='comment')
    结果
    from urllib.parse import urlparse
    
    result = urlparse('http://www.badiu.com/index.html;user?id=5#comment', allow_fragments=False)
    print(result)
    
    ParseResult(scheme='http', netloc='www.badiu.com', path='/index.html', params='user', query='id=5#comment', fragment='')
    结果
    from urllib.parse import urlparse
    
    result = urlparse('http://www.badiu.com/index.html#comment', allow_fragments=False)
    print(result)
    
    ParseResult(scheme='http', netloc='www.badiu.com', path='/index.html#comment', params='', query='', fragment='')
    结果

    urlunparse

    from urllib.parse import urlunparse
    
    data = ['http', 'www.baidu.com', 'index.html', 'user', 'id=6', 'comment']
    print(urlunparse(data))
    
    http://www.baidu.com/index.html;user?id=6#comment
    结果

    urljoin

    from urllib.parse import urljoin
    
    print(urljoin('http://www.baidu.com', 'ABC.html'))
    print(urljoin('http://www.baidu.com', 'https://www.cnblogs.com/0bug'))
    print(urljoin('http://www.baidu.com/0bug', 'https://www.cnblogs.com/0bug'))
    print(urljoin('http://www.baidu.com/0bug', 'https://www.cnblogs.com/0bug?q=2'))
    print(urljoin('http://www.baidu.com/0bug?q=2', 'https://www.cnblogs.com/0bug'))
    print(urljoin('http://www.baidu.com', '?q=2#comment'))
    print(urljoin('www.baidu.com', '?q=2#comment'))
    print(urljoin('www.baidu.com#comment', '?q=2'))
    
    http://www.baidu.com/ABC.html
    https://www.cnblogs.com/0bug
    https://www.cnblogs.com/0bug
    https://www.cnblogs.com/0bug?q=2
    https://www.cnblogs.com/0bug
    http://www.baidu.com?q=2#comment
    www.baidu.com?q=2#comment
    www.baidu.com?q=2
    结果

    urlencode

    from urllib.parse import urlencode
    
    params = {
        'name': '0bug',
        'age': 25
    }
    base_url = 'http://www.badiu.com?'
    url = base_url + urlencode(params)
    print(url)
    
    http://www.badiu.com?name=0bug&age=25
    结果
  • 相关阅读:
    python datetime unix时间戳以及字符串时间戳转换
    Linux下Shell的for循环语句
    分布式学习最佳实践:从分布式系统的特征开始(附思维导图)
    什么是分布式系统,如何学习分布式系统
    Spring Boot 之发送邮件
    v8是怎么实现更快的 await ?深入理解 await 的运行机制
    分布式=高并发=多线程
    半个月使用rust语言的体验
    Enter Query Mode Search Tricks Using Enter_Query Built-in in Oracle Forms
    How to Log Users Login and Logout Details Through Oracle Forms
  • 原文地址:https://www.cnblogs.com/0bug/p/8893677.html
Copyright © 2011-2022 走看看