zoukankan      html  css  js  c++  java
  • 20.5. urllib — Open arbitrary resources by URL — Python v2.7.2 documentation

    20.5. urllib — Open arbitrary resources by URL — Python v2.7.2 documentation

    .4. urllib Restrictions

    • Currently, only the following protocols are supported: HTTP, (versions 0.9 and
      1.0), FTP, and local files.

    • The caching feature of urlretrieve() has been disabled until I find the
      time to hack proper processing of Expiration time headers.

    • There should be a function to query whether a particular URL is in the cache.

    • For backward compatibility, if a URL appears to point to a local file but the
      file can’t be opened, the URL is re-interpreted using the FTP protocol. This
      can sometimes cause confusing error messages.

    • The urlopen() and urlretrieve() functions can cause arbitrarily
      long delays while waiting for a network connection to be set up. This means
      that it is difficult to build an interactive Web client using these functions
      without using threads.

    • The data returned by urlopen() or urlretrieve() is the raw data
      returned by the server. This may be binary data (such as an image), plain text
      or (for example) HTML. The HTTP protocol provides type information in the reply
      header, which can be inspected by looking at the Content-Type
      header. If the returned data is HTML, you can use the module htmllib to
      parse it.

    • The code handling the FTP protocol cannot differentiate between a file and a
      directory. This can lead to unexpected behavior when attempting to read a URL
      that points to a file that is not accessible. If the URL ends in a /, it is
      assumed to refer to a directory and will be handled accordingly. But if an
      attempt to read a file leads to a 550 error (meaning the URL cannot be found or
      is not accessible, often for permission reasons), then the path is treated as a
      directory in order to handle the case when a directory is specified by a URL but
      the trailing / has been left off. This can cause misleading results when
      you try to fetch a file whose read permissions make it inaccessible; the FTP
      code will try to read it, fail with a 550 error, and then perform a directory
      listing for the unreadable file. If fine-grained control is needed, consider
      using the ftplib module, subclassing FancyURLopener, or changing
      _urlopener to meet your needs.

    • This module does not support the use of proxies which require authentication.
      This may be implemented in the future.

    • Although the urllib module contains (undocumented) routines to parse
      and unparse URL strings, the recommended interface for URL manipulation is in
      module urlparse.

     

    20.5.5. Examples

    Here is an example session that uses the GET method to retrieve a URL
    containing parameters:

    >>>
    >>> import urllib
    >>> params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
    >>> f = urllib.urlopen("http://www.musi-cal.com/cgi-bin/query?%s" % params)
    >>> print f.read()
    

    The following example uses the POST method instead:

    >>>
    >>> import urllib
    >>> params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
    >>> f = urllib.urlopen("http://www.musi-cal.com/cgi-bin/query", params)
    >>> print f.read()
    

    The following example uses an explicitly specified HTTP proxy, overriding
    environment settings:

    >>>
    >>> import urllib
    >>> proxies = {'http': 'http://proxy.example.com:8080/'}
    >>> opener = urllib.FancyURLopener(proxies)
    >>> f = opener.open("http://www.python.org")
    >>> f.read()
    

    The following example uses no proxies at all, overriding environment settings:

    >>>
    >>> import urllib
    >>> opener = urllib.FancyURLopener({})
    >>> f = opener.open("http://www.python.org/")
    >>> f.read()
  • 相关阅读:
    火眼金睛算法,教你海量短文本场景下去重
    CynosDB技术详解——架构设计
    CynosDB技术详解——存储集群管理
    解决 "Script Error" 的另类思路
    Go 语言实践(一)
    Vue.js的复用组件开发流程
    MYSQL中的COLLATE是什么?
    Blending
    AlphaTesting
    Culling & Depth Testing
  • 原文地址:https://www.cnblogs.com/lexus/p/2416658.html
Copyright © 2011-2022 走看看