zoukankan      html  css  js  c++  java
  • 20.5. urllib — Open arbitrary resources by URL — Python v2.7.2 documentation

    20.5. urllib — Open arbitrary resources by URL — Python v2.7.2 documentation

    .4. urllib Restrictions

    • Currently, only the following protocols are supported: HTTP, (versions 0.9 and
      1.0), FTP, and local files.

    • The caching feature of urlretrieve() has been disabled until I find the
      time to hack proper processing of Expiration time headers.

    • There should be a function to query whether a particular URL is in the cache.

    • For backward compatibility, if a URL appears to point to a local file but the
      file can’t be opened, the URL is re-interpreted using the FTP protocol. This
      can sometimes cause confusing error messages.

    • The urlopen() and urlretrieve() functions can cause arbitrarily
      long delays while waiting for a network connection to be set up. This means
      that it is difficult to build an interactive Web client using these functions
      without using threads.

    • The data returned by urlopen() or urlretrieve() is the raw data
      returned by the server. This may be binary data (such as an image), plain text
      or (for example) HTML. The HTTP protocol provides type information in the reply
      header, which can be inspected by looking at the Content-Type
      header. If the returned data is HTML, you can use the module htmllib to
      parse it.

    • The code handling the FTP protocol cannot differentiate between a file and a
      directory. This can lead to unexpected behavior when attempting to read a URL
      that points to a file that is not accessible. If the URL ends in a /, it is
      assumed to refer to a directory and will be handled accordingly. But if an
      attempt to read a file leads to a 550 error (meaning the URL cannot be found or
      is not accessible, often for permission reasons), then the path is treated as a
      directory in order to handle the case when a directory is specified by a URL but
      the trailing / has been left off. This can cause misleading results when
      you try to fetch a file whose read permissions make it inaccessible; the FTP
      code will try to read it, fail with a 550 error, and then perform a directory
      listing for the unreadable file. If fine-grained control is needed, consider
      using the ftplib module, subclassing FancyURLopener, or changing
      _urlopener to meet your needs.

    • This module does not support the use of proxies which require authentication.
      This may be implemented in the future.

    • Although the urllib module contains (undocumented) routines to parse
      and unparse URL strings, the recommended interface for URL manipulation is in
      module urlparse.

     

    20.5.5. Examples

    Here is an example session that uses the GET method to retrieve a URL
    containing parameters:

    >>>
    >>> import urllib
    >>> params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
    >>> f = urllib.urlopen("http://www.musi-cal.com/cgi-bin/query?%s" % params)
    >>> print f.read()
    

    The following example uses the POST method instead:

    >>>
    >>> import urllib
    >>> params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
    >>> f = urllib.urlopen("http://www.musi-cal.com/cgi-bin/query", params)
    >>> print f.read()
    

    The following example uses an explicitly specified HTTP proxy, overriding
    environment settings:

    >>>
    >>> import urllib
    >>> proxies = {'http': 'http://proxy.example.com:8080/'}
    >>> opener = urllib.FancyURLopener(proxies)
    >>> f = opener.open("http://www.python.org")
    >>> f.read()
    

    The following example uses no proxies at all, overriding environment settings:

    >>>
    >>> import urllib
    >>> opener = urllib.FancyURLopener({})
    >>> f = opener.open("http://www.python.org/")
    >>> f.read()
  • 相关阅读:
    leetcode--Remove Duplicates from Sorted Array
    leetcode--Valid Parentheses
    leetcode--Longest Substring Without Repeating Characters
    leetcode--Combination Sum
    leetcode--Valid Sudoku
    java 4对象群体的组织
    java 3 接口与多态&输入输出流
    java 3类的继承
    java 2类与对象[学堂在线]
    计算机网络{网页开发与服务配置}
  • 原文地址:https://www.cnblogs.com/lexus/p/2416658.html
Copyright © 2011-2022 走看看