zoukankan      html  css  js  c++  java
  • requests上传文件

    官方文档:https://2.python-requests.org//en/master/

    工作中涉及到一个功能,需要上传附件到一个接口,接口参数如下:

    使用http post提交附件 multipart/form-data 格式,url : http://test.com/flow/upload,

    1 字段列表:
    2 md5:            //md5加密(随机值_当时时间戳)
    3 filesize:   //文件大小
    4 file:              //文件内容(须含文件名)
    5 返回值:
    6 {"success":true,"uploadName":"tmp.xml","uploadPath":"uploads/201311/758e875fb7c7a508feef6b5036119b9f"}

    由于工作中主要用python,并且项目中已有使用requests库的地方,所以计划使用requests来实现,本来以为是很简单的一个小功能,结果花费了大量的时间,requests官方的例子只提到了上传文件,并不需要传额外的参数:

    https://2.python-requests.org//en/master/user/quickstart/#post-a-multipart-encoded-file

     1 >>> url = 'https://httpbin.org/post'
     2 >>> files = {'file': ('report.xls', open('report.xls', 'rb'), 'application/vnd.ms-excel', {'Expires': '0'})}
     3 
     4 >>> r = requests.post(url, files=files)
     5 >>> r.text
     6 {
     7   ...
     8   "files": {
     9     "file": "<censored...binary...data>"
    10   },
    11   ...
    12 }

    但是如果涉及到了参数的传递时,其实就要用到requests的两个参数:data、files,将要上传的文件传入files,将其他参数传入data,request库会将两者合并到一起做一个multi part,然后发送给服务器。

    最终实现的代码是这样的:

    1 with open(file_name) as f:
    2   content = f.read()
    3 request_data = { 
    4    'md5':md5.md5('%d_%d' % (0, int(time.time()))).hexdigest(), 
    5    'filesize':len(content), 
    6 }   
    7 files = {'file':(file_name, open(file_name, 'rb'))}
    8 MyLogger().getlogger().info('url:%s' % (request_url))
    9 resp = requests.post(request_url, data=request_data, files=files)

    虽然最终代码可能看起来很简单,但是其实我费了好大功夫才确认这样是OK的,中间还翻了requests的源码,下面记录一下翻阅源码的过程:

    首先,找到post方法的实现,在requests.api.py中:

     1 def post(url, data=None, json=None, **kwargs):
     2     r"""Sends a POST request.
     3 
     4     :param url: URL for the new :class:`Request` object.
     5     :param data: (optional) Dictionary, list of tuples, bytes, or file-like
     6         object to send in the body of the :class:`Request`.
     7     :param json: (optional) json data to send in the body of the :class:`Request`.
     8     :param **kwargs: Optional arguments that ``request`` takes.
     9     :return: :class:`Response <Response>` object
    10     :rtype: requests.Response
    11     """
    12 
    13     return request('post', url, data=data, json=json, **kwargs)

    这里可以看到它调用了request方法,咱们继续跟进request方法,在requests.api.py中:

     1 def request(method, url, **kwargs):
     2     """Constructs and sends a :class:`Request <Request>`.
     3 
     4     :param method: method for the new :class:`Request` object: ``GET``, ``OPTIONS``, ``HEAD``, ``POST``, ``PUT``, ``PATCH``, or ``DELETE``.
     5     :param url: URL for the new :class:`Request` object.
     6     :param params: (optional) Dictionary, list of tuples or bytes to send
     7         in the query string for the :class:`Request`.
     8     :param data: (optional) Dictionary, list of tuples, bytes, or file-like
     9         object to send in the body of the :class:`Request`.
    10     :param json: (optional) A JSON serializable Python object to send in the body of the :class:`Request`.
    11     :param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`.
    12     :param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`.
    13     :param files: (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': file-tuple}``) for multipart encoding upload.
    14         ``file-tuple`` can be a 2-tuple ``('filename', fileobj)``, 3-tuple ``('filename', fileobj, 'content_type')``
    15         or a 4-tuple ``('filename', fileobj, 'content_type', custom_headers)``, where ``'content-type'`` is a string
    16         defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers
    17         to add for the file.
    18     :param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth.
    19     :param timeout: (optional) How many seconds to wait for the server to send data
    20         before giving up, as a float, or a :ref:`(connect timeout, read
    21         timeout) <timeouts>` tuple.
    22     :type timeout: float or tuple
    23     :param allow_redirects: (optional) Boolean. Enable/disable GET/OPTIONS/POST/PUT/PATCH/DELETE/HEAD redirection. Defaults to ``True``.
    24     :type allow_redirects: bool
    25     :param proxies: (optional) Dictionary mapping protocol to the URL of the proxy.
    26     :param verify: (optional) Either a boolean, in which case it controls whether we verify
    27             the server's TLS certificate, or a string, in which case it must be a path
    28             to a CA bundle to use. Defaults to ``True``.
    29     :param stream: (optional) if ``False``, the response content will be immediately downloaded.
    30     :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair.
    31     :return: :class:`Response <Response>` object
    32     :rtype: requests.Response
    33 
    34     Usage::
    35 
    36       >>> import requests
    37       >>> req = requests.request('GET', 'https://httpbin.org/get')
    38       <Response [200]>
    39     """
    40 
    41     # By using the 'with' statement we are sure the session is closed, thus we
    42     # avoid leaving sockets open which can trigger a ResourceWarning in some
    43     # cases, and look like a memory leak in others.
    44     with sessions.Session() as session:
    45         return session.request(method=method, url=url, **kwargs)

    这个方法的注释比较多,从注释里其实已经可以看到files参数使用传送文件,但是还是无法知道当需要同时传递参数和文件时该如何处理,继续跟进session.request方法,在requests.session.py中:

     1     def request(self, method, url,
     2             params=None, data=None, headers=None, cookies=None, files=None,
     3             auth=None, timeout=None, allow_redirects=True, proxies=None,
     4             hooks=None, stream=None, verify=None, cert=None, json=None):
     5         """Constructs a :class:`Request <Request>`, prepares it and sends it.
     6         Returns :class:`Response <Response>` object.
     7 
     8         :param method: method for the new :class:`Request` object.
     9         :param url: URL for the new :class:`Request` object.
    10         :param params: (optional) Dictionary or bytes to be sent in the query
    11             string for the :class:`Request`.
    12         :param data: (optional) Dictionary, list of tuples, bytes, or file-like
    13             object to send in the body of the :class:`Request`.
    14         :param json: (optional) json to send in the body of the
    15             :class:`Request`.
    16         :param headers: (optional) Dictionary of HTTP Headers to send with the
    17             :class:`Request`.
    18         :param cookies: (optional) Dict or CookieJar object to send with the
    19             :class:`Request`.
    20         :param files: (optional) Dictionary of ``'filename': file-like-objects``
    21             for multipart encoding upload.
    22         :param auth: (optional) Auth tuple or callable to enable
    23             Basic/Digest/Custom HTTP Auth.
    24         :param timeout: (optional) How long to wait for the server to send
    25             data before giving up, as a float, or a :ref:`(connect timeout,
    26             read timeout) <timeouts>` tuple.
    27         :type timeout: float or tuple
    28         :param allow_redirects: (optional) Set to True by default.
    29         :type allow_redirects: bool
    30         :param proxies: (optional) Dictionary mapping protocol or protocol and
    31             hostname to the URL of the proxy.
    32         :param stream: (optional) whether to immediately download the response
    33             content. Defaults to ``False``.
    34         :param verify: (optional) Either a boolean, in which case it controls whether we verify
    35             the server's TLS certificate, or a string, in which case it must be a path
    36             to a CA bundle to use. Defaults to ``True``.
    37         :param cert: (optional) if String, path to ssl client cert file (.pem).
    38             If Tuple, ('cert', 'key') pair.
    39         :rtype: requests.Response
    40         """
    41         # Create the Request.
    42         req = Request(
    43             method=method.upper(),
    44             url=url,
    45             headers=headers,
    46             files=files,
    47             data=data or {},
    48             json=json,
    49             params=params or {},
    50             auth=auth,
    51             cookies=cookies,
    52             hooks=hooks,
    53         )
    54         prep = self.prepare_request(req)
    55 
    56         proxies = proxies or {}
    57 
    58         settings = self.merge_environment_settings(
    59             prep.url, proxies, stream, verify, cert
    60         )
    61 
    62         # Send the request.
    63         send_kwargs = {
    64             'timeout': timeout,
    65             'allow_redirects': allow_redirects,
    66         }
    67         send_kwargs.update(settings)
    68         resp = self.send(prep, **send_kwargs)
    69 
    70         return resp

    先大概看一下这个方法,先是准备request,最后一步是调用send,推测应该是发送请求了,所以我们需要跟进到prepare_request方法中,在requests.session.py中:

     1 def prepare_request(self, request):
     2         """Constructs a :class:`PreparedRequest <PreparedRequest>` for
     3         transmission and returns it. The :class:`PreparedRequest` has settings
     4         merged from the :class:`Request <Request>` instance and those of the
     5         :class:`Session`.
     6 
     7         :param request: :class:`Request` instance to prepare with this
     8             session's settings.
     9         :rtype: requests.PreparedRequest
    10         """
    11         cookies = request.cookies or {}
    12 
    13         # Bootstrap CookieJar.
    14         if not isinstance(cookies, cookielib.CookieJar):
    15             cookies = cookiejar_from_dict(cookies)
    16 
    17         # Merge with session cookies
    18         merged_cookies = merge_cookies(
    19             merge_cookies(RequestsCookieJar(), self.cookies), cookies)
    20 
    21         # Set environment's basic authentication if not explicitly set.
    22         auth = request.auth
    23         if self.trust_env and not auth and not self.auth:
    24             auth = get_netrc_auth(request.url)
    25 
    26         p = PreparedRequest()
    27         p.prepare(
    28             method=request.method.upper(),
    29             url=request.url,
    30             files=request.files,
    31             data=request.data,
    32             json=request.json,
    33             headers=merge_setting(request.headers, self.headers, dict_class=CaseInsensitiveDict),
    34             params=merge_setting(request.params, self.params),
    35             auth=merge_setting(auth, self.auth),
    36             cookies=merged_cookies,
    37             hooks=merge_hooks(request.hooks, self.hooks),
    38         )
    39         return p

    在prepare_request中,生成了一个PreparedRequest对象,并调用其prepare方法,跟进到prepare方法中,在requests.models.py中:

     1 def prepare(self,
     2             method=None, url=None, headers=None, files=None, data=None,
     3             params=None, auth=None, cookies=None, hooks=None, json=None):
     4         """Prepares the entire request with the given parameters."""
     5 
     6         self.prepare_method(method)
     7         self.prepare_url(url, params)
     8         self.prepare_headers(headers)
     9         self.prepare_cookies(cookies)
    10         self.prepare_body(data, files, json)
    11         self.prepare_auth(auth, url)
    12 
    13         # Note that prepare_auth must be last to enable authentication schemes
    14         # such as OAuth to work on a fully prepared request.
    15 
    16         # This MUST go after prepare_auth. Authenticators could add a hook
    17         self.prepare_hooks(hooks)

    这里调用许多prepare_xx方法,这里我们只关心处理了data、files、json的方法,跟进到prepare_body中,在requests.models.py中:

     1 def prepare_body(self, data, files, json=None):
     2         """Prepares the given HTTP body data."""
     3 
     4         # Check if file, fo, generator, iterator.
     5         # If not, run through normal process.
     6 
     7         # Nottin' on you.
     8         body = None
     9         content_type = None
    10 
    11         if not data and json is not None:
    12             # urllib3 requires a bytes-like body. Python 2's json.dumps
    13             # provides this natively, but Python 3 gives a Unicode string.
    14             content_type = 'application/json'
    15             body = complexjson.dumps(json)
    16             if not isinstance(body, bytes):
    17                 body = body.encode('utf-8')
    18 
    19         is_stream = all([
    20             hasattr(data, '__iter__'),
    21             not isinstance(data, (basestring, list, tuple, Mapping))
    22         ])
    23 
    24         try:
    25             length = super_len(data)
    26         except (TypeError, AttributeError, UnsupportedOperation):
    27             length = None
    28 
    29         if is_stream:
    30             body = data
    31 
    32             if getattr(body, 'tell', None) is not None:
    33                 # Record the current file position before reading.
    34                 # This will allow us to rewind a file in the event
    35                 # of a redirect.
    36                 try:
    37                     self._body_position = body.tell()
    38                 except (IOError, OSError):
    39                     # This differentiates from None, allowing us to catch
    40                     # a failed `tell()` later when trying to rewind the body
    41                     self._body_position = object()
    42 
    43             if files:
    44                 raise NotImplementedError('Streamed bodies and files are mutually exclusive.')
    45 
    46             if length:
    47                 self.headers['Content-Length'] = builtin_str(length)
    48             else:
    49                 self.headers['Transfer-Encoding'] = 'chunked'
    50         else:
    51             # Multi-part file uploads.
    52             if files:
    53                 (body, content_type) = self._encode_files(files, data)
    54             else:
    55                 if data:
    56                     body = self._encode_params(data)
    57                     if isinstance(data, basestring) or hasattr(data, 'read'):
    58                         content_type = None
    59                     else:
    60                         content_type = 'application/x-www-form-urlencoded'
    61 
    62             self.prepare_content_length(body)
    63 
    64             # Add content-type if it wasn't explicitly provided.
    65             if content_type and ('content-type' not in self.headers):
    66                 self.headers['Content-Type'] = content_type
    67 
    68         self.body = body

    这个函数比较长,需要重点关注L52,这里调用了_encode_files方法,我们跟进这个方法:

     1     def _encode_files(files, data):
     2         """Build the body for a multipart/form-data request.
     3 
     4         Will successfully encode files when passed as a dict or a list of
     5         tuples. Order is retained if data is a list of tuples but arbitrary
     6         if parameters are supplied as a dict.
     7         The tuples may be 2-tuples (filename, fileobj), 3-tuples (filename, fileobj, contentype)
     8         or 4-tuples (filename, fileobj, contentype, custom_headers).
     9         """
    10         if (not files):
    11             raise ValueError("Files must be provided.")
    12         elif isinstance(data, basestring):
    13             raise ValueError("Data must not be a string.")
    14 
    15         new_fields = []
    16         fields = to_key_val_list(data or {})
    17         files = to_key_val_list(files or {})
    18 
    19         for field, val in fields:
    20             if isinstance(val, basestring) or not hasattr(val, '__iter__'):
    21                 val = [val]
    22             for v in val:
    23                 if v is not None:
    24                     # Don't call str() on bytestrings: in Py3 it all goes wrong.
    25                     if not isinstance(v, bytes):
    26                         v = str(v)
    27 
    28                     new_fields.append(
    29                         (field.decode('utf-8') if isinstance(field, bytes) else field,
    30                          v.encode('utf-8') if isinstance(v, str) else v))
    31 
    32         for (k, v) in files:
    33             # support for explicit filename
    34             ft = None
    35             fh = None
    36             if isinstance(v, (tuple, list)):
    37                 if len(v) == 2:
    38                     fn, fp = v
    39                 elif len(v) == 3:
    40                     fn, fp, ft = v
    41                 else:
    42                     fn, fp, ft, fh = v
    43             else:
    44                 fn = guess_filename(v) or k
    45                 fp = v
    46 
    47             if isinstance(fp, (str, bytes, bytearray)):
    48                 fdata = fp
    49             elif hasattr(fp, 'read'):
    50                 fdata = fp.read()
    51             elif fp is None:
    52                 continue
    53             else:
    54                 fdata = fp
    55 
    56             rf = RequestField(name=k, data=fdata, filename=fn, headers=fh)
    57             rf.make_multipart(content_type=ft)
    58             new_fields.append(rf)
    59 
    60         body, content_type = encode_multipart_formdata(new_fields)
    61 
    62         return body, content_type

    OK,到此为止,仔细阅读完这个段代码,就可以搞明白requests.post方法传入的data、files两个参数的作用了,其实requests在这里把它俩合并在一起了,作为post的body。

  • 相关阅读:
    linux加载和卸载模块
    java 面试题之银行业务系统
    java 面试题之交通灯管理系统
    java 实现类似spring的可配置的AOP框架
    分析JVM动态生成的类
    最长上升子序列(模板)
    项目管理模式
    让thinkphp 支持ftp上传到远程,ftp删除
    hdu 1280 前m大的数 哈希
    互联网+脑科学,中国脑计划的机会
  • 原文地址:https://www.cnblogs.com/lit10050528/p/11285600.html
Copyright © 2011-2022 走看看