zoukankan      html  css  js  c++  java
  • 彻底解决python cgi 编程出现的编码问题

    Answering this for late-comers because I don't think that the posted answers get to the root of the problem, which is the lack of locale environment variables in a CGI context. I'm using Python 3.2.

    1. open() opens file objects in text (string) or binary (bytes) mode for reading and/or writing; in text mode the encoding used to encode strings written to the file, and decode bytes read from the file, may be specified in the call; if it isn't then it is determined by locale.getpreferredencoding(), which on linux uses the encoding from your locale environment settings, which is normally utf-8 (from e.g. LANG=en_US.UTF-8)

      >>> f = open('foo', 'w')         # open file for writing in text mode
      >>> f.encoding
      'UTF-8'                          # encoding is from the environment
      >>> f.write('€')                 # write a Unicode string
      1
      >>> f.close()
      >>> exit()
      user@host:~$ hd foo
      00000000  e2 82 ac      |...|    # data is UTF-8 encoded
    2. sys.stdout is in fact a file opened for writing in text mode with an encoding based on locale.getpreferredencoding(); you can write strings to it just fine and they'll be encoded to bytes based on sys.stdout's encoding; print() by default writes to sys.stdout - print() itself has no encoding, rather it's the file it writes to that has an encoding;

      >>> sys.stdout.encoding
      'UTF-8'                          # encoding is from the environment
      >>> exit()
      user@host:~$ python3 -c 'print("€")' > foo
      user@host:~$ hd foo
      00000000  e2 82 ac 0a   |....|   # data is UTF-8 encoded; 
       is from print()

      ; you cannot write bytes to sys.stdout - use sys.stdout.buffer.write() for that; if you try to write bytes to sys.stdout using sys.stdout.write() then it will return an error, and if you try using print() then print() will simply turn the bytes object into a string object and an escape sequence like xff will be treated as the four characters \, x, f, f

      user@host:~$ python3 -c 'print(b"xe2xf82xac")' > foo
      user@host:~$ hd foo
      00000000  62 27 5c 78 65 32 5c 78  66 38 32 5c 78 61 63 27  |b'xe2xf82xac'|
      00000010  0a                                                |.|
    3. in a CGI script you need to write to sys.stdout and you can use print() to do it; but a CGI script process in Apache has no locale environment settings - they are not part of the CGI specification; therefore the sys.stdout encoding defaults to ANSI_X3.4-1968 - in other words, ASCII; if you try to print() a string that contain non-ASCII characters to sys.stdout you'll get "UnicodeEncodeError: 'ascii' codec can't encode character...: ordinal not in range(128)"

    4. a simple solution is to pass the Apache process's LANG environment variable through to the CGI script using Apache's mod_env PassEnv command in the server or virtual host configuration: PassEnv LANG; on Debian/Ubuntu make sure that in /etc/apache2/envvars you have uncommented the line ". /etc/default/locale" so that Apache runs with the system default locale and not the C (Posix) locale (which is also ASCII encoding); the following CGI script should run without errors in Python 3.2:

      #!/usr/bin/env python3
      import sys
      print('Content-Type: text/html; charset=utf-8')
      print()
      print('<html><body><pre>' + sys.stdout.encoding + '</pre>h€lló wörld<body></html>')

    https://stackoverflow.com/questions/9322410/set-encoding-in-python-3-cgi-scripts

  • 相关阅读:
    HUST 1372 marshmallow
    HUST 1371 Emergency relief
    CodeForces 629D Babaei and Birthday Cake
    CodeForces 629C Famil Door and Brackets
    ZOJ 3872 Beauty of Array
    ZOJ 3870 Team Formation
    HDU 5631 Rikka with Graph
    HDU 5630 Rikka with Chess
    CodeForces 626D Jerry's Protest
    【POJ 1964】 City Game
  • 原文地址:https://www.cnblogs.com/peter1994/p/7655315.html
Copyright © 2011-2022 走看看