zoukankan      html  css  js  c++  java
  • CherryProxy a filtering HTTP proxy extensible in Python

    CherryProxy - a filtering HTTP proxy extensible in Python | Decalage

    CherryProxy - a filtering HTTP proxy extensible in Python

    CherryProxy is a simple HTTP proxy written in Python 2.x, based on the CherryPy WSGI server and httplib, extensible for content analysis and filtering.

    It has not been designed for operational use and the current version lacks some HTTP features (such as HTTPS support), so some websites will not display properly. However, it should be very useful for testing / demo / prototyping / educational purposes.

    Why a new proxy

    There are already quite a few HTTP proxies developed in Python, as shown by the excellent list maintained by xhaus. I needed a simple proxy with filtering features, so I looked at several of them. But they were either too simple with lacking features, or too complex to extend, so I decided to develop a different one.

    I chose the CherryPy WSGI server because it provides a robust, thread-pooled HTTP server with good HTTP 1.1 support.

    News:

    • 2011-11-22: moved CherryProxy to its own bitbucket project
    • 2011-11-15 v0.12: added parent proxy support

    Download:

    Get the zip archive from here, or use Mercurial to get the latest source code from here.

    Install

    On Windows, double-click on install.bat. On other systems, run "python setup.py install" from a shell.

    License:

    Open-source, BSD-style

    Usage as a tool (simple proxy):

    1) run CherryProxy.py [options]

    Options:

      -h, --help            show this help message and exit

      -p PORT, --port=PORT  port for HTTP proxy, 8070 by default

      -a ADDRESS, --address=ADDRESS

                            IP address of interface for HTTP proxy (0.0.0.0 for

                            all, default=localhost)

      -f PROXY, --forward=PROXY

                            Forward requests to parent proxy, specified as

                            hostname[:port] or IP address[:port]

      -v, --verbose

    2) setup your browser to use localhost:8070 as proxy

    Usage in a Python Application:

    - import cherryproxy

    - create a subclass of cherryproxy.CherryProxy

    - implement methods filter_request and/or filter_response to enable filtering as

      needed.

    - see provided examples

    Filtering API:

    CherryProxy: class implementing a filtering HTTP proxy

     

    To use it, create a class inheriting from CherryProxy and implement the

    methods filter_request and filter_response as desired.

    Then call the start method to start the proxy.

    Note: the logging module needs to be initialized before creating a

    CherryProxy object.

    See the example scripts for more information.

    __init__(self, address='localhost', port=8070, server_name='CherryProxy/0.12', debug=False, log_level=20, options=None, parent_proxy=None)
    CherryProxy constructor

     

    address: IP address of interface to listen to, or 0.0.0.0 for all

             (localhost by default)

    port: TCP port for the proxy (8070 by default)

    server_name: server name used in HTTP responses

    debug: enable debugging messages if set to True

    log_level: logging level (use constants from logging module)

    options: None or optparse.OptionParser object to provide additional options

    parent_proxy: parent proxy, either IP address or hostname, with optional

        port (example: 'myproxy.local:8080')
    filter_request(self)
    Method to be overridden:

    Called to analyse/filter/modify the request received from the client,

    after reading the full request with its body if there is one,

    before it is sent to the server.

     

    This method may call set_response() if the request needs to be blocked

    before being sent to the server.

     

    The following attributes can be read and MODIFIED:

        self.req.data: data sent with the request (POST or PUT)

        (and also all listed in filter_request_headers)
    filter_request_headers(self)
    Method to be overridden:

    Called to analyse/filter/modify the request received from the client,

    before reading the full request with its body if there is one,

    before it is sent to the server.

     

    This method may call set_response() if the request needs to be blocked

    before being sent to the server.

     

    The following attributes can be read and MODIFIED:

        self.req.headers: dictionary of HTTP headers, with lowercase names

        self.req.method: HTTP method, e.g. 'GET', 'POST', etc

        self.req.scheme: protocol from URL, e.g. 'http' or 'https'

        self.req.netloc: IP address or hostname of server, with optional

                         port, for example 'www.google.com' or '1.2.3.4:8000'

        self.req.path: path in URL, for example '/folder/index.html'

        self.req.query: query string, found after question mark in URL

     

    The following attributes can be READ only:

        self.req.environ: dictionary of request attributes following WSGI

                          format (PEP 333)

        self.req.url: partial URL containing 'path?query'

        self.req.full_url: full URL containing 'scheme:netloc/path?query'

        self.req.length: length of request data in bytes, 0 if none

        self.req.content_type: content-type, for example 'text/html'

        self.req.charset: charset, for example 'UTF-8'

        self.req.url_filename: filename extracted from URL path
    filter_response(self)
    Method to be overridden:

    Called to analyse/filter/modify the response received from the server,

    after reading the full response with its body if there is one,

    before it is sent back to the client.

     

    This method may call set_response() if the response needs to be blocked

    (e.g. replaced by a simple response) before being sent to the client.
    filter_response_headers(self)
    Method to be overridden:

    Called to analyse/filter/modify the response received from the server,

    before reading the full response with its body if there is one,

    before it is sent back to the client.

     

    This method may call set_response() if the response needs to be blocked

    (e.g. replaced by a simple response) before being sent to the client.
    set_response(self, status, reason=None, data=None, content_type='text/plain')
    set a HTTP response to be sent to the client instead of the one from

    the server.

     

    - status: int, HTTP status code (see RFC 2616)

    - reason: str, optional text for the response line, standard text by default

    - data: str, optional body for the response, default="status reason"

    - content_type: str, content-type corresponding to data
    set_response_forbidden(self, status=403, reason='Forbidden', data=None, content_type='text/plain')
    set a HTTP 403 Forbidden response to be sent to the client instead of

    the one from the server.

     

    - status: int, HTTP status code (see RFC 2616)

    - reason: str, optional text for the response line, standard text by default

    - data: str, optional body for the response, default="status reason"

    - content_type: str, content-type corresponding to data
    start(self)
    start proxy server
    stop(self)
    stop proxy server
  • 相关阅读:
    Linux关闭jetty服务器脚本
    TreeMap 源码解读
    LinkedHashMap 源码解读
    HashTable 源码解读
    MappedByteBuffer文件句柄释放问题
    HashMap源码解读
    Java 对象创建过程
    java 虚拟机内存介绍
    dubbo 部署
    kotlin 学习入门
  • 原文地址:https://www.cnblogs.com/lexus/p/2476018.html
Copyright © 2011-2022 走看看