zoukankan      html  css  js  c++  java
  • protobuf protocol-buffers 序列化数据 gobs pickling string XML 用C实现的cPickle比pickle快1000倍 protobuf2 protobuf3 差异 数据类型



    大帆船 搜狗测试 2020-05-22





    message Student {  string name = 1;  int32 age = 2;  // true: male, false:female  bool sex = 3;}










    message Person {   int32 id = 1;   string name = 2;   string email = 3;}


    message Point {    int32 latitude = 1;    int32 longitude = 2;}
    message Feature { string name = 1; Point location = 2;}






    message Account {  string account_id = 1;  google.protobuf.Timestamp update_at = 2;  google.protobuf.Duration time_limit = 3;}






    message ImageData {    string index = 1;    bytes  image = 2;}
    message Data { string appid = 1; bytes payload = 2; string extra = 3;}
    message Request { google.protobuf.Any body = 1;}


    data=msg_pb2.Data(name="no.1",payload=open("1.wav","rb").read(),extra="no use")req=msg_pb2.Request()req.body.Pack(data)




    enum PersonType {  PERSONTYPE_UNSPECIFIED = 0;  INDIVIDUAL = 1;  LEGAL = 2;  AUTHORIZE = 3;}
    message Person { string real_name = 1; PersonType person_type = 2;}






    message App {  string appid = 1;  map<string, string> extra_informations = 2;}


    extra_informations={"name":"app1","expired":"no"}app=App(appid="1234567", extra_informations=extra_informations)




    message Audience {  string name = 1;  string tier = 2;}message Account {  string account_id = 1;  repeated Audience audience = 2;}









    AttributeError: Assignment not allowed to repeated field "name" in protocol message object.


    audience=[{"name":"ASR","tier":"stand"},{"name":"TTS","tier":"free"},{"name":"MT","tier":"stand"}]for audience1 in audience:    a=account.audience.add()    a.name=audience1['name']    a.tier=audience1['tier']


        在实际测试的接口中,有时某个message的结构可能会非常复杂,比如像语音识别服务一些接口,协议里包含很多不同的message和repeated类型,这样对于我们编写测试客户端代码以及构造case、解析case都会有一些影响。之前我们介绍过使用命令行的方式传递参数的方式显然难以满足这种情景下的需求,手动拼message的方式也显得十分不便。经过一番调研发现,对于这种情况,我们可以使用protobuf库中json_format里面的Parse、MessageToJson两个方法来有效解决,这两个方法可以实现protobuf message和json的互转。因为处理json的方式有很多,也很灵活,因此我们在构造case时可以使用json的方式,通过Parse方法直接将json转换成message。在收到返回结果之后,可以使用MessageToJson方法将message转换成json,这样对于我们测试人员来说,发送和接收的数据看起来都是json,无论是准备测试数据还检验结果都会轻松不少。


    from google.protobuf import json_formatjson_obj='{"a1":1,"a2":2}'request = json_format.Parse(json_obj,MessageName())json_result = json_format.MessageToJson(request)print (json_result)




    Protocol Buffer Basics: Go  |  Protocol Buffers  |  Google Developers


    1、python  --- protocol-buffers --- golang 

    Protocol buffers are a language-neutral, platform-neutral extensible mechanism for serializing structured data.

    Protocol Buffers - Google's data interchange format

    Protocol Buffer Basics: Go  |  Protocol Buffers  |  Google Developers

    Protocol Buffer Basics: Python  |  Protocol Buffers  |  Google Developers


    How do you serialize and retrieve structured data like this? There are a few ways to solve this problem:

    • Use gobs to serialize Go data structures. This is a good solution in a Go-specific environment, but it doesn't work well if you need to share data with applications written for other platforms.  
    • You can invent an ad-hoc way to encode the data items into a single string – such as encoding 4 ints as "12:3:-23:67". This is a simple and flexible approach, although it does require writing one-off encoding and parsing code, and the parsing imposes a small run-time cost. This works best for encoding very simple data.
    • Serialize the data to XML. This approach can be very attractive since XML is (sort of) human readable and there are binding libraries for lots of languages. This can be a good choice if you want to share data with other applications/projects. However, XML is notoriously space intensive, and encoding/decoding it can impose a huge performance penalty on applications. Also, navigating an XML DOM tree is considerably more complicated than navigating simple fields in a class normally would be.

    Protocol buffers are the flexible, efficient, automated solution to solve exactly this problem. With protocol buffers, you write a .proto description of the data structure you wish to store. From that, the protocol buffer compiler creates a class that implements automatic encoding and parsing of the protocol buffer data with an efficient binary format. The generated class provides getters and setters for the fields that make up a protocol buffer and takes care of the details of reading and writing the protocol buffer as a unit. Importantly, the protocol buffer format supports the idea of extending the format over time in such a way that the code can still read data encoded with the old format.

    1/3、gobs -- golang pickling -- python   


    Package gob manages streams of gobs - binary values exchanged between an Encoder (transmitter) and a Decoder (receiver). A typical use is transporting arguments and results of remote procedure calls (RPCs) such as those provided by package "net/rpc".

    The implementation compiles a custom codec for each data type in the stream and is most efficient when a single Encoder is used to transmit a stream of values, amortizing the cost of compilation.


    Use Python pickling. This is the default approach since it's built into the language, but it doesn't deal well with schema evolution, and also doesn't work very well if you need to share data with applications written in C++ or Java. 

    11.1. pickle — Python object serialization — Python 2.7.16 documentation

    The pickle module implements a fundamental, but powerful algorithm for serializing and de-serializing a Python object structure. “Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation, whereby a byte stream is converted back into an object hierarchy. Pickling (and unpickling) is alternatively known as “serialization”, “marshalling,” [1] or “flattening”, however, to avoid confusion, the terms used here are “pickling” and “unpickling”.

    This documentation describes both the pickle module and the cPickle module.

    11.1. pickle — Python object serialization — Python 2.7.16 documentation

    The cPickle module supports serialization and de-serialization of Python objects, providing an interface and functionality nearly identical to the pickle module. There are several differences, the most important being performance and subclassability.

    First, cPickle can be up to 1000 times faster than pickle because the former is implemented in C. Second, in the cPickle module the callables Pickler() and Unpickler() are functions, not classes. This means that you cannot use them to derive custom pickling and unpickling subclasses. Most applications have no need for this functionality and should benefit from the greatly improved performance of the cPickle module.

    The pickle data stream produced by pickle and cPickle are identical, so it is possible to use pickle and cPickle interchangeably with existing pickles. [10]

    There are additional minor differences in API between cPickle and pickle, however for most applications, they are interchangeable. More documentation is provided in the pickle module documentation, which includes a list of the documented differences.







    Developer Guide  |  Protocol Buffers  |  Google Developers


    Developer Guide

    Welcome to the developer documentation for protocol buffers – a language-neutral, platform-neutral, extensible way of serializing structured data for use in communications protocols, data storage, and more.

    This documentation is aimed at Java, C++, or Python developers who want to use protocol buffers in their applications. This overview introduces protocol buffers and tells you what you need to do to get started – you can then go on to follow the tutorials or delve deeper into protocol buffer encoding. API reference documentation is also provided for all three languages, as well as language and style guides for writing .proto files.

    What are protocol buffers?

    Protocol buffers are a flexible, efficient, automated mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages. You can even update your data structure without breaking deployed programs that are compiled against the "old" format.

    How do they work?

    You specify how you want the information you're serializing to be structured by defining protocol buffer message types in .proto files. Each protocol buffer message is a small logical record of information, containing a series of name-value pairs. Here's a very basic example of a .proto file that defines a message containing information about a person:

    message Person {
      required string name = 1;
      required int32 id = 2;
      optional string email = 3;
      enum PhoneType {
        MOBILE = 0;
        HOME = 1;
        WORK = 2;
      message PhoneNumber {
        required string number = 1;
        optional PhoneType type = 2 [default = HOME];
      repeated PhoneNumber phone = 4;

    As you can see, the message format is simple – each message type has one or more uniquely numbered fields, and each field has a name and a value type, where value types can be numbers (integer or floating-point), booleans, strings, raw bytes, or even (as in the example above) other protocol buffer message types, allowing you to structure your data hierarchically. You can specify optional fields, required fields, and repeated fields. You can find more information about writing .proto files in the Protocol Buffer Language Guide.

    Once you've defined your messages, you run the protocol buffer compiler for your application's language on your .proto file to generate data access classes. These provide simple accessors for each field (like name() and set_name()) as well as methods to serialize/parse the whole structure to/from raw bytes – so, for instance, if your chosen language is C++, running the compiler on the above example will generate a class called Person. You can then use this class in your application to populate, serialize, and retrieve Person protocol buffer messages. You might then write some code like this:

    Person person;
    person.set_name("John Doe");
    fstream output("myfile", ios::out | ios::binary);

    Then, later on, you could read your message back in:

    fstream input("myfile", ios::in | ios::binary);
    Person person;
    cout << "Name: " << person.name() << endl;
    cout << "E-mail: " << person.email() << endl;

    You can add new fields to your message formats without breaking backwards-compatibility; old binaries simply ignore the new field when parsing. So if you have a communications protocol that uses protocol buffers as its data format, you can extend your protocol without having to worry about breaking existing code.

    You'll find a complete reference for using generated protocol buffer code in the API Reference section, and you can find out more about how protocol buffer messages are encoded in Protocol Buffer Encoding.

    Why not just use XML?

    Protocol buffers have many advantages over XML for serializing structured data. Protocol buffers:

    • are simpler
    • are 3 to 10 times smaller
    • are 20 to 100 times faster
    • are less ambiguous
    • generate data access classes that are easier to use programmatically

    For example, let's say you want to model a person with a name and an email. In XML, you need to do:

        <name>John Doe</name>

    while the corresponding protocol buffer message (in protocol buffer text format) is:

    # Textual representation of a protocol buffer.
    # This is *not* the binary format used on the wire.
    person {
      name: "John Doe"
      email: "jdoe@example.com"

    When this message is encoded to the protocol buffer binary format (the text format above is just a convenient human-readable representation for debugging and editing), it would probably be 28 bytes long and take around 100-200 nanoseconds to parse. The XML version is at least 69 bytes if you remove whitespace, and would take around 5,000-10,000 nanoseconds to parse.

    Also, manipulating a protocol buffer is much easier:

      cout << "Name: " << person.name() << endl;
      cout << "E-mail: " << person.email() << endl;

    Whereas with XML you would have to do something like:

      cout << "Name: "
           << person.getElementsByTagName("name")->item(0)->innerText()
           << endl;
      cout << "E-mail: "
           << person.getElementsByTagName("email")->item(0)->innerText()
           << endl;

    However, protocol buffers are not always a better solution than XML – for instance, protocol buffers would not be a good way to model a text-based document with markup (e.g. HTML), since you cannot easily interleave structure with text. In addition, XML is human-readable and human-editable; protocol buffers, at least in their native format, are not. XML is also – to some extent – self-describing. A protocol buffer is only meaningful if you have the message definition (the .proto file).


    binary format <--- the text format


    Protocol Buffers source code is hosted on GitHub: https://github.com/protocolbuffers/protobuf.

    Our old Google Code repository is: https://code.google.com/p/protobuf/. We moved to GitHub on Aug 26, 2014 and no future changes will be made on the Google Code site. For latest code updates/issues, please visit our GitHub site.

    Compiling Your Protocol Buffers


    protoc -I=$SRC_DIR --python_out=$DST_DIR $SRC_DIR/addressbook.proto



  • 相关阅读:
  • 原文地址:https://www.cnblogs.com/rsapaper/p/10853151.html
Copyright © 2011-2022 走看看