zoukankan      html  css  js  c++  java
  • data persistence of Python

    data persistence

    https://docs.python.org/3.7/library/persistence.html

    支持python内存中的数据以持久化的形式存储在磁盘中。

    同时支持从磁盘中将数据恢复到内存中。

    The modules described in this chapter support storing Python data in a persistent form on disk.

    The pickle and marshal modules can turn many Python data types into a stream of bytes and then recreate the objects from the bytes.

    The various DBM-related modules support a family of hash-based file formats that store a mapping of strings to other strings.

    序列化工具 -- pickle

    https://docs.python.org/3.7/library/pickle.html

    将内存中的python对象转换成二进制的字节码,

    或者将二进制的字节码恢复为内存中的对象。

    The pickle module implements binary protocols for serializing and de-serializing a Python object structure.

    “Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation, whereby a byte stream (from a binary file or bytes-like object) is converted back into an object hierarchy.

    Pickling (and unpickling) is alternatively known as “serialization”, “marshalling,” 1 or “flattening”; however, to avoid confusion, the terms used here are “pickling” and “unpickling”.

    保存

    import pickle
    
    # An arbitrary collection of objects supported by pickle.
    data = {
        'a': [1, 2.0, 3, 4+6j],
        'b': ("character string", b"byte string"),
        'c': {None, True, False}
    }
    
    with open('data.pickle', 'wb') as f:
        # Pickle the 'data' dictionary using the highest protocol available.
        pickle.dump(data, f, pickle.HIGHEST_PROTOCOL)

    恢复

    import pickle
    
    with open('data.pickle', 'rb') as f:
        # The protocol version used is detected automatically, so we do not
        # have to specify it.
        data = pickle.load(f)

    应用层工具 --- shelve

    https://docs.python.org/3.7/library/shelve.html

    pickle主要是提供序列化和反序列化方法,

    至于二进制字节码需要保存到磁盘中,还需要开发者自己编码解决。

    shelve提供直接的接口,应用只需要关注数据的变更。

    于dbm不同,其可以存储任意类型的python对象, 底层使用pickle进行序列化。

    A “shelf” is a persistent, dictionary-like object.

    The difference with “dbm” databases is that the values (not the keys!) in a shelf can be essentially arbitrary Python objects — anything that the pickle module can handle.

    This includes most class instances, recursive data types, and objects containing lots of shared sub-objects. The keys are ordinary strings.

    import shelve
    
    d = shelve.open(filename)  # open -- file may get suffix added by low-level
                               # library
    
    d[key] = data              # store data at key (overwrites old data if
                               # using an existing key)
    data = d[key]              # retrieve a COPY of data at key (raise KeyError
                               # if no such key)
    del d[key]                 # delete data stored at key (raises KeyError
                               # if no such key)
    
    flag = key in d            # true if the key exists
    klist = list(d.keys())     # a list of all existing keys (slow!)
    
    # as d was opened WITHOUT writeback=True, beware:
    d['xx'] = [0, 1, 2]        # this works as expected, but...
    d['xx'].append(3)          # *this doesn't!* -- d['xx'] is STILL [0, 1, 2]!
    
    # having opened d without writeback=True, you need to code carefully:
    temp = d['xx']             # extracts the copy
    temp.append(5)             # mutates the copy
    d['xx'] = temp             # stores the copy right back, to persist it
    
    # or, d=shelve.open(filename,writeback=True) would let you just code
    # d['xx'].append(5) and have it work as expected, BUT it would also
    # consume more memory and make the d.close() operation slower.
    
    d.close()                  # close it

    底层存储工具 -- dbm

    https://docs.python.org/3.7/library/dbm.html

    dbm is a generic interface to variants of the DBM database — dbm.gnu or dbm.ndbm. If none of these modules is installed, the slow-but-simple implementation in module dbm.dumb will be used. There is a third party interface to the Oracle Berkeley DB.

    https://en.wikipedia.org/wiki/DBM_(computing)

    键值对数据库, 早期的NoSQL数据库, 具有查询速度快的优点。

    因为其使用key的hash值作为索引。

    同时也导致了更新速度慢的缺点。

    In computing, a DBM is a library and file format providing fast, single-keyed access to data. A key-value database from the original Unix, dbm is an early example of a NoSQL system.[1][2][3]

    The original dbm library and file format was a simple database engine, originally written by Ken Thompson and released by AT&T in 1979. The name is a three letter acronym for DataBase Manager, and can also refer to the family of database engines with APIs and features derived from the original dbm.

    https://pymotw.com/3/dbm/index.html

    大概关系如下

    shelve --> pickle --> dbm --> dbm database

    dbm is a front-end for DBM-style databases that use simple string values as keys to access records containing strings. It uses whichdb() to identify databases, then opens them with the appropriate module. It is used as a back-end for shelve, which stores objects in a DBM database using pickle.

    import dbm
    
    # Open database, creating it if necessary.
    with dbm.open('cache', 'c') as db:
    
        # Record some values
        db[b'hello'] = b'there'
        db['www.python.org'] = 'Python Website'
        db['www.cnn.com'] = 'Cable News Network'
    
        # Note that the keys are considered bytes now.
        assert db[b'www.python.org'] == b'Python Website'
        # Notice how the value is now in bytes.
        assert db['www.cnn.com'] == b'Cable News Network'
    
        # Often-used methods of the dict interface work too.
        print(db.get('python.org', b'not present'))
    
        # Storing a non-string key or value will raise an exception (most
        # likely a TypeError).
        db['www.yahoo.com'] = 4
    
    # db is automatically closed when leaving the with statement.

    关系数据库工具 -- sqlite3

    https://docs.python.org/3.7/library/sqlite3.html

    轻量级磁盘数据库,不需要独立的server。

    可以使用sql语言。

    SQLite is a C library that provides a lightweight disk-based database that doesn’t require a separate server process and allows accessing the database using a nonstandard variant of the SQL query language. Some applications can use SQLite for internal data storage. It’s also possible to prototype an application using SQLite and then port the code to a larger database such as PostgreSQL or Oracle.

    The sqlite3 module was written by Gerhard Häring. It provides a SQL interface compliant with the DB-API 2.0 specification described by PEP 249.

    import sqlite3
    
    persons = [
        ("Hugo", "Boss"),
        ("Calvin", "Klein")
        ]
    
    con = sqlite3.connect(":memory:")
    
    # Create the table
    con.execute("create table person(firstname, lastname)")
    
    # Fill the table
    con.executemany("insert into person(firstname, lastname) values (?, ?)", persons)
    
    # Print the table contents
    for row in con.execute("select firstname, lastname from person"):
        print(row)
    
    print("I just deleted", con.execute("delete from person").rowcount, "rows")
    
    # close is not a shortcut method and it's not called automatically,
    # so the connection object should be closed manually
    con.close()

    pyc专用代码编译缓存工具 --- marshal

    其生成的字节码具有机器架构相关性。

    专门用于pyc文件缓存。

    不能用作rpc交换数据, pickle的字节码是可以进行机器交换数据的。

    This module contains functions that can read and write Python values in a binary format. The format is specific to Python, but independent of machine architecture issues (e.g., you can write a Python value to a file on a PC, transport the file to a Sun, and read it back there). Details of the format are undocumented on purpose; it may change between Python versions (although it rarely does). 1

    This is not a general “persistence” module. For general persistence and transfer of Python objects through RPC calls, see the modules pickle and shelve. The marshal module exists mainly to support reading and writing the “pseudo-compiled” code for Python modules of .pyc files. Therefore, the Python maintainers reserve the right to modify the marshal format in backward incompatible ways should the need arise. If you’re serializing and de-serializing Python objects, use the pickle module instead – the performance is comparable, version independence is guaranteed, and pickle supports a substantially wider range of objects than marshal.

  • 相关阅读:
    MVC总结
    Python在Linux | Windows中输出带颜色的文字的方法
    flushdns
    linux配置java环境变量(详细)
    ELK
    sed 时间段
    如何让root用户能直接进行ssh登录?
    rsync有两种常用的认证方式,另外一种则是ssh。
    windows rsync server
    awk
  • 原文地址:https://www.cnblogs.com/lightsong/p/14006164.html
Copyright © 2011-2022 走看看