zoukankan      html  css  js  c++  java
  • ArangoDB数据导入

    目录

    1.arangoimp方法

    参数解析

    全局配置部分(Global configuration)

    • --backslash-escape

    use backslash as the escape character for quotes, used for csv (default: false)

    • --batch-size

    size for individual data batches (in bytes) (default: 16777216)

    • --collection

    collection name (default: "")

    • --configuration

    the configuration file or 'none' (default: "")

    • --convert

    convert the strings 'null', 'false', 'true' and strings containing numbers into non-string types (csv and tsv only) (default: true)

    • --create-collection

    create collection if it does not yet exist (default: false)

    • --create-collection-type

    type of collection if collection is created (edge or document). possible values: "document", "edge" (default: "document")

    • --file

    file name ("-" for STDIN) (default: "")

    • --from-collection-prefix

    _from collection name prefix (will be prepended to all values in '_from') (default: "")

    • --ignore-missing

    ignore missing columns in csv input (default: false)

    • --on-duplicate

    action to perform when a unique key constraint violation occurs. Possible values: ignore, replace, update, error. possible values: "error", "ignore", "replace", "update" (default: "error")

    • --overwrite

    overwrite collection if it exist (WARNING: this will remove any data from the collection) (default: false)

    • --progress

    show progress (default: true)

    • --quote

    quote character(s), used for csv (default: """)

    • --remove-attribute <string...>

    remove an attribute before inserting an attribute into a collection (for csv and tsv only) (default: )

    • --separator

    field separator, used for csv and tsv (default: "")

    • --skip-lines

    number of lines to skip for formats (csv and tsv only) (default: 0)

    • --threads

    Number of parallel import threads. Most useful for the rocksdb engine (default: 2)

    • --to-collection-prefix

    _to collection name prefix (will be prepended to all values in '_to') (default: "")

    • --translate <string...>

    translate an attribute name (use as --translate "from=to", for csv and tsv only) (default: )

    • --type

    type of import file. possible values: "auto", "csv", "json", "jsonl", "tsv" (default: "json")

    • --version

    reports the version and exits (default: false)

    Section 'log' (Configure the logging)

    • --log.color

    use colors for TTY logging (default: true)

    • --log.level <string...>

    the global or topic-specific log level (default: "info")

    • --log.output <string...>

    log destination(s) (default: )

    • --log.role

    log server role (default: false)

    • --log.use-local-time

    use local timezone instead of UTC (default: false)

    • --log.use-microtime

    use microtime instead (default: false)

    Section 'server' (Configure a connection to the server)

    • --server.authentication

    require authentication credentials when connecting (does not affect the server-side authentication settings) (default: true)

    • --server.connection-timeout

    connection timeout in seconds (default: 5)

    • --server.database

    database name to use when connecting (default: "_system")

    • --server.endpoint

    endpoint to connect to, use 'none' to start without a server (default: "http+tcp://127.0.0.1:8529")

    • --server.password

    password to use when connecting. If not specified and authentication is required, the user will be prompted for a password (default: "")

    • --server.request-timeout

    request timeout in seconds (default: 1200)

    • --server.username

    username to use when connecting (default: "root")

    Section 'ssl' (Configure SSL communication)

    • --ssl.protocol

    ssl protocol (1 = SSLv2, 2 = SSLv2 or SSLv3 (negotiated), 3 = SSLv3, 4 = TLSv1, 5 = TLSV1.2). possible values: 1, 2, 3, 4, 5 (default: 5)

    Section 'temp' (Configure temporary files)

    • --temp.path

    path for temporary files (default: "")

    应用实例

    • 导入节点集合数据
    arangoimp --server.endpoint tcp://127.0.0.1:8529 --server.username root --server.password ××× --server.database _system --file test.csv --type csv --create-collection true --create-collection-type document --overwrite true --collection "test" 
    
    • 导入边集合数据
    arangoimp --server.endpoint tcp://127.0.0.1:8529 --server.username root --server.password *** --server.database _system --file test.csv --type csv --create-collection true --create-collection-type document --overwrite true --collection "test" 
    

    python方法

    单条导入

    from arango import ArangoClient
    
    # Initialize the ArangoDB client.
    client = ArangoClient()
    
    # Connect to "test" database as root user.
    db = client.db('test', username='root', password='passwd')
    
    # Get the API wrapper for "students" collection.
    students = db.collection('students')
    
    # Create some test documents to play around with.
    lola = {'_key': 'lola', 'GPA': 3.5, 'first': 'Lola', 'last': 'Martin'}
    
    # Insert a new document. This returns the document metadata.
    metadata = students.insert(lola)
    

    批量数据导入

    由于每一次insert就会产生一次数据库连接,当数据规模较大时,一次次插入比较浪费网络资源,这时候就需要使用Transactions了

    from arango import ArangoClient
    
    # Initialize the ArangoDB client.
    client = ArangoClient()
    
    # Connect to "test" database as root user.
    db = client.db('test', username='root', password='passwd')
    
    # Get the API wrapper for "students" collection.
    students = db.collection('students')
    
    # Begin a transaction via context manager. This returns an instance of
    # TransactionDatabase, a database-level API wrapper tailored specifically
    # for executing transactions. The transaction is automatically committed
    # when exiting the context. The TransactionDatabase wrapper cannot be
    # reused after commit and may be discarded after.
    with db.begin_transaction() as txn_db:
    
        # Child wrappers are also tailored for transactions.
        txn_col = txn_db.collection('students')
    
        # API execution context is always set to "transaction".
        assert txn_db.context == 'transaction'
        assert txn_col.context == 'transaction'
    
        # TransactionJob objects are returned instead of results.
        job1 = txn_col.insert({'_key': 'Abby'})
        job2 = txn_col.insert({'_key': 'John'})
        job3 = txn_col.insert({'_key': 'Mary'})
    
    # Upon exiting context, transaction is automatically committed.
    assert 'Abby' in students
    assert 'John' in students
    assert 'Mary' in students
    
    # Retrieve the status of each transaction job.
    for job in txn_db.queued_jobs():
        # Status is set to either "pending" (transaction is not committed yet
        # and result is not available) or "done" (transaction is committed and
        # result is available).
        assert job.status() in {'pending', 'done'}
    
    # Retrieve the job results.
    metadata = job1.result()
    assert metadata['_id'] == 'students/Abby'
    
    metadata = job2.result()
    assert metadata['_id'] == 'students/John'
    
    metadata = job3.result()
    assert metadata['_id'] == 'students/Mary'
    
    # Transactions can be initiated without using a context manager.
    # If return_result parameter is set to False, no jobs are returned.
    txn_db = db.begin_transaction(return_result=False)
    txn_db.collection('students').insert({'_key': 'Jake'})
    txn_db.collection('students').insert({'_key': 'Jill'})
    
    # The commit must be called explicitly.
    txn_db.commit()
    assert 'Jake' in students
    assert 'Jill' in students
    

    参考资料

    AranfoDB Document v3.3

    python-arango document

    欢迎转载,转载请注明网址:https://www.cnblogs.com/minglex/p/9705481.html
  • 相关阅读:
    很简单的字节转换函数
    PHP获取用户操作系统信息
    PHP调用COM获得服务器硬件信息
    杂碎记录
    Math类使用记录
    hbase命令使用记录
    shell脚本学习
    多个job存依赖关系如何使用
    hbase的API并且使用多个rowkey分段直接读取数据
    shell学习记录
  • 原文地址:https://www.cnblogs.com/minglex/p/9705481.html
Copyright © 2011-2022 走看看