zoukankan      html  css  js  c++  java
  • Python系列之入门篇——HDFS

    Python系列之入门篇——HDFS

    简介

    HDFS (Hadoop Distributed File System) Hadoop分布式文件系统,具有高容错性,适合部署在廉价的机器上。Python
    提供了两种接口方式,分别是hdfscli(Restful Api Call),pyhdfs(RPC Call),这一节主要讲hdfscli的使用

    代码示例

    1. 安装
      pip install hdfs
      
    2. 引入相关模块
      from hdfs import *
      
    3. 创建客户端
      """
      It has two different kind of client, Client and InsecureClient.
      Client: cannot define file owner
      InsecureClient: can define file owner, default None
      """
      hdfs_root_path = 'http://localhost:50070'
      fs = Client(hdfs_root_path)
      fs = InsecureClient(hdfs_root_path, user='hdfs')
      
    4. 创建目录
      """
      Change file permission to 777, default None
      """
      fs.makedirs('/test', permission=777)
      
    5. 写文件
      """
      Write append or not depends on the file is exist or not
      strict: If `False`, return `None` rather than raise an exception if
            the path doesn't exist.
      """
      content = fs.content(hdfs_file_path, strict=False)
      if content is None:
          fs.write('/test/test.txt', data=data, permission=777)
      else:
          fs.write('/test/test.txt', data=data, append=True)
      
    6. 上传文件
      """
      overwrite default False, if don't set True, when you upload the file which is exist
      in hdfs, it will raise File is exist Exception.
      """
      client.upload(hdfs_path, local_path, overwrite=True)
      
    7. 总结
      还没有找到判断文件是否存在的方法,目前代码示例中用fs.content()来替换,如果大家有更好的方式,也麻烦分享给我
  • 相关阅读:
    在python中处理XML
    正则表达式
    python 反射
    python模块概况,json/pickle,time/datetime,logging
    python装饰器
    python3内置函数详解
    python函数,lambda表达式,三目运算,列表解析,递归
    python数据类型及其常用方法
    python3的文件操作
    初识Docker
  • 原文地址:https://www.cnblogs.com/dzqk/p/8328510.html
Copyright © 2011-2022 走看看