zoukankan      html  css  js  c++  java
  • Hadoop源码阅读-HDFS-day2

    昨天看到了AbstractFileSystem,也知道应用访问文件是通过FileContext这个类,今天来看这个类的源代码,先看下这个类老长的注释说明

      1 /**
      2  * The FileContext class provides an interface to the application writer for
      3  * using the Hadoop file system.
      4  * It provides a set of methods for the usual operation: create, open, 
      5  * list, etc 
      6  * 
      7  * <p>
      8  * <b> *** Path Names *** </b>
      9  * <p>
     10  * 
     11  * The Hadoop file system supports a URI name space and URI names.
     12  * It offers a forest of file systems that can be referenced using fully
     13  * qualified URIs.
     14  * Two common Hadoop file systems implementations are
     15  * <ul>
     16  * <li> the local file system: file:///path
     17  * <li> the hdfs file system hdfs://nnAddress:nnPort/path
     18  * </ul>
     19  * 
     20  * While URI names are very flexible, it requires knowing the name or address
     21  * of the server. For convenience one often wants to access the default system
     22  * in one's environment without knowing its name/address. This has an
     23  * additional benefit that it allows one to change one's default fs
     24  *  (e.g. admin moves application from cluster1 to cluster2).
     25  * <p>
     26  * 
     27  * To facilitate this, Hadoop supports a notion of a default file system.
     28  * The user can set his default file system, although this is
     29  * typically set up for you in your environment via your default config.
     30  * A default file system implies a default scheme and authority; slash-relative
     31  * names (such as /for/bar) are resolved relative to that default FS.
     32  * Similarly a user can also have working-directory-relative names (i.e. names
     33  * not starting with a slash). While the working directory is generally in the
     34  * same default FS, the wd can be in a different FS.
     35  * <p>
     36  *  Hence Hadoop path names can be one of:
     37  *  <ul>
     38  *  <li> fully qualified URI: scheme://authority/path
     39  *  <li> slash relative names: /path relative to the default file system
     40  *  <li> wd-relative names: path  relative to the working dir
     41  *  </ul>   
     42  *  Relative paths with scheme (scheme:foo/bar) are illegal.
     43  *  
     44  *  <p>
     45  *  <b>****The Role of the FileContext and configuration defaults****</b>
     46  *  <p>
     47  *  The FileContext provides file namespace context for resolving file names;
     48  *  it also contains the umask for permissions, In that sense it is like the
     49  *  per-process file-related state in Unix system.
     50  *  These two properties
     51  *  <ul> 
     52  *  <li> default file system i.e your slash)
     53  *  <li> umask
     54  *  </ul>
     55  *  in general, are obtained from the default configuration file
     56  *  in your environment,  (@see {@link Configuration}).
     57  *  
     58  *  No other configuration parameters are obtained from the default config as 
     59  *  far as the file context layer is concerned. All file system instances
     60  *  (i.e. deployments of file systems) have default properties; we call these
     61  *  server side (SS) defaults. Operation like create allow one to select many 
     62  *  properties: either pass them in as explicit parameters or use
     63  *  the SS properties.
     64  *  <p>
     65  *  The file system related SS defaults are
     66  *  <ul>
     67  *  <li> the home directory (default is "/user/userName")
     68  *  <li> the initial wd (only for local fs)
     69  *  <li> replication factor
     70  *  <li> block size
     71  *  <li> buffer size
     72  *  <li> encryptDataTransfer 
     73  *  <li> checksum option. (checksumType and  bytesPerChecksum)
     74  *  </ul>
     75  *
     76  * <p>
     77  * <b> *** Usage Model for the FileContext class *** </b>
     78  * <p>
     79  * Example 1: use the default config read from the $HADOOP_CONFIG/core.xml.
     80  *   Unspecified values come from core-defaults.xml in the release jar.
     81  *  <ul>  
     82  *  <li> myFContext = FileContext.getFileContext(); // uses the default config
     83  *                                                // which has your default FS 
     84  *  <li>  myFContext.create(path, ...);
     85  *  <li>  myFContext.setWorkingDir(path)
     86  *  <li>  myFContext.open (path, ...);  
     87  *  </ul>  
     88  * Example 2: Get a FileContext with a specific URI as the default FS
     89  *  <ul>  
     90  *  <li> myFContext = FileContext.getFileContext(URI)
     91  *  <li> myFContext.create(path, ...);
     92  *   ...
     93  * </ul> 
     94  * Example 3: FileContext with local file system as the default
     95  *  <ul> 
     96  *  <li> myFContext = FileContext.getLocalFSFileContext()
     97  *  <li> myFContext.create(path, ...);
     98  *  <li> ...
     99  *  </ul> 
    100  * Example 4: Use a specific config, ignoring $HADOOP_CONFIG
    101  *  Generally you should not need use a config unless you are doing
    102  *   <ul> 
    103  *   <li> configX = someConfigSomeOnePassedToYou.
    104  *   <li> myFContext = getFileContext(configX); // configX is not changed,
    105  *                                              // is passed down 
    106  *   <li> myFContext.create(path, ...);
    107  *   <li>...
    108  *  </ul>                                          
    109  *    
    110  */
    111 
    112 @InterfaceAudience.Public
    113 @InterfaceStability.Evolving /*Evolving for a release,to be changed to Stable */
    114 public class FileContext {
    View Code

    FileContext类为应用程序写提供一个接口,提供了常用操作:创建(create),打开(open),列举(list)等

    Hadoop 文件系统的两个通用实现分别是

    1. 本地文件系统 file:///path
    2. hdfs文件系统 hdfs://nnAddress:nnPort/path

     URI命名非常灵活,它需要知道服务端的名字或者地址。HDFS有一个默认值,这有一个额外的好处就是,允许更改默认的fs(比如:管理员将应用从集群1移到集群2)

    Hadoop 支持默认文件系统的理念。用户可以设置他的默认文件系统。

    默认的文件系统实现了一个默认的scheme和authority;slash-relative名称(例如:/for/bar) 将解析成相对于默认FS的路径

    同理,用户可以拥有自己的working-directory-relative名称(不是以slash开头的)。

    因此,Hadoop路径的可以是以下几种:

    完全合法的URI                    scheme://authority/path

    slash relative names          /path 相对于默认的文件系统

    wd-relative  names           path 相对于工作目录

     1 private FileContext(final AbstractFileSystem defFs,
     2     final FsPermission theUmask, final Configuration aConf) {
     3     defaultFS = defFs;
     4     umask = FsPermission.getUMask(aConf);
     5     conf = aConf;
     6     try {
     7       ugi = UserGroupInformation.getCurrentUser();
     8     } catch (IOException e) {
     9       LOG.error("Exception in getCurrentUser: ",e);
    10       throw new RuntimeException("Failed to get the current user " +
    11               "while creating a FileContext", e);
    12     }
    13     /*
    14      * Init the wd.
    15      * WorkingDir is implemented at the FileContext layer 
    16      * NOT at the AbstractFileSystem layer. 
    17      * If the DefaultFS, such as localFilesystem has a notion of
    18      *  builtin WD, we use that as the initial WD.
    19      *  Otherwise the WD is initialized to the home directory.
    20      */
    21     workingDir = defaultFS.getInitialWorkingDirectory();
    22     if (workingDir == null) {
    23       workingDir = defaultFS.getHomeDirectory();
    24     }
    25     resolveSymlinks = conf.getBoolean(
    26         CommonConfigurationKeys.FS_CLIENT_RESOLVE_REMOTE_SYMLINKS_KEY,
    27         CommonConfigurationKeys.FS_CLIENT_RESOLVE_REMOTE_SYMLINKS_DEFAULT);
    28     util = new Util(); // for the inner class
    29   }

    FileContext传进来三个参数,

    1. defFs   FileContext默认的FS
    2. theUmask   貌似没有使用到,历史遗留问题吗?他的umask使用FsPermission.getUMask(conf)初始化了
    3. conf  配置信息

    下面来看它说的几个常用的方法,首先是create,隐藏的是一堆的注释

     1 /**
     2    * Create or overwrite file on indicated path and returns an output stream for
     3    * writing into the file.
     4    * 
     5    * @param f the file name to open
     6    * @param createFlag gives the semantics of create; see {@link CreateFlag}
     7    * @param opts file creation options; see {@link Options.CreateOpts}.
     8    *          <ul>
     9    *          <li>Progress - to report progress on the operation - default null
    10    *          <li>Permission - umask is applied against permisssion: default is
    11    *          FsPermissions:getDefault()
    12    * 
    13    *          <li>CreateParent - create missing parent path; default is to not
    14    *          to create parents
    15    *          <li>The defaults for the following are SS defaults of the file
    16    *          server implementing the target path. Not all parameters make sense
    17    *          for all kinds of file system - eg. localFS ignores Blocksize,
    18    *          replication, checksum
    19    *          <ul>
    20    *          <li>BufferSize - buffersize used in FSDataOutputStream
    21    *          <li>Blocksize - block size for file blocks
    22    *          <li>ReplicationFactor - replication for blocks
    23    *          <li>ChecksumParam - Checksum parameters. server default is used
    24    *          if not specified.
    25    *          </ul>
    26    *          </ul>
    27    * 
    28    * @return {@link FSDataOutputStream} for created file
    29    * 
    30    * @throws AccessControlException If access is denied
    31    * @throws FileAlreadyExistsException If file <code>f</code> already exists
    32    * @throws FileNotFoundException If parent of <code>f</code> does not exist
    33    *           and <code>createParent</code> is false
    34    * @throws ParentNotDirectoryException If parent of <code>f</code> is not a
    35    *           directory.
    36    * @throws UnsupportedFileSystemException If file system for <code>f</code> is
    37    *           not supported
    38    * @throws IOException If an I/O error occurred
    39    * 
    40    * Exceptions applicable to file systems accessed over RPC:
    41    * @throws RpcClientException If an exception occurred in the RPC client
    42    * @throws RpcServerException If an exception occurred in the RPC server
    43    * @throws UnexpectedServerException If server implementation throws
    44    *           undeclared exception to RPC server
    45    * 
    46    * RuntimeExceptions:
    47    * @throws InvalidPathException If path <code>f</code> is not valid
    48    */
    View Code
     1 public FSDataOutputStream create(final Path f,
     2       final EnumSet<CreateFlag> createFlag, Options.CreateOpts... opts)
     3       throws AccessControlException, FileAlreadyExistsException,
     4       FileNotFoundException, ParentNotDirectoryException,
     5       UnsupportedFileSystemException, IOException {
     6     Path absF = fixRelativePart(f);
     7 
     8     // If one of the options is a permission, extract it & apply umask
     9     // If not, add a default Perms and apply umask;
    10     // AbstractFileSystem#create
    11 
    12     CreateOpts.Perms permOpt = CreateOpts.getOpt(CreateOpts.Perms.class, opts);
    13     FsPermission permission = (permOpt != null) ? permOpt.getValue() :
    14                                       FILE_DEFAULT_PERM;
    15     permission = permission.applyUMask(umask);
    16 
    17     final CreateOpts[] updatedOpts = 
    18                       CreateOpts.setOpt(CreateOpts.perms(permission), opts);
    19     return new FSLinkResolver<FSDataOutputStream>() {
    20       @Override
    21       public FSDataOutputStream next(final AbstractFileSystem fs, final Path p) 
    22         throws IOException {
    23         return fs.create(p, createFlag, updatedOpts);
    24       }
    25     }.resolve(this, absF);
    26   }

    create方法是用来在指定的路径上创建或者重写文件并返回outputstream的一个方法 

    在最后return时 new的 FSLinkResolver是用来处理路径为符号链接的情况

     1 /**
     2    * Generic helper function overridden on instantiation to perform a
     3    * specific operation on the given file system using the given path
     4    * which may result in an UnresolvedLinkException.
     5    * @param fs AbstractFileSystem to perform the operation on.
     6    * @param p Path given the file system.
     7    * @return Generic type determined by the specific implementation.
     8    * @throws UnresolvedLinkException If symbolic link <code>path</code> could
     9    *           not be resolved
    10    * @throws IOException an I/O error occurred
    11    */
    12   abstract public T next(final AbstractFileSystem fs, final Path p)
    13       throws IOException, UnresolvedLinkException;
    14 
    15 
    16 
    17 
    18 /**
    19    * Performs the operation specified by the next function, calling it
    20    * repeatedly until all symlinks in the given path are resolved.
    21    * @param fc FileContext used to access file systems.
    22    * @param path The path to resolve symlinks on.
    23    * @return Generic type determined by the implementation of next.
    24    * @throws IOException
    25    */
    26   public T resolve(final FileContext fc, final Path path) throws IOException {
    27     int count = 0;
    28     T in = null;
    29     Path p = path;
    30     // NB: More than one AbstractFileSystem can match a scheme, eg 
    31     // "file" resolves to LocalFs but could have come by RawLocalFs.
    32     AbstractFileSystem fs = fc.getFSofPath(p);
    33 
    34     // Loop until all symlinks are resolved or the limit is reached
    35     for (boolean isLink = true; isLink;) {
    36       try {
    37         in = next(fs, p);
    38         isLink = false;
    39       } catch (UnresolvedLinkException e) {
    40         if (!fc.resolveSymlinks) {
    41           throw new IOException("Path " + path + " contains a symlink"
    42               + " and symlink resolution is disabled ("
    43               + CommonConfigurationKeys.FS_CLIENT_RESOLVE_REMOTE_SYMLINKS_KEY + ").", e);
    44         }
    45         if (!FileSystem.areSymlinksEnabled()) {
    46           throw new IOException("Symlink resolution is disabled in"
    47               + " this version of Hadoop.");
    48         }
    49         if (count++ > FsConstants.MAX_PATH_LINKS) {
    50           throw new IOException("Possible cyclic loop while " +
    51                                 "following symbolic link " + path);
    52         }
    53         // Resolve the first unresolved path component
    54         p = qualifySymlinkTarget(fs.getUri(), p, fs.getLinkTarget(p));
    55         fs = fc.getFSofPath(p);
    56       }
    57     }
    58     return in;
    59   }
    View Code

    next 是一个一般的helper函数,需要被实例重写,从而在给定路径的文件系统上执行特定的操作,可能会抛UnresolvedLinkException异常

    resolve 通过next执行特定的操作,反复的调用next函数,知道路径上所有的符号链接被解析

  • 相关阅读:
    【转】Delphi 关键字详解
    import datetime
    addlayer添加神经网络层
    xadmin使用富文本
    django安装xadmin
    django安装DjangoUeditor富文本
    django中admin一些方法
    Centos7.6安装python3.6.8
    django错误处理
    mysql 快速生成百万条测试数据
  • 原文地址:https://www.cnblogs.com/nashiyue/p/5331225.html
Copyright © 2011-2022 走看看