昨天看到了AbstractFileSystem,也知道应用访问文件是通过FileContext这个类,今天来看这个类的源代码,先看下这个类老长的注释说明
1 /** 2 * The FileContext class provides an interface to the application writer for 3 * using the Hadoop file system. 4 * It provides a set of methods for the usual operation: create, open, 5 * list, etc 6 * 7 * <p> 8 * <b> *** Path Names *** </b> 9 * <p> 10 * 11 * The Hadoop file system supports a URI name space and URI names. 12 * It offers a forest of file systems that can be referenced using fully 13 * qualified URIs. 14 * Two common Hadoop file systems implementations are 15 * <ul> 16 * <li> the local file system: file:///path 17 * <li> the hdfs file system hdfs://nnAddress:nnPort/path 18 * </ul> 19 * 20 * While URI names are very flexible, it requires knowing the name or address 21 * of the server. For convenience one often wants to access the default system 22 * in one's environment without knowing its name/address. This has an 23 * additional benefit that it allows one to change one's default fs 24 * (e.g. admin moves application from cluster1 to cluster2). 25 * <p> 26 * 27 * To facilitate this, Hadoop supports a notion of a default file system. 28 * The user can set his default file system, although this is 29 * typically set up for you in your environment via your default config. 30 * A default file system implies a default scheme and authority; slash-relative 31 * names (such as /for/bar) are resolved relative to that default FS. 32 * Similarly a user can also have working-directory-relative names (i.e. names 33 * not starting with a slash). While the working directory is generally in the 34 * same default FS, the wd can be in a different FS. 35 * <p> 36 * Hence Hadoop path names can be one of: 37 * <ul> 38 * <li> fully qualified URI: scheme://authority/path 39 * <li> slash relative names: /path relative to the default file system 40 * <li> wd-relative names: path relative to the working dir 41 * </ul> 42 * Relative paths with scheme (scheme:foo/bar) are illegal. 43 * 44 * <p> 45 * <b>****The Role of the FileContext and configuration defaults****</b> 46 * <p> 47 * The FileContext provides file namespace context for resolving file names; 48 * it also contains the umask for permissions, In that sense it is like the 49 * per-process file-related state in Unix system. 50 * These two properties 51 * <ul> 52 * <li> default file system i.e your slash) 53 * <li> umask 54 * </ul> 55 * in general, are obtained from the default configuration file 56 * in your environment, (@see {@link Configuration}). 57 * 58 * No other configuration parameters are obtained from the default config as 59 * far as the file context layer is concerned. All file system instances 60 * (i.e. deployments of file systems) have default properties; we call these 61 * server side (SS) defaults. Operation like create allow one to select many 62 * properties: either pass them in as explicit parameters or use 63 * the SS properties. 64 * <p> 65 * The file system related SS defaults are 66 * <ul> 67 * <li> the home directory (default is "/user/userName") 68 * <li> the initial wd (only for local fs) 69 * <li> replication factor 70 * <li> block size 71 * <li> buffer size 72 * <li> encryptDataTransfer 73 * <li> checksum option. (checksumType and bytesPerChecksum) 74 * </ul> 75 * 76 * <p> 77 * <b> *** Usage Model for the FileContext class *** </b> 78 * <p> 79 * Example 1: use the default config read from the $HADOOP_CONFIG/core.xml. 80 * Unspecified values come from core-defaults.xml in the release jar. 81 * <ul> 82 * <li> myFContext = FileContext.getFileContext(); // uses the default config 83 * // which has your default FS 84 * <li> myFContext.create(path, ...); 85 * <li> myFContext.setWorkingDir(path) 86 * <li> myFContext.open (path, ...); 87 * </ul> 88 * Example 2: Get a FileContext with a specific URI as the default FS 89 * <ul> 90 * <li> myFContext = FileContext.getFileContext(URI) 91 * <li> myFContext.create(path, ...); 92 * ... 93 * </ul> 94 * Example 3: FileContext with local file system as the default 95 * <ul> 96 * <li> myFContext = FileContext.getLocalFSFileContext() 97 * <li> myFContext.create(path, ...); 98 * <li> ... 99 * </ul> 100 * Example 4: Use a specific config, ignoring $HADOOP_CONFIG 101 * Generally you should not need use a config unless you are doing 102 * <ul> 103 * <li> configX = someConfigSomeOnePassedToYou. 104 * <li> myFContext = getFileContext(configX); // configX is not changed, 105 * // is passed down 106 * <li> myFContext.create(path, ...); 107 * <li>... 108 * </ul> 109 * 110 */ 111 112 @InterfaceAudience.Public 113 @InterfaceStability.Evolving /*Evolving for a release,to be changed to Stable */ 114 public class FileContext {
FileContext类为应用程序写提供一个接口,提供了常用操作:创建(create),打开(open),列举(list)等
Hadoop 文件系统的两个通用实现分别是
- 本地文件系统 file:///path
- hdfs文件系统 hdfs://nnAddress:nnPort/path
URI命名非常灵活,它需要知道服务端的名字或者地址。HDFS有一个默认值,这有一个额外的好处就是,允许更改默认的fs(比如:管理员将应用从集群1移到集群2)
Hadoop 支持默认文件系统的理念。用户可以设置他的默认文件系统。
默认的文件系统实现了一个默认的scheme和authority;slash-relative名称(例如:/for/bar) 将解析成相对于默认FS的路径
同理,用户可以拥有自己的working-directory-relative名称(不是以slash开头的)。
因此,Hadoop路径的可以是以下几种:
完全合法的URI scheme://authority/path
slash relative names /path 相对于默认的文件系统
wd-relative names path 相对于工作目录
1 private FileContext(final AbstractFileSystem defFs, 2 final FsPermission theUmask, final Configuration aConf) { 3 defaultFS = defFs; 4 umask = FsPermission.getUMask(aConf); 5 conf = aConf; 6 try { 7 ugi = UserGroupInformation.getCurrentUser(); 8 } catch (IOException e) { 9 LOG.error("Exception in getCurrentUser: ",e); 10 throw new RuntimeException("Failed to get the current user " + 11 "while creating a FileContext", e); 12 } 13 /* 14 * Init the wd. 15 * WorkingDir is implemented at the FileContext layer 16 * NOT at the AbstractFileSystem layer. 17 * If the DefaultFS, such as localFilesystem has a notion of 18 * builtin WD, we use that as the initial WD. 19 * Otherwise the WD is initialized to the home directory. 20 */ 21 workingDir = defaultFS.getInitialWorkingDirectory(); 22 if (workingDir == null) { 23 workingDir = defaultFS.getHomeDirectory(); 24 } 25 resolveSymlinks = conf.getBoolean( 26 CommonConfigurationKeys.FS_CLIENT_RESOLVE_REMOTE_SYMLINKS_KEY, 27 CommonConfigurationKeys.FS_CLIENT_RESOLVE_REMOTE_SYMLINKS_DEFAULT); 28 util = new Util(); // for the inner class 29 }
FileContext传进来三个参数,
- defFs FileContext默认的FS
- theUmask 貌似没有使用到,历史遗留问题吗?他的umask使用FsPermission.getUMask(conf)初始化了
- conf 配置信息
下面来看它说的几个常用的方法,首先是create,隐藏的是一堆的注释
1 /** 2 * Create or overwrite file on indicated path and returns an output stream for 3 * writing into the file. 4 * 5 * @param f the file name to open 6 * @param createFlag gives the semantics of create; see {@link CreateFlag} 7 * @param opts file creation options; see {@link Options.CreateOpts}. 8 * <ul> 9 * <li>Progress - to report progress on the operation - default null 10 * <li>Permission - umask is applied against permisssion: default is 11 * FsPermissions:getDefault() 12 * 13 * <li>CreateParent - create missing parent path; default is to not 14 * to create parents 15 * <li>The defaults for the following are SS defaults of the file 16 * server implementing the target path. Not all parameters make sense 17 * for all kinds of file system - eg. localFS ignores Blocksize, 18 * replication, checksum 19 * <ul> 20 * <li>BufferSize - buffersize used in FSDataOutputStream 21 * <li>Blocksize - block size for file blocks 22 * <li>ReplicationFactor - replication for blocks 23 * <li>ChecksumParam - Checksum parameters. server default is used 24 * if not specified. 25 * </ul> 26 * </ul> 27 * 28 * @return {@link FSDataOutputStream} for created file 29 * 30 * @throws AccessControlException If access is denied 31 * @throws FileAlreadyExistsException If file <code>f</code> already exists 32 * @throws FileNotFoundException If parent of <code>f</code> does not exist 33 * and <code>createParent</code> is false 34 * @throws ParentNotDirectoryException If parent of <code>f</code> is not a 35 * directory. 36 * @throws UnsupportedFileSystemException If file system for <code>f</code> is 37 * not supported 38 * @throws IOException If an I/O error occurred 39 * 40 * Exceptions applicable to file systems accessed over RPC: 41 * @throws RpcClientException If an exception occurred in the RPC client 42 * @throws RpcServerException If an exception occurred in the RPC server 43 * @throws UnexpectedServerException If server implementation throws 44 * undeclared exception to RPC server 45 * 46 * RuntimeExceptions: 47 * @throws InvalidPathException If path <code>f</code> is not valid 48 */
1 public FSDataOutputStream create(final Path f, 2 final EnumSet<CreateFlag> createFlag, Options.CreateOpts... opts) 3 throws AccessControlException, FileAlreadyExistsException, 4 FileNotFoundException, ParentNotDirectoryException, 5 UnsupportedFileSystemException, IOException { 6 Path absF = fixRelativePart(f); 7 8 // If one of the options is a permission, extract it & apply umask 9 // If not, add a default Perms and apply umask; 10 // AbstractFileSystem#create 11 12 CreateOpts.Perms permOpt = CreateOpts.getOpt(CreateOpts.Perms.class, opts); 13 FsPermission permission = (permOpt != null) ? permOpt.getValue() : 14 FILE_DEFAULT_PERM; 15 permission = permission.applyUMask(umask); 16 17 final CreateOpts[] updatedOpts = 18 CreateOpts.setOpt(CreateOpts.perms(permission), opts); 19 return new FSLinkResolver<FSDataOutputStream>() { 20 @Override 21 public FSDataOutputStream next(final AbstractFileSystem fs, final Path p) 22 throws IOException { 23 return fs.create(p, createFlag, updatedOpts); 24 } 25 }.resolve(this, absF); 26 }
create方法是用来在指定的路径上创建或者重写文件并返回outputstream的一个方法
在最后return时 new的 FSLinkResolver是用来处理路径为符号链接的情况
1 /** 2 * Generic helper function overridden on instantiation to perform a 3 * specific operation on the given file system using the given path 4 * which may result in an UnresolvedLinkException. 5 * @param fs AbstractFileSystem to perform the operation on. 6 * @param p Path given the file system. 7 * @return Generic type determined by the specific implementation. 8 * @throws UnresolvedLinkException If symbolic link <code>path</code> could 9 * not be resolved 10 * @throws IOException an I/O error occurred 11 */ 12 abstract public T next(final AbstractFileSystem fs, final Path p) 13 throws IOException, UnresolvedLinkException; 14 15 16 17 18 /** 19 * Performs the operation specified by the next function, calling it 20 * repeatedly until all symlinks in the given path are resolved. 21 * @param fc FileContext used to access file systems. 22 * @param path The path to resolve symlinks on. 23 * @return Generic type determined by the implementation of next. 24 * @throws IOException 25 */ 26 public T resolve(final FileContext fc, final Path path) throws IOException { 27 int count = 0; 28 T in = null; 29 Path p = path; 30 // NB: More than one AbstractFileSystem can match a scheme, eg 31 // "file" resolves to LocalFs but could have come by RawLocalFs. 32 AbstractFileSystem fs = fc.getFSofPath(p); 33 34 // Loop until all symlinks are resolved or the limit is reached 35 for (boolean isLink = true; isLink;) { 36 try { 37 in = next(fs, p); 38 isLink = false; 39 } catch (UnresolvedLinkException e) { 40 if (!fc.resolveSymlinks) { 41 throw new IOException("Path " + path + " contains a symlink" 42 + " and symlink resolution is disabled (" 43 + CommonConfigurationKeys.FS_CLIENT_RESOLVE_REMOTE_SYMLINKS_KEY + ").", e); 44 } 45 if (!FileSystem.areSymlinksEnabled()) { 46 throw new IOException("Symlink resolution is disabled in" 47 + " this version of Hadoop."); 48 } 49 if (count++ > FsConstants.MAX_PATH_LINKS) { 50 throw new IOException("Possible cyclic loop while " + 51 "following symbolic link " + path); 52 } 53 // Resolve the first unresolved path component 54 p = qualifySymlinkTarget(fs.getUri(), p, fs.getLinkTarget(p)); 55 fs = fc.getFSofPath(p); 56 } 57 } 58 return in; 59 }
next 是一个一般的helper函数,需要被实例重写,从而在给定路径的文件系统上执行特定的操作,可能会抛UnresolvedLinkException异常
resolve 通过next执行特定的操作,反复的调用next函数,知道路径上所有的符号链接被解析