HDFS中PathFilter类

zoukankan html css js c++ java

HDFS中PathFilter类
　　在单个操作中处理一批文件，这是很常见的需求。比如说处理日志的MapReduce作业可能需要分析一个月内包含在大量目录中的日志文件。在一个表达式中使用通配符在匹配多个文件时比较方便的，无需列举每个文件和目录来指定输入。hadoop为执行通配提供了两个FIleSystem方法：
1 public FileStatus[] globStatus(Path pathPattern) throw IOException 2 public FileStatus[] globStatus(Path pathPattern, PathFilter filter) throw IOException
　　globStatus()方法返回与路径想匹配的所有文件的FileStatus对象数组，并按路径排序。hadoop所支持的通配符与Unix bash相同。

　　第二个方法传了一个PathFilter对象作为参数，PathFilter可以进一步对匹配进行限制。PathFilter是一个接口，里面只有一个方法accept(Path path)。

下面看一个例子演示PathFilter的作用：

　　RegexExcludePathFilter.java:该类实现了PathFilter接口，重写了accept方法
1 class RegexExcludePathFilter implements PathFilter{ 2 private final String regex; 3 public RegexExcludePathFilter(String regex) { 4 this.regex = regex; 5 } 6 @Override 7 public boolean accept(Path path) { 8 return !path.toString().matches(regex); 9 } 10 11 }
　　该方法就是打印符合通配的路径：
1 //通配符的使用 2 public static void list() throws IOException{ 3 Configuration conf = new Configuration(); 4 FileSystem fs = FileSystem.get(conf); 5 //PathFilter是过滤布符合置顶表达式的路径，下列就是把以txt结尾的过滤掉 6 FileStatus[] status = fs.globStatus(new Path("hdfs://master:9000/user/hadoop/test/*"),new RegexExcludePathFilter(".*txt")); 7 //FileStatus[] status = fs.globStatus(new Path("hdfs://master:9000/user/hadoop/test/*")); 8 Path[] listedPaths = FileUtil.stat2Paths(status); 9 for (Path p : listedPaths) { 10 System.out.println(p); 11 } 12 }
如果注释第6行，取消第7行的注释，则输出结果如下：
hdfs://master:9000/user/hadoop/test/a.txt
hdfs://master:9000/user/hadoop/test/b.txt
hdfs://master:9000/user/hadoop/test/c.aaa
hdfs://master:9000/user/hadoop/test/c.txt
hdfs://master:9000/user/hadoop/test/cc.aaa

如果注释第7行，取消第6行的注释，则输出结果如下：

hdfs://master:9000/user/hadoop/test/c.aaa
hdfs://master:9000/user/hadoop/test/cc.aaa

由此可见，PathFilter就是在匹配前面条件之后再加以限制，将匹配PathFilter的路径去除掉。其实由accept方法里面的return !path.toString().matches(regex);可以看出来，就是将匹配的全部去除掉，如果改为return path.toString().matches(regex);就是将匹配regex的Path输出，将不匹配的去除。
查看全文

相关阅读:
CVE-2018-18778 mini_httpd任意文件读取漏洞
 libssh 服务端权限认证绕过漏洞（CVE-2018-10933）
Apache SSI 远程命令执行漏洞
 Weblogic < 10.3.6 'wls-wsat' XMLDecoder 反序列化漏洞（CVE-2017-10271）
PHP-FPM 远程代码执行漏洞（CVE-2019-11043）
msfvenom各平台payload生成
 msfvenom绕过杀软之stage编码
 msfvenom payload的可持续化
 metasploit几个重要的监听参数
 msfvenom参数简介

原文地址：https://www.cnblogs.com/liuling/p/2013-6-18-02.html