zoukankan      html  css  js  c++  java
  • 搭建一个web服务下载HDFS的文件

    需求描述

    为了能方便快速的获取HDFS中的文件,简单的搭建一个web服务提供下载很方便快速,而且在web服务器端不留临时文件,只做stream中转,效率相当高!
    使用的框架是SpringMVC+HDFS API

    关键代码

    @Controller
    @RequestMapping("/file")
    public class FileDownloadController {
         
         private static final String BASE_DIR = "/user/app/dump/";
        @RequestMapping(value = "/download/{filename}", method = RequestMethod.GET)
        @ResponseBody
        public void fileDownload(@PathVariable("filename") String fileName, HttpServletRequest request, HttpServletResponse response) {
            try {
                response.setContentType("application/octet-stream; charset=utf-8");
                response.addHeader("Content-Disposition", "attachment; filename=" + URLEncoder.encode(fileName + ".csv", "UTF-8"));
                String path = BASE_DIR + fileName;
                HdfsUtils.copyFileAsStream(path, response.getOutputStream());
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    }
    
    • 加载要下载的文件都在 /user/app/dump/这个目录下
    • 下载路径 http://ip:port/file/download/xxxfile

    HdfsUtils.copyFileAsStream 实现

    public class HdfsUtils {
        private static FileSystem hdfs = null;
        static {
            URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
            Configuration conf=new Configuration();
            try {
                hdfs = FileSystem.get(URI.create("hdfs://xxxxxxx"), conf, "app");
            } catch (Exception e) {
                e.printStackTrace();
            } 
        }
    
        public static void copyFileAsStream(String fpath, OutputStream out) throws IOException, InterruptedException {
            org.apache.hadoop.fs.Path path = new org.apache.hadoop.fs.Path(fpath);
            FSDataInputStream fsInput = hdfs.open(path);
            IOUtils.copyBytes(fsInput, out, 4096, false);
            fsInput.close();
            out.flush();
        }
    }
    

    是不是非常简单? HDFS的文件流没落在web服务上,而是直接copy到了浏览器的OutputStream上

    更进一步提升性能,压缩

    修改 web端的代码, 用zip进行压缩,默认的压缩比例是1:5,大大减少了流在网络上传输量

    @Controller
    @RequestMapping("/file")
    public class FileDownloadController {
         private static final String BASE_DIR = "/user/app/dump/";
        
        @RequestMapping(value = "/download/zip/{filename}", method = RequestMethod.GET)
        @ResponseBody
        public void hdfsDownload2(@PathVariable("filename") String fileName, HttpServletRequest request, HttpServletResponse response) {
            try {
                response.setContentType("application/octet-stream; charset=utf-8");
                response.setHeader("Content-Disposition", "attachment; filename=" + URLEncoder.encode(fileName + ".zip", "UTF-8"));
    
                ZipOutputStream zipOut = null;
                try {
                    zipOut = new ZipOutputStream(new BufferedOutputStream(response.getOutputStream()));
                    zipOut.putNextEntry(new ZipEntry(fileName + ".csv"));
                } catch (Exception e) {
                    e.printStackTrace();
                }
                String path = BASE_DIR + fileName;
                HdfsUtils.copyFileAsStream(path, zipOut);
                zipOut.close();
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    }
    

    一些用的主要jar版本

    <properties>
        <spring.version>4.2.5.RELEASE</spring.version>
        <hadoop.version>2.7.0</hadoop.version>
    </properties>
    
    <dependencies>
      <dependency>
          <groupId>org.springframework</groupId>
          <artifactId>spring-web</artifactId>
          <version>${spring.version}</version>
      </dependency>
      <dependency>
          <groupId>org.springframework</groupId>
          <artifactId>spring-webmvc</artifactId>
          <version>${spring.version}</version>
      </dependency>
      <dependency>
          <groupId>org.apache.hadoop</groupId>
          <artifactId>hadoop-client</artifactId>
          <version>${hadoop.version}</version>
      </dependency>
      <dependency>
          <groupId>org.apache.hadoop</groupId>
          <artifactId>hadoop-common</artifactId>
          <version>${hadoop.version}</version>
      </dependency>
    </dependencies>
    
  • 相关阅读:
    HDFS DataNode 多目录
    HDFS DataNode 退役 旧节点
    HDFS DateNoda 服役 新节点
    HDFS DataNode 时限参数设置
    HDFS NameNode 多目录
    HDFS 安全模式
    HDFS NameNode故障排除
    HDFS CheckPoint时间设置
    HDFS NameNode和SecondaryNameNode
    微信小程序:上拉加载更多
  • 原文地址:https://www.cnblogs.com/oldtrafford/p/8750270.html
Copyright © 2011-2022 走看看