zoukankan html css js c++ java

搭建一个web服务下载HDFS的文件

需求描述

为了能方便快速的获取HDFS中的文件，简单的搭建一个web服务提供下载很方便快速，而且在web服务器端不留临时文件，只做stream中转，效率相当高！
使用的框架是SpringMVC+HDFS API

关键代码

@Controller
@RequestMapping("/file")
public class FileDownloadController {
     
     private static final String BASE_DIR = "/user/app/dump/";
    @RequestMapping(value = "/download/{filename}", method = RequestMethod.GET)
    @ResponseBody
    public void fileDownload(@PathVariable("filename") String fileName, HttpServletRequest request, HttpServletResponse response) {
        try {
            response.setContentType("application/octet-stream; charset=utf-8");
            response.addHeader("Content-Disposition", "attachment; filename=" + URLEncoder.encode(fileName + ".csv", "UTF-8"));
            String path = BASE_DIR + fileName;
            HdfsUtils.copyFileAsStream(path, response.getOutputStream());
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

加载要下载的文件都在 /user/app/dump/这个目录下
下载路径 http://ip:port/file/download/xxxfile

HdfsUtils.copyFileAsStream 实现

public class HdfsUtils {
    private static FileSystem hdfs = null;
    static {
        URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
        Configuration conf=new Configuration();
        try {
            hdfs = FileSystem.get(URI.create("hdfs://xxxxxxx"), conf, "app");
        } catch (Exception e) {
            e.printStackTrace();
        } 
    }

    public static void copyFileAsStream(String fpath, OutputStream out) throws IOException, InterruptedException {
        org.apache.hadoop.fs.Path path = new org.apache.hadoop.fs.Path(fpath);
        FSDataInputStream fsInput = hdfs.open(path);
        IOUtils.copyBytes(fsInput, out, 4096, false);
        fsInput.close();
        out.flush();
    }
}

是不是非常简单？ HDFS的文件流没落在web服务上，而是直接copy到了浏览器的OutputStream上

更进一步提升性能，压缩

修改 web端的代码, 用zip进行压缩，默认的压缩比例是1：5，大大减少了流在网络上传输量

@Controller
@RequestMapping("/file")
public class FileDownloadController {
     private static final String BASE_DIR = "/user/app/dump/";
    
    @RequestMapping(value = "/download/zip/{filename}", method = RequestMethod.GET)
    @ResponseBody
    public void hdfsDownload2(@PathVariable("filename") String fileName, HttpServletRequest request, HttpServletResponse response) {
        try {
            response.setContentType("application/octet-stream; charset=utf-8");
            response.setHeader("Content-Disposition", "attachment; filename=" + URLEncoder.encode(fileName + ".zip", "UTF-8"));

            ZipOutputStream zipOut = null;
            try {
                zipOut = new ZipOutputStream(new BufferedOutputStream(response.getOutputStream()));
                zipOut.putNextEntry(new ZipEntry(fileName + ".csv"));
            } catch (Exception e) {
                e.printStackTrace();
            }
            String path = BASE_DIR + fileName;
            HdfsUtils.copyFileAsStream(path, zipOut);
            zipOut.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

一些用的主要jar版本

<properties>
    <spring.version>4.2.5.RELEASE</spring.version>
    <hadoop.version>2.7.0</hadoop.version>
</properties>

<dependencies>
  <dependency>
      <groupId>org.springframework</groupId>
      <artifactId>spring-web</artifactId>
      <version>${spring.version}</version>
  </dependency>
  <dependency>
      <groupId>org.springframework</groupId>
      <artifactId>spring-webmvc</artifactId>
      <version>${spring.version}</version>
  </dependency>
  <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-client</artifactId>
      <version>${hadoop.version}</version>
  </dependency>
  <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-common</artifactId>
      <version>${hadoop.version}</version>
  </dependency>
</dependencies>

查看全文

相关阅读:
基于RMAN从活动数据库异机克隆(rman duplicate from active DB)
包含min函数的栈
 栈的链表实现
 HDU 2196 树形DP Computer
linux之access函数解析
 [置顶] sqlplus 使用笔记
 仿新浪微博登陆邮箱提示效果！
找出数组中出现奇数次的元素<异或的应用>
SOA体系结构基础培训教程-规范标准篇
 一个寻找.jar 和.zip文件中class文件的工具

原文地址：https://www.cnblogs.com/oldtrafford/p/8750270.html