zoukankan      html  css  js  c++  java
  • Java – Reading a Large File Efficiently--转

    原文地址:http://www.baeldung.com/java-read-lines-large-file

    1. Overview

    This tutorial will show how to read all the lines from a large file in Java in an efficient manner.

    This article is part of the “Java – Back to Basic” tutorial here on Baeldung.

    2. Reading In Memory

    The standard way of reading the lines of the file is in-memory – both Guava and Apache Commons IO provide a quick way to do just that:

    1
    Files.readLines(new File(path), Charsets.UTF_8);
    1
    FileUtils.readLines(new File(path));

    The problem with this approach is that all the file lines are kept in memory – which will quickly lead to OutOfMemoryError if the File is large enough.

    For example – reading a ~1Gb file:

    1
    2
    3
    4
    5
    @Test
    public void givenUsingGuava_whenIteratingAFile_thenWorks() throws IOException {
        String path = ...
        Files.readLines(new File(path), Charsets.UTF_8);
    }

    This starts off with a small amount of memory being consumed: (~0 Mb consumed)

    1
    2
    [main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 128 Mb
    [main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 116 Mb

    However, after the full file has been processed, we have at the end: (~2 Gb consumed)

    1
    2
    [main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 2666 Mb
    [main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 490 Mb

    Which means that about 2.1 Gb of memory are consumed by the process – the reason is simple – the lines of the file are all being stored in memory now.

    It should be obvious by this point that keeping in-memory the contents of the file will quickly exhaust the available memory – regardless of how much that actually is.

    What’s more, we usually don’t need all of the lines in the file in memory at once – instead, we just need to be able to iterate through each one, do some processing and throw it away. So, this is exactly what we’re going to do – iterate through the lines without holding the in memory.

    3. Streaming Through the File

    Let’s now look at a solution – we’re going to use a java.util.Scanner to run through the contents of the file and retrieve lines serially, one by one:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    FileInputStream inputStream = null;
    Scanner sc = null;
    try {
        inputStream = new FileInputStream(path);
        sc = new Scanner(inputStream, "UTF-8");
        while (sc.hasNextLine()) {
            String line = sc.nextLine();
            // System.out.println(line);
        }
        // note that Scanner suppresses exceptions
        if (sc.ioException() != null) {
            throw sc.ioException();
        }
    } finally {
        if (inputStream != null) {
            inputStream.close();
        }
        if (sc != null) {
            sc.close();
        }
    }

    This solution will iterate through all the lines in the file – allowing for processing of each line – without keeping references to them – and in conclusion, without keeping them in memory(~150 Mb consumed)

    1
    2
    [main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 763 Mb
    [main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 605 Mb

    4. Streaming with Apache Commons IO

    The same can be achieved using the Commons IO library as well, by using the customLineIterator provided by the library:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    LineIterator it = FileUtils.lineIterator(theFile, "UTF-8");
    try {
        while (it.hasNext()) {
            String line = it.nextLine();
            // do something with line
        }
    } finally {
        LineIterator.closeQuietly(it);
    }

    Since the entire file is not fully in memory – this will also result in pretty conservative memory consumption numbers(~150 Mb consumed)

    1
    2
    [main] INFO  o.b.java.CoreJavaIoIntegrationTest - Total Memory: 752 Mb
    [main] INFO  o.b.java.CoreJavaIoIntegrationTest - Free Memory: 564 Mb

    5. Conclusion

    This quick article shows how to process lines in a large file without iteratively, without exhausting the available memory – which proves quite useful when working with these large files.

    The implementation of all these examples and code snippets can be found in my github project – this is an Eclipse based project, so it should be easy to import and run as it is.

  • 相关阅读:
    Ubuntu 安装 NTP 服务
    Packer 如何将 JSON 的配置升级为 HCL2
    WinRM 如何设置 TrustedHosts
    Windows 10 如何设置网络属性为私有
    Windows 使用 PowerShell 来管理另外一台 Windows 机器
    Windows PowerShell ISE 是什么和 PowerShell 有什么区别
    Spring事务传播属性和隔离级别
    @SpringBootApplication(exclude={DataSourceAutoConfiguration.class})注解作用
    杂文 | 如何在演讲中讲个好故事
    2.2 思考框架:什么样的代码才是高效的代码
  • 原文地址:https://www.cnblogs.com/davidwang456/p/4766726.html
Copyright © 2011-2022 走看看