zoukankan      html  css  js  c++  java
  • How to remove duplicate lines in a large text file?

    How would you remove duplicate lines from a file that is  much too large to fit in memory? The duplicate lines are not necessarily adjacent, and say the file is 10 times bigger than RAM.

    A better solution is to use HashSet to store each line of input.txt. As set ignores duplicate values, so while storing a line, check if it already present in hashset. Write it to output.txt only if not present in hashset.

    Java:

    // Efficient Java program to remove 
    // duplicates from input.txt and  
    // save output to output.txt 
      
    import java.io.*; 
    import java.util.HashSet; 
      
    public class FileOperation 
    { 
        public static void main(String[] args) throws IOException  
        { 
            // PrintWriter object for output.txt 
            PrintWriter pw = new PrintWriter("output.txt"); 
              
            // BufferedReader object for input.txt 
            BufferedReader br = new BufferedReader(new FileReader("input.txt")); 
              
            String line = br.readLine(); 
              
            // set store unique values 
            HashSet<String> hs = new HashSet<String>(); 
              
            // loop for each line of input.txt 
            while(line != null) 
            { 
                // write only if not 
                // present in hashset 
                if(hs.add(line)) 
                    pw.println(line); 
                  
                line = br.readLine(); 
                  
            } 
              
            pw.flush(); 
              
            // closing resources 
            br.close(); 
            pw.close(); 
              
            System.out.println("File operation performed successfully"); 
        } 
    } 
    

      

  • 相关阅读:
    Apple MDM
    苹果核
    iOS自动化测试的那些干货
    Wifi 定位原理及 iOS Wifi 列表获取
    详解Shell脚本实现iOS自动化编译打包提交
    PushKit 占坑
    【译】使用 CocoaPods 模块化iOS应用
    NSMutableArray 根据key排序
    iOS 通过tag查找控件
    自己使用 2.常量变量,数据类型,数据的输入输出。
  • 原文地址:https://www.cnblogs.com/lightwindy/p/9650718.html
Copyright © 2011-2022 走看看