zoukankan      html  css  js  c++  java
  • C#大文件读取和查询--内存映射

    笔者最近需要快速查询日志文件,文件大小在4G以上。

    需求如下:

    1.读取4G左右大小的文件中的指定行,程序运行占用内存不超过500M。

    2.希望查询1G以内容,能控制在20s左右.

    刚开始觉得这个应该不难.研究一天之后,发现这个需要使用内存映射技术。

    查阅了相关资料之后

    https://msdn.microsoft.com/zh-cn/library/dd997372(v=vs.110).aspx?cs-save-lang=1&cs-lang=csharp#code-snippet-1

    发现还是有一定的复杂性.特别是需要对字符处理。

    笔者自己写了一个Demo,希望实现

    很遗憾,测试结果,查询1G左右的内容,花费时间在100s左右.

    程序如下:

    using System;
    using System.IO;
    using System.IO.MemoryMappedFiles;
    using System.Text;
    
    namespace ConsoleDemo
    {
        class Program
        {
            private const string TXT_FILE_PATH = @"E:开源学习超大文本文件读取Filea.txt";
            private const string SPLIT_VARCHAR = "";
            private const char SPLIT_CHAR = '';
            private static long FILE_SIZE = 0;
            static void Main(string[] args)
            {
                //long ttargetRowNum = 39999999;
                long ttargetRowNum = 10000000;
                DateTime beginTime = DateTime.Now;
                string line = CreateMemoryMapFile(ttargetRowNum);
                double totalSeconds = DateTime.Now.Subtract(beginTime).TotalSeconds;
                Console.WriteLine(line);
                Console.WriteLine(string.Format("查找第{0}行,共耗时:{1}s", ttargetRowNum, totalSeconds));
                Console.ReadLine();
            }
    
            /// <summary>
            /// 创建内存映射文件
            /// </summary>
            private static string CreateMemoryMapFile(long ttargetRowNum)
            {
                string line = string.Empty;
                using (FileStream fs = new FileStream(TXT_FILE_PATH, FileMode.Open, FileAccess.ReadWrite))
                {
                    long targetRowNum = ttargetRowNum + 1;//目标行
                    long curRowNum = 1;//当前行
                    FILE_SIZE = fs.Length;
                    using (MemoryMappedFile mmf = MemoryMappedFile.CreateFromFile(fs, "test", fs.Length, MemoryMappedFileAccess.ReadWrite, null, HandleInheritability.None, false))
                    {
                        long offset = 0;
                        //int limit = 250;
                        int limit = 200;
                        try
                        {
                            StringBuilder sbDefineRowLine = new StringBuilder();
                            do
                            {
                                long remaining = fs.Length - offset;
                                using (MemoryMappedViewStream mmStream = mmf.CreateViewStream(offset, remaining > limit ? limit : remaining))
                                //using (MemoryMappedViewStream mmStream = mmf.CreateViewStream(offset, remaining))
                                {
                                    offset += limit;
                                    using (StreamReader sr = new StreamReader(mmStream))
                                    {
                                        //string ss = sr.ReadToEnd().ToString().Replace("
    ", "囧").Replace(Environment.NewLine, "囧");
                                        string ss = sr.ReadToEnd().ToString().Replace("
    ", SPLIT_VARCHAR).Replace(Environment.NewLine, SPLIT_VARCHAR);
                                        if (curRowNum <= targetRowNum)
                                        {
                                            if (curRowNum < targetRowNum)
                                            {
                                                string s = sbDefineRowLine.ToString();
                                                int pos = s.LastIndexOf(SPLIT_CHAR);
                                                if (pos > 0)
                                                    sbDefineRowLine.Remove(0, pos);
    
                                            }
                                            else
                                            {
                                                line = sbDefineRowLine.ToString();
                                                return line;
                                            }
                                            if (ss.Contains(SPLIT_VARCHAR))
                                            {
                                                curRowNum += GetNewLineNumsOfStr(ss);
                                                sbDefineRowLine.Append(ss);
                                            }
                                            else
                                            {
                                                sbDefineRowLine.Append(ss);
                                            }
                                        }
                                        //sbDefineRowLine.Append(ss);
                                        //line = sbDefineRowLine.ToString();
                                        //if (ss.Contains(Environment.NewLine))
                                        //{
                                        //    ++curRowNum;
                                        //    //curRowNum++;
                                        //    //curRowNum += GetNewLineNumsOfStr(ss);
                                        //    //sbDefineRowLine.Append(ss);
                                        //}
                                        //if (curRowNum == targetRowNum)
                                        //{
                                        //    string s = "";
                                        //}
    
                                        sr.Dispose();
                                    }
    
                                    mmStream.Dispose();
                                }
                            } while (offset < fs.Length);
                        }
                        catch (Exception e)
                        {
                            Console.WriteLine(e.Message);
                        }
                        return line;
                    }
                }
            }
    
            private static long GetNewLineNumsOfStr(string s)
            {
                string[] _lst = s.Split(SPLIT_CHAR);
                return _lst.Length - 1;
            }
        }
    }
    View Code

    测试截图:

    欢迎大家提供更好的解决思路.

    参考资料:

    https://msdn.microsoft.com/zh-cn/library/dd997372(v=vs.110).aspx?cs-save-lang=1&cs-lang=csharp#code-snippet-1

    http://blog.csdn.net/onejune2013/article/details/7577152

  • 相关阅读:
    ‘Host’ is not allowed to connect to this mysql server
    centos7安装mysql
    further configuration avilable 不见了
    Dynamic Web Module 3.0 requires Java 1.6 or newer
    hadoop启动 datanode的live node为0
    ssh远程访问失败 Centos7
    Linux 下的各种环境安装
    Centos7 安装 python2.7
    安装scala
    Centos7 安装 jdk 1.8
  • 原文地址:https://www.cnblogs.com/lucky_hu/p/5345423.html
Copyright © 2011-2022 走看看