zoukankan      html  css  js  c++  java
  • 使用内存映射文件MMF实现大数据量导出时的内存优化(Linux篇)

    前言

        今天这篇博客是接我的上一篇博客 https://www.cnblogs.com/y-yp/p/12191258.html,继续介绍一下MMF在Linux上的用法

        ps:本来本地调试完case,想放到服务器上跑跑看,结果竟然报"PlatformNotSupportedException",然后仔细一查,竟然发现MMF在Windows和Linux上的用法不一样。。。"mapName"参数仅作为Window平台的一个特性,在Linux平台上只能传"null",于是就有了今天这篇博客

    实现

         既然“mapName”不能使用,经过测试选定使用了FileStream的这个重载

         具体细节就不在介绍了,有疑问的话可以参考我的上一篇博客 https://www.cnblogs.com/y-yp/p/12191258.html,这里直接给实现

         先定义“行数据信息记录“,这个用来读取数据的时候用,一行数据只生成一条记录,所以在大数据量的情况下也不会占用很多内存

            public class RowInfo
            {
                /// <summary>
                /// 行数据体积(单位字节)
                /// </summary>
                public long Capacity { get; set; }
    
                /// <summary>
                /// 行单元格个数
                /// </summary>
                public int CellQuantity { get; set; }
            }
    

      然后开始将数据写入MMF文件,并获取到”行数据信息记录“

                //准备数据
                var data = new List<string[]>();
                for (var i = 0; i < 100; i++)
                {
                    var rowData = new string[100];
                    for (var j = 0; j < 100; j++)
                    {
                        rowData[j] = $"{i}-{j}";
                    }
                    data.Add(rowData);
                }
    
                //统计mmf文件体积,包含单元格数据的体积Encoding.UTF8.GetBytes(x).Length和默认单元格数据长度int类型占4字节
                var mmfCapacity = data.Sum(x => x.Sum(x => Encoding.UTF8.GetBytes(x).Length + 4));
                var path = Environment.CurrentDirectory + "\" + "test.txt";
                using var writerFs = new FileStream(path, FileMode.Create, FileAccess.ReadWrite);
                using var writerMMF = MemoryMappedFile.CreateFromFile(writerFs, null, mmfCapacity, MemoryMappedFileAccess.ReadWrite, HandleInheritability.Inheritable, true);
    
                //记录行数据信息
                var rowInfos = new List<RowInfo>();
                var totalWriterOffset = 0;
                foreach (var rowData in data)
                {
                    var rowBuffers = rowData.Select(x => Encoding.UTF8.GetBytes(x)).ToList();
                    //计算行数据总体积
                    var capacity = rowBuffers.Sum(x => x.Length + 4);
                    //根据当前偏移和需要读取的长度创建accessor
                    using var accessor = writerMMF.CreateViewAccessor(totalWriterOffset, capacity);
                    //统计同行内单元格偏移
                    var offset = 0L;
                    foreach (var cellBuffer in rowBuffers)
                    {
                        if (cellBuffer.Length > 0)
                        {
                            var dataSize = cellBuffer.Length;
                            accessor.Write(offset, dataSize);
                            accessor.WriteArray(offset + 4, cellBuffer, 0, dataSize);
                            offset += 4 + dataSize;
                        }
                        else
                        {
                            accessor.Write(offset, 0);
                            offset += 4;
                        }
                    }
    
                    //记录行数据信息
                    var rowInfo = new RowInfo()
                    {
                        Capacity = capacity,
                        CellQuantity = rowBuffers.Count()
                    };
                    rowInfos.Add(rowInfo);
                    //总位移向前走一行数据的长度
                    totalWriterOffset += capacity;
                }
    
                return rowInfos;
    

      通过”行数据信息记录“还原数据,这里可以将读取出来的数据写入自己的excel或者是csv文件,不再赘述

                var result = new List<string[]>();
    
                var path = Environment.CurrentDirectory + "\" + "test.txt";
                //从行数据信息记录统计mmf文件总体积
                var mmfCapacity = rowInfos.Sum(x => x.Capacity);
                var totalReaderOffset = 0;
                using var readerFs = new FileStream(path, FileMode.Open, FileAccess.ReadWrite);
                using var readerMMF = MemoryMappedFile.CreateFromFile(readerFs, null, mmfCapacity, MemoryMappedFileAccess.ReadWrite, HandleInheritability.Inheritable, true);
                foreach(var rowInfo in rowInfos)
                {
                    var rowData = new string[rowInfo.CellQuantity];
                    using var accessor = readerMMF.CreateViewAccessor(totalReaderOffset, rowInfo.Capacity);
                    var position = 0;
                    for (int cellIndex = 0; cellIndex < rowInfo.CellQuantity; cellIndex++)
                    {
    
                        var cellSize = accessor.ReadInt32(position);
                        var buffer = new byte[cellSize];
                        accessor.ReadArray(position + 4, buffer, 0, cellSize);
                        rowData[cellIndex] = Encoding.UTF8.GetString(buffer);
                        position += 4 + cellSize;
                    }
                    result.Add(rowData);
                }
    
                if (File.Exists(path))
                {
                    File.Delete(path);
                }
    

      考虑使用内存映射文件的话,可以先本地测试一下性能,如果是SSD的话性能还是很不错的,综合跑下来跟直接写入内存速度相差不会超过一两倍(内存使用率较高的话会严重降低性能,甚至会OOM),而且这其中还有很多的优化空间

           今天的文章只是提了个思路,细节还有很多要考虑,有疑问的话欢迎提问交流~~

  • 相关阅读:
    codeforces 814B An express train to reveries
    codeforces 814A An abandoned sentiment from past
    codeforces 785D D. Anton and School
    codeforces 785C Anton and Fairy Tale
    codeforces 791C Bear and Different Names
    AOP详解
    Spring集成JUnit测试
    Spring整合web开发
    IOC装配Bean(注解方式)
    IOC装配Bean(XML方式)
  • 原文地址:https://www.cnblogs.com/y-yp/p/12372963.html
Copyright © 2011-2022 走看看