zoukankan      html  css  js  c++  java
  • PHP读取超大的excel文件数据的方案

    场景和痛点

    说明

    今天因为一个老同学找我,说自己公司的物流业务都是现在用excel处理,按月因为数据量大,一个excel差不多有百万数据,文件有接近100M,打开和搜索就相当的慢

    联想到场景:要导入数据,可能excel数据量很大,这里利用常用的一些方法比如phpexcel会常有时间和内存限制问题

    下面我们就利用一个利用流处理的类库SpreadsheetReader来做大excel的读取

    编写过程

    说明

    关键具体在代码里注释

    代码

    
    <?php
    /**
     * Created by PhpStorm.
     * User: qkl
     * Date: 2018/7/11
     * Time: 15:14
     */
    
    set_time_limit(0);   // 设置脚本最大执行时间 为0 永不过期
    //ini_set('memory_limit','200M');    // 临时设置最大内存占用
    
    function convert($size)
    {
        $unit = array('b', 'kb', 'mb', 'gb', 'tb', 'pb');
        return @round($size / pow(1024, ($i = floor(log($size, 1024)))), 2) . ' ' . $unit[$i];
    }
    
    require '../vendor/autoload.php';
    
    $start = memory_get_usage();
    echo convert($start) . PHP_EOL;
    //$inputFileName = './11111111.xlsx';
    $inputFileName = './example1.xlsx';
    
    // If you need to parse XLS files, include php-excel-reader
    
    $startTime = microtime(true);
    
    $Reader = new SpreadsheetReader($inputFileName);
    
    //获取当前文件所有的工作表
    $sheets = $Reader->Sheets();
    if (!$sheets) {
        die("没有工作表");
    }
    
    //改变当前处理的工作表
    $Reader->ChangeSheet(0);
    
    //打印当前所在工作表的当前所在行数据
    var_dump($Reader->current());
    
    //因为reader类集成了Iter所以可以用迭代方式处理
    //这里提醒 如果文件超大,这边的处理速度会过慢,不过不会引发内存性能问题
    //$i = 0;
    //foreach ($Reader as $Row)
    //{
    //    if ($i>=3) {
    //        break;
    //    }
    //
    //    echo $i . PHP_EOL;
    //    print_r($Row);
    //
    //    $i++;
    //}
    
    $endTime = microtime(true);
    $memoryUse = memory_get_usage();
    
    echo "内存占用:" . convert($memoryUse) . "; 用时:" . ($endTime - $startTime) . PHP_EOL;
    

    结果

    测试说明

    上面读取的example1.xlsx文件有100M左右,读写过慢,测试只开了读取当前默认工作表的当前所在行数据
    因数据敏感,已做屏蔽

    日志记录内存使用率

    
    147.77 kb
    array (size=50)
      0 => string 'xxxxxxxxxxxxxx' (length=25)
      1 => string 'xxxxxxxxxxxxxx' (length=15)
      2 => string 'xxxxxxxxxxxxxx' (length=18)
      3 => string 'xxxxxxxxxxxxxx' (length=12)
      4 => string 'xxxxxxxxxxxxxx' (length=12)
      5 => string 'xxxxxxxxxxxxxx' (length=12)
      6 => string 'xxxxxxxxxxxxxx' (length=24)
      7 => string 'xxxxxxxxxxxxxx' (length=12)
      8 => string 'xxxxxxxxxxxxxx' (length=27)
      9 => string 'xxxxxxxxxxxxxx' (length=12)
      10 => string 'xxxxxxxxxxxxxx' (length=15)
      11 => string 'xxxxxxxxxxxxxx' (length=28)
      12 => string 'xxxxxxxxxxxxxx' (length=9)
      13 => string 'xxxxxxxxxxxxxx' (length=12)
      14 => string 'xxxxxxxxxxxxxx' (length=9)
      15 => string 'xxxxxxxxxxxxxx' (length=6)
      16 => string 'xxxxxxxxxxxxxx' (length=9)
      17 => string 'xxxxxxxxxxxxxx' (length=3)
      18 => string 'xxxxxxxxxxxxxx' (length=6)
      19 => string 'xxxxxxxxxxxxxx' (length=3)
      20 => string 'xxxxxxxxxxxxxx' (length=15)
      21 => string 'xxxxxxxxxxxxxx' (length=15)
      22 => string 'xxxxxxxxxxxxxx' (length=19)
      23 => string 'xxxxxxxxxxxxxx' (length=13)
      24 => string 'xxxxxxxxxxxxxx' (length=19)
      25 => string 'xxxxxxxxxxxxxx' (length=12)
      26 => string 'xxxxxxxxxxxxxx' (length=12)
      27 => string 'xxxxxxxxxxxxxx' (length=12)
      28 => string 'xxxxxxxxxxxxxx' (length=6)
      29 => string 'xxxxxxxxxxxxxx' (length=12)
      30 => string 'xxxxxxxxxxxxxx' (length=6)
      31 => string 'xxxxxxxxxxxxxx' (length=15)
      32 => string 'xxxxxxxxxxxxxx' (length=24)
      33 => string 'xxxxxxxxxxxxxx' (length=18)
      34 => string 'xxxxxxxxxxxxxx' (length=18)
      35 => string 'xxxxxxxxxxxxxx' (length=24)
      36 => string 'xxxxxxxxxxxxxx' (length=12)
      37 => string 'xxxxxxxxxxxxxx' (length=18)
      38 => string 'xxxxxxxxxxxxxx' (length=21)
      39 => string 'xxxxxxxxxxxxxx' (length=9)
      40 => string 'xxxxxxxxxxxxxx' (length=9)
      41 => string 'xxxxxxxxxxxxxx' (length=18)
      42 => string 'xxxxxxxxxxxxxx' (length=21)
      43 => string 'xxxxxxxxxxxxxx' (length=15)
      44 => string 'xxxxxxxxxxxxxx' (length=12)
      45 => string 'xxxxxxxxxxxxxx' (length=6)
      46 => string 'xxxxxxxxxxxxxx' (length=12)
      47 => string 'xxxxxxxxxxxxxx' (length=22)
      48 => string 'xxxxxxxxxxxxxx' (length=22)
      49 => string '' (length=0)
    
    内存占用:207.55 kb; 用时:9.5835480690002
    

    原文地址:https://segmentfault.com/a/1190000015601758

  • 相关阅读:
    wireshake抓包,飞秋发送信息,python
    python问题:IndentationError:expected an indented block错误解决《转》
    560. Subarray Sum Equals K
    311. Sparse Matrix Multiplication
    170. Two Sum III
    686. Repeated String Match
    463. Island Perimeter
    146. LRU Cache
    694. Number of Distinct Islands
    200. Number of Islands
  • 原文地址:https://www.cnblogs.com/lalalagq/p/9980048.html
Copyright © 2011-2022 走看看