zoukankan      html  css  js  c++  java
  • php 截取UTF8文档某个位置开始的n个字符

    ucut.php :

    #!/usr/bin/php
    <?php
    define('INPUT_FILE', 't.txt');
    define('OUTPUT_FILE', 'a.txt');
    $pos = max(intval($argv[1]), 0);
    $len = max(intval($argv[2]), 0);
    $file_size = filesize(INPUT_FILE);
    if($pos >= $file_size) exit;
    $fp = fopen(INPUT_FILE, 'rb');
    $point = 0; //current byte position
    $string = '';
    while(ftell($fp) < $file_size) {
        if($point >= $pos + $len) break;$byte = fread($fp, 1);
        //php version >= 5.4
        $char = unpack('C', $byte)[1];
        if($char <= 0x7f) {
            //single byte
            if($point >= $pos) $string .= $byte;
            $point += 1;
            continue;
        } elseif($char >= 0xc0 && $char <= 0xdf) {
            //double bytes
            if($point >= $pos) {
                $string .= $byte.fread($fp, 1);
            } else {
                fseek($fp, 1, SEEK_CUR);
            }
            $point += 1;
            continue;
        } elseif($char >= 0xe0 && $char <= 0xef) {
            //three bytes
            if($point >= $pos) {
                $string .= $byte.fread($fp, 2);
            } else {
                fseek($fp, 2, SEEK_CUR);
            }
            $point += 1;
            continue;
        } elseif($char >= 0xf0 && $char <= 0xf7) {
            //four bytes
            if($point >= $pos) {
                $string .= $byte.fread($fp, 3);
            } else {
                fseek($fp, 3, SEEK_CUR);
            }
            $point += 1;
            continue;
        }
    }
    fclose($fp);
    file_put_contents(OUTPUT_FILE, $string);
    ?>

    测试文件t.txt内容:

    dei小五5维在fe测试修字d集合啊

    测试命令:

    ./ucut.php 7 2

    结果查看命令:

    hexdump -C t.txt && hexdump -C a.txt

  • 相关阅读:
    骚猪队的模板
    cs231n 作业2 心路历程
    cs231n 作业1 心路历程
    视觉语言导航综述Visual Language Navigation
    论文阅读DSAE,不知道VAE能不能玩的下去
    icpc 2019 word final A题 思路
    VAE 变分自动编码器入门
    luogu4827 梦美的线段树
    EOJ Monthly 2019.2 存代码
    国王游戏,高精度完全模板
  • 原文地址:https://www.cnblogs.com/unsea/p/2795273.html
Copyright © 2011-2022 走看看