zoukankan      html  css  js  c++  java
  • 判断文件是否为二进制

    在工作中,碰到处理STL文件,有时候拿到的文件是二进制,有时候又是ASCII, 所以

    想着写个方法进行判断,然后再选择打开方式。

    话不多说,上代码!

    enum FileTypeEnum 
      { 
        FileTypeUnknown,
        FileTypeBinary,
        FileTypeText
      };
    
    FileTypeEnum
    DetectFileType(const char *filename,
                                unsigned long length,
                                double percent_bin)
    {
      if (!filename || percent_bin < 0)
        {
        return FileTypeUnknown;
        }
    
      FILE *fp = Fopen(filename, "rb");
      if (!fp)
        {
        return FileTypeUnknown;
        }
    
      // Allocate buffer and read bytes
    
      unsigned char *buffer = new unsigned char [length];
      size_t read_length = fread(buffer, 1, length, fp);
      fclose(fp);
      if (read_length == 0)
        {
        return FileTypeUnknown;
        }
    
      // Loop over contents and count
    
      size_t text_count = 0;
    
      const unsigned char *ptr = buffer;
      const unsigned char *buffer_end = buffer + read_length;
    
      while (ptr != buffer_end)
        {
        if ((*ptr >= 0x20 && *ptr <= 0x7F) ||
            *ptr == '
    ' ||
            *ptr == '
    ' ||
            *ptr == '	')
          {
          text_count++;
          }
        ptr++;
        }
    
      delete [] buffer;
    
      double current_percent_bin =
        (static_cast<double>(read_length - text_count) /
         static_cast<double>(read_length));
    
      if (current_percent_bin >= percent_bin)
        {
        return FileTypeBinary;
        }
    
      return FileTypeText;
    }

    调用示例:

    DetectFileType(filename,256,0.05)

    算法原来很简单:

    • Up to ‘length’ bytes are read from the file, if more than ‘percent_bin’ %
    • of the bytes are non-textual elements, the file is considered binary,
    • otherwise textual. Textual elements are bytes in the ASCII [0x20, 0x7E]
    • range, but also , , .

    意思就是,从文件中读取一段字符串,并统计字符串中非文本字符的数量,如果超过

    字符串长度的百分之percent_bin,那么就是二进制文件。

    这里文本字符包括 以及ASCII码值在[0x20, 0x7E]这个范围的

    整个文件不需要全部读取到内存。

  • 相关阅读:
    setContentView和inflate区别
    DOS下永久设置java环境变量
    Android应用资源
    PHP mysql_select_db($database) 提示 no database selected
    ArrayList的add方法值被覆盖(android项目)
    java.io.StreamCorruptedException AC解决办法(ObjectOutputStream)
    搭建Nuget私服
    工具分享:(一)【dev-sidecar】解决Github无法访问,国内dns污染问题
    (二) gRPC初探之代码优先方法进行 API 开发
    (一) gRPC初探之协定优先方法进行 API 开发
  • 原文地址:https://www.cnblogs.com/brother-louie/p/13976570.html
Copyright © 2011-2022 走看看