zoukankan      html  css  js  c++  java
  • How to remove bad path characters in Python?

    Unfortunately, the set of acceptable characters varies by OS and by filesystem.

    • Windows:

      • Use almost any character in the current code page for a name, including Unicode characters and characters in the extended character set (128–255), except for the following:
        • The following reserved characters are not allowed:
          < > : " / | ? *
        • Characters whose integer representations are in the range from zero through 31 are not allowed.
        • Any other character that the target file system does not allow.

      The list of accepted characters can vary depending on the OS and locale of the machine that first formatted the filesystem.

      .NET has GetInvalidFileNameChars and GetInvalidPathChars, but I don't know how to call those from Python.

    • Mac OS: NUL is always excluded, "/" is excluded from POSIX layer, ":" excluded from Apple APIs
      • HFS+: any sequence of non-excluded characters that is representable by UTF-16 in the Unicode 2.0 spec
      • HFS: any sequence of non-excluded characters representable in MacRoman (default) or other encodings, depending on the machine that created the filesystem
      • UFS: same as HFS+
    • Linux:
      • native (UNIX-like) filesystems: any byte sequence excluding NUL and "/"
      • FAT, NTFS, other non-native filesystems: varies

    Your best bet is probably to either be overly-conservative on all platforms, or to just try creating the file name and handle errors.

    import re
    re.sub('[^w-_. ]', '_', filename)

    参考:https://stackoverflow.com/questions/1033424/how-to-remove-bad-path-characters-in-python

    Turn a string into a valid filename?

    import unicodedata
    import re
    
    def slugify(value, allow_unicode=False):
        """
        Taken from https://github.com/django/django/blob/master/django/utils/text.py
        Convert to ASCII if 'allow_unicode' is False. Convert spaces or repeated
        dashes to single dashes. Remove characters that aren't alphanumerics,
        underscores, or hyphens. Convert to lowercase. Also strip leading and
        trailing whitespace, dashes, and underscores.
        """
        value = str(value)
        if allow_unicode:
            value = unicodedata.normalize('NFKC', value)
        else:
            value = unicodedata.normalize('NFKD', value).encode('ascii', 'ignore').decode('ascii')
        value = re.sub(r'[^ws-]', '', value.lower())
        return re.sub(r'[-s]+', '-', value).strip('-_')

    https://stackoverflow.com/questions/295135/turn-a-string-into-a-valid-filename

  • 相关阅读:
    linux 程序后台运行
    小型网站架构技术点(简要)
    rsync安装与配置使用 数据同步方案(centos6.5)
    nfs的原理 安装配置方法 centos6.5
    centos 6.5 升级到 python2.7
    ntpdate 设置时区(注意本地时区要设置正确)
    关于umask的计算方式(简单任性)
    No space left on device(总结)
    lsof 查看打开了一个文件的有哪些进程 统计那个进程打开的文件最多
    作用域是什么?
  • 原文地址:https://www.cnblogs.com/profesor/p/14631647.html
Copyright © 2011-2022 走看看