string.casefold和
string.lower
区别
python 3.3 引入了string.casefold
方法,其效果和 string.lower
非常类似,都可以把字符串变成小写,那么它们之间有什么区别?他们各自的应用场景?
对 Unicode 的时候用 casefold
string.casefold
官方说明:
Casefolding is similar to lowercasing but more aggressive because it is intended to remove all case distinctions in a string. For example, the German lowercase letter 'ß'
is equivalent to "ss"
. Since it is already lowercase, lower()
would do nothing to 'ß'
; casefold()
converts it to "ss"
.
The casefolding algorithm is described in section 3.13 of the Unicode Standard
lower()
只对 ASCII 也就是 'A-Z'
有效,但是其它一些语言里面存在小写的情况就没办法了。文档里面举得例子是德语中'ß'
的小写是'ss'
:
s = 'ß'
s.lower() # 'ß'
s.casefold() # 'ss'
string.lower
官方说明:
Return a copy of the string with all the cased characters [4] converted to lowercase.
The lowercasing algorithm used is described in section 3.13 of the Unicode Standard
参考
https://docs.python.org/3/library/stdtypes.html#str.casefold
https://segmentfault.com/q/1010000004586740/a-1020000004586838
总结
汉语 & 英语环境下面,继续用 lower()
没问题;要处理其它语言且存在大小写情况的时候再用casefold()