zoukankan html css js c++ java

python 匹配中文和英文

在处理文本时经常会匹配中文名或者英文word，python中可以在utf-8编码下方便的进行处理。

中文unicode编码范围[u4e00-u9fa5]

英文字符编码范围[a-zA-Z]

此时匹配连续的中文或者英文就很方便了，例如：

>>> import re
>>> strings = u'中国china美国American'
>>> print strings
中国china美国American
>>> ch_pat = re.compile(ur'[u4e00-u9fa5]+')
>>> en_pat = re.compile('[a-zA-Z]+')
>>> ch_words = ch_pat.findall(strings)
>>> en_words = en_pat.findall(strings)
>>> print ch_words
[u'u4e2du56fd', u'u7f8eu56fd']
>>> print en_words
[u'china', u'American']

查看全文

相关阅读:
2021牛客暑期多校训练营5
二分图知识点温习
 Codeforces Round #735 (Div. 2)
牛客比赛订正（3,4）
Harbour.Space Scholarship Contest 2021-2022 (Div. 1 + Div. 2) Editorial题解
 关于球的相关知识
 AtCoder Beginner Contest 210题解
 P7077 [CSP-S2020] 函数调用
 偏序问题学习笔记
 P1606 [USACO07FEB]Lilypad Pond G

原文地址：https://www.cnblogs.com/chybot/p/4665389.html