我们经常在文本检查的时候要查找其中的数字,但是可能文本是中文数字字符串“两亿三千万”等等,但是要转化为阿拉伯数字才能被程序识别,本文提供了一种将中文或者英文转化为阿拉伯数字的一种方法。
其实中文和英文只是我只知道这两种语言,但是从程序设计的角度来看,该方法是用于多种语种,只要将数据库配置好即可
中文转化为阿拉伯数字: 可以通用于不要分隔符区分单词的语种 ,例如 中文、日文、韩文等(配置相应数据库即可)
英文转化为阿拉伯数字: 可以通用语用空格分隔单词的语种,例如 英文、法文、德文等(配置相应的数据库即可)
1、 设计语种对应表
t_SYS_Num |
|||
fid |
ftext |
fvalue |
ftype |
1 |
零 |
0 |
1 |
2 |
一 |
1 |
1 |
3 |
二 |
2 |
1 |
4 |
三 |
3 |
1 |
5 |
四 |
4 |
1 |
6 |
五 |
5 |
1 |
7 |
六 |
6 |
1 |
8 |
七 |
7 |
1 |
9 |
八 |
8 |
1 |
10 |
九 |
9 |
1 |
11 |
壹 |
1 |
1 |
12 |
贰 |
2 |
1 |
13 |
叁 |
3 |
1 |
14 |
肆 |
4 |
1 |
15 |
伍 |
5 |
1 |
16 |
陆 |
6 |
1 |
17 |
柒 |
7 |
1 |
18 |
捌 |
8 |
1 |
19 |
玖 |
9 |
1 |
22 |
one |
1 |
0 |
23 |
two |
2 |
0 |
24 |
three |
3 |
0 |
25 |
four |
4 |
0 |
26 |
five |
5 |
0 |
27 |
six |
6 |
0 |
28 |
seven |
7 |
0 |
29 |
eight |
8 |
0 |
30 |
nine |
9 |
0 |
31 |
ten |
10 |
0 |
32 |
eleven |
11 |
0 |
33 |
twelve |
12 |
0 |
34 |
thirteen |
13 |
0 |
35 |
fourteen |
14 |
0 |
36 |
fifteen |
15 |
0 |
37 |
sixteen |
16 |
0 |
38 |
seventeen |
17 |
0 |
39 |
eighteen |
18 |
0 |
40 |
nineteen |
19 |
0 |
41 |
twenty |
20 |
0 |
42 |
twenty-one |
21 |
0 |
43 |
twenty-two |
22 |
0 |
44 |
twenty-three |
23 |
0 |
45 |
twenty-four |
24 |
0 |
46 |
twenty-five |
25 |
0 |
47 |
twenty-six |
26 |
0 |
48 |
twenty-seven |
27 |
0 |
49 |
twenty-eight |
28 |
0 |
50 |
twenty-nine |
29 |
0 |
51 |
thirty |
30 |
0 |
52 |
thirty-one |
31 |
0 |
53 |
thirty-two |
32 |
0 |
54 |
thirty-three |
33 |
0 |
55 |
thirty-four |
34 |
0 |
56 |
thirty-five |
35 |
0 |
57 |
thirty-six |
36 |
0 |
58 |
thirty-seven |
37 |
0 |
59 |
thirty-eight |
38 |
0 |
60 |
thirty-nine |
39 |
0 |
61 |
forty |
40 |
0 |
62 |
forty-one |
41 |
0 |
63 |
forty-two |
42 |
0 |
64 |
forty-three |
43 |
0 |
65 |
forty-four |
44 |
0 |
66 |
forty-five |
45 |
0 |
67 |
forty-six |
46 |
0 |
68 |
forty-seven |
47 |
0 |
69 |
forty-eight |
48 |
0 |
70 |
forty-nine |
49 |
0 |
71 |
fifty |
50 |
0 |
72 |
fifty-one |
51 |
0 |
73 |
fifty-two |
52 |
0 |
74 |
fifty-three |
53 |
0 |
75 |
fifty-four |
54 |
0 |
76 |
fifty-five |
55 |
0 |
77 |
fifty-six |
56 |
0 |
78 |
fifty-seven |
57 |
0 |
79 |
fifty-eight |
58 |
0 |
80 |
fifty-nine |
59 |
0 |
81 |
sixty |
60 |
0 |
82 |
sixty-one |
61 |
0 |
83 |
sixty-two |
62 |
0 |
84 |
sixty-three |
63 |
0 |
85 |
sixty-four |
64 |
0 |
86 |
sixty-five |
65 |
0 |
87 |
sixty-six |
66 |
0 |
88 |
sixty-seven |
67 |
0 |
89 |
sixty-eight |
68 |
0 |
90 |
sixty-nine |
69 |
0 |
91 |
seventy |
70 |
0 |
92 |
seventy-one |
71 |
0 |
93 |
seventy-two |
72 |
0 |
94 |
seventy-three |
73 |
0 |
95 |
seventy-four |
74 |
0 |
96 |
seventy-five |
75 |
0 |
97 |
seventy-six |
76 |
0 |
98 |
seventy-seven |
77 |
0 |
99 |
seventy-eight |
78 |
0 |
100 |
seventy-nine |
79 |
0 |
101 |
eighty |
80 |
0 |
102 |
eighty-one |
81 |
0 |
103 |
eighty-two |
82 |
0 |
104 |
eighty-three |
83 |
0 |
105 |
eighty-four |
84 |
0 |
106 |
eighty-five |
85 |
0 |
107 |
eighty-six |
86 |
0 |
108 |
eighty-seven |
87 |
0 |
109 |
eighty-eight |
88 |
0 |
110 |
eighty-nine |
89 |
0 |
111 |
ninety |
90 |
0 |
112 |
ninety-one |
91 |
0 |
113 |
ninety-two |
92 |
0 |
114 |
ninety-three |
93 |
0 |
115 |
ninety-four |
94 |
0 |
116 |
ninety-five |
95 |
0 |
117 |
ninety-six |
96 |
0 |
118 |
ninety-seven |
97 |
0 |
119 |
ninety-eight |
98 |
0 |
120 |
ninety-nine |
99 |
0 |
121 |
十 |
10 |
1 |
122 |
拾 |
10 |
1 |
fId:主键序号
fText:语种字符串
fValue:对应值
fType:语种分类
T_SYS_Unit |
|||||
fid |
funit |
fvalue |
flevel |
fisspace |
ftype |
1 |
十 |
10 |
1 |
No |
1 |
2 |
百 |
100 |
2 |
No |
1 |
3 |
千 |
1000 |
3 |
No |
1 |
4 |
万 |
10000 |
4 |
Yes |
1 |
5 |
亿 |
100000000 |
5 |
Yes |
1 |
6 |
拾 |
10 |
1 |
No |
1 |
7 |
佰 |
100 |
2 |
No |
1 |
8 |
仟 |
1000 |
3 |
No |
1 |
9 |
萬 |
10000 |
4 |
Yes |
1 |
10 |
億 |
100000000 |
5 |
Yes |
1 |
11 |
hundred |
100 |
1 |
No |
0 |
12 |
thousand |
1000 |
2 |
No |
0 |
13 |
million |
1000000 |
3 |
Yes |
0 |
14 |
billion |
1000000000 |
4 |
Yes |
0 |
15 |
and |
1 |
0 |
No |
0 |
16 |
hundreds |
100 |
1 |
No |
0 |
17 |
thousands |
1000 |
2 |
No |
0 |
18 |
millions |
1000000 |
3 |
Yes |
0 |
19 |
billions |
1000000000 |
4 |
Yes |
0 |
fId:主键序号
fUnit:单位
fValue:对应的倍数
fLevel:单位排序级别
fIsspace:是否作为分隔符
fType:语种分类
2、 搜索符合原文中的数字字符串
a) 在文本中查找符合表t_SYS_Num 和t_SYS_Unit 中text的最长的字符串
3、 用递归法分割数字字符串
a) 首先将按照fLevel最大的text进行分割
i. 例如中文: 以“億”、“亿” 进行分割
ii. “三千二百万零二十一亿零五千” ——被分离为 “三千二百万零二十一” 和“五千”
iii. 然后按照下一级的单位进行分离,直到所有的fisspace为True的分离完毕
4、 将万以为的数字字符串转化为数字
5、 数字*以前分离出来的单位
6、 将所有的分离开的数字加起来得到最后的数值
代码实现:






































































































































































































































































































































































































