最近在教儿子做自然拼读,跟他玩了一个单词游戏,就是利用简单的枚举找出适合小朋友学习的两个字母的单词。人工找寻难免有疏漏之处,这里使用PyEnchant给出一个简单的脚本。
01 - foo.py
1 #!/usr/bin/python3 2 """ 3 A simple script to check a string is an English word 4 5 1. download PyEnchant from https://pypi.org/project/pyenchant/ 6 2. save pyenchant-2.0.0.tar.gz to /tmp 7 3. tar zxf pyenchant-2.0.0.tar.gz 8 4. export PYTHONPATH=/tmp/pyenchant-2.0.0:$PYTHONPATH 9 5. ./foo.py <string> 10 """ 11 12 import sys 13 import enchant 14 15 16 def is_english_word(word): 17 d_en = enchant.Dict("en_US") 18 return d_en.check(word) 19 20 21 def get_alphabet(): 22 l_alph = [] 23 for i in range(26): 24 l_alph.append(chr(ord('a') + i)) 25 return l_alph 26 27 28 def main(argc, argv): 29 if argc != 2: 30 sys.stderr.write("Usage: %s <char> " % argv[0]) 31 return 1 32 33 char_in = argv[1] 34 35 l_word1 = [] 36 l_alph = get_alphabet() 37 for char in l_alph: 38 word = char_in + char 39 if is_english_word(word): 40 l_word1.append(word) 41 print(l_word1) 42 43 l_word2 = [] 44 for char in l_alph: 45 word = char_in + char 46 word = word.upper() 47 if is_english_word(word): 48 if word.lower() in l_word1: 49 continue 50 l_word2.append(word) 51 print(l_word2) 52 return 0 53 54 if __name__ == '__main__': 55 sys.exit(main(len(sys.argv), sys.argv))
很简单,核心代码就是:
def is_english_word(word): d_en = enchant.Dict("en_US") return d_en.check(word)
02 - 测试foo.py
kaiba$ ./foo.py 'a' ['ab', 'ac', 'ad', 'ah', 'am', 'an', 'as', 'at', 'av', 'aw', 'ax'] ['AA', 'AF', 'AG', 'AI', 'AK', 'AL', 'AP', 'AR', 'AU', 'AZ'] kaiba$ ./foo.py 'b' ['be', 'bf', 'bi', 'bk', 'bl', 'bu', 'bx', 'by'] ['BA', 'BB', 'BC', 'BM', 'BO', 'BP', 'BR', 'BS'] kaiba$ ./foo.py 'be' ['bed', 'bee', 'beg', 'bet', 'bey'] ['BEN'] kaiba$ ./foo.py 't' ['ta', 'ti', 'tn', 'to', 'tr', 'ts'] ['TB', 'TC', 'TD', 'TE', 'TH', 'TL', 'TM', 'TU', 'TV', 'TX', 'TY'] kaiba$ ./foo.py 'tea' ['teak', 'teal', 'team', 'tear', 'teas', 'teat'] []
附记 - foo.sh (直接egrep /usr/share/dict/words)
1 #!/bin/bash 2 3 function is_english_word 4 { 5 typeset word=${1?"*** str, e.g. a"} 6 egrep "^$word$" /usr/share/dict/words > /dev/null 2>&1 7 return $? 8 } 9 10 (( $# != 1 )) && echo "Usage: $0 <str prefix>" >&2 && exit 1 11 str_prefix=$1 12 13 lwords="" 14 uwords="" 15 for c in {a..z}; do 16 typeset -l lword=$str_prefix$c 17 typeset -u uword=$lword 18 is_english_word $lword && lwords+="$lword " 19 is_english_word $uword && uwords+="$uword " 20 done 21 22 lwords=$(echo $lwords) 23 uwords=$(echo $uwords) 24 rc=1 25 [[ -n $lwords ]] && echo $lwords && rc=0 26 [[ -n $uwords ]] && echo $uwords && rc=0 27 exit $rc
- 运行foo.sh
$ for c in {a..z}; do ./foo.sh $c; echo; done aa ab ac ad ae af ag ah ai ak al am an ap aq ar as at av aw ax ay az AA AB AC AD AE AF AG AH AI AJ AK AL AM AN AO AP AQ AR AS AT AU AV AW AY AZ ba bb bd be bf bg bi bk bl bm bn bo bp br bs bt bu bv bx by bz BA BB BC BD BE BF BG BH BI BL BM BN BO BP BR BS BT BU BV BW BX ca cb cc cd ce cf cg ch ck cl cm co cp cq cr cs ct cu cv cy CA CB CC CD CE CF CG CH CI CJ CL CM CN CO CP CQ CR CS CT CU CV CW CY CZ da db dc dd de dg di dj dk dl dm dn do dp dr ds dt du dx dy dz DA DB DC DD DE DF DG DH DI DJ DK DM DN DO DP DQ DR DS DT DU DV DW DX DZ ea ec ed ee ef eg eh el em en eo ep eq er es et eu ew ex ey EA EC ED EE EF EG EI EL EM EO EP EQ ER ES ET EV EW fa fb fc fe ff fg fi fl fm fn fo fp fr fs ft fu fv fw fy fz FA FB FC FD FE FF FI FL FM FO FP FR FS FT FV FW FX FY ga gd ge gi gl gm gn go gp gr gs gt gu gv GA GB GC GD GE GG GH GI GM GN GO GP GQ GR GS GT GU GW ha hb hd he hf hg hi hl hm ho hp hq hr hs ht hv hw hy HA HB HC HD HE HF HG HH HI HJ HK HL HM HO HP HQ HR HS HT HU HV HW HZ ia ib ic id ie if ii ik il im in io iq ir is it iv iw ix IA IB IC ID IE IF IG IL IM IN IO IP IQ IR IS IT IU IV IW IX ja jg jo jr js jt JA JC JD JI JJ JO JP JV ka kb kc kg ki kl km kn ko kr kt kv kw ky KB KC KD KE KG KI KN KO KP KR KS KT KV KW KY la lb lc ld le lf lg lh li ll lm ln lo lp lr ls lt lu lv lx ly LA LB LC LD LE LF LG LH LI LJ LL LM LO LP LR LS LT LU LV LW LZ ma mb mc md me mf mg mh mi mk ml mm mn mo mp mr ms mt mu mv mw my MA MB MC MD ME MF MG MH MI MJ ML MM MN MO MP MR MS MT MU MV MW MX MY na nb nd ne ng ni nj nl nm no np nr ns nt nu nv ny NA NB NC ND NE NF NG NH NI NJ NL NM NP NQ NS NT NU NV NW NY NZ ob oc od oe of og oh ok ol om on op or os ot ow ox oy oz OA OB OC OD OE OF OG OH OK OL OM ON OO OP OR OS OT OU OV OW pa pc pd pe pf pg ph pi pk pl pm po pp pq pr ps pt pu PA PB PC PD PE PF PG PH PI PK PL PM PN PO PP PQ PR PS PT PU PV PW PX PY qe qh ql qm qn qp qr qs qt qu qv qy QA QB QC QD QE QF QM QN QP QR QS QV ra rc rd re rf rg rh rm rn ro rs rt RA RB RC RD RE RF RH RI RJ RL RM RN RO RP RQ RR RS RT RU RV RW RX sa sb sc sd se sf sg sh si sk sl sm sn so sp sq sr ss st su sv sw SA SB SC SD SE SF SG SI SJ SL SM SN SO SP SR SS ST SU SV SW SX SY ta tb tc te tg th ti tk tm tn to tp tr ts tu tv tx TA TB TC TD TE TG TH TI TL TM TN TO TP TR TS TT TU TV TW TX uc ug uh ui um un up ur us ut ux UA UB UC UG UH UI UK UL UN UP UR US UT UU UV UW va vb vc vd vg vi vl vo vp vr vs vt vv VA VB VC VD VE VF VG VI VJ VL VM VN VO VP VR VS VT VU VV VW wa wb wc wd we wf wg wh wi wk wl wm wo wr ws wt wy WA WB WC WD WF WG WH WI WL WM WO WP WR WS WU WV WW WY xc xd xi xr xs xu xw xx XA XB XD XL XN XO XP XQ XT ya yd ye yi ym yn yo yr ys yt YA YB YP YT YU YV YY za zn zo zs ZA ZB ZD ZG ZI ZK ZT ZZ