zoukankan      html  css  js  c++  java
  • [Python学习笔记-007] 使用PyEnchant检查英文单词

    最近在教儿子做自然拼读,跟他玩了一个单词游戏,就是利用简单的枚举找出适合小朋友学习的两个字母的单词。人工找寻难免有疏漏之处,这里使用PyEnchant给出一个简单的脚本。

    01 - foo.py

     1 #!/usr/bin/python3
     2 """
     3     A simple script to check a string is an English word
     4 
     5     1. download PyEnchant from https://pypi.org/project/pyenchant/
     6     2. save pyenchant-2.0.0.tar.gz to /tmp
     7     3. tar zxf pyenchant-2.0.0.tar.gz
     8     4. export PYTHONPATH=/tmp/pyenchant-2.0.0:$PYTHONPATH
     9     5. ./foo.py <string>
    10 """
    11 
    12 import sys
    13 import enchant
    14 
    15 
    16 def is_english_word(word):
    17     d_en = enchant.Dict("en_US")
    18     return d_en.check(word)
    19 
    20 
    21 def get_alphabet():
    22     l_alph = []
    23     for i in range(26):
    24         l_alph.append(chr(ord('a') + i))
    25     return l_alph
    26 
    27 
    28 def main(argc, argv):
    29     if argc != 2:
    30         sys.stderr.write("Usage: %s <char>
    " % argv[0])
    31         return 1
    32 
    33     char_in = argv[1]
    34 
    35     l_word1 = []
    36     l_alph = get_alphabet()
    37     for char in l_alph:
    38         word = char_in + char
    39         if is_english_word(word):
    40             l_word1.append(word)
    41     print(l_word1)
    42 
    43     l_word2 = []
    44     for char in l_alph:
    45         word = char_in + char
    46         word = word.upper()
    47         if is_english_word(word):
    48             if word.lower() in l_word1:
    49                 continue
    50             l_word2.append(word)
    51     print(l_word2)
    52     return 0
    53 
    54 if __name__ == '__main__':
    55     sys.exit(main(len(sys.argv), sys.argv))

    很简单,核心代码就是:

    def is_english_word(word):
        d_en = enchant.Dict("en_US")
        return d_en.check(word)

    02 - 测试foo.py

    kaiba$ ./foo.py 'a'
    ['ab', 'ac', 'ad', 'ah', 'am', 'an', 'as', 'at', 'av', 'aw', 'ax']
    ['AA', 'AF', 'AG', 'AI', 'AK', 'AL', 'AP', 'AR', 'AU', 'AZ']
    kaiba$ ./foo.py 'b'
    ['be', 'bf', 'bi', 'bk', 'bl', 'bu', 'bx', 'by']
    ['BA', 'BB', 'BC', 'BM', 'BO', 'BP', 'BR', 'BS']
    kaiba$ ./foo.py 'be'
    ['bed', 'bee', 'beg', 'bet', 'bey']
    ['BEN']
    kaiba$ ./foo.py 't'
    ['ta', 'ti', 'tn', 'to', 'tr', 'ts']
    ['TB', 'TC', 'TD', 'TE', 'TH', 'TL', 'TM', 'TU', 'TV', 'TX', 'TY']
    kaiba$ ./foo.py 'tea'
    ['teak', 'teal', 'team', 'tear', 'teas', 'teat']
    []

    附记 - foo.sh (直接egrep /usr/share/dict/words)

     1 #!/bin/bash
     2 
     3 function is_english_word
     4 {
     5     typeset word=${1?"*** str, e.g. a"}
     6     egrep "^$word$" /usr/share/dict/words > /dev/null 2>&1
     7     return $?
     8 }
     9 
    10 (( $# != 1 )) && echo "Usage: $0 <str prefix>" >&2 && exit 1
    11 str_prefix=$1
    12 
    13 lwords=""
    14 uwords=""
    15 for c in {a..z}; do
    16     typeset -l lword=$str_prefix$c
    17     typeset -u uword=$lword
    18     is_english_word $lword && lwords+="$lword "
    19     is_english_word $uword && uwords+="$uword "
    20 done
    21 
    22 lwords=$(echo $lwords)
    23 uwords=$(echo $uwords)
    24 rc=1
    25 [[ -n $lwords ]] && echo $lwords && rc=0
    26 [[ -n $uwords ]] && echo $uwords && rc=0 
    27 exit $rc
    • 运行foo.sh
    $ for c in {a..z}; do ./foo.sh $c; echo; done
    aa ab ac ad ae af ag ah ai ak al am an ap aq ar as at av aw ax ay az
    AA AB AC AD AE AF AG AH AI AJ AK AL AM AN AO AP AQ AR AS AT AU AV AW AY AZ
    
    ba bb bd be bf bg bi bk bl bm bn bo bp br bs bt bu bv bx by bz
    BA BB BC BD BE BF BG BH BI BL BM BN BO BP BR BS BT BU BV BW BX
    
    ca cb cc cd ce cf cg ch ck cl cm co cp cq cr cs ct cu cv cy
    CA CB CC CD CE CF CG CH CI CJ CL CM CN CO CP CQ CR CS CT CU CV CW CY CZ
    
    da db dc dd de dg di dj dk dl dm dn do dp dr ds dt du dx dy dz
    DA DB DC DD DE DF DG DH DI DJ DK DM DN DO DP DQ DR DS DT DU DV DW DX DZ
    
    ea ec ed ee ef eg eh el em en eo ep eq er es et eu ew ex ey
    EA EC ED EE EF EG EI EL EM EO EP EQ ER ES ET EV EW
    
    fa fb fc fe ff fg fi fl fm fn fo fp fr fs ft fu fv fw fy fz
    FA FB FC FD FE FF FI FL FM FO FP FR FS FT FV FW FX FY
    
    ga gd ge gi gl gm gn go gp gr gs gt gu gv
    GA GB GC GD GE GG GH GI GM GN GO GP GQ GR GS GT GU GW
    
    ha hb hd he hf hg hi hl hm ho hp hq hr hs ht hv hw hy
    HA HB HC HD HE HF HG HH HI HJ HK HL HM HO HP HQ HR HS HT HU HV HW HZ
    
    ia ib ic id ie if ii ik il im in io iq ir is it iv iw ix
    IA IB IC ID IE IF IG IL IM IN IO IP IQ IR IS IT IU IV IW IX
    
    ja jg jo jr js jt
    JA JC JD JI JJ JO JP JV
    
    ka kb kc kg ki kl km kn ko kr kt kv kw ky
    KB KC KD KE KG KI KN KO KP KR KS KT KV KW KY
    
    la lb lc ld le lf lg lh li ll lm ln lo lp lr ls lt lu lv lx ly
    LA LB LC LD LE LF LG LH LI LJ LL LM LO LP LR LS LT LU LV LW LZ
    
    ma mb mc md me mf mg mh mi mk ml mm mn mo mp mr ms mt mu mv mw my
    MA MB MC MD ME MF MG MH MI MJ ML MM MN MO MP MR MS MT MU MV MW MX MY
    
    na nb nd ne ng ni nj nl nm no np nr ns nt nu nv ny
    NA NB NC ND NE NF NG NH NI NJ NL NM NP NQ NS NT NU NV NW NY NZ
    
    ob oc od oe of og oh ok ol om on op or os ot ow ox oy oz
    OA OB OC OD OE OF OG OH OK OL OM ON OO OP OR OS OT OU OV OW
    
    pa pc pd pe pf pg ph pi pk pl pm po pp pq pr ps pt pu
    PA PB PC PD PE PF PG PH PI PK PL PM PN PO PP PQ PR PS PT PU PV PW PX PY
    
    qe qh ql qm qn qp qr qs qt qu qv qy
    QA QB QC QD QE QF QM QN QP QR QS QV
    
    ra rc rd re rf rg rh rm rn ro rs rt
    RA RB RC RD RE RF RH RI RJ RL RM RN RO RP RQ RR RS RT RU RV RW RX
    
    sa sb sc sd se sf sg sh si sk sl sm sn so sp sq sr ss st su sv sw
    SA SB SC SD SE SF SG SI SJ SL SM SN SO SP SR SS ST SU SV SW SX SY
    
    ta tb tc te tg th ti tk tm tn to tp tr ts tu tv tx
    TA TB TC TD TE TG TH TI TL TM TN TO TP TR TS TT TU TV TW TX
    
    uc ug uh ui um un up ur us ut ux
    UA UB UC UG UH UI UK UL UN UP UR US UT UU UV UW
    
    va vb vc vd vg vi vl vo vp vr vs vt vv
    VA VB VC VD VE VF VG VI VJ VL VM VN VO VP VR VS VT VU VV VW
    
    wa wb wc wd we wf wg wh wi wk wl wm wo wr ws wt wy
    WA WB WC WD WF WG WH WI WL WM WO WP WR WS WU WV WW WY
    
    xc xd xi xr xs xu xw xx
    XA XB XD XL XN XO XP XQ XT
    
    ya yd ye yi ym yn yo yr ys yt
    YA YB YP YT YU YV YY
    
    za zn zo zs
    ZA ZB ZD ZG ZI ZK ZT ZZ
  • 相关阅读:
    TypeError: Iterator operand 0 dtype could not be cast from dtype('<M8[us]') to dtype('<M8[D]') according to the rule 'safe'
    Linux中matplotlib 中文显示问题解决
    自己动手实现爬虫scrapy框架思路汇总
    机器学习算法之多项式回归
    scrapy爬虫--苏宁图书
    Mongodb数据库基本操作
    day04 Python
    day03 Python爬虫
    day02 Python完结
    day01 python基础
  • 原文地址:https://www.cnblogs.com/idorax/p/12003057.html
Copyright © 2011-2022 走看看