1、原题
2、perl脚本
print "================ Method 1===================== "; open IN,'<','anna-karenina.txt'; while(<IN>){ chomp; $line = $_; $line =~ s/[ . , ? ! ; : ' " ( ) { } [ ]]/ /g; #句号,逗号等统一改为空格 #print("$line "); @words = split(/s+/,$line); foreach $word (@words){ $counts{lc($word)}++; #将出现的单词存入hash表 } }; foreach $word (sort keys %counts) { print "$word,$counts{$word} "; #打印出单词出现的个数 } close IN; print "================ Method 2===================== "; open IN,'<','anna-karenina.txt'; while (my $line = <IN>) { #map{$words{$_}++;} $line =~ /(w+)/g # 与下面的语句等效 #print($line =~ /(w+)/g); foreach ($line =~ /(w+)/g){ # 对单词进行匹配 #print("$_ "); $words{lc($_)}++; } } for (sort keys(%words)) { print "$_: $words{$_} "; }
3、结果
1)测试文本
All happy families resemble one another; every unhappy family is unhappy in its own way. All was confusion in the house of Oblonskys. happy? happy: [happy] {happy} "happy" 'happy'
2)输出
================ Method 1===================== all,2 another,1 confusion,1 every,1 families,1 family,1 happy,7 house,1 in,2 is,1 its,1 oblonskys,1 of,1 one,1 own,1 resemble,1 the,1 unhappy,2 was,1 way,1 ================ Method 2===================== all: 2 another: 1 confusion: 1 every: 1 families: 1 family: 1 happy: 7 house: 1 in: 2 is: 1 its: 1 oblonskys: 1 of: 1 one: 1 own: 1 resemble: 1 the: 1 unhappy: 2 was: 1 way: 1
4、涉及的知识点
1)对多个项目进行替换可以使用方括号:
$line =~ s/[ . , ? ! ; : ' " ( ) { } [ ]]/ /g; #句号,逗号等统一改为空格
2)将单词小写lc,用哈希计数
$counts{lc($word)}++; #将出现的单词存入hash表
3)访问哈希整体%,访问哈希键值keys %,排序sort
sort keys %counts
4)方法2使用 $line =~ /(w+)/g 直接将文本中的单词转换成列表