zoukankan      html  css  js  c++  java
  • Perl-统计文本中各个单词出现的次数(NVDIA2019笔试)

    1、原题

     2、perl脚本

    print "================ Method 1=====================
    ";
    open IN,'<','anna-karenina.txt';
    while(<IN>){
            chomp;  
            $line = $_;
            $line =~ s/[ . , ? ! ; : ' " ( ) { }  [ ]]/ /g; #句号,逗号等统一改为空格
            #print("$line
    ");
            @words = split(/s+/,$line);
            foreach $word (@words){
                    $counts{lc($word)}++;  #将出现的单词存入hash表
            }
    };
    
    
    foreach $word (sort keys %counts) {
            print "$word,$counts{$word}
    ";  #打印出单词出现的个数
    }
    close IN;
    
    
    print "================ Method 2=====================
    ";
    open IN,'<','anna-karenina.txt';
    while (my $line = <IN>)
    {
            #map{$words{$_}++;} $line =~ /(w+)/g   # 与下面的语句等效
    
            #print($line =~ /(w+)/g);
            foreach ($line =~ /(w+)/g){   # 对单词进行匹配
                    #print("$_
    ");
                    $words{lc($_)}++;
            }
    }
    for (sort keys(%words))
    {
        print "$_: $words{$_}
    ";
    }

    3、结果

    1)测试文本

    All happy families resemble one another; every unhappy family is unhappy in its own way.
    All was confusion in the house of Oblonskys. happy? happy: [happy] {happy} "happy" 'happy'

    2)输出

    ================ Method 1=====================
    all,2
    another,1
    confusion,1
    every,1
    families,1
    family,1
    happy,7
    house,1
    in,2
    is,1
    its,1
    oblonskys,1
    of,1
    one,1
    own,1
    resemble,1
    the,1
    unhappy,2
    was,1
    way,1
    ================ Method 2=====================
    all: 2
    another: 1
    confusion: 1
    every: 1
    families: 1
    family: 1
    happy: 7
    house: 1
    in: 2
    is: 1
    its: 1
    oblonskys: 1
    of: 1
    one: 1
    own: 1
    resemble: 1
    the: 1
    unhappy: 2
    was: 1
    way: 1

    4、涉及的知识点

    1)对多个项目进行替换可以使用方括号:

      $line =~ s/[ . , ? ! ; : ' " ( ) { }  [ ]]/ /g; #句号,逗号等统一改为空格

    2)将单词小写lc,用哈希计数

      $counts{lc($word)}++;  #将出现的单词存入hash表

    3)访问哈希整体%,访问哈希键值keys %,排序sort

      sort keys %counts

    4)方法2使用  $line =~ /(w+)/g  直接将文本中的单词转换成列表

  • 相关阅读:
    Max_connect_errors – MySQL性能参数详解
    python qt
    Topo图
    ECSHOP报错误Deprecated: preg_replace(): The /e modifier is depr
    Socat
    Tomcat多次部署
    Android进程守护
    mysql将字符转换成数字
    Oracle sql查询
    ZOJ 题目2859 Matrix Searching(二维RMQ)
  • 原文地址:https://www.cnblogs.com/wt-seu/p/12368915.html
Copyright © 2011-2022 走看看