zoukankan      html  css  js  c++  java
  • Perl-统计文本中各个单词出现的次数(NVDIA2019笔试)

    1、原题

     2、perl脚本

    print "================ Method 1=====================
    ";
    open IN,'<','anna-karenina.txt';
    while(<IN>){
            chomp;  
            $line = $_;
            $line =~ s/[ . , ? ! ; : ' " ( ) { }  [ ]]/ /g; #句号,逗号等统一改为空格
            #print("$line
    ");
            @words = split(/s+/,$line);
            foreach $word (@words){
                    $counts{lc($word)}++;  #将出现的单词存入hash表
            }
    };
    
    
    foreach $word (sort keys %counts) {
            print "$word,$counts{$word}
    ";  #打印出单词出现的个数
    }
    close IN;
    
    
    print "================ Method 2=====================
    ";
    open IN,'<','anna-karenina.txt';
    while (my $line = <IN>)
    {
            #map{$words{$_}++;} $line =~ /(w+)/g   # 与下面的语句等效
    
            #print($line =~ /(w+)/g);
            foreach ($line =~ /(w+)/g){   # 对单词进行匹配
                    #print("$_
    ");
                    $words{lc($_)}++;
            }
    }
    for (sort keys(%words))
    {
        print "$_: $words{$_}
    ";
    }

    3、结果

    1)测试文本

    All happy families resemble one another; every unhappy family is unhappy in its own way.
    All was confusion in the house of Oblonskys. happy? happy: [happy] {happy} "happy" 'happy'

    2)输出

    ================ Method 1=====================
    all,2
    another,1
    confusion,1
    every,1
    families,1
    family,1
    happy,7
    house,1
    in,2
    is,1
    its,1
    oblonskys,1
    of,1
    one,1
    own,1
    resemble,1
    the,1
    unhappy,2
    was,1
    way,1
    ================ Method 2=====================
    all: 2
    another: 1
    confusion: 1
    every: 1
    families: 1
    family: 1
    happy: 7
    house: 1
    in: 2
    is: 1
    its: 1
    oblonskys: 1
    of: 1
    one: 1
    own: 1
    resemble: 1
    the: 1
    unhappy: 2
    was: 1
    way: 1

    4、涉及的知识点

    1)对多个项目进行替换可以使用方括号:

      $line =~ s/[ . , ? ! ; : ' " ( ) { }  [ ]]/ /g; #句号,逗号等统一改为空格

    2)将单词小写lc,用哈希计数

      $counts{lc($word)}++;  #将出现的单词存入hash表

    3)访问哈希整体%,访问哈希键值keys %,排序sort

      sort keys %counts

    4)方法2使用  $line =~ /(w+)/g  直接将文本中的单词转换成列表

  • 相关阅读:
    课程一(Neural Networks and Deep Learning),第一周(Introduction to Deep Learning)—— 1、经常提及的问题
    递归、字节流、文件复制_DAY20
    IO概述、异常、File文件类_DAY19
    某书2018笔试题之薯券
    某书2018笔试题之翻转数字
    某书2018笔试题之字符串中最大子数字串
    批量发货的启示
    为什么易燥易怒以及柔润相处的练习
    编程漫谈(十五):编程与软件开发
    使用函数式编程消除重复无聊的foreach代码(Scala示例)
  • 原文地址:https://www.cnblogs.com/wt-seu/p/12368915.html
Copyright © 2011-2022 走看看