zoukankan      html  css  js  c++  java
  • Perl 笔试题2 -- 统计单词频次

    Nvidia 2019 perl 笔试题

    统计一个文件内单词的频次并排序

    文本如下:

    "ALL happy families resemble one another; every unhappy family is unhappy in its own way.
    All was confusion in the house of Oblonskys. The wife had dicscovered that her husband was having an intrigue with a French governess who had been
    in their employ, and the declared that the could not live in the same house with him. This condition of things had lasted now three days, and was causing
    deep discomfort, not only to..."
    happy? happy: [happy] {happy} "happy" 'happy'

    代码

    #!/usr/bin/perl
    
    open(IN,"<word_frequance.txt") or die "file does not exist!";
    
    while(<IN>){
        chomp;
        $_ =~ tr/a-zA-Z/ /cs;   #将除了字母之外的所有字符都换成一个空格
        $_ =~ s/^s+//;   ## 丢弃前导空白符
        $_ =~ s/s+$//;   ## 丢弃末尾空白符
        @words = split(/s+/,$_);
        foreach $a (@words){
            $dict{lc($a)}++;     #创建字典时先将key小写
        }
    }
    close(IN);
    foreach $word (sort keys %dict) {
            print "$word,$dict{$word}
    ";  #打印出单词出现的个数
    }
    
    

    结果

    a,1
    all,2
    an,1
    and,2
    another,1
    been,1
    causing,1
    condition,1
    confusion,1
    could,1
    days,1
    declared,1
    deep,1
    dicscovered,1
    discomfort,1
    employ,1
    every,1
    families,1
    family,1
    french,1
    governess,1
    had,3
    happy,7
    having,1
    her,1
    him,1
    house,2
    husband,1
    in,4
    intrigue,1
    is,1
    its,1
    lasted,1
    live,1
    not,2
    now,1
    oblonskys,1
    of,2
    one,1
    only,1
    own,1
    resemble,1
    same,1
    that,2
    the,5
    their,1
    things,1
    this,1
    three,1
    to,1
    unhappy,2
    was,3
    way,1
    who,1
    wife,1
    with,2
    

    注意事项

    split 产生空元素

    在按照/ /或者/s+/来split字符串时,常会遇到莫名其妙多出来一个空元素的问题。

    这是因为如果字符串开头就是空格,split会把开头的前导空白符(一个空字符)也算作一个元素。

    如果要按照空格来split,有几种方法:

    1. split ' '或者直接用默认形式split,不加任何东西
      • split ' '是split的特殊情况,该格式是模拟awk的默认行为,所以在分割行为开始之前,会把字符串中的前导空格全部删除,然后再使用split /s+/处理。
    2. 删除前导空白符,再用split(/s+/,$_);
      • $_ =~ s/^s+//; ## 丢弃前导空白符 $_ =~ s/s+$//; ## 丢弃末尾空白符
  • 相关阅读:
    Windows Phone App的dump 文件分析
    博客园客户端UAP开发随笔 -- App的心动杀手锏:动画
    博客园客户端UAP开发随笔--自定义控件的左膀右臂
    新年快乐
    博客园客户端(Universal App)开发随笔
    博客园 UAP 的部分反馈回复
    博客园客户端(Universal App)开发随笔 -- 样式管理与夜间模式
    Hadoop专业解决方案-第5章 开发可靠的MapReduce应用
    胖子哥的大数据之路(6)- NoSQL生态圈全景介绍
    NoSQL高级培训课程-HBase&&MongoDB(两天版)
  • 原文地址:https://www.cnblogs.com/lyc-seu/p/12375351.html
Copyright © 2011-2022 走看看