zoukankan      html  css  js  c++  java
  • 基于STL的字典生成模块-模拟搜索引擎算法的尝试

    该课题来源于UVA中Searching the Web的题目:https://vjudge.net/problem/UVA-1597

    按照题目的说法,我对按照特定格式输入的文章中的词语合成字典,以满足后期的快速查找。

    针对于字典的合成途径,我利用了STL中的map与set的嵌套形成了一种特定的数据结构来解析文章中的单词

     1 #include<map>
     2 #include<iostream>
     3 #include<set>
     4 #include<algorithm>
     5 #include<string>
     6 #include<cctype>
     7 #include<sstream>
     8 using namespace std;
     9 struct newpair
    10 {
    11     int article;
    12     int line;
    13     bool operator<(const newpair b) const
    14     {
    15         return this->line < b.line;
    16     }
    17 };
    18 typedef map<string,set<newpair> > BIGMAP;
    19 typedef set<newpair>::iterator SET_pair_ITER;
    20 typedef map<string,set<newpair> >::iterator BIGMAP_iter;
    21 
    22 BIGMAP maper;
    23 string psd[1600];
    24 int maxline;
    25 
    26 int checkmaper()
    27 {
    28     BIGMAP_iter it;
    29     for(it=maper.begin();it!=maper.end();++it)
    30     {
    31         cout<<(it->first);//string-type
    32         set<newpair> cyc;
    33         cyc=it->second;//set<newpair>-type
    34         for(SET_pair_ITER iter=cyc.begin();iter!=cyc.end();++iter)
    35         {
    36             newpair ctn=*iter;
    37             cout<<"  article "<<ctn.article<<" line "<<ctn.line<<endl;
    38         }
    39     }
    40     return 0;
    41 }
    42 
    43 void buildmaper(string aim,int articlenum,int linenum)
    44 {
    45     newpair m;
    46     m.article=articlenum;
    47     m.line=linenum;
    48     maper[aim].insert(m);
    49 }
    50 
    51 int readin()
    52 {
    53     int n;
    54     char c;//input the 
    
    55     cin>>n>>c;
    56     int cur=0;
    57     for(int i=0;i<n;cur++)
    58     {
    59         getline(cin,psd[cur]);
    60         if((int)psd[cur].find("***")!=-1){i++;continue;}//the next article
    61         for(string::iterator it=psd[cur].begin();it!=psd[cur].end();++it)
    62         {
    63             if(isalpha(*it)) *it=tolower(*it);
    64             else *it=' ';
    65         }
    66         stringstream ss(psd[cur]);
    67         string chr;
    68         while(ss>>chr) buildmaper(chr,i,cur);
    69     }
    70     return cur;
    71 }
    72 
    73 int main()
    74 {
    75     freopen("input.txt","r",stdin);
    76     freopen("ans.txt","w",stdout);
    77     maxline=readin();
    78     checkmaper();
    79     return 0;
    80 }

    以上代码涉及了较多C++知识与个别底层知识,下面进行列举:

    1、stringstream常用操作

    2、基本STL之map与set

    3、结构体中的运算符重载

    4、迭代器的操作

    5、RB树实现map与set的基本原理

    有关详细的实现方法请参照我的其它博客和上述代码。

    在上述代码中唯一一个容易出现bug的位置是set的实现:由于set对输入的元素需要进行排序,所以必须在newpair结构体中重载<(operator)。

    下面是运行图片:

    输入如下:

    4
    one   repeat  repeat  repeat
    A manufacturer, importer, or seller of
    digital media devices may not (1) sell,
    or offer for sale, in interstate commerce,
    or (2) cause to be transported in, or in a
    manner affecting, interstate commerce,
    a digital media device unless the device
    includes and utilizes standard security
    technologies that adhere to the security
    system standards.
    **********
    one two   repeat  repeat  repeat   repeat
    Of course, Lisa did not necessarily
    intend to read his books. She might
    want the computer only to write her
    midterm. But Dan knew she came from
    a middle-class family and could hardly
    afford the tuition, let alone her reading
    fees. Books might be the only way she
    could graduate
    **********
    one two three   repeat   repeat  repeat  repeat   repeat
    Research in analysis (i.e., the evaluation
    of the strengths and weaknesses of
    computer system) is essential to the
    development of effective security, both
    for works protected by copyright law
    and for information in general. Such
    research can progress only through the
    open publication and exchange of
    complete scientific results
    **********
    one two three   four   repeat  repeat   repeat  repeat  repeat   repeat
    I am very very very happy!
    What about you?
    **********

    输出如下:

    a  article 0 line 1
      article 0 line 4
      article 0 line 6
      article 1 line 16
    about  article 3 line 34
    adhere  article 0 line 8
    affecting  article 0 line 5
    afford  article 1 line 17
    alone  article 1 line 17
    am  article 3 line 33
    analysis  article 2 line 22
    and  article 0 line 7
      article 1 line 16
      article 2 line 23
      article 2 line 27
      article 2 line 29
    be  article 0 line 4
      article 1 line 18
    books  article 1 line 13
      article 1 line 18
    both  article 2 line 25
    but  article 1 line 15
    by  article 2 line 26
    came  article 1 line 15
    can  article 2 line 28
    cause  article 0 line 4
    class  article 1 line 16
    commerce  article 0 line 3
      article 0 line 5
    complete  article 2 line 30
    computer  article 1 line 14
      article 2 line 24
    copyright  article 2 line 26
    could  article 1 line 16
      article 1 line 19
    course  article 1 line 12
    dan  article 1 line 15
    development  article 2 line 25
    device  article 0 line 6
    devices  article 0 line 2
    did  article 1 line 12
    digital  article 0 line 2
      article 0 line 6
    e  article 2 line 22
    effective  article 2 line 25
    essential  article 2 line 24
    evaluation  article 2 line 22
    exchange  article 2 line 29
    family  article 1 line 16
    fees  article 1 line 18
    for  article 0 line 3
      article 2 line 26
      article 2 line 27
    four  article 3 line 32
    from  article 1 line 15
    general  article 2 line 27
    graduate  article 1 line 19
    happy  article 3 line 33
    hardly  article 1 line 16
    her  article 1 line 14
      article 1 line 17

    其余略。。。。。。。。。。

    OK

  • 相关阅读:
    centos7 双网卡设置(先NAT和后桥接)
    centos7 nginx搭建及其反向代理
    centos7 出现please make your choice from 1 to enter..
    centos7 keepalive双机热备~
    多线程【转】
    多进程的基本使用--multiprocessing 【转】
    http--一次完整的HTTP事务是怎样一个过程?【转】
    【转】Python操作MongoDB
    文件操作
    Log4j 日志操作包配置详解
  • 原文地址:https://www.cnblogs.com/savennist/p/12230612.html
Copyright © 2011-2022 走看看