问题描述:
Design a search autocomplete system for a search engine. Users may input a sentence (at least one word and end with a special character '#'
). For each character they type except '#', you need to return the top 3 historical hot sentences that have prefix the same as the part of sentence already typed. Here are the specific rules:
- The hot degree for a sentence is defined as the number of times a user typed the exactly same sentence before.
- The returned top 3 hot sentences should be sorted by hot degree (The first is the hottest one). If several sentences have the same degree of hot, you need to use ASCII-code order (smaller one appears first).
- If less than 3 hot sentences exist, then just return as many as you can.
- When the input is a special character, it means the sentence ends, and in this case, you need to return an empty list.
Your job is to implement the following functions:
The constructor function:
AutocompleteSystem(String[] sentences, int[] times):
This is the constructor. The input is historical data. Sentences
is a string array consists of previously typed sentences. Times
is the corresponding times a sentence has been typed. Your system should record these historical data.
Now, the user wants to input a new sentence. The following function will provide the next character the user types:
List<String> input(char c):
The input c
is the next character typed by the user. The character will only be lower-case letters ('a'
to 'z'
), blank space (' '
) or a special character ('#'
). Also, the previously typed sentence should be recorded in your system. The output will be the top 3 historical hot sentences that have prefix the same as the part of sentence already typed.
Example:
Operation: AutocompleteSystem(["i love you", "island","ironman", "i love leetcode"], [5,3,2,2])
The system have already tracked down the following sentences and their corresponding times: "i love you"
: 5
times "island"
: 3
times "ironman"
: 2
times "i love leetcode"
: 2
times
Now, the user begins another search:
Operation: input('i')
Output: ["i love you", "island","i love leetcode"]
Explanation:
There are four sentences that have prefix "i"
. Among them, "ironman" and "i love leetcode" have same hot degree. Since ' '
has ASCII code 32 and 'r'
has ASCII code 114, "i love leetcode" should be in front of "ironman". Also we only need to output top 3 hot sentences, so "ironman" will be ignored.
Operation: input(' ')
Output: ["i love you","i love leetcode"]
Explanation:
There are only two sentences that have prefix "i "
.
Operation: input('a')
Output: []
Explanation:
There are no sentences that have prefix "i a"
.
Operation: input('#')
Output: []
Explanation:
The user finished the input, the sentence "i a"
should be saved as a historical sentence in system. And the following input will be counted as a new search.
Note:
- The input sentence will always start with a letter and end with '#', and only one blank space will exist between two words.
- The number of complete sentences that to be searched won't exceed 100. The length of each sentence including those in the historical data won't exceed 100.
- Please use double-quote instead of single-quote when you write test cases even for a character input.
- Please remember to RESET your class variables declared in class AutocompleteSystem, as static/class variables are persisted across multiple test cases. Please see here for more details.
解题思路:
可以用trie树来解决这个问题。
由于要返回前3个搜索次数最多的句子,我们可以用priority_queue来存储所返回的所有的句子和它的次数的键值对。
首先构造trie tree,主要为trieNode的结构以及insert 方法。
构造完trieNode类后, 这个系统实际上主要为一个巨大的trietree,我们需要一个树的根节点。
由于我们每次都要输入一个字符,我们可以用一个私有的Node:curNode来追踪当前我们节点。
curNode初始化为root,在每次输入完一个句子时,即输入的字符为‘#’时,我们需要将其置为root
同时需要一个string类型stn来表示当前的搜索的句子。
需要注意的是我们priority_queue中存储的为pair<string,int>我们需要给它重写比较器。
所以我们每输入一个字符,首先检查是不是结尾标识“#”,如果是的话,将当前句子加入trie树,重置相关变量,返回空数组。
如不是,检查当前TrieNode对应的child是否含有c的对应节点。如果没有,将curNode置为NULL并且返回空数组。
若存在,将curNode 更新为c对应的节点,并且对curNode进行dfs。
dfs时,我们首先检查当前是不是一个完整的句子,如果是,将句子与其次数同时加入priority_queue中,然后对其child中可能存在的子节点进行dfs。
进行完dfs后,我们需要取出前三个,需要注意的是,可能可选择的结果不满3个,所以要在while中多加入检测q为空的条件语句。
最后要将q中的所有元素都弹出。
代码:
class TrieNode{ public: string str; int cnt; unordered_map<char, TrieNode*> child; TrieNode(): str(""), cnt(0){}; }; struct cmp{ bool operator() (const pair<string, int> &p1, const pair<string, int> &p2){ return p1.second < p2.second || (p1.second == p2.second && p1.first > p2.first); } }; class AutocompleteSystem { public: AutocompleteSystem(vector<string> sentences, vector<int> times) { root = new TrieNode(); for(int i = 0; i < sentences.size(); i++){ insert(sentences[i], times[i]); } curNode = root; stn = ""; } vector<string> input(char c) { if(c == '#'){ insert(stn, 1); stn.clear(); curNode = root; return {}; } stn.push_back(c); if(curNode && curNode->child.count(c)){ curNode = curNode->child[c]; }else{ curNode = NULL; return {}; } dfs(curNode); vector<string> ret; int n = 3; while(n > 0 && !q.empty()){ ret.push_back(q.top().first); q.pop(); n--; } while(!q.empty()) q.pop(); return ret; } void dfs(TrieNode* n){ if(n->str != ""){ q.push({n->str, n->cnt}); } for(auto p : n->child){ dfs(p.second); } } void insert(string s, int cnt){ TrieNode* cur = root; for(auto c : s){ if(cur->child.count(c) == 0){ cur->child[c] = new TrieNode(); } cur = cur->child[c]; } cur->str = s; cur->cnt += cnt; } private: TrieNode *root, *curNode; string stn; priority_queue<pair<string,int>, vector<pair<string, int>>, cmp > q; }; /** * Your AutocompleteSystem object will be instantiated and called as such: * AutocompleteSystem obj = new AutocompleteSystem(sentences, times); * vector<string> param_1 = obj.input(c); */