zoukankan      html  css  js  c++  java
  • [LeetCode#208]Implement Trie (Prefix Tree)

    Problem:

    Implement a trie with insertsearch, and startsWith methods.

    Note:
    You may assume that all inputs are consist of lowercase letters a-z.

    Analysis:

    This data structure is super useful!!!
    It could greatly improve the efficiency in solving some problem!!!
    
    Big picture:
    Each TrieNode has an item and a group of children. For this problem, children's size is 26 ('a' - 'z'). 
    If children[i] != null, it indicates character: 'a'+i just appear at this position. <The position means the level of this node, the same as the character's index in a String>
    Reference:
    https://en.wikipedia.org/wiki/Trie
    
    We construct a tree based on the inserted word. The root node could be used to represent the first character in the String.
    Note: when we search a character, we could directly use children[c-'a'] to locate the information for it, which brings great convenience.  
    Key: except the final word stored in item, we actually record no character, we just map chidlren[c-'a'] with character c. The way we test if a character exist at this index(level) is to check if children[c-'a'] == null. 
    
    1. insert a word. 
    public void insert(String word) {
        TrieNode node = root;
        for (char c : word.toCharArray()) {
            if (node.children[c-'a'] == null)
                node.children[c-'a'] = new TrieNode();
            node = node.children[c-'a'];
            }
        node.item = word;
    }
    
    Key part: 
    1.1 we start from the first index
    TrieNode node = root; <which maps to the first character in the word>
    1.2 
    Iff the character at the position has already appeared for other words, we directly use this child for future extension. Other wise, as the first such character(at the index), we make a new TrieNode.
    if (node.children[c-'a'] == null)
        node.children[c-'a'] = new TrieNode();
    Important: before begining the next loop, we have already point to the TrieNode(at that index). Thus you must update it
    node = node.children[c-'a'];
    Once we reached the right TrieNode, we must indicate we have word at here.
    node.item = word;
    
    
    
    2. search a word.
    public boolean search(String word) {
        TrieNode node = root;
        for (char c : word.toCharArray()) {
            if (node.children[c-'a'] == null)   
                return false;
            node = node.children[c-'a'];
        }
        return node.item.equals(word);
    }
    We follow the idea of inserting a word. Once we could not find a character(at certain index) for a certain TrieNode, we return false.
    Note: All indexed characters appear rightly in the Trie does not necessarily mean the serached the word has appeared in the Trie.
    Case:
    iff 'abcd' in the trie, all characters of 'abc' also in the tree.
    To avoid this situation, we must compare the word we have recorded at the Trie Node.
    
    
    
    3. serach prefix.
    This is a very important characteristic of TrieNode.
    It is just part of searching a word in the Trie.
    public boolean startsWith(String prefix) {
            TrieNode node = root;
            for (char c : prefix.toCharArray()) {
                if (node.children[c-'a'] == null)
                    return false;
                node = node.children[c-'a'];
            }
            return true;
    }
    
    Efficiency:
    For a Trie, 
    the cost of searching a word or preix is O(length(word or prefix)). 
    the cost to insert a word is O(length(word)).
    
    However, it could be very space efficiency.
    All word share the same prefix, would share the same Trie Node for those prefix. 
    
    The amazing part of this data structure is prefix search, which could greatly save enery in DFS search. 

    Solution:

    class TrieNode {
        TrieNode[] children;
        String item; 
        // Initialize your data structure here.
        public TrieNode() {
            children = new TrieNode[26];
            item = "";
        }
    }
    
    public class Trie {
        private TrieNode root;
    
        public Trie() {
            root = new TrieNode();
        }
    
        // Inserts a word into the trie.
        public void insert(String word) {
            TrieNode node = root;
            for (char c : word.toCharArray()) {
                if (node.children[c-'a'] == null)
                    node.children[c-'a'] = new TrieNode();
                node = node.children[c-'a'];
            }
            node.item = word;
        }
    
        // Returns if the word is in the trie.
        public boolean search(String word) {
            TrieNode node = root;
            for (char c : word.toCharArray()) {
                if (node.children[c-'a'] == null)   
                    return false;
                node = node.children[c-'a'];
            }
            return node.item.equals(word);
        }
    
        // Returns if there is any word in the trie
        // that starts with the given prefix.
        public boolean startsWith(String prefix) {
            TrieNode node = root;
            for (char c : prefix.toCharArray()) {
                if (node.children[c-'a'] == null)
                    return false;
                node = node.children[c-'a'];
            }
            return true;
        }
    }

     Fatal errors and Improvement:

    To come up with the above solution, I have made a fatal error, which could be quite common if you have poor programming skill and habits. 
    A fatal: Change the state of visited board, and make a middle return before recovering the state into right mode.
    --------------------------------------------
            visited[i][j] = true;
            cur_str = cur_str + board[i][j];
            if (!prefix_tree.startsWith(cur_str))
                return; //directly return!!!
            ...
            visited[i][j] = false;
    --------------------------------------------
    Fix method is easy, but still ugly
    if (!prefix_tree.startsWith(cur_str)) {
        visited[i][j] = false;
        return;
    }
    
    The reason for such mistake is the bad programming organization. We should avoid any return in the middle a change state, that means we should sequeeze "the change states" as minimum as possible. 
    ----------------------------------------------------------------------------------------
            if (i < 0 || j < 0 || i >= board.length || j >= board[0].length || visited[i][j])
                return;
            cur_str = cur_str + board[i][j];
            if (!prefix_tree.startsWith(cur_str))
                return;
            if (prefix_tree.search(cur_str))
                ret.add(cur_str);
            visited[i][j] = true;
            searchWord(board, i-1, j, cur_str, visited, prefix_tree, ret);
            searchWord(board, i+1, j, cur_str, visited, prefix_tree, ret);
            searchWord(board, i, j-1, cur_str, visited, prefix_tree, ret);
            searchWord(board, i, j+1, cur_str, visited, prefix_tree, ret);
            visited[i][j] = false;
    ----------------------------------------------------------------------------------------
    Why not change the state, when we need to fork? (when the children branch need to know it!)
    
    What's more, the solution is still no elegant enough!
    If we don't need duplicates as return, we should try to use a hash set initally, then add all its elements into ArrayList.
    The idea from:
            Set<String> hs = new HashSet<>();
            hs.addAll(ret);
            ret.clear();
            ret.addAll(hs);
            return ret;
    In our code:
    Set<String> ret = new HashSet<String> ();
    ...
    if (prefix_tree.search(cur_str))
        ret.add(cur_str);
    ...
    return new ArrayList<String> (ret);

    Solution 2.2

    public class Solution {
        public List<String> findWords(char[][] board, String[] words) {
            if (board == null || words == null)
                throw new IllegalArgumentException("Plese check the arguments passed into the funtion");
            Set<String> ret = new HashSet<String> ();
            Trie prefix_tree = new Trie();
            if (board.length == 0)
                return new ArrayList<String> (ret);
            boolean[][] visited = new boolean[board.length][board[0].length];
            for (int i = 0; i < words.length; i++)
                prefix_tree.insert(words[i]);
            for (int i = 0; i < board.length; i++) {
                for (int j = 0; j < board[0].length; j++) {
                    searchWord(board, i, j, "", visited, prefix_tree, ret);
                }
            }
            return new ArrayList<String> (ret);
        }
        
        
        private void searchWord(char[][] board, int i, int j, String cur_str, boolean[][] visited, Trie prefix_tree, Set<String> ret) {
            if (i < 0 || j < 0 || i >= board.length || j >= board[0].length || visited[i][j])
                return;
            cur_str = cur_str + board[i][j];
            if (!prefix_tree.startsWith(cur_str))
                return;
            if (prefix_tree.search(cur_str))
                ret.add(cur_str);
            visited[i][j] = true;
            searchWord(board, i-1, j, cur_str, visited, prefix_tree, ret);
            searchWord(board, i+1, j, cur_str, visited, prefix_tree, ret);
            searchWord(board, i, j-1, cur_str, visited, prefix_tree, ret);
            searchWord(board, i, j+1, cur_str, visited, prefix_tree, ret);
            visited[i][j] = false;
        }
    }
    
    //Trie Node
    class TrieNode{
        public TrieNode[] children = new TrieNode[26];
        public String item = "";
    }
    
    //Trie
    class Trie{
        public TrieNode root = new TrieNode();
     
        public void insert(String word){
            TrieNode node = root;
            for(char c: word.toCharArray()){
                if(node.children[c-'a']==null){
                    node.children[c-'a']= new TrieNode();
                }
                node = node.children[c-'a'];
            }
            node.item = word;
        }
     
        public boolean search(String word){
            TrieNode node = root;
            for(char c: word.toCharArray()){
                if(node.children[c-'a']==null)
                    return false;
                node = node.children[c-'a'];
            }
            if(node.item.equals(word)){
                return true;
            }else{
                return false;
            }
        }
     
        public boolean startsWith(String prefix){
            TrieNode node = root;
            for(char c: prefix.toCharArray()){
                if(node.children[c-'a']==null)
                    return false;
                node = node.children[c-'a'];
            }
            return true;
        }
    }
  • 相关阅读:
    获取txt文件的内容
    MAT(Memory Analyzer Tool)下载和安装
    转换流:指定编码 读取文本文件InputStreamReader和写出文本文件OutputStreamWriter 字符集
    切入点表达式
    并发时事务引发的问题和事务的隔离级别
    多线程之join方法
    多线程之守护线程
    约束:主键、非空、唯一、外键
    数据库设计
    PLSQL过期解决办法
  • 原文地址:https://www.cnblogs.com/airwindow/p/4768259.html
Copyright © 2011-2022 走看看