zoukankan      html  css  js  c++  java
  • Multi String Search

    refer to: https://www.algoexpert.io/questions/Multi%20String%20Search


    Input: a big string and an array of small strings, the length of every string in the array is smaller than the length of the big string

    Output: a boolean string, which every index represents the particular small string is contained in the big string or not

    Analysis

    Approach 1

    naive approach, iterate through all of the small strings, for every small string, traverse the whole big string, time complexity: O(mnl), where m is the length of the array of small strings, n is the length of the big string, and l is the biggest length of small string. Space complexity is O(m) -> the size of the output array.

    code

    def multiStringSearch(bigString, smallStrings):
        return [isInBigString(bigString, smallString) for smallString in smallStrings]
    
    def isInBigString(bigString, smallString):
        for i in range(len(bigString)):
            if i + len(smallString) > len(bigString):
                break
            if isInBigStringHelper(bigString, smallString, i):
                return True
        return False
    
    def isInBigStringHelper(bigString, smallString, startIdx):
        # compare the two strings from startIdx
        leftBigIdx = startIdx
        rightBigIdx = startIdx + len(smallString) - 1
        leftSmallIdx = 0
        rightSmallIdx = len(smallString) - 1
        while leftBigIdx <= rightBigIdx:
            if bigString[leftBigIdx] != smallString[leftSmallIdx] or bigString[rightBigIdx] != smallString[rightSmallIdx]:
                return False
            leftBigIdx += 1
            rightBigIdx -= 1
            leftSmallIdx += 1
            rightSmallIdx -= 1
        return True

    Approach 2

    build a suffix-trie containing all of the big string's suffixes. 后缀树,to build this suffix trie like data structrue, it takes O(n^2) time and O(n^2) space, where n is the length of the big string.

    To check if a sring is contained in a suffixex trie, it takes O(len(string)) time. In this case, it takes O(biggest length of small string) for check if a string is contained in the big string. To finish checking the whole array of small strings, we will take O(ml) time to check the whole array of strings, where m is the length of the array, l is the biggest length of small string. In total, we will take O(n^2 + ml) time, where O(n^2) is the suffix trie construction and O(ml) is the checking step. Space complexity is O(n^2 + m). O(n^2) is the space for suffix trie, O(m) is to store the output array.

    Code

    def multiStringSearch(bigString, smallStrings):
        modifiedSuffixTrie = ModifiedSuffixTrie(bigString)# construct the suffix trie
        return [modifiedSuffixTrie.contains(string) for string in smallStrings] # check every string is in the suffix trie
    
    class ModifiedSuffixTrie:
        def __init__(self, string):
            self.root = {}
            self.populateModifiedSuffixTrieFrom(string)   # add string into the suffix trie
        
        def populateModifiedSuffixTrieFrom(self, string): # for each string, add each suffix trie, start indices included.
            for i in range(len(string)): 
                self.insertSubstringStartingAt(i, string)
                
        def insertSubstringStartingAt(self, i, string):   # given a start index, insert the substring in the suffix trie
            node = self.root
            for j in range(i, len(string)): # iterate erevy letter from index i 
                letter = string[j]  # the current letter
                if letter not in node:
                    node[letter] = {} # create an empty hashmap to build a new root tree
                node = node[letter] # update the node, traverse down the tree
        
        def contains(self, string): # check a string is contained in the suffix tries
            node = self.root
            for letter in string:
                if letter not in node:
                    return False
                node = node[letter] # update the node, traverse down the tree
       return True

    Approach 3

    build a suffix trie based on array of small strings. suffix trie construction: O(ml) time, where m is the length of the array, l is the biggest length of small string; O(ml) space to store the suffix tries. 

    Search: O(nl), where n is the length of the big string, l is the longest small string in the array.

    In total: time complexity: O(ml + nl).  space complexity: O(ml + m)->O(ml)

    code

    def multiStringSearch(bigString, smallStrings):
        trie = Trie()
        for string in smallStrings:
            trie.insert(string)
        containedStrings = {}
        for i in range(len(bigString)):
            findSmallStringsIn(bigString, i, trie, containedStrings)
        return [string in containedStrings for string in smallStrings]
    
    def findSmallStringsIn(string, startIdx, trie, containedStrings):
        currentNode = trie.root
        for i in range(startIdx, len(string)):
            currentChar = string[i]
            if currentChar not in currentNode:
                break
            currentNode = currentNode[currentChar]
            if trie.endSymbol in currentNode:
                containedStrings[currentNode[trie.endSymbol]] = True
    
    class Trie:
        def __init__(self):
            self.root = {}
            self.endSymbol = "*"
            
        def insert(self, string):
            current = self.root
            for i in range(len(string)):
                if string[i] not in current:
                    current[string[i]] = {}
                current = current[string[i]]
            current[self.endSymbol] = string

    Comparing

    appoach 1: O(mnl)

    appoach 2: O(n^2 + ml)

    appoach 3: O(nl + ml)

    compare l and n, where n is the length of the big string, l is the longest length of small strings

    In the problem description, we know that the length of every string in the array is smaller than the length of the big string, so appoach 3 is better than approach 2.

  • 相关阅读:
    进制转换内容总结
    【Linux】Tomcat安装及端口配置
    【Linux】 JDK安装及配置 (linux-tar.gz版)
    判断集合元素唯一
    linux服务器上部署项目,同时运行两个或多个tomcat
    阿里服务器CentOS报错base ls command not found
    java接受安卓及ios App上传的图片,并保存到阿里OSS
    java-随机生成用户名(中文版及英文版)
    java-将评论内容过滤特殊表情emoj符号,保存到mysql中
    jdbc连接阿里云服务器上的MySQL数据库 及 数据库IP限制
  • 原文地址:https://www.cnblogs.com/LilyLiya/p/14826995.html
Copyright © 2011-2022 走看看