zoukankan html css js c++ java

Word Break

Given a string s and a dictionary of words dict, determine if s can be segmented into a space-separated sequence of one or more dictionary words.

For example, given
s = "leetcode",
dict = ["leet", "code"].

Return true because "leetcode" can be segmented as "leet code".

参考：http://chuansongme.com/n/200032

分词问题分析

原题

给定字符串，以及一个字典，判断字符串是否能够拆分为字段中的单词。例如，字段为{hello，world}，字符串为hellohelloworld，则可以拆分为hello,hello,world，都是字典中的单词。

分析

这个题目唤作“分词问题”，略显宽泛。只是想提及这个问题，这是在自然语言处理，搜索引擎等等领域中，非常基础的一个问题，解决的方法也比较多，相对比较成熟，不过这仍旧是一个值得进一步探索的问题。那我们先从这个简单的题目入手，看看如何处理题目中这个问题。最直接的思路就是递归，很简单。我们考虑每一个前缀，是否在字典中？如果在，则递归处理剩下的字串，如果不在；则考虑其他前缀。示例代码如下：

#include<iostream>
#include<unordered_set>
#include<string>
using namespace std;

class Solution {
public:
    bool wordBreak(string s, unordered_set<string> &dict) {
        int len=s.length();
        if(len==0)
            return true;
        int i;
        for(i=1;i<=len;i++)
        {
            if(dict.count(s.substr(0,i))>0&&wordBreak(s.substr(i,len-i),dict))
                return true;
        }
        return false;
    }
};

int main()
{
    unordered_set<string> dict={"aaaa","aaa"};
    Solution s;
    string ss="aaaaaa";
    cout<<s.wordBreak(ss,dict)<<endl;
}

在上面的代码中：每一种情况都要处理substr，程序的耗时比较长，如果在OJ上提交，干脆超时的，那么如何改进呢？

这个题目的处理，上期的题目是很相似的。在递归子问题中，找重复的子问题。也非常明显，如下图(图片来自GeeksforGeeks)所示：

所以，通过动态规划的方法，可以通过有较大幅度的提升，同样，这个题目与前面的每一个状态都有关系的，所以，是一个二重循环，时间复杂度为O(n^2)。示例代码如下：

#include<iostream>
#include<unordered_set>
#include<string>
#include<cstring>
using namespace std;

class Solution
{
public:
    bool wordBreak(string s, unordered_set<string> &dict)
    {
        int len=s.length();
        if(len==0)
            return true;
        bool dp[len+1];
        memset(dp,0,sizeof(dp));
        int i,j;
        for(i=1; i<=len; i++)
        {
            if(dp[i]==false&&dict.count(s.substr(0,i))>0)
                dp[i]=true;
            if(i==len&&dp[i]==true)
                return true;
            if(dp[i]==true)
            {
                for(j=i+1; j<=len; j++)
                {
                    if(dp[j]==false&&dict.count(s.substr(i,j-i))>0)
                        dp[j]=true;
                    if(j==len&&dp[j]==true)
                        return true;
                }
            }
        }
        return false;
    }
};

int main()
{
    unordered_set<string> dict= {"leet","code"};
    Solution s;
    string ss = "leetcode";
    cout<<s.wordBreak(ss,dict)<<endl;
}

方法三：

// 第二版 参考leetcode官网上的答案
class Solution { 
public:
bool wordBreak(string s, unordered_set<string> &dict) {
    vector<bool> wordB(s.length() + 1, false);
    wordB[0] = true;
    for (int i = 1; i < s.length() + 1; i++) {
        for (int j = i - 1; j >= 0; j--) {
            if (wordB[j] && dict.find(s.substr(j, i - j)) != dict.end()) {
                wordB[i] = true;
                break; //只要找到一种切分方式就说明长度为i的单词可以成功切分，因此可以跳出内层循环。
            }
        }
    }
    return wordB[s.length()];
    }
};

查看全文

相关阅读:
PAT（乙级）2020年冬季考试
 Educational Codeforces Round 105 (Rated for Div. 2)【ABC】
三省吾身
 初识SpringBoot
Controller 层中，到底是返回界面还是JSON?（转）
IDEA控制台中文乱码解决
 springboot引入外部依赖jar包（转）
Java7的try-with-resources声明（转）
Java对象的序列化和反序列化（转）
AcWing1303. 斐波那契前 n 项和（递推/矩阵快速幂）

原文地址：https://www.cnblogs.com/wuchanming/p/4133695.html