zoukankan      html  css  js  c++  java
  • CareerCup: 17.14 minimize unrecognized characters

    Oh, no! You have just completed  a lengthy  document when you have an  unfortu-
    nate Find/Replace mishap.  You have accidentally  removed all spaces, punctuation,
    and  capitalization  in  the document. A sentence like  "I reset  the  computer.  It  still
    didn't boot!" would become "iresetthecomputeritstilldidntboot".  You figure that you
    can add back in the punctation and capitalization later, once you get the individual
    words properly separated. Most of the words will be in a dictionary,  but some strings,
    like proper names, will  not.
    Given a dictionary  (a list of words), design an algorithm  to find the optimal way of
    "unconcatenating" a sequence of words. In this case, "optimal" is defined  to be the
    parsing  which  minimizes  the number  of unrecognized sequences of characters.
    For example, the string "jesslookedjustliketimherbrother"  would be optimally parsed
    as  "JESS looked just like TIM her brother". This parsing  has seven unrecognized  char-
    acters, which  we have capitalized  for  clarity.

    这是CareerCup Chapter 17的第14题,我没怎么看CareerCup上的解法,但感觉这道题跟Word Break, Palindrome Partition II很像,都是有一个dictionary, 可以用一维DP来做,用一个int[] res = new int[len+1]; res[i] refers to minimized # of unrecognized chars in first i chars, res[0]=0, res[len]即为所求。

    有了维护量,现在需要考虑转移方程,如下:

    int unrecogNum = dict.contains(s.substring(j, i))? 0 : i-j; //看index从j到i-1的substring在不在dictionary里,如果不在,unrecogNum=j到i-1的char数
    res[i] = Math.min(res[i], res[j]+unrecogNum);

    亲测,我使用的case都过了,只是不知道有没有不过的Corner Case:

     1 package fib;
     2 
     3 import java.util.Arrays;
     4 import java.util.HashSet;
     5 import java.util.Set;
     6 
     7 public class unconcatenating {
     8     public int optway(String s, Set<String> dict) {
     9         if (s==null || s.length()==0) return 0;
    10         int len = s.length();
    11         if (dict.isEmpty()) return len;
    12         int[] res = new int[len+1]; // res[i] refers to minimized # of unrecognized chars in first i chars
    13         Arrays.fill(res, Integer.MAX_VALUE);
    14         res[0] = 0;
    15         for (int i=1; i<=len; i++) {
    16             for (int j=0; j<i; j++) {
    17                 String str = s.substring(j, i);
    18                 int unrecogNum = dict.contains(str)? 0 : i-j;
    19                 res[i] = Math.min(res[i], res[j]+unrecogNum);
    20             }
    21         }
    22         return res[len];
    23     }
    24 
    25 
    26     public static void main(String[] args) {
    27         unconcatenating example = new unconcatenating();
    28         Set<String> dict = new HashSet<String>();
    29         dict.add("reset");
    30         dict.add("the");
    31         dict.add("computer");
    32         dict.add("it");
    33         dict.add("still");
    34         dict.add("didnt");
    35         dict.add("boot");
    36         int result = example.optway("johnresetthecomputeritdamnstilldidntboot", dict);
    37         System.out.print("opt # of unrecognized chars is ");
    38         System.out.println(result);
    39     }
    40 
    41 }

    output是:opt # of unrecognized chars is 8

  • 相关阅读:
    wp8模拟器中使用电脑键盘和模拟器的版本解释
    程序员如何正确的评估自己的薪资
    本地资源之绑定页面的标题和增加软件的语言支持
    C#导出数据的EXCEL模板设计
    程序员高效编程的14点建议
    使用StaticResource给控件定义公共的样式和属性来写界面XAML
    程序员什么时候该考虑辞职
    我的第一个wp8小程序
    检测CPU是否支持虚拟化
    所有经历都是一种恩赐
  • 原文地址:https://www.cnblogs.com/EdwardLiu/p/4356967.html
Copyright © 2011-2022 走看看