zoukankan      html  css  js  c++  java
  • Regular expression: Implementations and running times

    原文:维基百科链接

    There are at least three different algorithms that decide if and how a given regular expression matches a string.

    至少有三种算法可以用来确定给定的正则表达式是否匹配一个字符串。

    (1) 自动机

    The oldest and fastest rely on a result in formal language theory that allows every nondeterministic(不确定) finite automaton (NFA) to be transformed into a deterministic finite automaton (DFA). The DFA can be constructed explicitly and then run on the resulting input string one symbol at a time. Constructing the DFA for a regular expression of size m has the time and memory cost of O(2m), but it can be run on a string of size n in time O(n).

    (2) 模拟自动机

    An alternative approach is to simulate the NFA directly, essentially building each DFA state on demand and then discarding it at the next step. This keeps the DFA implicit and avoids the exponential construction cost, but running cost rises to O(m2n). The explicit approach is called the DFA algorithm and the implicit approach the NFA algorithm. Adding caching to the NFA algorithm is often called the "lazy DFA" algorithm, or just the DFA algorithm without making a distinction. These algorithms are fast, but using them for recalling grouped subexpressions, lazy quantification, and similar features is tricky.[12][13]

    (3) 回溯

    The third algorithm is to match the pattern against the input string by backtracking. This algorithm is commonly called NFA, but this terminology can be confusing. Its running time can be exponential, which simple implementations exhibit when matching against expressions like (a|aa)*b that contain both alternation and unbounded quantification and force the algorithm to consider an exponentially increasing number of sub-cases. This behavior can cause a security problem called Regular expression Denial of Service.

    Although backtracking implementations only give an exponential guarantee in the worst case, they provide much greater flexibility and expressive power. For example, any implementation which allows the use of backreferences, or implements the various extensions introduced by Perl, must include some kind of backtracking. Some implementations try to provide the best of both algorithms by first running a fast DFA algorithm, and revert to a potentially slower backtracking algorithm only when a backreference is encountered during the match.

    其他资料:

    Implementing Regular Expressions

  • 相关阅读:
    使用 C# .NET 在 ASP.NET 应用程序中实现基于窗体的身份验证
    高性能 Windows Socket 组件 HPSocket
    Linux下的C编程实战
    Scrum实践
    hadoop之NameNode,DataNode,Secondary NameNode
    代码抽象层次
    分布式统计的思考以及实现
    GCC起步
    学习 easyui 之一:easyloader 分析与使用
    从Prism中学习设计模式之MVVM 模式简述MVVM
  • 原文地址:https://www.cnblogs.com/wangshide/p/2608188.html
Copyright © 2011-2022 走看看