从jsoup而来,文章见: https://github.com/code4craft/jsoup-learning/blob/master/blogs/jsoup4.md
状态机
Jsoup的词法分析和语法分析都用到了状态机。状态机可以理解为一个特殊的程序模型,例如经常跟我们打交道的正则表达式就是用状态机实现的。
它由状态(state)和转移(transition)两部分构成。根据状态转移的可能性,状态机又分为DFA(确定有限状态机)和NFA(非确定有限状态自动机)。这里拿一个最简单的正则表达式"a[b]*"作为例子,我们先把它映射到一个状态机DFA,大概是这样子:
状态机本身是一个编程模型,这里我们尝试用程序去实现它,那么最直接的方式大概是这样:
public void process(StringReader reader) throws StringReader.EOFException {
char ch;
switch (state) {
case Init:
ch = reader.read();
if (ch == 'a') {
state = State.AfterA;
accum.append(ch);
}
break;
case AfterA:
...
break;
case AfterB:
...
break;
case Accept:
...
break;
}
}
这样写简单的状态机倒没有问题,但是复杂情况下就有点难受了。还有一种标准的状态机解法,先建立状态转移表,然后使用这个表建立状态机。这个方法的问题就是,只能做纯状态转移,无法在代码级别操作输入输出。
Jsoup里则使用了状态模式来实现状态机,初次看到时,确实让人眼前一亮。状态模式是设计模式的一种,它将状态和对应的行为绑定在一起。而在状态机的实现过程中,使用它来实现状态转移时的处理再合适不过了。
"a[b]*"的例子的状态模式实现如下,这里采用了与Jsoup相同的方式,用到了枚举来实现状态模式:
public class StateModelABStateMachine implements ABStateMachine {
State state;
StringBuilder accum;
enum State {
Init {
@Override
public void process(StateModelABStateMachine stateModelABStateMachine, StringReader reader) throws StringReader.EOFException {
char ch = reader.read();
if (ch == 'a') {
stateModelABStateMachine.state = AfterA;
stateModelABStateMachine.accum.append(ch);
}
}
},
Accept {
...
},
AfterA {
...
},
AfterB {
...
};
public void process(StateModelABStateMachine stateModelABStateMachine, StringReader reader) throws StringReader.EOFException {
}
}
public void process(StringReader reader) throws StringReader.EOFException {
state.process(this, reader);
}
}
完整的实现程序如下:
StateModelABStateMachine.java:
package us.codecraft.learning.automata; /** * @author code4crafter@gmail.com */ public class StateModelABStateMachine implements ABStateMachine { State state = State.Init; StringBuilder accum = new StringBuilder(); enum State { Init { @Override public void process(StateModelABStateMachine stateModelABStateMachine, StringReader reader) throws StringReader.EOFException { char ch = reader.read(); if (ch == 'a') { stateModelABStateMachine.state = AfterA; stateModelABStateMachine.accum.append(ch); } } }, Accept { @Override public void process(StateModelABStateMachine stateModelABStateMachine, StringReader reader) throws StringReader.EOFException { System.out.println("find " + stateModelABStateMachine.accum.toString()); stateModelABStateMachine.accum = new StringBuilder(); stateModelABStateMachine.state = Init; reader.unread(); } }, AfterA { @Override public void process(StateModelABStateMachine stateModelABStateMachine, StringReader reader) throws StringReader.EOFException { char ch = reader.read(); if (ch == 'b') { stateModelABStateMachine.accum.append(ch); stateModelABStateMachine.state = AfterB; } else { stateModelABStateMachine.state = Accept; } } }, AfterB { @Override public void process(StateModelABStateMachine stateModelABStateMachine, StringReader reader) throws StringReader.EOFException { char ch = reader.read(); if (ch == 'b') { stateModelABStateMachine.accum.append(ch); stateModelABStateMachine.state = AfterB; } else { stateModelABStateMachine.state = Accept; } } }; public void process(StateModelABStateMachine stateModelABStateMachine, StringReader reader) throws StringReader.EOFException { } } @Override public void process(StringReader reader) throws StringReader.EOFException { state.process(this, reader); } public static void main(String[] args) { ABStateMachine abStateMachine = new StateModelABStateMachine(); String text = "abbbababbbaa"; StringReader reader = new StringReader(text); try { while (true) { abStateMachine.process(reader); } } catch (StringReader.EOFException e) { } } }
ABStateMachine.java:
package us.codecraft.learning.automata; /** * @author code4crafter@gmail.com */ public interface ABStateMachine { void process(StringReader reader) throws StringReader.EOFException; }
StringReader.java:
package us.codecraft.learning.automata; /** * @author code4crafter@gmail.com */ public class StringReader { class EOFException extends Exception {} private String string; private int index; public StringReader(String string) { this.string = string; } public char read() throws EOFException { if (index < string.length() - 1) { return string.charAt(index++); } else { throw new EOFException(); } } public void unread() { index--; if (index < 0) { index = 0; } } }