zoukankan      html  css  js  c++  java
  • A Go library implementing an FST (finite state transducer)——mark下

    https://github.com/couchbaselabs/vellum

    Building an FST

    To build an FST, create a new builder using the New() method. This method takes an io.Writer as an argument. As the FST is being built, data will be streamed to the writer as soon as possible. With this builder you MUST insert keys in lexicographic order. Inserting keys out of order will result in an error. After inserting the last key into the builder, you MUST call Close() on the builder. This will flush all remaining data to the underlying writer.

    In memory:

      var buf bytes.Buffer
      builder, err := vellum.New(&buf, nil)
      if err != nil {
        log.Fatal(err)
      }

    To disk:

      f, err := os.Create("/tmp/vellum.fst")
      if err != nil {
        log.Fatal(err)
      }
      builder, err := vellum.New(f, nil)
      if err != nil {
        log.Fatal(err)
      }

    MUST insert keys in lexicographic order:

    err = builder.Insert([]byte("cat"), 1)
    if err != nil {
      log.Fatal(err)
    }
    
    err = builder.Insert([]byte("dog"), 2)
    if err != nil {
      log.Fatal(err)
    }
    
    err = builder.Insert([]byte("fish"), 3)
    if err != nil {
      log.Fatal(err)
    }
    
    err = builder.Close()
    if err != nil {
      log.Fatal(err)
    }

    Using an FST

    After closing the builder, the data can be used to instantiate an FST. If the data was written to disk, you can use the Open()method to mmap the file. If the data is already in memory, or you wish to load/mmap the data yourself, you can instantiate the FST with the Load() method.

    Load in memory:

      fst, err := vellum.Load(buf.Bytes())
      if err != nil {
        log.Fatal(err)
      }

    Open from disk:

      fst, err := vellum.Open("/tmp/vellum.fst")
      if err != nil {
        log.Fatal(err)
      }

    Get key/value:

      val, exists, err = fst.Get([]byte("dog"))
      if err != nil {
        log.Fatal(err)
      }
      if exists {
        fmt.Printf("contains dog with val: %d
    ", val)
      } else {
        fmt.Printf("does not contain dog")
      }

    Iterate key/values:

      itr, err := fst.Iterator(startKeyInclusive, endKeyExclusive)
      for err == nil {
        key, val := itr.Current()
        fmt.Printf("contains key: %s val: %d", key, val)
        err = itr.Next()
      }
      if err != nil {
        log.Fatal(err)
      }

    How does the FST get built?

    A full example of the implementation is beyond the scope of this README, but let's consider a small example where we want to insert 3 key/value pairs.

    First we insert "are" with the value 4.

    step1

    Next, we insert "ate" with the value 2.

    step2

    Notice how the values associated with the transitions were adjusted so that by summing them while traversing we still get the expected value.

    At this point, we see that state 5 looks like state 3, and state 4 looks like state 2. But, we cannot yet combine them because future inserts could change this.

    Now, we insert "see" with value 3. Once it has been added, we now know that states 5 and 4 can longer change. Since they are identical to 3 and 2, we replace them.

    step3

    Again, we see that states 7 and 8 appear to be identical to 2 and 3.

    Having inserted our last key, we call Close() on the builder.

    step4

    Now, states 7 and 8 can safely be replaced with 2 and 3.

    For additional information, see the references at the bottom of this document.

  • 相关阅读:
    由@Convert注解引出的jackson对枚举的反序列化规则
    List.contains()与自动拆箱
    Utf-8+Bom编码导致的读取数据部分异常问题
    ResouceUtils.getFile()取不到Jar中资源文件源码小结
    Java自动装箱中的缓存原理
    Javaconfig形式配置Dubbo多注册中心
    logback多环境配置
    Spring @Scheduled @Async联合实现调度任务(2017.11.28更新)
    Nginx的Access日志记录的时机
    Mysql索引引起的死锁
  • 原文地址:https://www.cnblogs.com/bonelee/p/6806129.html
Copyright © 2011-2022 走看看