zoukankan      html  css  js  c++  java
  • Fast Token Replacement in C#

    http://www.codeproject.com/Articles/298519/Fast-Token-Replacement-in-Csharp

    Fast Token Replacement in C#

    Introduction

    FastReplacer is good for executing many Replace operations on a large string when performance is important.

    The main idea is to avoid modifying existing text or allocating new memory every time a string is replaced.

    We have designed FastReplacer to help us on a project where we had to generate a large text with a large number of append and replace operations. The first version of the application took 20 seconds to generate the text usingStringBuilder. The second improved version that used the String class took 10 seconds. Then we implementedFastReplacer and the duration dropped to 0.1 seconds.

    Using the code

    Use of FastReplacer should come intuitively to developers that have used StringBuilder before.

    Add classes FastReplacer and FastReplacerSnippet to your C# console application and copy the following code in Program.cs:

    using System;
    using Omega.Alpha.Common;
     
    namespace ConsoleApplication1
    {
        class Program
        {
            static void Main(string[] args)
            {
                // Tokens will be delimited with { and }.
                FastReplacer fr = new FastReplacer("{", "}");
                fr.Append("{OPTION} OR {ANOTHER}.");
                fr.Replace("{ANOTHER}", "NOT {OPTION}");
                // The text is now "{OPTION} OR NOT {OPTION}."
                fr.Replace("{OPTION}", "TO BE");
                // The text is now "TO BE OR NOT TO BE."
                Console.WriteLine(fr.ToString());
            }
        }
    }

    Note that only properly formatted tokens can be replaced using FastReplacer. This constraint is set to ensure good performance since tokens can be extracted from text in one pass.

    Case-insensitive

    For a case-insensitive replace operation, set the additional constructor parameter caseSensitive to false (default istrue).

    FastReplacer fr = new FastReplacer("{", "}", false); // Case-insensitive
    fr.Append("{Token}, {token} and {TOKEN}.");
    fr.Replace("{tOkEn}", "x"); // Text is "x, x and x."
    Console.WriteLine(fr.ToString());

    What is in the package

    The algorithm is implemented in two C# classes on .NET Framework 4.0:

    • FastReplacer – Class for generating strings with a fast Replace function.
    • FastReplacerSnippet – Internal class that is used from FastReplacer. No need to use it directly.

    The attached solution contains three projects for Visual Studio 2010:

    • Omega.Alpha.Common – A class library with the FastReplacer class and the FastReplacerSnippet class.
    • FastReplacerDemo – Demonstration console application. Contains samples and performance tests used in this article.
    • FastReplacerTest – Unit tests for FastReplacer. All cool features can be seen in these tests.

    Performance

    Speed

    Using String or StringBuilder to replace tokens in large text takes more time because every time a Replacefunction is called, a new string is generated.

    These tests were performed with FastReplacerDemo.exe (attached project) on a computer with Windows Experience Index: Processor 5.5, Memory 5.5.

    ReplaceCount TextLength FastReplacer String StringBuilder
    100 907 0.0008 sec 0.0006 sec 0.0014 sec
    300 2707 0.0023 sec 0.0044 sec 0.0192 sec
    1000 10008 0.0081 sec 0.0536 sec 1.2130 sec
    3000 30008 0.0246 sec 0.4515 sec 43.5499 sec
    10000 110009 0.0894 sec 5.9623 sec 1677.5883 sec
    30000 330009 0.3649 sec 60.9739 sec Skipped
    100000 1200010 1.5461 sec 652.8718 sec Skipped

    Memory usage

    Memory usage while working with FastReplacer is usually 3 times the memory size of the resulting text. This includes:

    1. Original strings sent to FastReplacer as an argument for AppendReplaceInsertBefore, andInsertAfter functions.
    2. Temporary memory used while generating final text in the FastReplacer.ToString function.
    3. The final generated string.

    The algorithm

    Using the conventional string.Replace function or StringBuilder.Replace function to generate large text takes O(n*m) time, where n is the number of replace operations that is executed and m is the text length, because a new string is generated every time the function is executed.

    This chapter explains that FastReplacer will take O(n*log(n) + m) time for the same task.

    Tokens

    Tokens are recognized in text every time a new text is added (by the AppendReplaceInsertBefore, andInsertAfter functions). Positions of tokens are kept in a Dictionary for fast retrieval. That way searching for a text to be replaced takes O(n*log(n) + m) time instead of O(n*m). That is good, but the following part of the algorithm has more impact.

    Snippets

    When the Replace function is called to replace a token with a new string, FastReplacer does not modify existing text. Instead, it keeps track of the position in the original text where the token should be cut out and the new string inserted. Next time the Replace function is called, it will do the same on the original text and in the new strings that were previously inserted. That creates a directed acyclic graph where every node (called FastReplacerSnippet) represents a string with information where it should be inserted in its parent text, and with an array of child nodes that should be inserted in that string.

    Every replace operation takes O(log n) time to find the matching token (covered in the previous chapter) and O(1) to insert a new node in the data structure.

    Generating the final string takes O(m) time because there is only one pass through the data structure to recursively collect all the parts that need to be concatenated.

    Sample 1

    For example, in the string “My {pet} likes {food}.”, if token “{pet}” is replaced with “tortoise”, the following data structure will be created:

    FastTokenReplacement/My_pet_likes_food.png

    The final text will be composed by concatenating the text parts “My ”, “tortoise”, and “ likes {food}.”.

    Sample 2

    A more complex example is the text “{OPTION} OR {ANOTHER}”. If the token “{ANOTHER}” is replaced with “NOT {OPTION}”, then the token “{OPTION}” replaced with “TO BE”, we will get the following structure:

    FastTokenReplacement/To_be_or_not_to_be.png

    Constraints

    When snippets of text are inserted, tokens are searched in every snippet separately. Tokens in every snippet must be properly formatted. A token cannot start in one snippet then end in another.

    For example, you cannot insert a text that contains only the beginning of a token (e.g., “Hello {”) then append a text with the end of the token (e.g., “USERNAME}.”). Each of these function calls would fail because the token in each text is not properly formatted.

    To ensure maximal consistency, FastReplacer will throw an exception if the inserted text contains an improperly formatted token.

  • 相关阅读:
    FormsAuthentication 简单使用
    2.0 泛型
    解决Eclipse java build path中Web App Libraries无法自动找到WEBINF的lib目录
    WinHex 15.8 r4 注册信息
    用ClassPathXmlApplicationContext读取Spring配置文件的路径问题
    Installing NFS on CentOS 6.2
    使用XStream对Java对象进行序列化和反序列化
    Eclipse代码注释模板code template
    Timestamp和String的相互转换 Java
    visual studio vs2010 2012 C/C++ 编译找不到mspdb100.dll文件的解决方法
  • 原文地址:https://www.cnblogs.com/Leo_wl/p/3228428.html
Copyright © 2011-2022 走看看