zoukankan      html  css  js  c++  java
  • 学习Huffman Coding

    什么是霍夫曼编码 (Huffman Coding)

    是一种用于无损数据压缩的权编码算法。由美国计算机科学家David Albert Huffman在1952年发明。

    霍夫曼编码使用变长编码表泽源符号(如一个字母)进行编码,其中变长编码表是通过一种评估来源符号出现几率的方法得到的,出现几率高的字母使用较短的编码,反之出现几率低的则使用较长的编码,这便使编码之后的字符串的平均长度、期望值降低,从而达到无损压缩数据的目的。

    Huffman Coding的作用是什么?

    用于数据压缩与解压。

    我们知道英文和数字各占1个字节,中文占1个字符,也是就是2个字节;

    • utf-8编码中,中文字符占了3个字节,英文占1个字节;
    • utf-16编码中,中文字符占了3个字节,英文占2个字节;
    • utf-32编码中,所有字符均占4个字节;

    我们再重温下字节:

    字节是一种数据量的单位,一个字节等于8位(8 bit) bit,一个二进制数据0或1,是一bit。

    所有的数据所占空间都可以用字节数据来衡量;例如Java中:

    • 一个字符(char)占2个字节,
    • 一个short占2个字节,
    • 一个int占4个字节,
    • 一个float占4个字节,
    • 一个long或double占8个字节。

    代码的实现

    Code Tree, Left Traversal has a value of 0, Right Traversal has a value of 1.

    Coal: reduce the code tree,

    • step1: take the 2 chars with the lowest frequency
    • step2: make a 2 leaf node tree from them, the root node value is a sum of 2 leaves node's frequency
    • step3: take the next lowest frequency char, and add it to the tree

    Let us understand the algorithm with an example.

    package _Algorithm.HuffmanCode
    
    import java.util.*
    
    class HuffmanCoding {
        //recursive function to paint the huffman-code through the tree traversal
        private fun printCode(root: HuffmanNode?, s: String) {
            if (root?.left == null && root?.right == null && Character.isLetter(root?.c!!)) {
                println("${root?.c}:$s")
                return
            }
    
            //if we go left than add "0" to the node
            //if we go right than add "1" to the node
            printCode(root.left, s + "0")
            printCode(root.right, s + "1")
        }
    
        fun test() {
            val n = 6
            val charArray = charArrayOf('a', 'b', 'c', 'd', 'e', 'f')
            val charfreq = intArrayOf(5, 9, 12, 13, 16, 45)
            val priorityQueue = PriorityQueue<HuffmanNode>(n, MyComparator())
    
            for (i in 0 until n) {
                val node = HuffmanNode()
                node.c = charArray[i]
                node.data = charfreq[i]
                node.left = null
                node.right = null
                priorityQueue.add(node)
            }
    
            //create root node
            var root: HuffmanNode? = null
    
            while (priorityQueue.size > 1) {
                //first min extract
                val x = priorityQueue.poll()
                //second min extract
                val y = priorityQueue.poll()
    
                // to the sum of the frequency of the two nodes
                // assigning values to the f node.
                val f = HuffmanNode()
                f.data = x.data + y.data
                f.c = '-'
                f.left = x
                f.right = y
                //make the f node as the root
                root = f
                priorityQueue.add(f)
            }
            printCode(root, "")
        }
    }
    
    class MyComparator : Comparator<HuffmanNode> {
        override fun compare(o1: HuffmanNode?, o2: HuffmanNode?): Int {
            return o1?.data!! - o2?.data!!
        }
    }

    打印结果

    f:0
    c:100
    d:101
    a:1100
    b:1101
    e:111

    压缩结果

    从数据:
     val charArray = charArrayOf('a', 'b', 'c', 'd', 'e', 'f')
     val charfreq = intArrayOf(5, 9, 12, 13, 16, 45)

    我们得出结果:
    Finding number of bits without using Huffman:
    Total number of characters = sum of frequencies = 100;
    1byte = 8bits, so total number of bit = 100*8 = 800;

    Using Huffman Encoding result is :
    f:0 //code length is 1
    c:100 //code length is 3
    d:101
    a:1100
    b:1101
    e:111
    so total number of bits =
    freq(f) * code_length(f) + freq(c) * code_length(c) + freq(d) * code_length(d) + freq(a) * code length(a) +
    freq(b) * code_length(b) + freq(e) * code_length(e) =
    45*1 + 12*3 + 13*3 + 5*4 + 9*4 + 16*3 = 224

    Bits saved: 800-224 = 576.
     
  • 相关阅读:
    webpack 4.X 基础编译
    一个数组中两个数的和为N,找出这两个数字的下标
    Mybatis自动生成,针对字段类型为text等会默认产生XXXXWithBlobs的方法问题
    Docker Mysql
    Docker镜像Push到DockerHub
    E: Unable to locate package git
    linux解压类型总结
    docker安装gitlab
    Docker应用
    解决github访问及上传慢的问题
  • 原文地址:https://www.cnblogs.com/johnnyzhao/p/12821052.html
Copyright © 2011-2022 走看看