zoukankan      html  css  js  c++  java
  • huffman编码——原理与实现


    哈夫曼算法原理


    Wikipedia上面说的非常清楚了,这里我就不再赘述,直接贴过来了。


    1952年, David A. Huffman提出了一个不同的算法,这个算法能够为不论什么的可能性提供出一个理想的树。香农-范诺编码(Shanno-Fano)是从树的根节点到叶子节点所进行的的编码,哈夫曼编码算法却是从相反的方向,暨从叶子节点到根节点的方向编码的。

    1. 为每一个符号建立一个叶子节点,并加上其对应的发生频率
    2. 当有一个以上的节点存在时,进行下列循环:
      1. 把这些节点作为带权值的二叉树的根节点,左右子树为空
      2. 选择两棵根结点权值最小的树作为左右子树构造一棵新的二叉树,且至新的二叉树的根结点的权值为其左右子树上根结点的权值之和。
      3. 把权值最小的两个根节点移除
      4. 将新的二叉树添�队列中.
    3. 最后剩下的节点暨为根节点,此时二叉树已经完毕。

    演示样例


    Huffman Algorithm


    符号 A B C D E
    计数 15 7 6 6 5
    概率 0.38461538 0.17948718 0.15384615 0.15384615 0.12820513

    在这样的情况下,D,E的最低频率和分配分别为0和1,分组结合概率的0.28205128。如今最低的一双是B和C,所以他们就分配0和1组合结合概率的0.33333333在一起。这使得BC和DE所以0和1的前面加上他们的代码和它们结合的概率最低。然后离开仅仅是一个和BCDE,当中有前缀分别为0和1,然后结合。这使我们与一个单一的节点,我们的算法是完整的。


    可得A代码的代码长度是1比特,其余字符是3比特。

    字符 A B C D E
    代码 0 100 101 110 111

        Entropy:



    Pseudo-code

    
     1:  begin
     2:     count frequencies of single characters (source units)
     3:     output(frequencies using Fibonacci Codes of degree 2)
     4:     sort them to non-decreasing sequence
     5:     create a leaf node (character, frequency c, left son = NULL, right son = NULL) 
     6:  	   of the tree for each character and put nodes into queue F
     7:     while (|F|>=2) do 
     8:      begin
     9:        pop the first two nodes (u1, u2) with the lowest 
    10:  	      frequencies from sorted queue
    11:        create a node evaluated with sum of the chosen units, 
    12:  	      successors are chosen units (eps, c(u1)+c(u2), u1, u2)
    13:        insert new node into queue
    14:      end
    15:     node evaluate with way from root to leaf node (left son 1, right son 0)
    16:     create output from coded intput characters
    17:  end




    哈夫曼算法实现



    实现的时候我们用vector<bool>记录每一个char的编码;用map<char,vector<bool>>表示整个字典;
    就得到了以下的代码(以下有两个,一对一错):

    先放出来这个错的,以示警戒

    /************************************************************************/
    /*	File Name: Huffman.cpp
    *		@Function: Lossless Compression
    		@Author: Sophia Zhang
    		@Create Time: 2012-9-26 10:40
    		@Last Modify: 2012-9-26 11:10
    */
    /************************************************************************/
    
    #include"iostream"
    #include "queue"
    #include "map"
    #include "string"
    #include "iterator"
    #include "vector"
    #include "algorithm"
    using namespace std;
    
    #define NChar 8	//suppose use at most 8 bits to describe all symbols
    #define Nsymbols 1<<NChar	//can describe 256 symbols totally (include a-z, A-Z)
    typedef vector<bool> Huff_code;//8 bit code of one char
    map<char,Huff_code> Huff_Dic;	//huffman coding dictionary
    
    class HTree
    {
    public :
    	HTree* left;
    	HTree* right;
    	char ch;
    	int weight;
    
    	HTree(){left = right = NULL; weight=0;}
    	HTree(HTree* l,HTree* r,int w,char c){left = l;	right = r;	weight=w;	ch=c;}
    	~HTree(){delete left; delete right;}
    	int Getweight(){return weight?weight:left->weight+right->weight;}
    	bool Isleaf(){return !left && !right; }
    	bool operator < (const HTree tr) const
    	{
    		return tr.weight < weight;
    	}
    };
    
    HTree* BuildTree(int *frequency)
    {
    	priority_queue<HTree*> QTree;
    
    	//1st level add characters
    	for (int i=0;i<Nsymbols;i++)
    	{
    		if(frequency[i])
    			QTree.push(new HTree(NULL,NULL,frequency[i],(char)i));			
    	}
    
    	//build
    	while (QTree.size()>1)
    	{
    		HTree* lc  = QTree.top();
    		QTree.pop();
    		HTree* rc = QTree.top();
    		QTree.pop();
    
    		HTree* parent = new HTree(lc,rc,parent->Getweight(),(char)256);
    		QTree.push(parent);
    	}
    	//return tree root
    	return QTree.top();
    }
    
    void Huffman_Coding(HTree* root, Huff_code& curcode)
    {
    	if(root->Isleaf())
    	{
    		Huff_Dic[root->ch] = curcode;
    		return;
    	}
    	Huff_code& lcode = curcode;
    	Huff_code& rcode = curcode;
    	lcode.push_back(false);
    	rcode.push_back(true);
    
    	Huffman_Coding(root->left,lcode);
    	Huffman_Coding(root->right,rcode);
    }
    
    int main()
    {
    	int freq[Nsymbols] = {0};
    	char *str = "this is the string need to be compressed";
    
    	//statistic character frequency
    	while (*str!='')
    		freq[*str++]++;
    
    	//build tree
    	HTree* r = BuildTree(freq);
    	Huff_code nullcode;
    	nullcode.clear();
    	Huffman_Coding(r,nullcode);
    
    	for(map<char,Huff_code>::iterator it = Huff_Dic.begin(); it != Huff_Dic.end(); it++)
    	{
    		cout<<(*it).first<<'	';
    		Huff_code vec_code = (*it).second;
    		for (vector<bool>::iterator vit = vec_code.begin(); vit!=vec_code.end();vit++)
    		{
    			cout<<(*vit)<<endl;
    		}
    	}
    }


    上面这段代码,我执行出来不正确。在调试的时候发现了一个问题,就是QTree优先队列中的排序出了问题,说来也是,上面的代码中,我重载小于号是对HTree object做的;而实际上我建树时用的是指针,那么优先级队列中元素为指针时该怎么办呢?

    那我们将friend bool operator >(Node node1,Node node2)改动为friend bool operator >(Node* node1,Node* node2),也就是传递的是Node的指针行不行呢?


    答案是不能够,由于依据c++primer中重载操作符中讲的“程序猿仅仅能为类类型或枚举类型的操作数定义重载操作符,在把操作符声明为类的成员时,至少有一个类或枚举类型的參数依照值或者引用的方式传递”,也就是说friend bool operator >(Node* node1,Node* node2)形參中都是指针类型的是不能够的。我们仅仅能再建一个类,用当中的重载()操作符作为优先队列的比較函数。




    就得到了以下正确的代码:


    /************************************************************************/
    /*	File Name: Huffman.cpp
    *		@Function: Lossless Compression
    		@Author: Sophia Zhang
    		@Create Time: 2012-9-26 10:40
    		@Last Modify: 2012-9-26 12:10
    */
    /************************************************************************/
    
    #include"iostream"
    #include "queue"
    #include "map"
    #include "string"
    #include "iterator"
    #include "vector"
    #include "algorithm"
    using namespace std;
    
    #define NChar 8	//suppose use 8 bits to describe all symbols
    #define Nsymbols 1<<NChar	//can describe 256 symbols totally (include a-z, A-Z)
    typedef vector<bool> Huff_code;//8 bit code of one char
    map<char,Huff_code> Huff_Dic;	//huffman coding dictionary
    
    /************************************************************************/
    /* Tree Class elements:
    *2 child trees
    *character and frequency of current node
    */
    /************************************************************************/
    class HTree
    {
    public :
    	HTree* left;
    	HTree* right;
    	char ch;
    	int weight;
    
    	HTree(){left = right = NULL; weight=0;ch ='';}
    	HTree(HTree* l,HTree* r,int w,char c){left = l;	right = r;	weight=w;	ch=c;}
    	~HTree(){delete left; delete right;}
    	bool Isleaf(){return !left && !right; }
    };
    
    /************************************************************************/
    /* prepare for pointer sorting*/
    /*because we cannot use overloading in class HTree directly*/
    /************************************************************************/
    class Compare_tree
    {
    public:
    	bool operator () (HTree* t1, HTree* t2)
    	{
    		return t1->weight> t2->weight;
    	}
    };
    
    /************************************************************************/
    /* use priority queue to build huffman tree*/
    /************************************************************************/
    HTree* BuildTree(int *frequency)
    {
    	priority_queue<HTree*,vector<HTree*>,Compare_tree> QTree;
    
    	//1st level add characters
    	for (int i=0;i<Nsymbols;i++)
    	{
    		if(frequency[i])
    			QTree.push(new HTree(NULL,NULL,frequency[i],(char)i));			
    	}
    
    	//build
    	while (QTree.size()>1)
    	{
    		HTree* lc  = QTree.top();
    		QTree.pop();
    		HTree* rc = QTree.top();
    		QTree.pop();
    
    		HTree* parent = new HTree(lc,rc,lc->weight+rc->weight,(char)256);
    		QTree.push(parent);
    	}
    	//return tree root
    	return QTree.top();
    }
    
    /************************************************************************/
    /* Give Huffman Coding to the Huffman Tree*/
    /************************************************************************/
    void Huffman_Coding(HTree* root, Huff_code& curcode)
    {
    	if(root->Isleaf())
    	{
    		Huff_Dic[root->ch] = curcode;
    		return;
    	}
    	Huff_code lcode = curcode;
    	Huff_code rcode = curcode;
    	lcode.push_back(false);
    	rcode.push_back(true);
    
    	Huffman_Coding(root->left,lcode);
    	Huffman_Coding(root->right,rcode);
    }
    
    int main()
    {
    	int freq[Nsymbols] = {0};
    	char *str = "this is the string need to be compressed";
    
    	//statistic character frequency
    	while (*str!='')
    		freq[*str++]++;
    
    	//build tree
    	HTree* r = BuildTree(freq);
    	Huff_code nullcode;
    	nullcode.clear();
    	Huffman_Coding(r,nullcode);
    
    	for(map<char,Huff_code>::iterator it = Huff_Dic.begin(); it != Huff_Dic.end(); it++)
    	{
    		cout<<(*it).first<<'	';
    		std::copy(it->second.begin(),it->second.end(),std::ostream_iterator<bool>(cout));
    		cout<<endl;
    	}
    }








    Reference:






    关于Compression很多其它的学习资料将继续更新,敬请关注本博客和新浪微博Sophia_qing







  • 相关阅读:
    MFC Windows 程序设计>WinMain 简单Windows程序 命令行编译
    AT3949 [AGC022D] Shopping 题解
    CF643D Bearish Fanpages 题解
    CF643C Levels and Regions 题解
    CF241E Flights 题解
    CF671C Ultimate Weirdness of an Array 题解
    CF1592F Alice and Recoloring 题解
    GYM 102452E 题解
    CF494C Helping People 题解
    P5556 圣剑护符
  • 原文地址:https://www.cnblogs.com/yxwkf/p/4005039.html
Copyright © 2011-2022 走看看