zoukankan      html  css  js  c++  java
  • huffman编码——原理与实现


    哈夫曼算法原理


    Wikipedia上面说的非常清楚了,这里我就不再赘述,直接贴过来了。


    1952年, David A. Huffman提出了一个不同的算法,这个算法能够为不论什么的可能性提供出一个理想的树。香农-范诺编码(Shanno-Fano)是从树的根节点到叶子节点所进行的的编码,哈夫曼编码算法却是从相反的方向,暨从叶子节点到根节点的方向编码的。

    1. 为每一个符号建立一个叶子节点,并加上其对应的发生频率
    2. 当有一个以上的节点存在时,进行下列循环:
      1. 把这些节点作为带权值的二叉树的根节点,左右子树为空
      2. 选择两棵根结点权值最小的树作为左右子树构造一棵新的二叉树,且至新的二叉树的根结点的权值为其左右子树上根结点的权值之和。
      3. 把权值最小的两个根节点移除
      4. 将新的二叉树添�队列中.
    3. 最后剩下的节点暨为根节点,此时二叉树已经完毕。

    演示样例


    Huffman Algorithm


    符号 A B C D E
    计数 15 7 6 6 5
    概率 0.38461538 0.17948718 0.15384615 0.15384615 0.12820513

    在这样的情况下,D,E的最低频率和分配分别为0和1,分组结合概率的0.28205128。如今最低的一双是B和C,所以他们就分配0和1组合结合概率的0.33333333在一起。这使得BC和DE所以0和1的前面加上他们的代码和它们结合的概率最低。然后离开仅仅是一个和BCDE,当中有前缀分别为0和1,然后结合。这使我们与一个单一的节点,我们的算法是完整的。


    可得A代码的代码长度是1比特,其余字符是3比特。

    字符 A B C D E
    代码 0 100 101 110 111

        Entropy:



    Pseudo-code

    
     1:  begin
     2:     count frequencies of single characters (source units)
     3:     output(frequencies using Fibonacci Codes of degree 2)
     4:     sort them to non-decreasing sequence
     5:     create a leaf node (character, frequency c, left son = NULL, right son = NULL) 
     6:  	   of the tree for each character and put nodes into queue F
     7:     while (|F|>=2) do 
     8:      begin
     9:        pop the first two nodes (u1, u2) with the lowest 
    10:  	      frequencies from sorted queue
    11:        create a node evaluated with sum of the chosen units, 
    12:  	      successors are chosen units (eps, c(u1)+c(u2), u1, u2)
    13:        insert new node into queue
    14:      end
    15:     node evaluate with way from root to leaf node (left son 1, right son 0)
    16:     create output from coded intput characters
    17:  end




    哈夫曼算法实现



    实现的时候我们用vector<bool>记录每一个char的编码;用map<char,vector<bool>>表示整个字典;
    就得到了以下的代码(以下有两个,一对一错):

    先放出来这个错的,以示警戒

    /************************************************************************/
    /*	File Name: Huffman.cpp
    *		@Function: Lossless Compression
    		@Author: Sophia Zhang
    		@Create Time: 2012-9-26 10:40
    		@Last Modify: 2012-9-26 11:10
    */
    /************************************************************************/
    
    #include"iostream"
    #include "queue"
    #include "map"
    #include "string"
    #include "iterator"
    #include "vector"
    #include "algorithm"
    using namespace std;
    
    #define NChar 8	//suppose use at most 8 bits to describe all symbols
    #define Nsymbols 1<<NChar	//can describe 256 symbols totally (include a-z, A-Z)
    typedef vector<bool> Huff_code;//8 bit code of one char
    map<char,Huff_code> Huff_Dic;	//huffman coding dictionary
    
    class HTree
    {
    public :
    	HTree* left;
    	HTree* right;
    	char ch;
    	int weight;
    
    	HTree(){left = right = NULL; weight=0;}
    	HTree(HTree* l,HTree* r,int w,char c){left = l;	right = r;	weight=w;	ch=c;}
    	~HTree(){delete left; delete right;}
    	int Getweight(){return weight?weight:left->weight+right->weight;}
    	bool Isleaf(){return !left && !right; }
    	bool operator < (const HTree tr) const
    	{
    		return tr.weight < weight;
    	}
    };
    
    HTree* BuildTree(int *frequency)
    {
    	priority_queue<HTree*> QTree;
    
    	//1st level add characters
    	for (int i=0;i<Nsymbols;i++)
    	{
    		if(frequency[i])
    			QTree.push(new HTree(NULL,NULL,frequency[i],(char)i));			
    	}
    
    	//build
    	while (QTree.size()>1)
    	{
    		HTree* lc  = QTree.top();
    		QTree.pop();
    		HTree* rc = QTree.top();
    		QTree.pop();
    
    		HTree* parent = new HTree(lc,rc,parent->Getweight(),(char)256);
    		QTree.push(parent);
    	}
    	//return tree root
    	return QTree.top();
    }
    
    void Huffman_Coding(HTree* root, Huff_code& curcode)
    {
    	if(root->Isleaf())
    	{
    		Huff_Dic[root->ch] = curcode;
    		return;
    	}
    	Huff_code& lcode = curcode;
    	Huff_code& rcode = curcode;
    	lcode.push_back(false);
    	rcode.push_back(true);
    
    	Huffman_Coding(root->left,lcode);
    	Huffman_Coding(root->right,rcode);
    }
    
    int main()
    {
    	int freq[Nsymbols] = {0};
    	char *str = "this is the string need to be compressed";
    
    	//statistic character frequency
    	while (*str!='')
    		freq[*str++]++;
    
    	//build tree
    	HTree* r = BuildTree(freq);
    	Huff_code nullcode;
    	nullcode.clear();
    	Huffman_Coding(r,nullcode);
    
    	for(map<char,Huff_code>::iterator it = Huff_Dic.begin(); it != Huff_Dic.end(); it++)
    	{
    		cout<<(*it).first<<'	';
    		Huff_code vec_code = (*it).second;
    		for (vector<bool>::iterator vit = vec_code.begin(); vit!=vec_code.end();vit++)
    		{
    			cout<<(*vit)<<endl;
    		}
    	}
    }


    上面这段代码,我执行出来不正确。在调试的时候发现了一个问题,就是QTree优先队列中的排序出了问题,说来也是,上面的代码中,我重载小于号是对HTree object做的;而实际上我建树时用的是指针,那么优先级队列中元素为指针时该怎么办呢?

    那我们将friend bool operator >(Node node1,Node node2)改动为friend bool operator >(Node* node1,Node* node2),也就是传递的是Node的指针行不行呢?


    答案是不能够,由于依据c++primer中重载操作符中讲的“程序猿仅仅能为类类型或枚举类型的操作数定义重载操作符,在把操作符声明为类的成员时,至少有一个类或枚举类型的參数依照值或者引用的方式传递”,也就是说friend bool operator >(Node* node1,Node* node2)形參中都是指针类型的是不能够的。我们仅仅能再建一个类,用当中的重载()操作符作为优先队列的比較函数。




    就得到了以下正确的代码:


    /************************************************************************/
    /*	File Name: Huffman.cpp
    *		@Function: Lossless Compression
    		@Author: Sophia Zhang
    		@Create Time: 2012-9-26 10:40
    		@Last Modify: 2012-9-26 12:10
    */
    /************************************************************************/
    
    #include"iostream"
    #include "queue"
    #include "map"
    #include "string"
    #include "iterator"
    #include "vector"
    #include "algorithm"
    using namespace std;
    
    #define NChar 8	//suppose use 8 bits to describe all symbols
    #define Nsymbols 1<<NChar	//can describe 256 symbols totally (include a-z, A-Z)
    typedef vector<bool> Huff_code;//8 bit code of one char
    map<char,Huff_code> Huff_Dic;	//huffman coding dictionary
    
    /************************************************************************/
    /* Tree Class elements:
    *2 child trees
    *character and frequency of current node
    */
    /************************************************************************/
    class HTree
    {
    public :
    	HTree* left;
    	HTree* right;
    	char ch;
    	int weight;
    
    	HTree(){left = right = NULL; weight=0;ch ='';}
    	HTree(HTree* l,HTree* r,int w,char c){left = l;	right = r;	weight=w;	ch=c;}
    	~HTree(){delete left; delete right;}
    	bool Isleaf(){return !left && !right; }
    };
    
    /************************************************************************/
    /* prepare for pointer sorting*/
    /*because we cannot use overloading in class HTree directly*/
    /************************************************************************/
    class Compare_tree
    {
    public:
    	bool operator () (HTree* t1, HTree* t2)
    	{
    		return t1->weight> t2->weight;
    	}
    };
    
    /************************************************************************/
    /* use priority queue to build huffman tree*/
    /************************************************************************/
    HTree* BuildTree(int *frequency)
    {
    	priority_queue<HTree*,vector<HTree*>,Compare_tree> QTree;
    
    	//1st level add characters
    	for (int i=0;i<Nsymbols;i++)
    	{
    		if(frequency[i])
    			QTree.push(new HTree(NULL,NULL,frequency[i],(char)i));			
    	}
    
    	//build
    	while (QTree.size()>1)
    	{
    		HTree* lc  = QTree.top();
    		QTree.pop();
    		HTree* rc = QTree.top();
    		QTree.pop();
    
    		HTree* parent = new HTree(lc,rc,lc->weight+rc->weight,(char)256);
    		QTree.push(parent);
    	}
    	//return tree root
    	return QTree.top();
    }
    
    /************************************************************************/
    /* Give Huffman Coding to the Huffman Tree*/
    /************************************************************************/
    void Huffman_Coding(HTree* root, Huff_code& curcode)
    {
    	if(root->Isleaf())
    	{
    		Huff_Dic[root->ch] = curcode;
    		return;
    	}
    	Huff_code lcode = curcode;
    	Huff_code rcode = curcode;
    	lcode.push_back(false);
    	rcode.push_back(true);
    
    	Huffman_Coding(root->left,lcode);
    	Huffman_Coding(root->right,rcode);
    }
    
    int main()
    {
    	int freq[Nsymbols] = {0};
    	char *str = "this is the string need to be compressed";
    
    	//statistic character frequency
    	while (*str!='')
    		freq[*str++]++;
    
    	//build tree
    	HTree* r = BuildTree(freq);
    	Huff_code nullcode;
    	nullcode.clear();
    	Huffman_Coding(r,nullcode);
    
    	for(map<char,Huff_code>::iterator it = Huff_Dic.begin(); it != Huff_Dic.end(); it++)
    	{
    		cout<<(*it).first<<'	';
    		std::copy(it->second.begin(),it->second.end(),std::ostream_iterator<bool>(cout));
    		cout<<endl;
    	}
    }








    Reference:






    关于Compression很多其它的学习资料将继续更新,敬请关注本博客和新浪微博Sophia_qing







  • 相关阅读:
    基于Visual C++2013拆解世界五百强面试题--题9-找出所有的排列方式
    基于Visual C++2013拆解世界五百强面试题--题8-数组的排序和查找
    基于Visual C++2013拆解世界五百强面试题--题7-链表的各种操作
    宣布在 Azure 镜像库中正式推出 Windows Server 2012 R2 并降低 Windows Azure 的实例定价
    基于Visual C++2013拆解世界五百强面试题--题6-double类型逆序
    基于Visual C++2013拆解世界五百强面试题--题5-自己实现strstr
    Paging
    Swapping
    Partitioning
    Stationary point
  • 原文地址:https://www.cnblogs.com/hrhguanli/p/3931213.html
Copyright © 2011-2022 走看看