2015-09-211.参考书《数据压缩导论(第四版)》page 100 的第五题和第六题
5、给定如表4-9所示的概率模型,求出序列a1a1a3a2a3a1的实质标签。
表4-9 习题5、习题6的概率模型
字母 | 概率 |
a1 | 0.2 |
a2 | 0.3 |
a3 | 0.5 |
解:根据题意:从概率模型可知:
映射a1<=>1,a2<=>2,a3<=>3
所以cdf:Fx(1)=0.2,
Fx(2)=0.5,
Fx(3)=1.0
Fx(K)=0,K<=0,Fx(K)=1,K>3
我们可以利用公式确定标签所在区间的上下限,将u(0)初始化为1,将L(1)初始化为0,该序列的第一个元素为a1,
上界:U(0)=1, 下界:L(0)=0
L(1)=L(1-1)+(u(1-1)-L(1-1))Fx(1-1)=0+(1-0)0=0
u(1)=L(1-1)+(u(1-1)-L(1-1))Fx(1)=0+(1-0)0.2=0.2
上界:U(1)=0.2, 下界:L(1)=0
L(2)=L(2-1)+(u(2-1)-L(2-1))Fx(1-1)=0
u(2)=L(2-1)+(u(2-1)-L(2-1))Fx(1)=0+(0.2-0)0.2=0.04
上界:U(2)=0.04, 下界:L(2)=0
L(3)=L(3-1)+(u(3-1)-L(3-1))Fx(3-1)=0+(0.04-0)0.5=0.02
u(3)=L(3-1)+(u(3-1)-L(3-1))Fx(3)=0+(0.04-0)1.0=0.04
上界:U(3)=0.04, 下界:L(3)=0.02
L(4)=L(4-1)+(u(4-1)-L(4-1))Fx(2-1)=0.02+(0.04-0.02)0.2=0.024
u(4)=L(4-1)+(u(4-1)-L(4-1))Fx(2)=0.02+(0.04-0.02)0.5=0.03
上界:U(4)=0.03, 下界:L(4)=0.024
L(5)=L(5-1)+(u(5-1)-L(5-1))Fx(3-1)=0.024+(0.03-0.024)0.5=0.027
u(5)=L(5-1)+(u(5-1)-L(5-1))Fx(3)=0.024+(0.03-0.024)1.0=0.03
上界:U(5)=0.03, 下界:L(5)=0.027
L(6)=L(6-1)+(u(6-1)-L(6-1))Fx(1-1)=0.027+(0.03-0.027)0=0.027
u(6)=L(6-1)+(u(6-1)-L(6-1))Fx(1)=0.027+(0.03-0.027)0.2=0.0276
上界:U(6)=0.0276, 下界:L(6)=0.027
所以,生成a1a1a3a2a3a1的实质标签为:
Tx(113231)= ( u(6) + l(6) )/2
=(0.0276+0.027)/2
=0.0273
6、对于表4-9给出的概率模型,对于一个标签为0.63215699的长度为10的序列进行编码。
表4-9
字母 | 概率 |
a1 | 0.2 |
a2 | 0.3 |
a3 | 0.5 |
解:根据题意,对a1,a2,a2进行划分,首先
从概率模型可知:,
映射a1<=>1,a2<=>2,a3<=>3
所以cdf:Fx(1)=0.2,
Fx(2)=0.5,
Fx(3)=1.0
Fx(K)=0,K<=0,Fx(K)=1,K>3
所以从概率模型可以看出
a1的区间段是[0,0.2)
a2的区间段是[0.2,0.5)
a3的区间段是[0.5,1.0)
把a3的区间段划分,有:1.0-0.5=0.5
所以:0.5*0.2=0.1,0.5*0.3=0.15,0.5*0.5=0.25
得新的分段区间为:
0.1+0.5=0.6
0.6+0.15=0.75
0.75+0.25=1.0
得到的新区间按a2来划分,有:0.75-0.6=0.15
所以:0.15*0.2=0.03,0.15*0.3=0.045,0.15*0.5=0.075
得到的分段区间为:
0.03+0.6=0.63
0.63+0.045==0.675
0.675+0.075=0.75
得到的新区间按a2来划分,有:0.675-0.63=0.045
所以:0.045*0.2=0.009,0.045*0.3=0.0135,0.045*0.5=0.0225
得到分段区间为:
0.009+0.63=0.639
0.639+0.0135=0.6525
0.6525+0.0225=0.675
得到的区间按a1来划分,有:0.639-0.63=0.009
0.009*0.2=0.0018,0.009*0.3=0.0027,0.009*0.50=0.0045
0.0018+0.63=0.6318
0.6318+0.0027=0.6345
0.6345+0.0045=0.639
得到的区间按a2来划分,有:0.6345-0.6318=0.0027
0.0027*0.2=0.00054,0.0027*0.3=0.00081,0.0027*0.50=0.00135
0.00054+0.6318=0.63234
0.63234+0.00081=0.63315
0.63315+0.00135=0.6345
得到的区间按a1来划分,有:0.63234-0.6318=0.00054
0.00054*0.2=0.000108,0.00054*0.3=000162,0.00054*0.50=0.00027
0.000108+0.6318=0.631908
0.631908+0.000162=0.63207
0.63207+0.00027=0.63234
得到的区间按a3来划分,有:0.63234-0.63207=0.00027
0.00027*0.2=0.000054,0.00027*0.3=0.000081,0.00027*0.50=0.000135
0.000054+0.63207=0.632124
0.632124+0.000081=0.632205
0.632205+0.000135=0.63234
得到的区间按a2来划分,有:0.632205-0.63124=0.000965
0.000965 *0.2=0.000193,0.000965 *0.3=0.0002895,0.000965 *0.50=0.0004825
0.000193+0.63124=0.631433
0.0002895+0.631433=0.6317225
0.6317225=0.0004825=0.632205
得到的区间按a2来划分,0.6317225-0.631433=0.0002895
0.0002895 *0.2=0.0005790,0.0002895 *0.3=0.0008685,0.0002895*0.50=0.00014460
0.0005790+0.631433=0.632155
0.632017+0.0008685=0.632164
0.6328805+0.00014460=0.6330251
得到的区间按a2来划分,0.632155-0.632124=0.000031
0.000031 *0.2=0.0000062,0.000031*0.3=0.0000063,0.000031*0.50=0.0000155
0.0000062+0.631433=0.6321612
0.6321612+0.0000063=0.6321675
0.631455+0.0000155=0.632183
所以,综上所述,标签为0.63215699的长度为10的序列的编码为
a3a2a2a1a2a1a3a2a2a3