Submitted by: Submitted by anegogo
Views: 372
Words: 586
Pages: 3
Category: Science and Technology
Date Submitted: 03/27/2012 08:35 PM
1. Arithmetic Coding
a: [0, 0.25)
i : [0.25, 0.4)
n: [0.4, 0.5)
s : [0.5, 0.95)
EOS:[0.95, 1)
(0.0011 0000 1111 0110 0011)10 = 0.191256523
Decoding steps:
1) 1st symbol: a
0≤0.191256523<0.25,
→0≤0.765026093<1
2) 2nd symbol: s
0.5≤0.765026093<0.95,
→0≤0.588946872<1
3) 3rd symbol: s
0.5≤0.588946872<0.95,
→0≤0.197659716<1
4) 4th symbol: a
0≤0.197659716<0.25,
→0≤0.790638865<1
5) 5th symbol: s
0.5≤0.790638865<0.95,
→0≤0.645864144<1
6) 6th symbol: s
0.5≤0.645864144<0.95,
→0≤0.324142542<1
7) 7th symbol: i
0.25≤0.324142542<0.4,
→0≤0.494283614<1
8) 8th symbol: n
0.4≤0.494283614<0.5,
→0≤0.94283614<1
9) 9th symbol: s
0.5≤0.94283614<0.95,
→0≤0.984080312<1
10) 10th symbol: EOS
So the decoding result is: assassins
2. Bigrams
a)
The 10 most frequent characters are:
# | Character | No. of Occur |
1 | _(SPACE) | 27967 |
2 | E | 15084 |
3 | T | 11631 |
4 | O | 9245 |
5 | A | 9083 |
6 | N | 7871 |
7 | I | 7803 |
8 | h | 7581 |
9 | s | 6980 |
10 | r | 6400 |
b)
In the Alice.txt file, the statistic result shows that there are 84 different kinds of character, and the totally number of characters is 163809. So the entropy is:
Hs=-r=1nprlog2pr=-r=185ar no.of occur163809log2ar no.of occur163809=4.59022
c)
The 10 most frequent bigrams are:
Note: “_” stands for space.
# | bigram | No. of Occur |
1 | e_ | 4826 |
2 | _t | 4095 |
3 | he | 3918 |
4 | th | 3509 |
5 | _a | 2886 |
6 | d_ | 2804 |
7 | t_ | 2803 |
8 | _s | 2289 |
9 | in | 2227 |
10 | er | 2040 |
d)
Here, the number of symbols is 81905.
Hs=8.15679
e)
I think d is better, although b has smaller entropy than d, but after compression, d’s number of symbols is much smaller than b. To be specific:
H(b)×# symbols(b) = 4.59022×163809 = 751,919.3 > H(d)×# symbols(d) = 8.15679×81905 = 668,081.9.
So d is...