Multimedia Arithmatic Coding and Bigram

Submitted by: Submitted by

Views: 372

Words: 586

Pages: 3

Category: Science and Technology

Date Submitted: 03/27/2012 08:35 PM

Report This Essay

1. Arithmetic Coding

a: [0, 0.25)

i : [0.25, 0.4)

n: [0.4, 0.5)

s : [0.5, 0.95)

EOS:[0.95, 1)

(0.0011 0000 1111 0110 0011)10 = 0.191256523

Decoding steps:

1) 1st symbol: a

0≤0.191256523<0.25,

→0≤0.765026093<1

2) 2nd symbol: s

0.5≤0.765026093<0.95,

→0≤0.588946872<1

3) 3rd symbol: s

0.5≤0.588946872<0.95,

→0≤0.197659716<1

4) 4th symbol: a

0≤0.197659716<0.25,

→0≤0.790638865<1

5) 5th symbol: s

0.5≤0.790638865<0.95,

→0≤0.645864144<1

6) 6th symbol: s

0.5≤0.645864144<0.95,

→0≤0.324142542<1

7) 7th symbol: i

0.25≤0.324142542<0.4,

→0≤0.494283614<1

8) 8th symbol: n

0.4≤0.494283614<0.5,

→0≤0.94283614<1

9) 9th symbol: s

0.5≤0.94283614<0.95,

→0≤0.984080312<1

10) 10th symbol: EOS

So the decoding result is: assassins

2. Bigrams

a)

The 10 most frequent characters are:

# | Character | No. of Occur |

1 | _(SPACE) | 27967 |

2 | E | 15084 |

3 | T | 11631 |

4 | O | 9245 |

5 | A | 9083 |

6 | N | 7871 |

7 | I | 7803 |

8 | h | 7581 |

9 | s | 6980 |

10 | r | 6400 |

b)

In the Alice.txt file, the statistic result shows that there are 84 different kinds of character, and the totally number of characters is 163809. So the entropy is:

Hs=-r=1nprlog2pr=-r=185ar no.of occur163809log2ar no.of occur163809=4.59022

c)

The 10 most frequent bigrams are:

Note: “_” stands for space.

# | bigram | No. of Occur |

1 | e_ | 4826 |

2 | _t | 4095 |

3 | he | 3918 |

4 | th | 3509 |

5 | _a | 2886 |

6 | d_ | 2804 |

7 | t_ | 2803 |

8 | _s | 2289 |

9 | in | 2227 |

10 | er | 2040 |

d)

Here, the number of symbols is 81905.

Hs=8.15679

e)

I think d is better, although b has smaller entropy than d, but after compression, d’s number of symbols is much smaller than b. To be specific:

H(b)×# symbols(b) = 4.59022×163809 = 751,919.3 > H(d)×# symbols(d) = 8.15679×81905 = 668,081.9.

So d is...