Solutions Chapter 2

Submitted by: Submitted by

Views: 147

Words: 2406

Pages: 10

Category: Business and Industry

Date Submitted: 06/04/2014 07:00 AM

Report This Essay

6

Solutions to Case Studies and Exercises

Chapter 2 Solutions

Case Study 1: Optimizing Cache Performance via Advanced Techniques

2.1 a. Each element is 8B. Since a 64B cacheline has 8 elements, and each column access will result in fetching a new line for the non-ideal matrix, we need a minimum of 8x8 (64 elements) for each matrix. Hence, the minimum cache size is 128 × 8B = 1KB. b. The blocked version only has to fetch each input and output element once. The unblocked version will have one cache miss for every 64B/8B = 8 row elements. Each column requires 64Bx256 of storage, or 16KB. Thus, column elements will be replaced in the cache before they can be used again. Hence the unblocked version will have 9 misses (1 row and 8 columns) for every 2 in the blocked version. c. for (i = 0; i < 256; i=i+B) { for (j = 0; j < 256; j=j+B) { for(m=0; m 3.2 × .5 = 1.6ns 4-way – (1 – .0033) × 2 + .0033 × (13) = 2.036 cycles => 2.06 × .83 = 1.69ns 8-way – (1 – .0009) × 3 + .0009 × 13 = 3 cycles => 3 × .79 = 2.37ns Direct mapped cache is the best. 2.9 a. The average memory access time of the current (4-way 64KB) cache is 1.69ns. 64KB direct mapped cache access time = .86ns @ .5 ns cycle time = 2 cycles Way-predicted cache has cycle time and access time similar to direct mapped cache and miss rate similar to 4-way cache. The AMAT of the way-predicted cache has three components: miss, hit with way prediction correct, and hit with way prediction mispredict: 0.0033 × (20) + (0.80 × 2 + (1 – 0.80) × 3) × (1 – 0.0033) = 2.26 cycles = 1.13ns b. The cycle time of the 64KB 4-way cache is 0.83ns, while the 64KB directmapped cache can be accessed in 0.5ns. This provides 0.83/0.5 = 1.66 or 66% faster cache access. c. With 1 cycle way misprediction penalty, AMAT is 1.13ns (as per part a), but with a 15 cycle misprediction penalty, the AMAT becomes: 0.0033 × 20 + (0.80 × 2 + (1 – 0.80) × 15) × (1 – 0.0033) = 4.65 cycles or 2.3ns. d. The serial access is 2.4ns/1.59ns = 1.509 or 51%...