skip to Main Content

VLIW: Assume the MIPS VLIW architecture in the slides that uses an instruction p

VLIW: Assume the MIPS VLIW architecture in the slides that uses an instruction packet of 2 instructions. One instruction is an ALU op or a branch. The other instruction is a load or a store. Assume a single cycle VLIW processor (all instructions will take 1 clock cycle, and we will have all cache hits, so LW and SW take 1 clock cycle). Assume no forwarding (the result of an instruction is not available until the next clock cycle). b = 4; for( i = 1; i < 21;="" i++=""> { a[i] = a[i] + a[i-1] * b; } $s0 is set to address of a[0] Assume $s0 + 4 is address of a[1] initially addi $t0, $0, 4 set b addi $t1, $0, 1 set i addi $t5, $0, 20 set stopping condition for loop LOOP: lw $t2, $s0 (-4) get a[i-1] lw $t3, $s0 (0) get a[i] mul $t4, $t2, $t0 do the mult add $t4, $t4, $t3 do the add sw $t4, $s0, (0) store the result addi $s0, $s0, 4 increment a[] address addi $t1, $t1, 1 increment i BNE $t1, $t5, LOOP 1) Assemble the instructions into the fewest number of packets possible. How many instruction packets are required for the entire program? 2) Unroll the loop once (2 loop bodies in a single iteration). Assemble the instructions into as few packets as possible. How many instruction packets are required for the entire program? (remember you can re-order instructions, as long as the output is the same 3 Use a 1 bit branch predictor initialized to predict branch not taken. If the branch follows the following patterns, how many mispredictions are there?: (T = taken, N = not taken) series 1: T T N N T T T N N series 2: T N T N T N T T T N N N 4) Repeat question 3 using a two bit predictor initialized to 'predict strongly taken'. Use the state diagram in the slides (When it changes prediction it changes into the 'strongly' prediction state). series 1: T T N N T T T N N series 2: T N T N T N T T T N N N 5) Assume the following mix of instructions: thread 1: 1 – takes 2 clock cycles 2 – no restictions 3 – uses same functional unit as 2 4 – takes 2 clock cycles 5 – depends on result generated by 4 6 – uses same functional unit as 4 7 – no restictions 8 – no restrictions thread 2: A – no restrictions B – uses same functional unit as A C – takes 2 clock cycles D – no restictions E – no restrictions Show a schedule (like the last slide in 3b) in which a minimal number of clock cycles is needed to execute the 2 threads on. Assume 2 instructions are fetched in a clock cycle and a number of instructions equal to the number of functional units in the processor can be issued in a clock cycle: 1) a course grained multi-threaded processor with 1 core, and 2 functional units. 2) a fine grained multi-threaded processor with 1 core containing 2 functional units that issues instructions from threads in a round robin fashion (cycle 1 issues from thread 1, cycle 2 issues from thread 2, cycle 3 issues from thread 1, etc). Note that instructions from multiple threads may be in the functional units in a given clock cycle, but only instructions from a single thread are ISSUED each clock cycle. 3) A symmetric multithreaded processor (hyperthreaded) with 1 core and 4 functional units. Instructions from multiple threads may be issued in a single clock cycle.

GET HELP WITH THIS PAPER TODAY

Do you need help working on this assignment? We will write a custom essay on this or any other topic specifically for you.

Back To Top