

Rzeczpospolita Polska

Unia Europejska Europejski Fundusz Społeczny



Politechnika Śląska jako Centrum Nowoczesnego Kształcenia opartego o badania i innowacje

POWR.03.05.00-IP.08-00-PZ1/17

Projekt współfinansowany przez Unię Europejską ze środków Europejskiego Funduszu Społecznego

#### **Microprocessor and Embedded Systems**

Faculty of Automatic Control, Electronics and Computer Science, Informatics, Bachelor Degree

#### Lecture 9

### Microprocessor operation acceleration

**B**artłomiej Zieliński, PhD, DSc

#### µp acceleration

Program:

- Acceleration/optimisation?
- Pipelining
- Superscalar processing
- Branch prediction
- Cache memory

- Acceleration/optimisation?
  - Clock frequency up
    - Frequency limit in silicon structure
  - Forms of parallel execution
    - Pipelining
    - Superscalar
    - Full parallel
  - RISC/CISC/FISC

- Typical RISC properties
  - Constant instruction format
    - Easier decoding
  - Memory only for rd/wr operations
  - Many registers (32+)
    - No outlined accumulator
    - Argument passing through registers (not stack)
  - Small number of command
  - Fast command execution (1 clk/cmd)
  - Simple decoding/control unit
  - (Harvard achitecture)

- Pipeline execution
  - Command split into phases
    - E.g., Fetch/Decode/Address/Exec/Write
    - Optimum≈8 phases?
  - Few commands processed concurrently
    - Each in another processing phase
  - Problems
    - Command time in different stages varies  $\rightarrow$  queues
    - Exec interruption  $\rightarrow$  pipe emptied  $\rightarrow$  big delay
    - Command & data mutual dependencies

- Superscalar execution
  - More than 1 execution pipe
    - >>1 not effective due to dependencies
    - Pipes may have different capabilities
  - Command sequence split into pipes
    - Depends on differences between pipes
    - Sychronous/asynchronous pipes
      - Syn: pipe1 waits  $\rightarrow$  pipe2 waits too even if no reason
      - Asyn: pipe1 waits  $\rightarrow$  pipe2 goes on
        - » Cmd2 ends before cmd1
          - Out-of-order execution

- Operand dependencies
  - Read after read
    - E.g., B=B+C || A=C
      - ightarrow Dual Pipe Access
  - Read after Write
    - E.g., A=A+B || C=A; A=A+B || [M]=A → Result forwarding, Operand forwarding
  - Write after Read
    - E.g., B=A || A=A+C
      - ightarrow Register renaming
  - Write after Write
    - E.g., A=[M] || A=A+B

- Branch prediction
  - 3 kinds of code execution disturbances:
    - Interrupts
    - Unconditional jumps
    - Conditional jumps
  - Conditional jumps
    - Which branch should enter the pipeline?
  - Longer pipe  $\rightarrow$  bigger problem
    - Decision in the middle/end of pipe
    - Pipe emptying up to few tens of clk's

- Branch prediction
  - Possible solutions
    - Branch prediction
      - $-P_{success} < 1$
    - Multipath execution
      - Hardware multiplication
      - Implicit execution (no result publication) until it's known which path is correct
      - Good example: Intel Itanium

- Branch prediction
  - Branch target buffer



- Branch prediction
  - Branch prediction methods
    - Static
      - Command bit defined by a compiler
        - » Based on possible code execution analysis
        - » What if a compiler makes a mistake?
      - By jump address (Intel's solution: Pentium III)
        - » Negative (jump back): end of loop  $\rightarrow$  jump
        - » Positive (jump forward): error service  $\rightarrow$  not jump

- Branch prediction
  - Branch prediction methods
    - Dynamic
      - History bits set according to program flow
        - » 1 bit: too little ("checkerboard")



#### µp acceleration

- Optimisation
  - For a given  $\mu p$  type
  - Synchronous pipes:
    - Manual instr. placement for better pairing

- Cache memory
  - Placement in a  $\mu p$  system



- Cache memory
  - What is better?
    - Small capacity, high speed
      - Compact code
    - Large capacity, low speed
      - Distributed code

#### µp acceleration

- Cache memory
  - Cache organisation



Fully associative





- Cache memory
  - Policies
    - Write-through
    - Write-back
  - How to ensure cache & RAM consistency in a multiprocessor system?
  - MESI protocol
    - Modified
    - Exclusive
    - Shared
    - Invalid