RISC-V Merged CmpBr

A microarchitectural case against the RISC-V merged compare-branch instruction

...

Before we begin, we take note of several design constraints:

  1. We are targeting the SKY130 process. This means the SRAM macros generated by OpenRAM are uncharacterized (as of this writing). As the address flops are integrated within the SRAM macros, we must also latch (using flops) the data signals.

  2. Processor design is a scalar, single issue, in-order pipeline with no branch prediction. This was for scope simplicity and also area constraints. There are no plans to change this.

  3. Most of the code will be executed from cacheable external memory. The cache pipeline consists of 2 stages due to (1).

1. Cache pipeline general idea

The 2 cache pipeline stages are CA and CD and are as follows:

  1. Address/data, generated combinationally, gets latched by the 1st buffer. The read data is then generated "combinationally" by the SRAM during the CA stage.

  2. SRAM data gets latched by the 2nd buffer. The data passes through cache circuitry to determine hit/miss during the CD stage. A cache miss will activate an FSM off the data path.

  3. The final data is latched by the 3rd buffer.

Each of these "buffers" are really 2-deep FIFOs, with ready/valid signals and an upstream stall signal (that is combined with the downstream ready/valid signals).

2. Instruction fetch unit

The core execution pipeline has 3 stages, S1, S2 and S3 to execute arithmetic instructions. Prepending the (traditionally single stage) cache access stages CA1 and CD1,

...

Last updated