# wlp4cc

As this project contains course assignment material, due to the University of Waterloo Policy 71 academic freedom rules, the source code will be available **by request**. Please send me an email with your GitHub account ID.

**Source code:** [**Repository URL**](https://github.com/TheOneKevin/wlp4cc)

## Overview

The Waterloo Programming Language v4 ([WLP4](https://student.cs.uwaterloo.ca/~cs241/wlp4/WLP4.html)) is developed for the CS241 course. The language is a subset of C (and so can be compiled with minimum modifications with an off-the-shelf C compiler).

{% hint style="success" %}
IR and code optimization was **not taught during the course** and all material below was created on my own volition. This is separate from assignment material.
{% endhint %}

The program input will undergo a series of steps:

1. Tokenized
2. Parsed into a concrete syntax tree (CST)
3. CST is streamed into our wlp4cc program
   1. The CST is walked and an IR graph is emitted
   2. The IR is represented internally by a graph but can be printed linearly for debugging purposes.
4. Optimizations are performed (see below)
   1. Initialize control flow graph (CFG)
   2. Convert to SSA form
   3. Perform constant folding + copy propagation
   4. Perform dead code elimination (DCE)
   5. Remote PHI nodes
5. IR is lowered into machine code (MC) and placed through the backend

The following snippet of the compiler is shown below. This is not critical in solving the assignment (so **hopefully** not a Policy 71 violation) but details the extra optimization steps implemented:

```cpp
// 2. Generate codegen procs and begin optimizations
Codegen::Context cg_ctx(ctx);
for (const auto &p : cg_ctx.procs) {
    // 2.1. Initialize CFG
    p.second->compute_predecessors();
    log_ir("1-preopt.ir", &ctx);
    log_ir("1-preopt.dot", &ctx, true);

    // 2.2. Convert to SSA form
    p.second->compute_dominators();
    p.second->compute_post_dominators();
    p.second->insert_phi();
    p.second->rename_vars();
    log_ir("2-" + p.first + "-ssa.ir", p.second);
    log_ir("2-" + p.first + "-ssa.dot", p.second, true);

    // 2.3. Perform constant folding and copy propagation
    p.second->recompute_all();
    p.second->propagate_const_mov();
    log_ir("3-" + p.first + "-prop.ir", p.second, p.second->liveness_annotator);
    log_ir("3-" + p.first + "-prop.dot", p.second, true);

    // 2.4. Perform DCE
    p.second->dce();
    log_ir("4-" + p.first + "-dce.ir", p.second, p.second->liveness_annotator);
    log_ir("4-" + p.first + "-dce.dot", p.second, true);

    // 2.5. Lower PHI and analyze live intervals
    p.second->lower_phi();
    p.second->recompute_all();
    p.second->analyze_liveness();
    log_ir("5-" + p.first + "-nophi.ir", p.second, p.second->pc_annotator);
    log_ir("5-" + p.first + "-nophi.dot", p.second, true);

    // 2.6. Allocate registers
    p.second->allocate_registers();
    // 2.7. Emit
#ifdef DEBUG
    {
        std::ofstream f;
        f.open(logging_dir + "/out." + p.first + ".s");
        p.second->emit_machinecode(f);
        f.close();
    }
#else
    p.second->emit_machinecode(std::cout);
#endif
}
```

## Example Program

Consider the following input program:

```c
int wain(int a, int b) {
    int c = 0;
    while(a > b) {
        c = c + 1;
    }
    c = 2 + 3;
    return a + b + c;
}
```

We will present the IR at each optimization step, in graphical form.

<details>

<summary>1-preopt</summary>

This is the pre-optimization IR, the IR emitted from the CST. You can see **.L0** forms line 2, **.L1** and **.L2** form the loop and body, while **.L3** forms lines 6 to 7.

<img src="/files/2VSdRNhNlE3hWHgd0oeL" alt="" data-size="original">

</details>

<details>

<summary>2-wain-ssa</summary>

PHI nodes are inserted while converting IR into SSA form.

<img src="/files/EZ90gjz67ZVFfPlwJyPO" alt="" data-size="original">

</details>

<details>

<summary>3-wain-prop</summary>

Constants and copy propagation happens. Most notably, `c=2+3; return a+b+c;` is simplified into `return a+b+5;` in block **.L3**

<img src="/files/UV2Jx3wGAXB9QIKu22to" alt="" data-size="original">

</details>

<details>

<summary>4-wain-dce</summary>

Dead code elimination. Since values assigned to `c` is not used outside the loop, the entire loop is optimized away and blocks **.L1** and **.L2** are replaced with NOPs.

<img src="/files/v19U3bhoCufsy8FxU5sR" alt="" data-size="original">

</details>

<details>

<summary>5-wain-nophi</summary>

PHI nodes are eliminated, returning IR into non SSA form allowing for MC lowering.

<img src="/files/CKuKbmV7RE5wHAYwowjM" alt="" data-size="original">

</details>

The final IR emitted looks like:

```
wain(int, int) { 
.L0:
    NOP                            // 0
.L1:
    NOP                            // 1
    BRANCH LE -1 -2 .L3            // 2
.L2:
    NOP                            // 3
    NOP                            // 4
    NOP                            // 5
    BRANCH NONE 0 0 .L1            // 6
.L3:
    NOP                            // 7
    NOP                            // 8
    5 = imm 5                      // 9
    NOP                            // 10
    6 = ADD -1 -2                  // 11
    7 = ADD 6 5                    // 12
    RET 7                          // 13
}
```

Some notes about the IR:

1. Syntax is similar to LLVM IR
2. Each operation is `OP x y` where `x` and `y` are registers
3. Negative register numbers indicate parameter values
4. Labels are prefixed with `.`


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://theonekevin.gitbook.io/notes/master/wlp4cc.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
