Code
%%{init: {"flowchart": {"htmlLabels": false}} }%% %%| echo: false graph LR; A[Front end]--IR--> B[Middle end]; B--IR--> C[Back end]; A--IR--> C;
A compiler writer builds bridges between people and machines, and every day this task becomes more challenging.
A compiler
Most homework
Since most programming languages have a json library, you can do homework in any language you like.
we are going to critically read current research papers
Everybody gets to do a project, which is compiler related
Write a paragraph to introduce yourself in a reply to the canvas introductions topic. Add a picture of you can. Mention a compilers topic you’d like to learn about someday, either in this class or beyond. Add your info to the canvas introductions discussion topic.
Pick a paper from the weekly schedule whose discussion you will lead. Claim the paper by opening a pull request (at the class github) for the weekly.qmd file, fill in your name in the LEADER: line. (I encurage teams of two to sign up for the same paper)
Once everyone has signed up, and I see which papers are covered, I’ll finalize the dates and times.
Add a text file containing done to Canvas assignment 0 to indicate you have done the introduction and claimed a paper
For this assignment you just need to submit a response to the canvas assignment to indicate that you are done after you write your introduction into canvas
For other assignments you should:
My plan is that grades, personal details and the like stay in canvas and everything else becomes public and goes on the github website.
Originally, a compiler was a person doing calculations.
In 1952, Grace Hopper an operational link-loader, which she called a compiler. She later said, “Nobody believed that,” and that she “had a running compiler and nobody would touch it. They told me computers could only do arithmetic.”
Grace Hopper
In 1957, John Backus created the first commercial compiler, FORTRAN (14 people worked on it for about 4 years).
2/3 of the cost and 90% of the time for solving a problem was coding.
FORTRAN was provided for the IBM 1401 computer by an innovative 63-phase compiler that ran entirely in its core memory of only 8000 (six-bit) characters.
In these early years, the vendor development model was:
Vendors sometimes built compilers but often used small startup compiler companies.
Compilers stabilized on a classic structure (using an ir intermediate language). IR is machine independent.
This course focuses on stage 2 (The middle end)
A goal of this course is to explain how to transform a program automatically, while preserving its semantics, in such a way that the new program is more efficient according to a well-defined metric.
There are many ways to compare the performance of programs:
In 1987, GCC was released. It formalized the IR, and was more or less open source. Within the stages, compiler writers could use any data structures but at the edges they had to use the single IR. Adding an optimization or reordering optimizations is quite hard.
Vendors could use one front end, one middle end and only need to write a new back end.
This ended almost all the compiler startups. Free front end, middle end.
In gcc the IR is somewhat C based, for instance there are pointers but there is no simple way to talk about garbage collection without hacks.
in about 2006 LLVM (originally low level virtual machine) appeared. This changed the model to look like a library.
The core of LLVM is the intermediate representation (IR), a low-level programming language similar to assembly. IR is a strongly typed reduced instruction set computer (RISC) instruction set which abstracts away most details of the target.
%%{init: {"flowchart": {"htmlLabels": false}} }%% %%| echo: false graph LR; A[Front end]--IR--> B0[OP0] --IR--> B1[OP1] --IR--> B2[OPT2]--IR --> BN[OPTn]--IR -->C{Back end}; A--IR--> C;
Optimizations form passes. A user could mix and match – run some optimizations but not others to compile a specific program. It became easy for people to add a pass. Lots of academic research, lots of experiments.
In this course we are going to an IR call BRIL, which is a very simplified version of LLVM IR, and we are going to string passes together by using UNIX pipes.
%%{init: {"flowchart": {"htmlLabels": false}} }%% %%| echo: false graph LR; A[TEXT_Version of BRIL]--> B0[BRIL in JSON] --> B1["new pass"] --> B2[BRIL interpreter];
Compilers are massive and expensive to build.
Compiler | Year Started | Developers | Lines Of Code | Est Cost $ |
---|---|---|---|---|
GCC 9.2.0 | 1988 | 617 | 5,591,759 | 425,747,279 |
LLVM 8.0.1 | 2001 | 1,210 | 6,877,489 | 529,894,190 |
OpenJDK 14+10 | 2007 | 883 | 7,955,827 | 616,517,789 |
v8 7.8.112 | 2008 | 736 | 3,043,793 | 225,195,832 |
Rust 1.37.0 | 2010 | 2,737 | 852,877 | 59,109,425 |
Swift | 2010 | 857 | 665,238 | 45,535,689 |
Intel Graphics 1.0.10 | 2018 | 149 | 694,688 | 46,934,626 |
Some observations:
Some changes since early 2000’s:
At the scale of data-centers, every single performance percent matters! Just take a look at Google’s (and other’s) publicly available numbers on expenditures on datacenters. We are talking about billions of dollars. A single percent improvement can mean millions of dollars from more program features or improved utilization.
Compiler Advances Double Computing Power Every 18 Years.
while hardware computing horsepower doubles every 18 months. How would you prove this?
Talk given by KAI’s Arch Robison
Compile-time program optimizations are similar to poetry: more are written than actually published in commercial compilers. Hard economic reality is that many interesting optimizations have too narrow an audience to justify their cost in a general-purpose compiler and custom compilers are too expensive to write.
Remove performance penalty for:
Over time hardware has become more of a challenge for compilers, for example caches are not predictable at compile time. So compilers have to guess
And hardware can ignore features of the compiler can deal with them - for example interlock