At the end of the course, you’ll do a language implementation research project. This is an open-ended and open-source project that can be on any topic that you can construe as being about compiler hacking. The final product is an experience report on the course blog where you rigorously evaluate the success of your implementation.
You can work individually or in groups of 2–3 people.
Proposal
answer these three questions
What will you do?
How will you do it?
How will you empirically measure success?
I will have feedback on how to approach your project.
Implementation
The main phase, of course, is implementing the thing you said you would implement. I recommend you keep a “lab notebook” to log your thoughts, attempts, and frustrations—this will come in handy for the report you’ll write about the project.
Evaluation
A major part of your project is an empirical evaluation. To design your evaluation strategy, you will need to consider at least these things:
Where will you get the input code you’ll use in your evaluation?
How will you check the correctness of your implementation? If you’ve implemented an optimization, for example, “correctness” means that the transformed programs behave the same way as the original programs.
How will you measure the benefit (in performance, energy, complexity, etc.) of your implementation?
How will you present the data you collect from your empirical evaluation?
Other questions may be relevant depending on the project you choose. Consider the SIGPLAN empirical evaluation guidelines when you design your methodology.
Experience Report
For the main project deadline, you will write up the project’s outcomes in the form of a post on the course blog. Your writeup should answer these questions:
What was the goal?
What did you do? (Include both the design and the implementation.)
What were the hardest parts to get right?
Were you successful? (Report rigorously on your empirical evaluation.)
To submit your report, open a pull request in the course’s GitHub repository to add your post to the blog.
more complex types, variables are %n, assumes a stack, must be in ssa
using copilot
what does the llvm alloca do
In LLVM, the alloca instruction is used to allocate memory on the stack. It stands for “allocate” and is similar to the alloca function in C. The memory allocated by alloca is automatically freed when the function returns, making it suitable for allocating temporary storage within a function.
alloca
Explanation of alloca Instruction
Syntax: alloca <type>, <num_elements>, <alignment>
<type>: The type of the elements to be allocated.
<num_elements>: The number of elements to allocate (optional, defaults to 1).
<alignment>: The alignment of the allocated memory (optional).
example
Explanation:
%3 = alloca i32, align 4:
Allocates memory for a single i32 (32-bit integer) on the stack.
The allocated memory is aligned to a 4-byte boundary.
The result is a pointer to the allocated memory, stored in the variable %3.
example 2
%4 = alloca i32, align 4:
Similar to the first instruction, this allocates memory for another i32 on the stack with 4-byte alignment.
The result is stored in the variable %4.
%5 = alloca i8**, align 8:
Allocates memory for a single i8** (pointer to a pointer to an 8-bit integer) on the stack.
The allocated memory is aligned to an 8-byte boundary.
The result is stored in the variable %5.
allocate usage
Usage:
Temporary Storage: alloca is typically used for temporary storage within a function, such as local variables.
Automatic Deallocation: The memory allocated by alloca is automatically deallocated when the function returns, so there is no need to manually free it.
Example in Context:
In the provided LLVM IR code, the alloca instructions are used to allocate space for local variables %3, %4, and %5 on the stack. These variables are then used to store values and perform operations within the main function.
allocate
In this function:
%3 is allocated for an i32 and initialized to 0.
%4 is allocated for an i32 and initialized with the value of the first argument %0.
%5 is allocated for an i8** and initialized with the value of the second argument %1.
The value stored in %4 is loaded into %6 and returned.
This demonstrates how alloca is used to manage local variables within an LLVM function.
llvm classes
what is an llvm value
In LLVM, a Value is a fundamental class that represents a typed constant or variable. It is the base class for all values computed by a program that may be used as operands to other values. This includes constants, instructions, function arguments, and more.
value continued
Key Points about llvm::Value:
Base Class: llvm::Value is the base class for many other classes in LLVM, such as llvm::Instruction, llvm::Constant, llvm::Argument, and llvm::GlobalVariable.
Typed: Every Value has a type, represented by the llvm::Type class. This type information is crucial for type checking and code generation.
Use-Def Chain: Value objects maintain a list of uses, which are the places where the value is used. This is part of the use-def (use-definition) chain, which is important for optimizations and transformations.
value continued
Common Subclasses of llvm::Value:
llvm::Instruction: Represents an individual instruction in the LLVM IR.
llvm::Constant: Represents a constant value, such as an integer or floating-point constant.
llvm::Argument: Represents an argument to a function.
llvm::GlobalVariable: Represents a global variable.
llvm classes (llvm is c++ but does not use standard library)
llvm does not use char* or std::string, it has something else called a StringRef.
there is no std::cout or std::cerr there are outs(), errs()
Instructions are a kind of Value, since everything is in SSA form, operands are pointers to instructions
plugins
An LLVM plugin is a shared library that can add additional functionality to the LLVM infrastructure. Plugins can be used to add new passes, analyses, targets, and more.
Plugins are dynamically loaded into LLVM. Once loaded, a plugin can register new command-line options, passes, etc., that are then available for use in that invocation of the tool.
The advantage for us is that using a plugin means you do not have to ever build llvm from source.
pass starter
There is a cs6120 package that makes setting up the build process for plugins simpler
rtlib - easier way to insert code with needing irbuilder
using llvm branches
To clone a specific branch from a GitHub repository, you can use the git clone command with the -b option followed by the branch name and the repository URL. Here is the syntax:
how do i find all the binary operators in a llvm plugin
to find all the binary operators in an LLVM plugin, you can iterate over all instructions in each function and use the isa<BinaryOperator> function to check if an instruction is a binary operator. If it is, you can then cast it to a BinaryOperator using dyn_cast<BinaryOperator> and process it as needed.
Here is an example of how you can modify your LLVM plugin to find and print all binary operators:
plugin
PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM)
{
for (auto &F : M)
{
errs() << "I saw a function called " << F.getName() << "!\n";
for (auto &B : F)
{
for (auto &I : B)
{
if (isa<BinaryOperator>(&I))
{
errs() << "instr:" << I << "\n";
}
auto *op = dyn_cast<BinaryOperator>(&I);
errs() << "from_op:" << op << "\n";
}
};
};
return PreservedAnalyses::all();
}
output
I saw a function called main!
I saw a function called main!
from_op:0x0
from_op:0x0
from_op:0x0
from_op:0x0
from_op:0x0
from_op:0x0
from_op:0x0
instr: %7 = add nsw i32 %6, 5
from_op:0xb58610
from_op:0x0
getting analysis info
PreservedAnalyses run(Module &M, ModuleAnalysisManager &MAM) {
// Get the FunctionAnalysisManager.
FunctionAnalysisManager &FAM = MAM.getResult<FunctionAnalysisManagerModuleProxy>(M).getManager();
for (Function &F : M) {
// Skip external functions.
if (F.isDeclaration()) continue;
// Get the DominatorTree for the function.
DominatorTree &DT = FAM.getResult<DominatorTreeAnalysis>(F);
// Print the dominator tree.
errs() << "Dominator Tree for function: " << F.getName() << "\n";
DT.print(errs());
}
irbuilder
errs() << "I saw a function called " << F.getName() << "!\n";
for (auto &B : F){
errs() << B << "\n";
for (auto &I : B){
if (auto *op = dyn_cast<BinaryOperator>(&I)){
errs() << "old inst " << *op << "\n";
IRBuilder<> builder(op);
Value *left = op->getOperand(0); // first operand
Value *right = op->getOperand(1); // second operad
Value *mul = builder.CreateMul(left, right);
errs() << "new inst:" << *mul << "\n";
errs() << B << "\n";
// replace uses of op with mul
for (auto &U : op->uses()){
int num = U.getOperandNo(); // which argument
User *user = U.getUser(); // the instruction with the use
errs() << " user:" << *user << " ";
user->setOperand(num, mul);
errs() << *user << "\n";
}
}
}
}
};
return PreservedAnalyses::none();
output
I saw a function called main!
%3 = alloca i32, align 4
%4 = alloca i32, align 4
%5 = alloca i8**, align 8
store i32 0, i32* %3, align 4
store i32 %0, i32* %4, align 4
store i8** %1, i8*** %5, align 8
%6 = load i32, i32* %4, align 4
%7 = add nsw i32 %6, 5
ret i32 %7
output 2
old inst %7 = add nsw i32 %6, 5
new inst: %7 = mul i32 %6, 5
%3 = alloca i32, align 4
%4 = alloca i32, align 4
%5 = alloca i8**, align 8
store i32 0, i32* %3, align 4
store i32 %0, i32* %4, align 4
store i8** %1, i8*** %5, align 8
%6 = load i32, i32* %4, align 4
%7 = mul i32 %6, 5
%8 = add nsw i32 %6, 5
ret i32 %8
user: ret i32 %8 ret i32 %7
more complex transforms - for instrumentation
instrumentation code in c not using builder
insert calls to functions and link them in
using IRBuilder is a mess, So I’m going to show a trick that makes it much simpler
chat gpt
how do i write an instrumentation function in c for llvm, use a plugin pass to insert a call to the instrumentation routine
Write the Instrumentation Function in C: Create a C function that you want to call from your LLVM pass.
Create an LLVM Pass: Write an LLVM pass that inserts a call to the instrumentation function.
Build and Load the Pass: Compile the pass and load it using the opt tool or integrate it into your build system.
instrumentation routine
void instrument_function(const char* bb_name) {
printf("Instrumentation function called for basic block: %s\n", bb_name);
}
pass
PreservedAnalyses run(Module &M, ModuleAnalysisManager &MAM) {
LLVMContext &Ctx = M.getContext();
IRBuilder<> Builder(Ctx);
// Declare the instrumentation function as an external function
FunctionType *FuncType = FunctionType::get(Type::getVoidTy(Ctx), Type::getInt8PtrTy(Cctx), false);
FunctionCallee InstrumentFunc =
M.getOrInsertFunction("instrument_function", FuncType);
for (Function &F : M) {
// Insert the call at the beginning of each basic block
for (BasicBlock &BB : F) {
Builder.SetInsertPoint(&BB, BB.begin());
// Create a global string for the basic block name
Value *BBName = Builder.CreateGlobalStringPtr(BB.getName());
// Create the call to the instrumentation function
Builder.CreateCall(InstrumentFunc, BBName);
}
}
return PreservedAnalyses::none();
starter code
rm -r llvm-pass-skeleton
git clone -b rtlib https://github.com/sampsyo/llvm-pass-skeleton.git
cd llvm-pass-skeleton
mkdir -p build
cd build
cmake ..
make
Adrians skeleton
struct SkeletonPass : public PassInfoMixin<SkeletonPass> {
PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM) {
for (auto &F : M.functions()) {
// Get the function to call from our runtime library.
LLVMContext &Ctx = F.getContext();
std::vector<Type*> paramTypes = {Type::getInt32Ty(Ctx)};
Type *retType = Type::getVoidTy(Ctx);
FunctionType *logFuncType = FunctionType::get(retType, paramTypes, false);
FunctionCallee logFunc =
F.getParent()->getOrInsertFunction("logop", logFuncType);
for (auto &B : F) {
for (auto &I : B) {
if (auto *op = dyn_cast<BinaryOperator>(&I)) {
// Insert *after* `op`.
IRBuilder<> builder(op);
builder.SetInsertPoint(&B, ++builder.GetInsertPoint());
// Insert a call to our function.
Value* args[] = {op};
builder.CreateCall(logFunc, args);
return PreservedAnalyses::none();
}
}
}
}
return PreservedAnalyses::all();
}
};
cd llvm-pass-skeleton
cc -c rtlib.c --- generates rtlib.o
clang -fpass-plugin=build/skeleton/SkeletonPass.so -c test.cpp -- generates test.o
cc test.o rtlib.o -- links it together
./a.out -- runs it
Homework
Follow the LLVM tutorial blog post far enough to implement a pass that changes program execution.
This is intentionally open-ended. You can be as ambitious or as unambitious as you want. An example of an unambitious but acceptable task would be to print out a message every time the program uses floating-point division.
An example of an ambitious task would be to implement an optimization on LLVM IR and make sure it speeds things up in actual wall-clock time execution.
Find a real-ish C/C++ program somewhere and run your pass on it to observe the results.