intro to llvm

difference between bril and llvm

links

llvm page

Adrians tutorial

llvm doc

google or github pilot is very useful for this

#as a first step I'm going to show how to install clang and cmake 

# step remove any old copies 
# the -S flag to sudo means - read from stdinput
# the -y flag means always ans yes to apt 
# since sudo needs a password 
# -qq is the very quiet option 
!sudo -S apt purge -y -qq clang cmake <  ~/pw
!sudo -S apt install -y -qq clang cmake < ~/pw
[sudo] password for norm: The following packages were automatically installed and are no longer required:
  cmake-data dh-elpa-helper emacsen-common libarchive13 libjsoncpp25 librhash0
Use 'sudo apt autoremove' to remove them.
The following packages will be REMOVED:
  clang* cmake*
0 upgraded, 0 newly installed, 2 to remove and 48 not upgraded.
After this operation, 21.3 MB disk space will be freed.

(Reading database ... 40226 files and directories currently installed.)
Removing clang (1:14.0-55~exp2) ...
Progress: [  0%] [..........................................................] Progress: [ 11%] [######....................................................] Progress: [ 22%] [############..............................................] Progress: [ 33%] [###################.......................................] Progress: [ 44%] [#########################.................................] emoving cmake (3.22.1-1ubuntu1.22.04.2) ...
Progress: [ 56%] [################################..........................] Progress: [ 67%] [######################################....................] Progress: [ 78%] [#############################################.............] Progress: [ 89%] [###################################################.......] rocessing triggers for man-db (2.10.2-1) ...

[sudo] password for norm: Suggested packages:
  cmake-doc ninja-build cmake-format
The following NEW packages will be installed:
  clang cmake
0 upgraded, 2 newly installed, 0 to remove and 48 not upgraded.
Need to get 0 B/5014 kB of archives.
After this operation, 21.3 MB of additional disk space will be used.

Selecting previously unselected package clang.
(Reading database ... 40203 files and directories currently installed.)
Preparing to unpack .../clang_1%3a14.0-55~exp2_amd64.deb ...
Progress: [  0%] [..........................................................] Progress: [ 11%] [######....................................................] Unpacking clang (1:14.0-55~exp2) ...
Progress: [ 22%] [############..............................................] electing previously unselected package cmake.
Preparing to unpack .../cmake_3.22.1-1ubuntu1.22.04.2_amd64.deb ...
Progress: [ 33%] [###################.......................................] Unpacking cmake (3.22.1-1ubuntu1.22.04.2) ...
Progress: [ 44%] [#########################.................................] etting up clang (1:14.0-55~exp2) ...
Progress: [ 56%] [################################..........................] Progress: [ 67%] [######################################....................] etting up cmake (3.22.1-1ubuntu1.22.04.2) ...
Progress: [ 78%] [#############################################.............] Progress: [ 89%] [###################################################.......] rocessing triggers for man-db (2.10.2-1) ...

lets take a look at llvm ir

%%writefile temp.c
int main(int argc, char** argv){
    return argc;
}
Overwriting temp.c
# call clang and dump the ir
# # -emit-llvm  print the ir
# -S print as text not as binary 
# 0 -  output to stdout 
# 
!clang -emit-llvm -S -o - temp.c
; ModuleID = 'temp.c'
source_filename = "temp.c"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"

; Function Attrs: noinline nounwind optnone uwtable
define dso_local i32 @main(i32 noundef %0, i8** noundef %1) #0 {
  %3 = alloca i32, align 4
  %4 = alloca i32, align 4
  %5 = alloca i8**, align 8
  store i32 0, i32* %3, align 4
  store i32 %0, i32* %4, align 4
  store i8** %1, i8*** %5, align 8
  %6 = load i32, i32* %4, align 4
  ret i32 %6
}

attributes #0 = { noinline nounwind optnone uwtable "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }

!llvm.module.flags = !{!0, !1, !2, !3, !4}
!llvm.ident = !{!5}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{i32 7, !"PIC Level", i32 2}
!2 = !{i32 7, !"PIE Level", i32 2}
!3 = !{i32 7, !"uwtable", i32 1}
!4 = !{i32 7, !"frame-pointer", i32 2}
!5 = !{!"Ubuntu clang version 14.0.0-1ubuntu1.1"}

An LLVM plugin is a shared library that can add additional functionality to the LLVM infrastructure. Plugins can be used to add new passes, analyses, targets, and more.

Plugins are dynamically loaded into LLVM. Once loaded, a plugin can register new command-line options, passes, etc., that are then available for use in that invocation of the tool.

There is a cs6120 package that makes setting up the build process for plugins simple

llvm ir, has two forms .bc files are bitcode, .ll forms are text versions that look like assembly.

llvm is not written in C++ but it has a lot of features that look like C++.

  1. llvm does not use char* or std::string, it has something else called a StringRef.
  2. there is no std::cout or std::cerr there are outs(), errs()
  3. lot of built in data structures
  4. complex class hierarchy

flowchart TD;
Value --> Argument ;
Value --> other["..."];
Value --> User;
User --> Constant
User--> Operator
User--> Instruction
Constant --> ConstantExpr
Constant--> ConstantData
Operator--> ConcreteOperator
Instruction--> UnaryInst
ConstantData --> ConstantInt
ConstantData --> UndefValue
Instruction --> BinaryOperator
Instruction--> CallBase

Instructions are a kind of Value, since everything is in SSA form, so in memory operands are pointers to instructions so if I is an instruction

outs() << *(I.getOperand(0)) ; prints an instruction

Given a Value* V, what kind of thing is V?

  1. isa(V) true of V is a agument
  2. cast(V) casts to Argument, assert falure of not Argument
  3. dyn_cast(V) casts to Argument returns NULL if not an argument
Static bool isLoopInvariant(const Value *V, const Loop *L) { 
    if (isa<Constant>(V) || isa<Argument>(V) || isa<GlobalValue<(V)) {
         return true; } 
    //otherwise it must be an instruction…    
    return !L->contains(cast<Instruction>(V)->getParent());
     … 
}

Navigating llvm IR - IT Containers

  1. Module - two way linked list of Functions
  2. Function - two way linked list of Basic Blocks
  3. Basic Block - two way linked list of Instructions

%5 = add i32 %4,2

this instruction adds two 32 bit ints, input is in register %4 and the constant 2, result goes into register %5

blog post: Why would a grad student care about llvm

%%bash 
rm -r llvm-pass-skeleton/
git clone   https://github.com/sampsyo/llvm-pass-skeleton.git
cd llvm-pass-skeleton/
mkdir -p build 
cd build 
cmake ..
make


# look at  llvm-pass-skeleton/skeleton/Skeleton.cpp
Cloning into 'llvm-pass-skeleton'...

The function returns PreservedAnalyses::all() to indicate that it didn’t modify M. Later, when we actually transform the program, we’ll need to return something like PreservedAnalyses::none().

The ModuleAnalysisManager is responsible for managing the analysis results for Module passes.

When a pass requests an analysis, the ModuleAnalysisManager checks if the analysis result is already available. If it is, the ModuleAnalysisManager returns the cached result. If it’s not, the ModuleAnalysisManager runs the analysis pass, caches the result, and then returns it.

This allows LLVM to avoid recomputing analysis results unnecessarily, which can significantly improve the performance of the compiler.

Here’s an example of how you might use it:

PreservedAnalyses MyPass::run(Module &M, ModuleAnalysisManager &MAM) {
    // Request an analysis result.
    const auto &Result = MAM.getResult<SomeAnalysis>(M);

    // Use the analysis result.
    // ...

    return PreservedAnalyses::all();
}

Here is a second example getting the dominator tree

    PreservedAnalyses run(Module &M, ModuleAnalysisManager &MAM) {
        // Get the FunctionAnalysisManager.
        FunctionAnalysisManager &FAM = MAM.getResult<FunctionAnalysisManagerModuleProxy>(M).getManager();

        for (Function &F : M) {
            // Skip external functions.
            if (F.isDeclaration()) continue;

            // Request the dominator tree of the function.
            const DominatorTree &DT = FAM.getResult<DominatorTreeAnalysis>(F);

            // Use the dominator tree.
            // ...
        }

        return PreservedAnalyses::all();
    }

now let look at the containers

%%bash
rm -r llvm-pass-skeleton/
git clone  -b containers  https://github.com/sampsyo/llvm-pass-skeleton.git
cd llvm-pass-skeleton/
mkdir -p build 
cd build 
cmake ..
make
Cloning into 'llvm-pass-skeleton'...
-- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test HAVE_FFI_CALL
-- Performing Test HAVE_FFI_CALL - Success
-- Found FFI: /usr/lib/x86_64-linux-gnu/libffi.so  
-- Performing Test Terminfo_LINKABLE
-- Performing Test Terminfo_LINKABLE - Success
-- Found Terminfo: /usr/lib/x86_64-linux-gnu/libtinfo.so  
-- Found ZLIB: /usr/lib/x86_64-linux-gnu/libz.so (found version "1.2.11") 
-- Found LibXml2: /usr/lib/x86_64-linux-gnu/libxml2.so (found version "2.9.13") 
-- Linker detection: GNU ld
-- Registering SkeletonPass as a pass plugin (static build: OFF)
-- Configuring done
-- Generating done
-- Build files have been written to: /home/norm/llvm/llvm-pass-skeleton/build
[ 50%] Building CXX object skeleton/CMakeFiles/SkeletonPass.dir/Skeleton.cpp.o
[100%] Linking CXX shared module SkeletonPass.so
Error while terminating subprocess (pid=71626): 
[100%] Built target SkeletonPass
# run the plugin 
# 
!clang -fpass-plugin=`echo llvm-pass-skeleton/build/skeleton/SkeletonPass.*` temp.c
In a function called main!
Function body:
; Function Attrs: noinline nounwind optnone uwtable
define dso_local i32 @main(i32 noundef %0, i8** noundef %1) #0 {
  %3 = alloca i32, align 4
  %4 = alloca i32, align 4
  %5 = alloca i8**, align 8
  store i32 0, i32* %3, align 4
  store i32 %0, i32* %4, align 4
  store i8** %1, i8*** %5, align 8
  %6 = load i32, i32* %4, align 4
  ret i32 %6
}
Basic block:

  %3 = alloca i32, align 4
  %4 = alloca i32, align 4
  %5 = alloca i8**, align 8
  store i32 0, i32* %3, align 4
  store i32 %0, i32* %4, align 4
  store i8** %1, i8*** %5, align 8
  %6 = load i32, i32* %4, align 4
  ret i32 %6
Instruction: 
  %3 = alloca i32, align 4
Instruction: 
  %4 = alloca i32, align 4
Instruction: 
  %5 = alloca i8**, align 8
Instruction: 
  store i32 0, i32* %3, align 4
Instruction: 
  store i32 %0, i32* %4, align 4
Instruction: 
  store i8** %1, i8*** %5, align 8
Instruction: 
  %6 = load i32, i32* %4, align 4
Instruction: 
  ret i32 %6
I saw a function called main!
%%writefile temp1.c
int main(int argc, char** argv){
    if (argc >2 )
        return argc;
    return 0;
}
Overwriting temp1.c
!clang -fpass-plugin=`echo llvm-pass-skeleton/build/skeleton/SkeletonPass.*` temp1.c
In a function called main!
Function body:
; Function Attrs: noinline nounwind optnone uwtable
define dso_local i32 @main(i32 noundef %0, i8** noundef %1) #0 {
  %3 = alloca i32, align 4
  %4 = alloca i32, align 4
  %5 = alloca i8**, align 8
  store i32 0, i32* %3, align 4
  store i32 %0, i32* %4, align 4
  store i8** %1, i8*** %5, align 8
  %6 = load i32, i32* %4, align 4
  %7 = icmp sgt i32 %6, 2
  br i1 %7, label %8, label %10

8:                                                ; preds = %2
  %9 = load i32, i32* %4, align 4
  store i32 %9, i32* %3, align 4
  br label %11

10:                                               ; preds = %2
  store i32 0, i32* %3, align 4
  br label %11

11:                                               ; preds = %10, %8
  %12 = load i32, i32* %3, align 4
  ret i32 %12
}
Basic block:

  %3 = alloca i32, align 4
  %4 = alloca i32, align 4
  %5 = alloca i8**, align 8
  store i32 0, i32* %3, align 4
  store i32 %0, i32* %4, align 4
  store i8** %1, i8*** %5, align 8
  %6 = load i32, i32* %4, align 4
  %7 = icmp sgt i32 %6, 2
  br i1 %7, label %8, label %10
Instruction: 
  %3 = alloca i32, align 4
Instruction: 
  %4 = alloca i32, align 4
Instruction: 
  %5 = alloca i8**, align 8
Instruction: 
  store i32 0, i32* %3, align 4
Instruction: 
  store i32 %0, i32* %4, align 4
Instruction: 
  store i8** %1, i8*** %5, align 8
Instruction: 
  %6 = load i32, i32* %4, align 4
Instruction: 
  %7 = icmp sgt i32 %6, 2
Instruction: 
  br i1 %7, label %8, label %10
Basic block:

8:                                                ; preds = %2
  %9 = load i32, i32* %4, align 4
  store i32 %9, i32* %3, align 4
  br label %11
Instruction: 
  %9 = load i32, i32* %4, align 4
Instruction: 
  store i32 %9, i32* %3, align 4
Instruction: 
  br label %11
Basic block:

10:                                               ; preds = %2
  store i32 0, i32* %3, align 4
  br label %11
Instruction: 
  store i32 0, i32* %3, align 4
Instruction: 
  br label %11
Basic block:

11:                                               ; preds = %10, %8
  %12 = load i32, i32* %3, align 4
  ret i32 %12
Instruction: 
  %12 = load i32, i32* %3, align 4
Instruction: 
  ret i32 %12
I saw a function called main!

using IRBuilder is a mess, So I’m going to show a trick that makes it much simpler

%%bash
rm -r llvm-pass-skeleton/
git clone  -b rtlib  https://github.com/sampsyo/llvm-pass-skeleton.git
cd llvm-pass-skeleton/
mkdir -p build 
cd build 
cmake ..
make
Cloning into 'llvm-pass-skeleton'...
-- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test HAVE_FFI_CALL
-- Performing Test HAVE_FFI_CALL - Success
-- Found FFI: /usr/lib/x86_64-linux-gnu/libffi.so  
-- Performing Test Terminfo_LINKABLE
-- Performing Test Terminfo_LINKABLE - Success
-- Found Terminfo: /usr/lib/x86_64-linux-gnu/libtinfo.so  
-- Found ZLIB: /usr/lib/x86_64-linux-gnu/libz.so (found version "1.2.11") 
-- Found LibXml2: /usr/lib/x86_64-linux-gnu/libxml2.so (found version "2.9.13") 
-- Linker detection: GNU ld
-- Registering SkeletonPass as a pass plugin (static build: OFF)
-- Configuring done
-- Generating done
-- Build files have been written to: /home/norm/llvm/llvm-pass-skeleton/build
[ 50%] Building CXX object skeleton/CMakeFiles/SkeletonPass.dir/Skeleton.cpp.o
[100%] Linking CXX shared module SkeletonPass.so
[100%] Built target SkeletonPass
%%bash 
cat ls ~/llvm/llvm-pass-skeleton/skeleton/Skeleton.cpp 
echo done
cat: ls: No such file or directory
#include "llvm/Pass.h"
#include "llvm/Passes/PassBuilder.h"
#include "llvm/Passes/PassPlugin.h"
#include "llvm/Support/raw_ostream.h"
#include "llvm/IR/IRBuilder.h"
#include "llvm/Transforms/Utils/BasicBlockUtils.h"
using namespace llvm;

namespace {

struct SkeletonPass : public PassInfoMixin<SkeletonPass> {
    PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM) {
        for (auto &F : M.functions()) {

            // Get the function to call from our runtime library.
            LLVMContext &Ctx = F.getContext();
            std::vector<Type*> paramTypes = {Type::getInt32Ty(Ctx)};
            Type *retType = Type::getVoidTy(Ctx);
            FunctionType *logFuncType = FunctionType::get(retType, paramTypes, false);
            FunctionCallee logFunc =
                F.getParent()->getOrInsertFunction("logop", logFuncType);

            for (auto &B : F) {
                for (auto &I : B) {
                    if (auto *op = dyn_cast<BinaryOperator>(&I)) {
                        // Insert *after* `op`.
                        IRBuilder<> builder(op);
                        builder.SetInsertPoint(&B, ++builder.GetInsertPoint());

                        // Insert a call to our function.
                        Value* args[] = {op};
                        builder.CreateCall(logFunc, args);

                        return PreservedAnalyses::none();
                    }
                }
            }

        }
        return PreservedAnalyses::all();
    }
};

}

extern "C" LLVM_ATTRIBUTE_WEAK ::llvm::PassPluginLibraryInfo
llvmGetPassPluginInfo() {
    return {
        .APIVersion = LLVM_PLUGIN_API_VERSION,
        .PluginName = "Skeleton pass",
        .PluginVersion = "v0.1",
        .RegisterPassBuilderCallbacks = [](PassBuilder &PB) {
            PB.registerPipelineStartEPCallback(
                [](ModulePassManager &MPM, OptimizationLevel Level) {
                    MPM.addPass(SkeletonPass());
                });
        }
    };
}
done
%%bash 
cat /home/norm/llvm/llvm-pass-skeleton/rtlib.c
echo
#include <stdio.h>
void logop(int i) {
    printf("computed: %i\n", i);
}
%%writefile llvm-pass-skeleton/test_r.cpp
#include <stdio.h>
int main (int argc, char** argv) {
    printf("%d %d", argc, (argc + 2) * (argc +3));
}
Overwriting llvm-pass-skeleton/test_r.cpp
%%bash 
cd llvm-pass-skeleton/
cc -c rtlib.c
clang  -fpass-plugin=build/skeleton/SkeletonPass.so -c test_r.cpp
cc test_r.o rtlib.o
./a.out 1 2 3 4
echo 
computed: 7
5 56
Back to top