Paper Presentation - Yashaswini

Author

Yashaswini Makaram

PROGRAML: Graph-based Deep Learning for Program Optimization and Analysis

Introduction

This paper adresses the following issues: 1. Hand-tuning Heuristics for changes in software or hardware is time consuming and never-ending 2. Machine learning approaches do don’t capture the structure of programs and are unable to reason about program behaviour - Unnessecary emphasis on naming conventions - Using compilation to remove noise also omits information - Related statemed separated sequentially fall to vanishing gradients and catastrophic forgetting

Background

Previous methods of representing IR code for input to ML algorithms: 1. AST-code2vec - AST paths to embed programs - highly effective at software engineering tasks such as algorithm classification, where the code was written by humans - puts more weight on names rather than code structure 2. Neural Code Comprehension - Encoder uses Contextual Flow Graphs (XFG) built from LLVM-IR statements to create inputs for neural networks - Combining DFGs and CFGs, the XFG representation omits important information such as order of instruction operands 3. Control and Data Flow graphs - uses only instruction opcodes to compute latent representations - Omits data types, the presence of variables and constants, and the ordering of operands

Main Contribution

Programl offers a new graphical representation of IR that combines - control flow - data flow - call flow - input encoding

to bypass issues mentioned earlier.

They tested agains 3 main problem types - Traditional compiler analysis - Heterogeneous device mapping - Algorithm classification with the caveat that this representation is not meant for these purposes and should not be used to substitute current methods, but need to be able to pass them in order to be a valid solution

Merits

ProGraML aims to create a toolbox for eventual machine learning application in optimization compilers.
solver issues that other state of the art representations have not addresses
May aid other endevours in the future.

Shortcomings

It does not replace any of the current tools and cannot stand alone in this task
It cannot currently be put to use as it is only a proof of concept
while it may be useful in the future, we cannot say for sure what changes may happen over the years.

Class Discussion

we disscussed the reason that the paper focused so much on telling us that the ProGraML representation is not meant to replace current methods, and reasoned about its nature as a toolbox for the future.
we discussed its use in amchine learning and wherther it could be sed with large language models or other use cases.

Conclusion

Overall the paper is well written ad they present their findings well.
they have good data to back up thier conclutions
they do not gloss over the shortcomings of their work
however, the usefulness of the representation has not been fully showcased. some more work may need to be done for a final product of this toolbox to be useful to users.