【译】LLVM 的设计
文章目录
LLVM 是开源的编译工具链项目,用 C++ 编写,包含一系列模块化的编译器组件和工具链,用于开发编译器前端和后端。
内容来源:https://aosabook.org/en/llvm.html
原文标题:LLVM
作者:Chris Lattner
译者:流左沙
注:翻译较原文会有一定的精简、重排和添删,主要是为了提取重要内容以达到更好的理解,请以原文为准。
Classical Compiler Design
Implications of this Design
The most important win of this classical design comes when a compiler decides to support multiple source languages or target architectures. If the compiler uses a common code representation in its optimizer, then a front end can be written for any language that can compile to it, and a back end can be written for any target that can compile from it
传统的编译器实现方式在面对支持不同的语言或架构时败下阵来。如果编译器在优化器中使用一种通用的代码表示,那么编译器前端可以用任何代码编写,后端可以写出为任何架构。
With this design, porting the compiler to support a new source language (e.g., Algol or BASIC) requires implementing a new front end, but the existing optimizer and back end can be reused.
支持一门新的编程语言只需要实现新的前端即可,但是已存在的优化器和后端都可以复用。
LLVM IR
With the historical background and context out of the way, let’s dive into LLVM: The most important aspect of its design is the LLVM Intermediate Representation (IR), which is the form it uses to represent code in the compiler.
最重要的部分是 LLVM IR,编译器用它来作代码表示。
LLVM IR is a low-level RISC-like virtual instruction set. Like a real RISC instruction set, it supports linear sequences of simple instructions like add, subtract, compare, and branch. These instructions are in three address form, which means that they take some number of inputs and produce a result in a different register. LLVM IR supports labels and generally looks like a weird form of assembly language.
LLVM IR 是低层级的虚拟指令集。像真实的 RISC 指令集,它支持简单的线性指令,如加、减、比较、分支跳转。这些指令以三地址码组织,表示从一系列的输入中输出不同的结果到寄存器中。LLVM IR 支持标签以及看起来像是汇编语言。
Unlike most RISC instruction sets, LLVM is strongly typed with a simple type system (e.g., i32
is a 32-bit integer, i32**
is a pointer to pointer to 32-bit integer) and some details of the machine are abstracted away. For example, the calling convention is abstracted through call
and ret
instructions and explicit arguments. Another significant difference from machine code is that the LLVM IR doesn’t use a fixed set of named registers, it uses an infinite set of temporaries named with a % character.
不像 RISC 指令集,LLVM 使用简单的类型系统以及一些机器相关的细节被抽象化了。另一个重大区别是 LLVM IR 的机器码不使用命名的寄存器,它使用无限的临时字符名字。
Target Description Files
The “mix and match” approach allows target authors to choose what makes sense for their architecture and permits a large amount of code reuse across different targets. This brings up another challenge: each shared component needs to be able to reason about target specific properties in a generic way. For example, a shared register allocator needs to know the register file of each target and the constraints that exist between instructions and their register operands. LLVM’s solution to this is for each target to provide a target description in a declarative domain-specific language (a set of .td
files) processed by the tblgen tool.
混合匹配的方式可以让我们选择架构,以及让大量的代码可以在不同的目标复用。但这带来了另一大难题:每个共享模块需要有适当的方式去合理地处理目标特定属性。比如一个共享的寄存器分配器需要去知晓每个目标的寄存器文件,以及存在于不同指令之间和他们的寄存器操作数的限制。LLVM 的解决方法是对于每个目标提供一个目标描述的 dsl 文件(由 tblgen 工具去处理)。
The different subsystems supported by the .td
files allow target authors to build up the different pieces of their target. For example, the x86 back end defines a register class that holds all of its 32-bit registers named “GR32” (in the .td
files, target specific definitions are all caps) like this:
.td 文件支持不同子系统的构建目标,比如,x86 后端定义了一个寄存器类来保存 32 位寄存器
|
|
This definition says that registers in this class can hold 32-bit integer values (“i32”), prefer to be 32-bit aligned, have the specified 16 registers (which are defined elsewhere in the .td
files) and have some more information to specify preferred allocation order and other things. Given this definition, specific instructions can refer to this, using it as an operand.
这个定义描述的是寄存器存有 32 位的值,且 32 位对齐,有指定的 16 个寄存器和更多的一些信息。有了这个定义,指定的指令指向它,用它作为操作数。
For example, the “complement a 32-bit register” instruction is defined as:
比如,一个 32 位寄存器指令如下
|
|
This definition says that NOT32r is an instruction (it uses the I
tblgen class), specifies encoding information (0xF7, MRM2r
), specifies that it defines an “output” 32-bit register $dst
and has a 32-bit register “input” named $src
(the GR32
register class defined above defines which registers are valid for the operand), specifies the assembly syntax for the instruction (using the {}
syntax to handle both AT&T and Intel syntax), specifies the effect of the instruction and provides the pattern that it should match on the last line. The “let” constraint on the first line tells the register allocator that the input and output register must be allocated to the same physical register.
这个定义描述的是 NOT32r 是一条指令(使用 I 这个 tblgen 的类),指定了编码信息(0xF7, MRM2r),指明它定义了一个 32 位的输出寄存器 $dst 和有一个 32 位寄存器的输入 $src (上面定义 GR32 寄存器类可以使用了),同时指明了汇编指令的语法(使用 {} 语法去处理不同平台的语法),指明了指令的作用以及最后一行提供匹配模式。第一行的 let 语法限制让寄存器分配器要把输入和输出都分配到同一个物理寄存器上。
This definition is a very dense description of the instruction, and the common LLVM code can do a lot with information derived from it (by the tblgen
tool). This one definition is enough for instruction selection to form this instruction by pattern matching on the input IR code for the compiler. It also tells the register allocator how to process it, is enough to encode and decode the instruction to machine code bytes, and is enough to parse and print the instruction in a textual form. These capabilities allow the x86 target to support generating a stand-alone x86 assembler (which is a drop-in replacement for the “gas” GNU assembler) and disassemblers from the target description as well as handle encoding the instruction for the JIT.
这个定义是对于指令的详细描述,通用的 LLVM 代码可以从这个描述中做很多事情。这些定义对于指令选择去匹配构造输入 IR 代码是足够的。同时它也能告诉寄存器分配器如何去处理它,让分配器足以去编码和解码指令到机器代码字节,以及足以以文本格式去解析和输出指令。这个能力让 x86 的目标可以支持 x86 汇编的生成,以及反汇编器也是如此,还有对于 JIT 的指令编码。
While we aim to get as much target information as possible into the .td
files in a nice declarative form, we still don’t have everything. Instead, we require target authors to write some C++ code for various support routines and to implement any target specific passes they might need (like X86FloatingPoint.cpp
, which handles the x87 floating point stack). As LLVM continues to grow new targets, it becomes more and more important to increase the amount of the target that can be expressed in the .td
file, and we continue to increase the expressiveness of the .td
files to handle this. A great benefit is that it gets easier and easier write targets in LLVM as time goes on.
当我们想要把更多目标信息放到 .td 文件里,我们需要作者去写 c++ 代码去支持常规的操作,以及去实现特定平台的 pass 来处理。当 LLVM 继续添加新的目标,对于能在 .td 文件添加大量的描述变得越来越重要,之后我们会持续去增加 .td 文件的描述性去处理。一个好处就是更容易去编写 LLVM 的目标。
Choosing When and Where Each Phase Runs
As mentioned earlier, LLVM IR can be efficiently (de)serialized to/from a binary format known as LLVM bitcode. Since LLVM IR is self-contained, and serialization is a lossless process, we can do part of compilation, save our progress to disk, then continue work at some point in the future. This feature provides a number of interesting capabilities including support for link-time and install-time optimization, both of which delay code generation from “compile time”.
像前面提到的,LLVM IR 会高效地去序列化或反序列化二进制格式,也就是 LLVM bitcode。由于 LLVM IR 是自举的,以及序列化是轻量的操作,我们可以进行部分的编译,保存进度到磁盘,以后再恢复执行。这个特性提供了大量有趣的功能,包括支持链接时和安装时的优化,这两个都会在编译时推迟代码的生成。
Link-Time Optimization (LTO) addresses the problem where the compiler traditionally only sees one translation unit (e.g., a .c
file with all its headers) at a time and therefore cannot do optimizations (like inlining) across file boundaries. LLVM compilers like Clang support this with the -flto
or -O4
command line option. This option instructs the compiler to emit LLVM bitcode to the .o
file instead of writing out a native object file, and delays code generation to link time.
LTO 指出了问题:编译器一次只能看到一个编译单元,因此没法跨文件范围做优化(如内联)。LLVM 的编译器比如 clang 支持 -flto 或者 -O4 命令。这个命令让编译器输出 bitcode 格式的 .o 文件而不是本地的目标文件,推迟代码生成到链接时。
Details differ depending on which operating system you’re on, but the important bit is that the linker detects that it has LLVM bitcode in the .o
files instead of native object files. When it sees this, it reads all the bitcode files into memory, links them together, then runs the LLVM optimizer over the aggregate. Since the optimizer can now see across a much larger portion of the code, it can inline, propagate constants, do more aggressive dead code elimination, and more across file boundaries. While many modern compilers support LTO, most of them (e.g., GCC, Open64, the Intel compiler, etc.) do so by having an expensive and slow serialization process. In LLVM, LTO falls out naturally from the design of the system, and works across different source languages (unlike many other compilers) because the IR is truly source language neutral.
具体细节会因操作系统而异,但链接器重要的一点是检查 .o 文件是 bitcode 格式还是本地目标格式。当看到是 bitcode 格式,它会把所有的 bitcode 文件读进内存,把他们链接到一起,然后在集成过程中执行 LLVM 的优化器。由于优化器可以看到更多的代码范围,它就能做到内联、常量传播、更激进的无用代码删除的优化。当现代编译器支持 LTO 时,大部分会有较高的代价和缓慢的序列化过程,在 LLVM 中 LTO 就自然得多,因为有 IR 这个中间格式。
Install-time optimization is the idea of delaying code generation even later than link time, all the way to install time, as shown in Figure 11.7. Install time is a very interesting time (in cases when software is shipped in a box, downloaded, uploaded to a mobile device, etc.), because this is when you find out the specifics of the device you’re targeting. In the x86 family for example, there are broad variety of chips and characteristics. By delaying instruction choice, scheduling, and other aspects of code generation, you can pick the best answers for the specific hardware an application ends up running on.
安装时优化是推迟代码生成甚至在链接之后。安装时优化是有趣的时间段(这个阶段软件被打包、下载、上传到移动设备),因为你可以知道你的设备的特定目标。在 x86 中,有大量的芯片和特性。通过推迟指令的选择,调整和代码生成的其他部分,你可以找到特定平台的更好的优化方式。
Unit Testing the Optimizer
Compilers are very complicated, and quality is important, therefore testing is critical. For example, after fixing a bug that caused a crash in an optimizer, a regression test should be added to make sure it doesn’t happen again. The traditional approach to testing this is to write a .c
file (for example) that is run through the compiler, and to have a test harness that verifies that the compiler doesn’t crash. This is the approach used by the GCC test suite, for example.
编译器非常复杂,而且性能非常重要,因为测试是很关键的。比如修复了某个优化器导致的崩溃 bug,那么就应该执行回归测试保证不会再犯。传统的方式是写一个 .c 文件让编译器来执行,GCC 就是用的这种方式。
The problem with this approach is that the compiler consists of many different subsystems and even many different passes in the optimizer, all of which have the opportunity to change what the input code looks like by the time it gets to the previously buggy code in question. If something changes in the front end or an earlier optimizer, a test case can easily fail to test what it is supposed to be testing.
这种方式的问题是编译器包含有各种各样的子系统以及各种不同的优化器的 pass,所有这些都有机会去修改输入的代码,如果一旦发生了修改,那么测试用例很可能会报错。
By using the textual form of LLVM IR with the modular optimizer, the LLVM test suite has highly focused regression tests that can load LLVM IR from disk, run it through exactly one optimization pass, and verify the expected behavior. Beyond crashing, a more complicated behavioral test wants to verify that an optimization is actually performed. Here is a simple test case that checks to see that the constant propagation pass is working with add instructions:
通过使用 LLVM IR 的文本格式,LLVM 的测试套件可以更高效地处理回归测试,把 IR 从磁盘中加载,执行特定的优化 pass,然后验证其行为。这种方法还可以看到实际的优化。
|
|
The RUN
line specifies the command to execute: in this case, the opt
and FileCheck
command line tools. The opt
program is a simple wrapper around the LLVM pass manager, which links in all the standard passes (and can dynamically load plugins containing other passes) and exposes them through to the command line. The FileCheck
tool verifies that its standard input matches a series of CHECK
directives. In this case, this simple test is verifying that the constprop
pass is folding the add
of 4 and 5 into 9.
RUN 指令表示执行,这里使用 opt 和 FileCheck 工具。opt 是 LLVM pass 管理的包装器,链接有所有的 pass。FileCheck 验证输出是否匹配。在上面的案例当中,是验证 constprop pass 常量传播会把 4 加 5 输出为 9。
Automatic Test Case Reduction with BugPoint
When a bug is found in a compiler or other client of the LLVM libraries, the first step to fixing it is to get a test case that reproduces the problem. Once you have a test case, it is best to minimize it to the smallest example that reproduces the problem, and also narrow it down to the part of LLVM where the problem happens, such as the optimization pass at fault. While you eventually learn how to do this, the process is tedious, manual, and particularly painful for cases where the compiler generates incorrect code but does not crash.
当发现编译器的 bug,第一步是去用测试用例来重现它。一旦有了测试用例,就更好地减少 bug 的测试用例了,以及可以缩小问题发现的范围,比如某个优化的 pass 导致的问题。当你学会了怎么去操作,这个测试用例就是冗余的、有问题的。
The LLVM BugPoint tool7 uses the IR serialization and modular design of LLVM to automate this process. For example, given an input .ll
or .bc
file along with a list of optimization passes that causes an optimizer crash, BugPoint reduces the input to a small test case and determines which optimizer is at fault. It then outputs the reduced test case and the opt
command used to reproduce the failure. It finds this by using techniques similar to “delta debugging” to reduce the input and the optimizer pass list. Because it knows the structure of LLVM IR, BugPoint does not waste time generating invalid IR to input to the optimizer, unlike the standard “delta” command line tool.
LLVM BugPoint 工具使用 IR 序列化以及模块化的设计来自动操作。比如,给定输入 .ll 或者 .bc 文件,在一系列的优化 pass 之后会导致崩溃,BugPoint 减少输入为小的测试用例以及分析哪个优化操作导致崩溃。然后它输出减少后的测试用例以及使用 opt 来重现问题。它通过增量调试的方式去减少输入和优化的 pass 列表。因为它知晓 LLVM IR 的结构,BugPoint 不会浪费时间生成无效的 IR。
In the more complex case of a miscompilation, you can specify the input, code generator information, the command line to pass to the executable, and a reference output. BugPoint will first determine if the problem is due to an optimizer or a code generator, and will then repeatedly partition the test case into two pieces: one that is sent into the “known good” component and one that is sent into the “known buggy” component. By iteratively moving more and more code out of the partition that is sent into the known buggy code generator, it reduces the test case.
在更复杂的混合编译中,你可以指定输入、代码生成的信息、指定的命令行工具去执行、输出的引用。BugPoint 会首先分析这个问题是优化器引起的还是代码生成引起的,然后会重复地划分测试用例为两部分:一部分是已知的好的部分,另一部分是已知的有问题的部分。通过重复地移除越来越多的代码块,就减少了测试用例的量。
BugPoint is a very simple tool and has saved countless hours of test case reduction throughout the life of LLVM. No other open source compiler has a similarly powerful tool, because it relies on a well-defined intermediate representation. That said, BugPoint isn’t perfect, and would benefit from a rewrite. It dates back to 2002, and is typically only improved when someone has a really tricky bug to track down that the existing tool doesn’t handle well. It has grown over time, accreting new features (such as JIT debugging) without a consistent design or owner.
BugPoint 是一个简单的工具,为我们省下了大量的测试时间。没有其他开源编译器有如此强大的工具,因为它依赖于良好的中间表示。但是,BugPoint 并非完美,它对于重写有好处。
reference
文章作者 calssion
上次更新 2021-06-14