【译】编译构建的确定性分析
文章目录
这是一篇对编译构建的确定性分析的文章,偏理论性的,对编译构建的确定性进行了很好地分类。
内容来源:https://blog.llvm.org/2019/11/deterministic-builds-with-clang-and-lld.html
原文标题:Deterministic builds with clang and lld
作者:Nico Weber
译者:流左沙
注:翻译较原文会有一定的精简、重排和添删,主要是为了提取重要内容以达到更好的理解,请以原文为准。
definition
A build is called deterministic or reproducible if running it twice produces exactly the same build outputs.
一个构建如果运行多次产生相同的构建结果,则称之为可确定的或者可复制的。
There are several degrees of build determinism that are increasingly useful but increasingly difficult to achieve:
这里有一些构建确定性的层级介绍,逐级往上更加有用但也更加复杂:
Basic determinism 基础的确定性
Doing a full build of the same source code in the same directory on the same machine produces exactly the same output every time, in the sense that a content hash of the final build artifacts and of all intermediate files does not change. 对于同一目录下的相同源代码,在同一台机器上的全量构建可以产生相同的输出。且对于最终构建产物和所有的中间文件的内容哈希都不会改变。
-
Once you have this, if all your builders are configured the same way (OS version, toolchain, build path, checkout path, …), they can share build artifacts, for example by using distcc. 有了这个特性,对于相同配置的构建,可以共享构建产物。
-
This also allows local caching of test suite results keyed by a hash of test binary and test input files. 可以让本地的测试用例结果缓存起来作为二进制和输入文件的存储。
-
Illustrative example:
|
|
Incremental basic determinism 增量的基础确定性
Like basic determinism, but the output binaries also don’t change in partial rebuilds. In build systems that track file modification times to decide when to rebuild, this means for example that updating the modification time on a C++ source file (without doing any actual changes) and rebuilding will produce the same output as a full build. 跟上面的基础确定性一样,但是输出的二进制不会在部分重编后被改变。构建系统跟踪文件的修改时间来决定是否重编。
-
This allows having build bots that don’t do full builds each time, while still allowing caching of compile artifacts and test results. 可以有构建处理不需要全量构建,而且还是可以缓存编译产物和测试结果。
-
Illustrative example:
|
|
Local determinism 局部确定性
Like incremental basic determinism, but builds are also independent of the name of the build directory. Builds of the same source code on the same machine produce exactly the same output every time, independent of the location of the source checkout directory or the build directory. 跟增量基础确定性一样,但是编译是独立于每个构建目录的名字。对于相同的源代码,在同一平台会产生同样的输出,但会有不同的目录位置。
-
This allows machines to have several build directories at different locations but still share compile and test caches. 允许机器有很多的构建目录,但依然可以共享编译和测试的缓存。
-
Illustrative example:
|
|
Universal determinism 全局确定性
Like 3, but builds are also independent of the machine the build runs on. Everybody that checks out the project at a given revision into any directory and builds it following the build instructions ends up with exactly the same bits in the build output. 像局部确定性,但是对于每个平台是独立的。在给定的版本中将项目放到任何目录中,并按照构建指令构建它的,最终在构建输出中得到完全相同的结果。
-
Since exact local OS and locally installed packages no longer matter, this allows devs to share compile and test caches with bots, without having to use difficult-to-setup containers. 由于特定的系统和本地安装的包不那么重要,所以这允许设备去共享编译和测试缓存,不用去使用不同的配置。
-
It also allows easy verification of builds done by others to make sure output binaries haven’t been tampered with. 它支持简单的构建的验证,来确保输出不会被篡改。
-
Illustrative example:
|
|
Getting to basic determinism
Basic determinism needs tools (compiler, linker, etc) that are deterministic. Tools internally must not output things in hash table order, multi-threaded programs must not write output in the order threads finish, etc. All of LLVM’s tools have deterministic outputs when run with the right flags but not necessarily by default.
基础确定性需要用到的工具也是确定性的。比如,内部工具不能以哈希表的顺序来输出,多线程不能以线程结束顺序来输出。当以正确的参数执行时,所有 LLVM 的工具都有确定性的输出。
The C standard defines the predefined macros TIME and DATE that expand to the time a source file is compiled. Several compilers, including clang, also define the non-standard TIMESTAMP. This is inherently nondeterministic. You should not use these macros, and you can use -Wdate-time to make the compiler emit a warning when they are used.
C 标准预定义了一些宏来扩展源文件被编译的时间。一些编译器,包括 clang,也定义了非标准的宏。这是内部的不确定性。你不能使用这些宏,可以使用参数来进行提醒警告。
If they are used in third-party code you don’t control, you can use -Wno-builtin-macro-redefined -D__DATE__= -D__TIME__= -D__TIMESTAMP__= to make them expand to nothing.
如果第三方使用了,那你需要用参数把它们设置为空。
Getting to incremental determinism
Having deterministic incremental builds mostly requires having correct incremental builds, meaning that if a file is changed and the build reruns, everything that uses this file needs to be rebuilt.
增量确定性需要有正确的增量构建,意味着一个文件被修改后,与此文件相关的都需要重编。
This is very build system dependent, so this post can’t say much about it.
这是特别依赖于构建系统的。
In general, every build step needs to correctly declare all the inputs it depends on.
一般来说,每个构建步骤都需要正确地声明所有它依赖的输入。
Getting to local determinism
Making build outputs independent of the names of the checkout or build directory means that build outputs must not contain absolute paths, or relative paths that contain the name of either directory.
局部确定性意味着构建输出不能包含有绝对的路径以及相对的带有目录名字的路径。
A possible way to arrange for that is to put all build directories into the checkout directory. For example, if your code is at path/to/src, then you could have “out” in your .gitignore and build directories at path/to/src/out/debug, path/to/src/out/release, and so on. The relative path from each build artifact to the source is with “../../” followed by the path of the source file in the source directory, which is identical for each build directory.
一个可能的方式是整理所有的构建目录到一个目录。如 git 的操作。
To make your build locally deterministic, pass relative paths to your .cc files to clang.
为了使构建有局部确定性,需要传相对路径给 clang 使用。
By default, clang will internally use absolute paths to refer to compiler-internal headers. Pass -no-canonical-prefixes to make clang use relative paths for these internal files.
默认的,clang 会用绝对路径指向编译器内部头文件。通过参数可以确保 clang 使用相对路径。
Passing relative paths to clang makes clang expand FILE to a relative path, but paths in debug information are still absolute by default. Pass -fdebug-compilation-dir . to make paths in debug information relative to the build directory. (Before LLVM 9, this is an internal clang flag that must be used as -Xclang -fdebug-compilation-dir -Xclang .
) When using clang’s integrated assembler (the default), -Wa,-fdebug-compilation-dir,. will do the same for object files created from assembly input. (For ml.exe / ml64.exe, see the script linked to from the “Basic determinism” section above.)
调试信息里的路径还会是绝对路径,这个也有参数可以解决。
Using this means that debuggers won’t automatically find the source code belonging to your binary. At the moment, there’s no way to tell debuggers to resolve relative paths relative to the location of the binary (DWARF proposal, gdb patch). See the end of this section for how to configure common debuggers to work correctly.
但使用这种方式会影响调试器找到二进制对应的源代码。
Getting to universal determinism
By now, your build output is deterministic as long as everyone uses the same compiler, and linker binaries, and as long as everyone uses the version of the SDK and system libraries.
只要每个人都使用相同的编译器和链接器,只要每个人都使用相同的 SDK 版本和系统库,构建输出就是确定性的。
Making your build independent of that requires making sure that everyone automatically uses the same compiler, linker, and SDK.
为了使构建独立,需要确保每个人自动地使用相同的编译器、链接器和 SDK。
This might seem like a lot of work, but in addition to build determinism this work also gives you cross builds (where you can e.g. build the Linux version of your product on a Windows host).
这看起来会有大量的工作,但除了构建确定性,还会有交叉编译。
reference
https://blog.llvm.org/2019/11/deterministic-builds-with-clang-and-lld.html
文章作者 calssion
上次更新 2021-06-12