CLI tests are now ported over to the new std.Build API and thus work
properly with concurrency.
* add `std.Build.addCheckFile` for creating a
`std.Build.CheckFileStep`.
* add `std.Build.makeTempPath`. This function is intended to be called
in the `configure` phase only. It returns an absolute directory path,
which is potentially going to be a source of API breakage in the
future, so keep that in mind when using this function.
* add `std.Build.CheckFileStep.setName`.
* `std.Build.CheckFileStep`: better error message when reading the
input file fails.
* `std.Build.RunStep`: add a `has_side_effects` flag for when you need
to override the autodetection.
* `std.Build.RunStep`: add the ability to obtain a FileSource for the
directory that contains the written files.
* `std.Build.WriteFileStep`: add a way to write bytes to an arbitrary
path - absolute or relative to the package root. Be careful with this
because it updates source files. This should not be used as part of
the normal build process, but as a utility occasionally run by a
developer with intent to modify source files and then commit those
changes to version control. A file added this way is not available
with `getFileSource`.
* RunStep: ability to set stdin
* RunStep: ability to capture stdout and stderr as a FileSource
* RunStep: add setName method
* RunStep: hash the stdio checks
The problem is that one may execute too many subprocesses concurrently
that, together, exceed an RSS value that causes the OOM killer to kill
something problematic such as the window manager. Or worse, nothing, and
the system freezes.
This is a real world problem. For example when building LLVM a simple
`ninja install` will bring your system to its knees if you don't know
that you should add `-DLLVM_PARALLEL_LINK_JOBS=1`.
In particular: compiling the zig std lib tests takes about 2G each,
which at 16x at once (8 cores + hyperthreading) is using all 32GB of my
RAM, causing the OOM killer to kill my window manager
The idea here is that you can annotate steps that might use a high
amount of system resources with an upper bound. So for example I could
mark the std lib tests as having an upper bound peak RSS of 3 GiB.
Then the build system will do 2 things:
1. ulimit the child process, so that it will fail if it would exceed
that memory limit.
2. Notice how much system RAM is available and avoid running too many
concurrent jobs at once that would total more than that.
This implements (1) not with an operating system enforced limit, but by
checking the maxrss after a child process exits.
However it does implement (2) correctly.
The available memory used by the build system defaults to the total
system memory, regardless of whether it is used by other processes at
the time of spawning the build runner. This value can be overridden with
the new --maxrss flag to `zig build`. This mechanism will ensure that
the sum total of upper bound RSS memory of concurrent tasks will not
exceed this value.
This system makes it so that project maintainers can annotate
problematic subprocesses, avoiding bug reports from users, who can
blissfully execute `zig build` without worrying about the project's
internals.
Nobody's computer crashes, and the build system uses as much parallelism
as possible without risking OOM. Users do not need to unnecessarily
resort to -j1 when the build system can figure this out for them.
* remove std.Build.updateFile. I noticed some people use it from
build.zig (declare phase) when it is intended only for use in the
make phase.
- This also was incorrectly reporting errors with std.log.
* std.Build.InstallArtifactStep
- report better errors on failure
- report whether the step was cached or not
* std.Build.InstallDirStep: report better error on failure
* std.Build.InstallFileStep: report better error on failure
* Eliminate all uses of `std.debug.print` in make() functions, instead
properly using the step failure reporting mechanism.
* Introduce the concept of skipped build steps. These do not cause the
build to fail, and they do allow their dependants to run.
* RunStep gains a new flag, `skip_foreign_checks` which causes the
RunStep to be skipped if stdio mode is `check` and the binary cannot
be executed due to it being a foreign executable.
- RunStep is improved to automatically use known interpreters to
execute binaries if possible (integrating with flags such as
-fqemu and -fwasmtime). It only does this after attempting a native
execution and receiving a "exec file format" error.
- Update RunStep to use an ArrayList for the checks rather than this
ad-hoc reallocation/copying mechanism.
- `expectStdOutEqual` now also implicitly adds an exit_code==0 check
if there is not already an expected termination. This matches
previously expected behavior from older API and can be overridden by
directly setting the checks array.
* Add `dest_sub_path` to `InstallArtifactStep` which allows choosing an
arbitrary subdirectory relative to the prefix, as well as overriding
the basename.
- Delete the custom InstallWithRename step that I found deep in the
test/ directory.
* WriteFileStep will now update its step display name after the first
file is added.
* Add missing stdout checks to various standalone test case build
scripts.
Rework std.Build.Step to have an `owner: *Build` field. This
simplified the implementation of installation steps, as well as provided
some much-needed common API for the new parallelized build system.
--verbose is now defined very concretely: it prints to stderr just
before spawning a child process.
Child process execution is updated to conform to the new
parallel-friendly make() function semantics.
DRY up the failWithCacheError handling code. It now integrates properly
with the step graph instead of incorrectly dumping to stderr and calling
process exit.
In the main CLI, fix `zig fmt` crash when there are no errors and stdin
is used.
Deleted steps:
* EmulatableRunStep - this entire thing can be removed in favor of a
flag added to std.Build.RunStep called `skip_foreign_checks`.
* LogStep - this doesn't really fit with a multi-threaded build runner
and is effectively superseded by the new build summary output.
build runner:
* add -fsummary and -fno-summary to override the default behavior,
which is to print a summary if any of the build steps fail.
* print the dep prefix when emitting error messages for steps.
std.Build.FmtStep:
* This step now supports exclude paths as well as a check flag.
* The check flag decides between two modes, modify mode, and check
mode. These can be used to update source files in place, or to fail
the build, respectively.
Zig's own build.zig:
* The `test-fmt` step will do all the `zig fmt` checking that we expect
to be done. Since the `test` step depends on this one, we can simply
remove the explicit call to `zig fmt` in the CI.
* The new `fmt` step will actually perform `zig fmt` and update source
files in place.
std.Build.RunStep:
* expose max_stdio_size is a field (previously an unchangeable
hard-coded value).
* rework the API. Instead of configuring each stream independently,
there is a `stdio` field where you can choose between
`infer_from_args`, `inherit`, or `check`. These determine whether the
RunStep is considered to have side-effects or not. The previous
field, `condition` is gone.
* when stdio mode is set to `check` there is a slice of any number of
checks to make, which include things like exit code, stderr matching,
or stdout matching.
* remove the ill-defined `print` field.
* when adding an output arg, it takes the opportunity to give itself a
better name.
* The flag `skip_foreign_checks` is added. If this is true, a RunStep
which is configured to check the output of the executed binary will
not fail the build if the binary cannot be executed due to being for
a foreign binary to the host system which is running the build graph.
Command-line arguments such as -fqemu and -fwasmtime may affect
whether a binary is detected as foreign, as well as system
configuration such as Rosetta (macOS) and binfmt_misc (Linux).
- This makes EmulatableRunStep no longer needed.
* Fix the child process handling to properly integrate with the new
bulid API and to avoid deadlocks in stdout/stderr streams by polling
if necessary.
std.Build.RemoveDirStep now uses the open build_root directory handle
instead of an absolute path.
The compiler now provides a server protocol for an interactive session
with another process. The build runner uses this protocol to communicate
compilation errors semantically from zig compiler subprocesses to the
build runner.
The protocol is exposed via stdin/stdout, or on a network socket,
depending on whether the CLI flag `--listen=-` or e.g.
`--listen=127.0.0.1:1337` is used.
Additionally:
* add the zig version string to the build runner cache prefix
* remove --prominent-compile-errors CLI flag because it no longer does
anything. Compilation errors are now unconditionally displayed at the
bottom of the build summary output when using the terminal-based
build runner.
* Remove the color field from std.Build. The build steps are no longer
supposed to interact with stderr directly. Instead they communicate
semantically back to the build runner, which has its own logic about
TTY configuration.
* Use the cleanExit() pattern in the build runner.
* Build steps can now use error.MakeFailed when they have already
properly reported an error, or they can fail with any other error
code in which case the build runner will create a simple message
based on this error code.
Introduces std.zig.ErrorBundle which is a trivially serializeable set
of compilation errors. This is in the standard library so that both
the compiler and the build runner can use it. The idea is they will
use it to communicate compilation errors over a binary protocol.
The binary encoding of ErrorBundle is a bit problematic - I got a little
too aggressive with compaction. I need to change it in a follow-up
commit to use some indirection in the error message list, otherwise
iteration is too unergonomic. In fact it's so problematic right now that
the logic getAllErrorsAlloc() actually fails to produce a viable
ErrorBundle because it puts SourceLocation data in between the root
level ErrorMessage data.
This commit has a simplification - redundant logic for rendering AST
errors to stderr has been removed in favor of moving the logic for
lowering AST errors into AstGen. So even if we get parse errors, the
errors will get lowered into ZIR before being reported. I believe this
will be useful when working on --autofix. Either way, some redundant
brittle logic was happily deleted.
In Compilation, updateSubCompilation() is improved to properly perform
error reporting when a sub-compilation object fails. It no longer dumps
directly to stderr; instead it populates an ErrorBundle object, which
gets added to the parent one during getAllErrorsAlloc().
In package fetching code, instead of dumping directly to stderr, it now
populates an ErrorBundle object, and gets properly reported at the CLI
layer of abstraction.
- improve fn prototypes of process_vm_writev
- make the memory writable in the ELF file
- force the linker to always append the function
- write updates with process_vm_writev
With this commit, the build runner now communicates progress towards
completion of the step graph to the terminal, while also printing the
stderr of child processes as soon as possible, without clobbering each
other, and without clobbering the CLI progress output.
API users can take advantage of these to freely write to the terminal
which has an ongoing progress display, similar to what Ninja does when
compiling C/C++ objects and a warning or error message is printed.
* Step.init() now takes an options struct
* Step.init() now captures a small stack trace and stores it in the
Step so that it can be accessed when printing user-friendly debugging
information, including the lines of code that created the step in
question.
Instead of dumping directly to stderr. This prevents processes running
simultaneously from racing their stderr against each other.
For now it only reports at the end, but an improvement would be to
report as soon as a failed step occurs.
After sorting the step stack so that dependencies can be popped before
their dependants are popped, there is still a situation left to handle
correctly:
Example:
A depends on:
B
C
D depends on:
E
F
They will be ordered like this:
A B C D E F
If there are 6+ cores, then all of them will be evaluated at once,
incorrectly evaluating A and D before their dependencies.
Starting evaluation of F and then E is correct, but waiting until they
are done is not correct because it should start working on B and C as
well.
This commit solves the problem by computing dependants in the dependency
loop checking logic, and then having workers queue up their dependants
when they finish their own work.