mirrors/zig - "Borealis" Git by INX: Hosted by INX "Xenon".

mirror of https://codeberg.org/ziglang/zig.git synced 2025-12-09 23:29:03 +00:00

Author	SHA1	Message	Date
Kendall Condon	93775de45f	rework fuzz testing to be smith based -- On the standard library side: The `input: []const u8` parameter of functions passed to `testing.fuzz` has changed to `smith: testing.Smith`. `Smith` is used to generate values from libfuzzer or input bytes generated by libfuzzer. `Smith` contains the following base methods: `value` as a generic method for generating any type * `eos` for generating end-of-stream markers. Provides the additional guarantee `true` will eventually by provided. * `bytes` for filling a byte array. * `slice` for filling part of a buffer and providing the length. `Smith.Weight` is used for giving value ranges a higher probability of being selected. By default, every value has a weight of zero (i.e. they will not be selected). Weights can only apply to values that fit within a u64. The above functions have corresponding ones that accept weights. Additionally, the following functions are provided: * `baselineWeights` which provides a set of weights containing every possible value of a type. * `eosSimpleWeighted` for unique weights for `true` and `false` * `valueRangeAtMost` and `valueRangeLessThan` for weighing only a range of values. -- On the libfuzzer and abi side: --- Uids These are u32s which are used to classify requested values. This solves the problem of a mutation causing a new value to be requested and shifting all future values; for example: 1. An initial input contains the values 1, 2, 3 which are interpreted as a, b, and c respectively by the test. 2. The 1 is mutated to a 4 which causes the test to request an extra value interpreted as d. The input is now 4, 2, 3, 5 (new value) which the test corresponds to a, d, b, c; however, b and c no longer correspond to their original values. Uids contain a hash component and type component. The hash component is currently determined in `Smith` by taking a hash of the calling `@returnAddress()` or via an argument in the corresponding `WithHash` functions. The type component is used extensively in libfuzzer with its hashmaps. --- Mutations At the start of a cycle (a run), a random number of values to mutate is selected with less being exponentially more likely. The indexes of the values are selected from a selected uid with a logarithmic bias to uids with more values. Mutations may change a single values, several consecutive values in a uid, or several consecutive values in the uid-independent order they were requested. They may generate random values, mutate from previous ones, or copy from other values in the same uid from the same input or spliced from another. For integers, mutations from previous ones currently only generates random values. For bytes, mutations from previous mix new random data and previous bytes with a set number of mutations. --- Passive Minimization A different approach has been taken for minimizing inputs: instead of trying a fixed set of mutations when a fresh input is found, the input is instead simply added to the corpus and removed when it is no longer valuable. The quality of an input is measured based off how many unique pcs it hit and how many values it needed from the fuzzer. It is tracked which inputs hold the best qualities for each pc for hitting the minimum and maximum unique pcs while needing the least values. Once all an input's qualities have been superseded for the pcs it hit, it is removed from the corpus. -- Comparison to byte-based smith A byte-based smith would be much more inefficient and complex than this solution. It would be unable to solve the shifting problem that Uids do. It is unable to provide values from the fuzzer past end-of-stream. Even with feedback, it would be unable to act on dynamic weights which have proven essential with the updated tests (e.g. to constrain values to a range). -- Test updates All the standard library tests have been updated to use the new smith interface. For `Deque`, an ad hoc allocator was written to improve performance and remove reliance on heap allocation. `TokenSmith` has been added to aid in testing Ast and help inform decisions on the smith interface.	2025-11-23 14:58:22 -05:00
Kendall Condon	0dbbb93dbe	fix fuzzing speed with prior runs	2025-11-23 12:20:59 -05:00
Matthew Lugg	010dcd6a9b	fuzzer: account for runtime address slide This is relevant to PIEs, which are notably enabled by default on macOS. The build system needs to only see virtual addresses, that is, those which do not have the slide applied; but the fuzzer itself naturally sees relocated addresses (i.e. with the slide applied). We just need to subtract the slide when we communicate addresses to the build system.	2025-11-20 10:42:20 +00:00
mlugg	7e7d7875b9	std.Build: implement unit test timeouts For now, there is a flag to `zig build` called `--test-timeout-ms` which accepts a value in milliseconds. If the execution time of any individual unit test exceeds that number of milliseconds, the test is terminated and marked as timed out. In the future, we may want to increase the granularity of this feature by allowing timeouts to be specified per-step or even per-test. However, a global option is actually very useful. In particular, it can be used in CI scripts to ensure that no individual unit test exceeds some reasonable limit (e.g. 60 seconds) without having to assign limits to every individual test step in the build script. Also, individual unit test durations are now shown in the time report web interface -- this was fairly trivial to add since we're timing tests (to check for timeouts) anyway. This commit makes progress on #19821, but does not close it, because that proposal includes a more sophisticated mechanism for setting timeouts. Co-Authored-By: David Rubin <david@vortan.dev>	2025-10-18 09:28:39 +01:00
Loris Cro	9bb0b43ea3	implement review suggestions	2025-09-25 18:20:19 +02:00
Loris Cro	0feacc2b81	fuzzing: implement limited fuzzing Adds the limit option to `--fuzz=[limit]`. the limit expresses a number of iterations that each fuzz test will perform at maximum before exiting. The limit argument supports also 'K', 'M', and 'G' suffixeds (e.g. '10K'). Does not imply `--web-ui` (like unlimited fuzzing does) and prints a fuzzing report at the end. Closes #22900 but does not implement the time based limit, as after internal discussions we concluded to be problematic to both implement and use correctly.	2025-09-24 12:46:48 +02:00
Kendall Condon	e66b269333	greatly improve capabilities of the fuzzer This PR significantly improves the capabilities of the fuzzer. The changes made to the fuzzer to accomplish this feat mostly include tracking memory reads from .rodata to determine fresh inputs, new mutations (especially the ones that insert const values from .rodata reads and __sanitizer_conv_const_cmp), and minimizing found inputs. Additionally, the runs per second has greatly been increased due to generating smaller inputs and avoiding clearing the 8-bit pc counters. An additional feature added is that the length of the input file is now stored and the old input file is rerun upon start. Other changes made to the fuzzer include more logical initialization, using one shared file `in` for inputs, creating corpus files with proper sizes, and using hexadecimal-numbered corpus files for simplicity. Furthermore, I added several new fuzz tests to gauge the fuzzer's efficiency. I also tried to add a test for zstandard decompression, which it crashed within 60,000 runs (less than a second.) Bug fixes include: * Fixed a race conditions when multiple fuzzer processes needed to use the same coverage file. * Web interface stats now update even when unique runs is not changing. * Fixed tokenizer.testPropertiesUpheld to allow stray carriage returns since they are valid whitespace.	2025-09-18 18:56:10 -04:00
mlugg	dcc3e6e1dd	build system: replace fuzzing UI with build UI, add time report This commit replaces the "fuzzer" UI, previously accessed with the `--fuzz` and `--port` flags, with a more interesting web UI which allows more interactions with the Zig build system. Most notably, it allows accessing the data emitted by a new "time report" system, which allows users to see which parts of Zig programs take the longest to compile. The option to expose the web UI is `--webui`. By default, it will listen on `[::1]` on a random port, but any IPv6 or IPv4 address can be specified with e.g. `--webui=[::1]:8000` or `--webui=127.0.0.1:8000`. The options `--fuzz` and `--time-report` both imply `--webui` if not given. Currently, `--webui` is incompatible with `--watch`; specifying both will cause `zig build` to exit with a fatal error. When the web UI is enabled, the build runner spawns the web server as soon as the configure phase completes. The frontend code consists of one HTML file, one JavaScript file, two CSS files, and a few Zig source files which are built into a WASM blob on-demand -- this is all very similar to the old fuzzer UI. Also inherited from the fuzzer UI is that the build system communicates with web clients over a WebSocket connection. When the build finishes, if `--webui` was passed (i.e. if the web server is running), the build runner does not terminate; it continues running to serve web requests, allowing interactive control of the build system. In the web interface is an overall "status" indicating whether a build is currently running, and also a list of all steps in this build. There are visual indicators (colors and spinners) for in-progress, succeeded, and failed steps. There is a "Rebuild" button which will cause the build system to reset the state of every step (note that this does not affect caching) and evaluate the step graph again. If `--time-report` is passed to `zig build`, a new section of the interface becomes visible, which associates every build step with a "time report". For most steps, this is just a simple "time taken" value. However, for `Compile` steps, the compiler communicates with the build system to provide it with much more interesting information: time taken for various pipeline phases, with a per-declaration and per-file breakdown, sorted by slowest declarations/files first. This feature is still in its early stages: the data can be a little tricky to understand, and there is no way to, for instance, sort by different properties, or filter to certain files. However, it has already given us some interesting statistics, and can be useful for spotting, for instance, particularly complex and slow compile-time logic. Additionally, if a compilation uses LLVM, its time report includes the "LLVM pass timing" information, which was previously accessible with the (now removed) `-ftime-report` compiler flag. To make time reports more useful, ZIR and compilation caches are ignored by the Zig compiler when they are enabled -- in other words, `Compile` steps always run, even if their result should be cached. This means that the flag can be used to analyze a project's compile time without having to repeatedly clear cache directory, for instance. However, when using `-fincremental`, updates other than the first will only show you the statistics for what changed on that particular update. Notably, this gives us a fairly nice way to see exactly which declarations were re-analyzed by an incremental update. If `--fuzz` is passed to `zig build`, another section of the web interface becomes visible, this time exposing the fuzzer. This is quite similar to the fuzzer UI this commit replaces, with only a few cosmetic tweaks. The interface is closer than before to supporting multiple fuzz steps at a time (in line with the overall strategy for this build UI, the goal will be for all of the fuzz steps to be accessible in the same interface), but still doesn't actually support it. The fuzzer UI looks quite different under the hood: as a result, various bugs are fixed, although other bugs remain. For instance, viewing the source code of any file other than the root of the main module is completely broken (as on master) due to some bogus file-to-module assignment logic in the fuzzer UI. Implementation notes: * The `lib/build-web/` directory holds the client side of the web UI. * The general server logic is in `std.Build.WebServer`. * Fuzzing-specific logic is in `std.Build.Fuzz`. * `std.Build.abi` is the new home of `std.Build.Fuzz.abi`, since it now relates to the build system web UI in general. * The build runner now has an actual general-purpose allocator, because thanks to `--watch` and `--webui`, the process can be arbitrarily long-lived. The gpa is `std.heap.DebugAllocator`, but the arena remains backed by `std.heap.page_allocator` for efficiency. I fixed several crashes caused by conflation of `gpa` and `arena` in the build runner and `std.Build`, but there may still be some I have missed. * The I/O logic in `std.Build.WebServer` is pretty gnarly; there are a lot of threads involved. I anticipate this situation improving significantly once the `std.Io` interface (with concurrency support) is introduced.	2025-08-01 23:48:21 +01:00

8 commits