added adapter to AnyWriter and GenericWriter to help bridge the gap
between old and new API
make std.testing.expectFmt work at compile-time
std.fmt no longer has a dependency on std.unicode. Formatted printing
was never properly unicode-aware. Now it no longer pretends to be.
Breakage/deprecations:
* std.fs.File.reader -> std.fs.File.deprecatedReader
* std.fs.File.writer -> std.fs.File.deprecatedWriter
* std.io.GenericReader -> std.io.Reader
* std.io.GenericWriter -> std.io.Writer
* std.io.AnyReader -> std.io.Reader
* std.io.AnyWriter -> std.io.Writer
* std.fmt.format -> std.fmt.deprecatedFormat
* std.fmt.fmtSliceEscapeLower -> std.ascii.hexEscape
* std.fmt.fmtSliceEscapeUpper -> std.ascii.hexEscape
* std.fmt.fmtSliceHexLower -> {x}
* std.fmt.fmtSliceHexUpper -> {X}
* std.fmt.fmtIntSizeDec -> {B}
* std.fmt.fmtIntSizeBin -> {Bi}
* std.fmt.fmtDuration -> {D}
* std.fmt.fmtDurationSigned -> {D}
* {} -> {f} when there is a format method
* format method signature
- anytype -> *std.io.Writer
- inferred error set -> error{WriteFailed}
- options -> (deleted)
* std.fmt.Formatted
- now takes context type explicitly
- no fmt string
Both sliceTo and indexOfScalarPos use SIMD when available to speed up the search. On my x86_64 machine, this leads to getenvW being around 2-3x faster overall.
Additionally, any future improvements to sliceTo/indexOfScalarPos will benefit getenvW.
This code previously added 4 NUL code units, but that was likely due to a misinterpretation of this part of the CreateProcess documentation:
> A Unicode environment block is terminated by four zero bytes: two for the last string, two more to terminate the block.
(four zero *bytes* means *two* zero code units)
Additionally, the second zero code unit is only actually needed when the environment is empty due to a quirk of the CreateProcess implementation. In the case of a non-empty environment, there always ends up being two trailing NUL code units since one will come after the last environment variable in the block.
* fix merge conflicts
* rename the declarations
* reword documentation
* extract FixedBufferAllocator to separate file
* take advantage of locals
* remove the assertion about max alignment in Allocator API, leaving it
Allocator implementation defined
* fix non-inline function call in start logic
The GeneralPurposeAllocator implementation is totally broken because it
uses global state but I didn't address that in this commit.
heap.zig: define new default page sizes
heap.zig: add min/max_page_size and their options
lib/std/c: add miscellaneous declarations
heap.zig: add pageSize() and its options
switch to new page sizes, especially in GPA/stdlib
mem.zig: remove page_size
It is now composed of these main sections:
* Declarations that are shared among all operating systems.
* Declarations that have the same name, but different type signatures
depending on the operating system. Often multiple operating systems
share the same type signatures however.
* Declarations that are specific to a single operating system.
- These are imported one per line so you can see where they come from,
protected by a comptime block to prevent accessing the wrong one.
Closes#19352 by changing the convention to making types `void` and
functions `{}`, so that it becomes possible to update `@hasDecl` sites
to use `@TypeOf(f) != void` or `T != void`. Happily, this ended up
removing some duplicate logic and update some bitrotted feature
detection checks.
A handful of types have been modified to gain namespacing and type
safety. This is a breaking change.
Oh, and the last usage of `usingnamespace` site is eliminated.
Now that we use the PEB to get the precise length of the command line string, there's no need for a multi-item pointer/sliceTo call. This provides a minor speedup:
Benchmark 1 (153 runs): benchargv-before.exe
measurement mean ± σ min … max outliers delta
wall_time 32.7ms ± 429us 32.1ms … 36.9ms 1 ( 1%) 0%
peak_rss 6.49MB ± 5.62KB 6.46MB … 6.49MB 14 ( 9%) 0%
Benchmark 2 (157 runs): benchargv-after.exe
measurement mean ± σ min … max outliers delta
wall_time 31.9ms ± 236us 31.4ms … 32.7ms 4 ( 3%) ⚡- 2.4% ± 0.2%
peak_rss 6.49MB ± 4.77KB 6.46MB … 6.49MB 14 ( 9%) + 0.0% ± 0.0%
Previously, to ensure args were encoded as well-formed WTF-8 (i.e. no encoded surrogate pairs), the code unit would be encoded and then the last 6 emitted bytes would be checked to see if they were a surrogate pair, and this was done for any emitted code unit (although this was not necessary, it should have only been done when emitting a low surrogate).
After this commit, we still want to ensure well-formed WTF-8, but, to do so, the last emitted code point is stored, meaning we can just directly check that the last code unit is a high surrogate and the current code unit is a low surrogate to determine if we have a surrogate pair.
This provides some performance benefit over and above a "use the same strategy as before but only check when we're emitting a low surrogate" implementation:
Benchmark 1 (111 runs): benchargv-master.exe
measurement mean ± σ min … max outliers delta
wall_time 45.2ms ± 532us 44.5ms … 49.4ms 2 ( 2%) 0%
peak_rss 6.49MB ± 3.94KB 6.46MB … 6.49MB 10 ( 9%) 0%
Benchmark 2 (154 runs): benchargv-storelast.exe
measurement mean ± σ min … max outliers delta
wall_time 32.6ms ± 293us 32.2ms … 34.2ms 8 ( 5%) ⚡- 27.8% ± 0.2%
peak_rss 6.49MB ± 5.15KB 6.46MB … 6.49MB 15 (10%) - 0.0% ± 0.0%
Benchmark 3 (131 runs): benchargv-onlylow.exe
measurement mean ± σ min … max outliers delta
wall_time 38.4ms ± 257us 37.9ms … 39.6ms 5 ( 4%) ⚡- 15.1% ± 0.2%
peak_rss 6.49MB ± 5.70KB 6.46MB … 6.49MB 9 ( 7%) - 0.0% ± 0.0%
Before this commit, the WTF-16 command line string would be converted to WTF-8 in `init`, and then a second buffer of the WTF-8 size + 1 would be allocated to store the parsed arguments. The converted WTF-8 command line would then be parsed and the relevant bytes would be copied into the argument buffer before being returned.
After this commit, only the WTF-8 size of the WTF-16 string is calculated (without conversion) which is then used to allocate the buffer for the parsed arguments. Parsing is then done on the WTF-16 slice directly, with the arguments being converted to WTF-8 on-the-fly.
This has a few (minor) benefits:
- Cuts the amount of memory allocated by ArgIteratorWindows in half (or better)
- Makes the total amount of memory allocated by ArgIteratorWindows predictable, since, before, the upfront `wtf16LeToWtf8Alloc` call could end up allocating more-memory-than-necessary temporarily due to its internal use of an ArrayList. Now, the amount of memory allocated is always exactly `calcWtf8Len(cmd_line) + 1`.
Deprecated aliases that are now compile errors:
- `std.fs.MAX_PATH_BYTES` (renamed to `std.fs.max_path_bytes`)
- `std.mem.tokenize` (split into `tokenizeAny`, `tokenizeSequence`, `tokenizeScalar`)
- `std.mem.split` (split into `splitSequence`, `splitAny`, `splitScalar`)
- `std.mem.splitBackwards` (split into `splitBackwardsSequence`, `splitBackwardsAny`, `splitBackwardsScalar`)
- `std.unicode`
+ `utf16leToUtf8Alloc`, `utf16leToUtf8AllocZ`, `utf16leToUtf8`, `fmtUtf16le` (all renamed to have capitalized `Le`)
+ `utf8ToUtf16LeWithNull` (renamed to `utf8ToUtf16LeAllocZ`)
- `std.zig.CrossTarget` (moved to `std.Target.Query`)
Deprecated `lib/std/std.zig` decls were deleted instead of made a `@compileError` because the `refAllDecls` in the test block would trigger the `@compileError`. The deleted top-level `std` namespaces are:
- `std.rand` (renamed to `std.Random`)
- `std.TailQueue` (renamed to `std.DoublyLinkedList`)
- `std.ChildProcess` (renamed/moved to `std.process.Child`)
This is not exhaustive. Deprecated aliases that I didn't touch:
+ `std.io.*`
+ `std.Build.*`
+ `std.builtin.Mode`
+ `std.zig.c_translation.CIntLiteralRadix`
+ anything in `src/`
This makes it so that any other threads which are writing to stderr have
a chance to finish before the process terminates. It also clears the
terminal in case any progress has been written to stderr, while still
accomplishing the goal of not waiting until the update thread exits.
On Windows, the command line arguments of a program are a single WTF-16 encoded string and it's up to the program to split it into an array of strings. In C/C++, the entry point of the C runtime takes care of splitting the command line and passing argc/argv to the main function.
https://github.com/ziglang/zig/pull/18309 updated ArgIteratorWindows to match the behavior of CommandLineToArgvW, but it turns out that CommandLineToArgvW's behavior does not match the behavior of the C runtime post-2008. In 2008, the C runtime argv splitting changed how it handles consecutive double quotes within a quoted argument (it's now considered an escaped quote, e.g. `"foo""bar"` post-2008 would get parsed into `foo"bar`), and the rules around argv[0] were also changed.
This commit makes ArgIteratorWindows match the behavior of the post-2008 C runtime, and adds a standalone test that verifies the behavior matches both the MSVC and MinGW argv splitting exactly in all cases (it checks that randomly generated command line strings get split the same way).
The motivation here is roughly the same as when the same change was made in Rust (https://github.com/rust-lang/rust/pull/87580), that is (paraphrased):
- Consistent behavior between Zig and modern C/C++ programs
- Allows users to escape double quotes in a way that can be more straightforward
Additionally, the suggested mitigation for BatBadBut (https://flatt.tech/research/posts/batbadbut-you-cant-securely-execute-commands-on-windows/) relies on the post-2008 argv splitting behavior for roundtripping of the arguments given to `cmd.exe`. Note: it's not necessary for the suggested mitigation to work, but it is necessary for the suggested escaping to be parsed back into the intended argv by ArgIteratorWindows after being run through a `.bat` file.
Windows paths now use WTF-16 <-> WTF-8 conversion everywhere, which is lossless. Previously, conversion of ill-formed UTF-16 paths would either fail or invoke illegal behavior.
WASI paths must be valid UTF-8, and the relevant function calls have been updated to handle the possibility of failure due to paths not being encoded/encodable as valid UTF-8.
Closes#18694Closes#1774Closes#2565
This adds `ArgIteratorWindows`, which faithfully replicates the quoting and escaping behavior observed in `CommandLineToArgvW` and should make Zig applications play better with processes that abuse these quirks.
Finish the work started in 4c4fb839972f66f55aa44fc0aca5f80b0608c731.
Now the compiler compiles again.
Wire up dependency tree fetching code in the CLI for `zig build`.
Everything is hooked up except for `createDependenciesModule` is not yet
implemented.
- Adds `illumos` to the `Target.Os.Tag` enum. A new function,
`isSolarish` has been added that returns true if the tag is either
Solaris or Illumos. This matches the naming convention found in Rust's
`libc` crate[1].
- Add the tag wherever `.solaris` is being checked against.
- Check for the C pre-processor macro `__illumos__` in CMake to set the
proper target tuple. Illumos distros patch their compilers to have
this in the "built-in" set (verified with `echo | cc -dM -E -`).
Alternatively you could check the output of `uname -o`.
Right now, both Solaris and Illumos import from `c/solaris.zig`. In the
future it may be worth putting the shared ABI bits in a base file, and
mixing that in with specific `c/solaris.zig`/`c/illumos.zig` files.
[1]: 6e02a329a2/src/unix/solarish
Most of this migration was performed automatically with `zig fmt`. There
were a few exceptions which I had to manually fix:
* `@alignCast` and `@addrSpaceCast` cannot be automatically rewritten
* `@truncate`'s fixup is incorrect for vectors
* Test cases are not formatted, and their error locations change
- fix getdents return type usize → c_int
- special-case process.zig to use sysctl instead of sysctlbyname
- use struct/field pattern for sysctl HW_* constants