68d2f68ed introduced special handling for StructInit fields
containing multiline strings to prevent inserting whitespace after =.
However, this logic didn't handle cases without a trailing comma,
which resulted in unwanted trailing whitespace.
* threaded K12: separate context computation from thread spawning
Compute all contexts and store them in a pre-allocated array,
then spawn threads using the pre-computed contexts.
This ensures each context is fully materialized in memory with the
correct values before any thread tries to access it.
* kt128: unroll the permutation rounds only twice
This appears to deliver the best performance thanks to improved cache
utilization, and it’s consistent with what we already do for SHA3.
KT128 and KT256 are fast, secure cryptographic hash functions based on Keccak (SHA-3).
They can be seen as the modern version of SHA-3, and evolution of SHAKE, with better performance.
After the SHA-3 competition, the Keccak team proposed these variants in 2016, and the constructions underwent 8 years of public scrutiny before being standardized in October 2025 as RFC 9861.
They uses a tree-hashing mode on top of TurboSHAKE, providing both high security and excellent performance, especially on large inputs.
They support arbitrary-length output and optional customization strings.
Hashing of very large inputs can be done using multiple threads, for high throughput.
KT128 provides 128-bit security strength, equivalent to AES-128 and SHAKE128, which is sufficient for virtually all applications.
KT256 provides 256-bit security strength, equivalent to SHA-512. For virtually all applications, KT128 is enough (equivalent to SHA-256 or BLAKE3).
For small inputs, TurboSHAKE128 and TurboSHAKE256 (which KT128 and KT256 are based on) can be used instead as they have less overhead.
Also remove the example implementation from the file doc comment; it's
better to just link to `defaultLog` as an example, since this avoids
writing the example implementation twice and prevents the example from
bitrotting.
`std.Io.tty.Config.detect` may be an expensive check (e.g. involving
syscalls), and doing it every time we need to print isn't really
necessary; under normal usage, we can compute the value once and cache
it for the whole program's execution. Since anyone outputting to stderr
may reasonably want this information (in fact they are very likely to),
it makes sense to cache it and return it from `lockStderrWriter`. Call
sites who do not need it will experience no significant overhead, and
can just ignore the TTY config with a `const w, _` destructure.