mainly this addresses the following use case:
1. Someone creates a template with build.zig.zon, id field included
(note that zig init does not create this problem since it generates
fresh id every time it runs).
2. User A uses the template, changing package name to "example" but not
id field.
3. User B uses the same template, changing package name also to
"example", also not changing the id field.
Here, both packages have unintentional conflicting logical ids.
By making the field a combination of name checksum + random id, this
accident is avoided. "nonce" is an OK name for this.
Also relaxes errors on remote packages when using `zig fetch`.
Introduces the `id` field to `build.zig.zon`.
Together with name, this represents a globally unique package
identifier. This field should be initialized with a 16-bit random number
when the package is first created, and then *never change*. This allows
Zig to unambiguously detect when one package is an updated version of
another.
When forking a Zig project, this id should be regenerated with a new
random number if the upstream project is still maintained. Otherwise,
the fork is *hostile*, attempting to take control over the original
project's identity.
`0x0000` is invalid because it obviously means a random number wasn't
used.
`0xffff` is reserved to represent "naked" packages.
Tracking issue #14288
Additionally:
* Fix bad path in error messages regarding build.zig.zon file.
* Manifest validates that `name` and `version` field of build.zig.zon
are maximum 32 bytes.
* Introduce error for root package to not switch to enum literal for
name.
* Introduce error for root package to omit `id`.
* Update init template to generate `id`
* Update init template to populate `minimum_zig_version`.
* New package hash format changes:
- name and version limited to 32 bytes via error rather than truncation
- truncate sha256 to 192 bits rather than 40 bits
- include the package id
This means that, given only the package hashes for a complete dependency
tree, it is possible to perform version selection and know the final
size on disk, without doing any fetching whatsoever. This prevents
wasted bandwidth since package versions not selected do not need to be
fetched.
I pointed a fuzzer at the tokenizer and it crashed immediately. Upon
inspection, I was dissatisfied with the implementation. This commit
removes several mechanisms:
* Removes the "invalid byte" compile error note.
* Dramatically simplifies tokenizer recovery by making recovery always
occur at newlines, and never otherwise.
* Removes UTF-8 validation.
* Moves some character validation logic to `std.zig.parseCharLiteral`.
Removing UTF-8 validation is a regression of #663, however, the existing
implementation was already buggy. When adding this functionality back,
it must be fuzz-tested while checking the property that it matches an
independent Unicode validation implementation on the same file. While
we're at it, fuzzing should check the other properties of that proposal,
such as no ASCII control characters existing inside the source code.
Other changes included in this commit:
* Deprecate `std.unicode.utf8Decode` and its WTF-8 counterpart. This
function has an awkward API that is too easy to misuse.
* Make `utf8Decode2` and friends use arrays as parameters, eliminating a
runtime assertion in favor of using the type system.
After this commit, the crash found by fuzzing, which was
"\x07\xd5\x80\xc3=o\xda|a\xfc{\x9a\xec\x91\xdf\x0f\\\x1a^\xbe;\x8c\xbf\xee\xea"
no longer causes a crash. However, I did not feel the need to add this
test case because the simplified logic eradicates most crashes of this
nature.
The surrogate code points U+D800 to U+DFFF are valid code points but are not Unicode scalar values. This commit makes the error message more accurately reflect what is actually allowed in `\u` escape sequences.
From https://www.unicode.org/versions/Unicode15.0.0/ch03.pdf:
> D71 High-surrogate code point: A Unicode code point in the range U+D800 to U+DBFF.
> D73 Low-surrogate code point: A Unicode code point in the range U+DC00 to U+DFFF.
>
> 3.9 Unicode Encoding Forms
> D76 Unicode scalar value: Any Unicode code point except high-surrogate and low-surrogate code points.
Related: #20270
Build manifest files support `lazy: true` for dependency sections.
This causes the auto-generated dependencies.zig to have 2 more
possibilities:
1. It communicates whether a dependency is lazy or not.
2. The dependency might be acknowledged, but missing due to being lazy
and not fetched.
Lazy dependencies are not fetched by default, but if they are already
fetched then they are provided to the build script.
The build runner reports the set of missing lazy dependenices that are
required to the parent process via stdout and indicates the situation
with exit code 3.
std.Build now has a `lazyDependency` function. I'll let the doc comments
speak for themselves:
When this function is called, it means that the current build does, in
fact, require this dependency. If the dependency is already fetched, it
proceeds in the same manner as `dependency`. However if the dependency
was not fetched, then when the build script is finished running, the
build will not proceed to the make phase. Instead, the parent process
will additionally fetch all the lazy dependencies that were actually
required by running the build script, rebuild the build script, and then
run it again.
In other words, if this function returns `null` it means that the only
purpose of completing the configure phase is to find out all the other
lazy dependencies that are also required.
It is allowed to use this function for non-lazy dependencies, in which
case it will never return `null`. This allows toggling laziness via
build.zig.zon without changing build.zig logic.
The CLI for `zig build` detects this situation, but the logic for then
redoing the build process with these extra dependencies fetched is not
yet implemented.