This reverts commit 0c99ba1eab, reversing
changes made to 5f92b070bf.
This caused a CI failure when it landed in master branch due to a
128-bit `@byteSwap` in std.mem.
This accomplishes two things:
* Works around #8442 by putting stage1-specific logic in to disable all
the std.json tests.
* Slightly reduces installation size of zig since std lib files ending
in "test.zig" are excluded from being installed.
We already have a LICENSE file that covers the Zig Standard Library. We
no longer need to remind everyone that the license is MIT in every single
file.
Previously this was introduced to clarify the situation for a fork of
Zig that made Zig's LICENSE file harder to find, and replaced it with
their own license that required annual payments to their company.
However that fork now appears to be dead. So there is no need to
reinforce the copyright notice in every single file.
* Switch json testing 'roundTrip()' to use FixedBufferStream, improve error handling, remove comptime from param
* Add 'try' to calls to roundTrip() that can now return an error
* Remove comptime from params in json testing, replace expect(false) with letting error propagate
* Add 'try' to calls to ok() that can now return an error
Co-authored-by: Lewis Gaul <legaul@cisco.com>
Fixes#2379
= Overlong (non-shortest) sequences
UTF-8's unique encoding scheme allows for some Unicode codepoints
to be represented in multiple ways. For any of these characters,
the spec forbids all but the shortest form. These disallowed longer
sequences are called "overlong". As an interesting side effect of
this rule, the bytes C0 and C1 never appear in valid UTF-8.
= Codepoint range
UTF-8 disallows representation of codepoints beyond U+10FFFF,
which is the highest character which can be encoded in UTF-16.
Because a 4-byte sequence is capable of resulting in such characters,
they must be explicitly rejected. This rule also has an interesting
side effect, which is that bytes F5 to FF never appear.
= References
Detecting an overlong version of a codepoint could get gnarly, but
luckily The Unicode Consortium did the hard work by creating this
handy table of valid byte sequences:
https://unicode.org/versions/corrigendum1.html
I thought this mapped nicely to the parser's state machine, so I
rearranged the relevant states to make use of it.