Everybody gets what they want!
* AT_RANDOM is completely ignored.
* On Linux, MADV_WIPEONFORK is used to provide fork safety.
* On pthread systems, `pthread_atfork` is used to provide fork safety.
* For systems that do not have the capability to provide fork safety,
the implementation falls back to calling getrandom() every time.
* If madvise is unavailable or returns an error, or pthread_atfork
fails for whatever reason, it falls back to calling getrandom() every
time.
* Applications may choose to opt-out of fork safety.
* Applications may choose to opt-in to unconditionally calling
getrandom() for every call to std.crypto.random.fillFn.
* Added `std.meta.globalOption`.
* Added `std.os.madvise` and related bits.
* Bumped up the size of the main thread TLS buffer. See the comment
there for justification.
* Simpler hot path in TLS initialization.
* get rid of the pointless fences
* make seed_len 16 instead of 32, which is accurate since it was
already padding the rest anyway; now we do 1 pad instead of 2.
* secureZero to clear the AT_RANDOM auxval
* add a flag root source files can use to disable the start code. This
is in case people want to opt out of the initialization when they
don't depend on it.
std.crypto.random
* cross platform, even freestanding
* can't fail. on initialization for some systems requires calling
os.getrandom(), in which case there are rare but theoretically
possible errors. The code panics in these cases, however the
application may choose to override the default seed function and then
handle the failure another way.
* thread-safe
* supports the full Random interface
* cryptographically secure
* no syscall required to initialize on Linux (AT_RANDOM)
* calls arc4random on systems that support it
`std.crypto.randomBytes` is removed in favor of `std.crypto.random.bytes`.
I moved some of the Random implementations into their own files in the
interest of organization.
stage2 no longer requires passing a RNG; instead it uses this API.
Closes#6704
Transforming scalars to non-adjacent form shrinks the number of
precomputations down to 8, while still processing 4 bits at a time.
However, real-world benchmarks show that the transform is only
really useful with large precomputation tables and for batch
signature verification. So, do it for batch verification only.
https://github.com/cfrg/draft-irtf-cfrg-hash-to-curve
This is quite an important feature to have since many other standards
being worked on depend on this operation.
Brings a couple useful arithmetic operations on field elements by the way.
This PR also adds comments to the functions we expose in 25519/field
so that they can appear in the generated documentation.
We currently have ciphers optimized for performance, for
compatibility, for size and for specific CPUs.
However we lack a class of ciphers that is becoming increasingly
important, as Zig is being used for embedded systems, but also as
hardware-level side channels keep being found on (Intel) CPUs.
Here is ISAPv2, a construction specifically designed for resilience
against leakage and fault attacks.
ISAPv2 is obviously not optimized for performance, but can be an
option for highly sensitive data, when the runtime environment cannot
be trusted.
This is a trivial implementation that just does a or[xor] loop.
However, this pattern is used by virtually all crypto libraries and
in practice, even without assembly barriers, LLVM never turns it into
code with conditional jumps, even if one of the parameters is constant.
This has been verified to still be the case with LLVM 11.0.0.
As documented in the comment right above the finalization function,
Gimli can be used as a XOF, i.e. the output doesn't have a fixed
length.
So, allow it to be used that way, just like BLAKE3.
With the simple rule that whenever we have or will have 2 similar
functions, they should be in their own namespace.
Some of these new namespaces currently contain a single function.
This is to prepare for reduced-round versions that are likely to
be added later.
We read and write bytes directly from the state, but in the init
function, we potentially endian-swap them.
Initialize bytes in native format since we will be reading them
in native format as well later.
Also use the public interface in the "permute" test rather than an
internal interface. The state itself is not meant to be accessed directly,
even in tests.
BLAKE2 includes the expected output length in the initial state.
This length is actually distinct from the actual output length
used at finalization.
BLAKE2b-256/128 is thus not the same as BLAKE2b-128.
This behavior can be a little bit surprising, and has been "fixed"
in BLAKE3.
In order to support this, we may want to provide an option to set the
length used for domain separation.
In Zig, there is another reason to allow this: we assume that the
output length is defined at comptime.
But BLAKE2 doesn't have a fixed output length. For an output length that
is not known at comptime, we can't take the full block size and
truncate it due to the reason above.
What we can do now is set that length as an option to get the correct
initial state, and truncate the output if necessary.
Leverage result location semantics for X25519 like we do everywhere
else in 25519/*
Also add the edwards25519->curve25519 map by the way since many
applications seem to use this to share the same key pair for encryption
and signature.
Intel keeps changing the latency & throughput of the aes* and clmul
instructions every time they release a new model.
Adjust `optimal_parallel_blocks` accordingly, keeping 8 as a safe
default for unknown data.
Gives a ~40% speedup on x86_64.
However, the generic code remains faster on aarch64.
This is still processing only one block at a time for now.
I'm pretty confident that processing more blocks per round
will eventually give a substantial performance improvement on
all platforms with vector units.
The bcrypt function intentionally requires quite a lot of CPU cycles
to complete.
In addition to that, not having its full state constantly in the
CPU L1 cache causes a massive performance drop.
These properties slow down brute-force attacks against low-entropy
inputs (typically passwords), and GPU-based attacks get little
to no advantages over CPUs.
The NaCl constructions are available in pretty much all programming
languages, making them a solid choice for applications that require
interoperability.
Go includes them in the standard library, JavaScript has the popular
tweetnacl.js module, and reimplementations and ports of TweetNaCl
have been made everywhere.
Zig has almost everything that NaCl has at this point, the main
missing component being the Salsa20 cipher, on top on which NaCl's
secretboxes, boxes, and sealedboxes can be implemented.
So, here they are!
And clean the X25519 API up a little bit by the way.
- use `PascalCase` for all types. So, AES256GCM is now Aes256Gcm.
- consistently use `_length` instead of mixing `_size` and `_length` for the
constants we expose
- Use `minimum_key_length` when it represents an actual minimum length.
Otherwise, use `key_length`.
- Require output buffers (for ciphertexts, macs, hashes) to be of the right
size, not at least of that size in some functions, and the exact size elsewhere.
- Use a `_bits` suffix instead of `_length` when a size is represented as a
number of bits to avoid confusion.
- Functions returning a constant-sized slice are now defined as a slice instead
of a pointer + a runtime assertion. This is the case for most hash functions.
- Use `camelCase` for all functions instead of `snake_case`.
No functional changes, but these are breaking API changes.