The inverse MixColumns operation is already used internally for
AES decryption, but it wasn’t exposed in the public API because
it didn’t seem necessary at the time.
Since then, several new AES-based block ciphers and permutations
(such as Vistrutah and Areion) have been developed, and they require
this operation to be implementable in Zig.
Since then, new interesting AES-based block ciphers and permutations
(Vistrutah, Areion, etc). have been invented, and require that
operation to be implementable in Zig.
* std.crypto.aes: introduce AES block vectors
Modern Intel CPUs with the VAES extension can handle more than a
single AES block per instruction.
So can some ARM and RISC-V CPUs. Software implementations with
bitslicing can also greatly benefit from this.
Implement low-level operations on AES block vectors, and the
parallel AEGIS variants on top of them.
AMD Zen4:
aegis-128x4: 73225 MiB/s
aegis-128x2: 51571 MiB/s
aegis-128l: 25806 MiB/s
aegis-256x4: 46742 MiB/s
aegis-256x2: 30227 MiB/s
aegis-256: 8436 MiB/s
aes128-gcm: 5926 MiB/s
aes256-gcm: 5085 MiB/s
AES-GCM, and anything based on AES-CTR are also going to benefit
from this later.
* Make AEGIS-MAC twice a fast
Apple M1/M2 have an EOR3 instruction that can XOR 2 operands with
another one, and LLVM knows how to take advantage of it.
However, two EOR can't be automatically combined into an EOR3 if
one of them is in an assembly block.
That simple change speeds up ciphers doing an AES round immediately
followed by a XOR operation on Apple Silicon.
Before:
aegis-128l mac: 12534 MiB/s
aegis-256 mac: 6722 MiB/s
aegis-128l: 10634 MiB/s
aegis-256: 6133 MiB/s
aes128-gcm: 3890 MiB/s
aes256-gcm: 3122 MiB/s
aes128-ocb: 2832 MiB/s
aes256-ocb: 2057 MiB/s
After:
aegis-128l mac: 15667 MiB/s
aegis-256 mac: 8240 MiB/s
aegis-128l: 12656 MiB/s
aegis-256: 7214 MiB/s
aes128-gcm: 3976 MiB/s
aes256-gcm: 3202 MiB/s
aes128-ocb: 2835 MiB/s
aes256-ocb: 2118 MiB/s
* crypto.core.aes: process 6 block in parallel instead of 8 on aarch64
At least on Apple Silicon, this is slightly faster than 8 blocks.
* AES: add parallel blocks for tigerlake, rocketlake, alderlake, zen3
We already have a LICENSE file that covers the Zig Standard Library. We
no longer need to remind everyone that the license is MIT in every single
file.
Previously this was introduced to clarify the situation for a fork of
Zig that made Zig's LICENSE file harder to find, and replaced it with
their own license that required annual payments to their company.
However that fork now appears to be dead. So there is no need to
reinforce the copyright notice in every single file.
- use `PascalCase` for all types. So, AES256GCM is now Aes256Gcm.
- consistently use `_length` instead of mixing `_size` and `_length` for the
constants we expose
- Use `minimum_key_length` when it represents an actual minimum length.
Otherwise, use `key_length`.
- Require output buffers (for ciphertexts, macs, hashes) to be of the right
size, not at least of that size in some functions, and the exact size elsewhere.
- Use a `_bits` suffix instead of `_length` when a size is represented as a
number of bits to avoid confusion.
- Functions returning a constant-sized slice are now defined as a slice instead
of a pointer + a runtime assertion. This is the case for most hash functions.
- Use `camelCase` for all functions instead of `snake_case`.
No functional changes, but these are breaking API changes.