While it is already mentioned on the `items` attributes of the structs, it is
interesting to comment in every method potentially invalidating pointers to items
that they may do so.
- the meaning of packed structs changed in zig 0.10. adjust accordingly.
Use "extern struct" for the cases that directly map to C structs.
- Add new type info kinds, like enum64 and DeclTag
- change the Type enum to use the canonical names from libbpf.
This is more predictable when comparing with external BPF
documentation (than invented synonyms that need to be guessed)
* crypto.core.aes: process 6 block in parallel instead of 8 on aarch64
At least on Apple Silicon, this is slightly faster than 8 blocks.
* AES: add parallel blocks for tigerlake, rocketlake, alderlake, zen3
...instead of hard-coding it to 20.
- This is consistent with the ChaCha implementation
- NaCl and libsodium, that this API is designed to interop with,
also support 8 and 12 round variants. The 12 round variant, in
particular, provides the same security level as the 20 round variant,
but is obviously faster.
- scrypt currently uses its own non optimized version of Salsa, just
because it use 8 rounds instead of 20. This will help remove code
duplication.
No behavior nor public API changes. The Salsa20 and XSalsa20 still
represent the 20-round variant.
If the noise parameter was null, we didn't use any noise at all.
We unconditionally generated random noise (`noise2`) but didn't use it.
Spotted by @cryptocode, thanks!
Instead of making the memory alignment functions more complicated, I
added more API documentation for their existing semantics.
closes#12118closes#12135
* std.os.uefi: integer backed structs, add tests to catch regressions
device_path_protocol now uses extern structs with align(1) fields because
the transition to integer backed packed struct broke alignment
added comptime asserts that device_path_protocol structs do not violate
alignment and size specifications
Make the test use the minimum length and set MAX_NAME_BYTES to the maximum so that:
- the test will work on any host platform
- *and* the MAX_NAME_BYTES will be able to hold the max file name component on any host platform
Each u16 within a file name component can be encoded as up to 3 UTF-8 bytes, so we need to use MAX_NAME_BYTES to account for all possible UTF-8 encoded names.
Fixes#8268
Comptime code can't execute assembly code, so we need some way to
force comptime code to use the generic path. This should be replaced
with whatever is implemented for #868, when that day comes.
I am seeing that the result for the hash is incorrect in stage1 and
crashes stage2, so presumably this never worked correctly. I will follow
up on that soon.
This gets us most of the way back to the performance I had when
I was using the LLVM intrinsics:
- Intel Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz:
190.67 MB/s (w/o intrinsics) -> 1285.08 MB/s
- AMD EPYC 7763 (VM) @ 2.45 GHz:
240.09 MB/s (w/o intrinsics) -> 1360.78 MB/s
- Apple M1:
216.96 MB/s (w/o intrinsics) -> 2133.69 MB/s
Minor changes to this source can swing performance from 400 MB/s to
1400 MB/s or... 20 MB/s, depending on how it interacts with the
optimizer. I have a sneaking suspicion that despite LLVM inheriting
GCC's extremely strict inline assembly semantics, its passes are
rather skittish around inline assembly (and almost certainly, its
instruction cost models can assume nothing)
There's probably plenty of room to optimize these further in the
future, but for the moment this gives ~3x improvement on Intel
x86-64 processors, ~5x on AMD, and ~10x on M1 Macs.
These extensions are very new - Most processors prior to 2020 do
not support them.
AVX-512 is a slightly older alternative that we could use on Intel
for a much bigger performance bump, but it's been fused off on
Intel's latest hybrid architectures and it relies on computing
independent SHA hashes in parallel. In contrast, these SHA intrinsics
provide the usual single-threaded, single-stream interface, and should
continue working on new processors.
AArch64 also has SHA-512 intrinsics that we could take advantage
of in the future