Clang fails to compile the CBE translation of this code ("non-ASM
statement in naked function"). Similar to the implementations of
`restore_rt` on x86 and ARM, when the CBE is in use, this commit employs
alternative inline assembly that avoids using non-immediate input
operands.
Fixes#25209.
On PowerPC, some registers are both inputs to syscalls and clobbered by
them. An example is r0, which initially contains the syscall number, but
may be overwritten during execution of the syscall.
musl and glibc use a `+` (read-write) constraint to indicate this, which
isn't supported in Zig. The current implementation of PowerPC syscalls
in the Zig standard library instead lists these registers as both inputs
and clobbers, but this results in the C backend generating code that is
invalid for at least some C compilers, like GCC, which doesn't support
the specifying the same register as both an input and a clobber.
This PR changes the PowerPC syscall functions to list such registers as
inputs and outputs rather than inputs and clobbers. Thanks to jacobly0
who pointed out that it's possible to have multiple outputs; I had
gotten the wrong idea from the documentation.
The generic syscall table has different names for syscalls that take a
timespec64 on 32-bit targets, in that it adds the `_time64` suffix.
Similarly, the `_time32` suffix has been removed.
I'm not sure if the existing logic for determining the proper timespec
struct to use was subtly broken, but it should be a good chance to
finish #4726 - we only have 12 years after all...
As for the changes since 6.11..6.16:
6.11:
- x86_64 gets `uretprobe`, a syscall to speed up returning BPF probes.
- Hexagon gets `clone3`, but don't be fooled: it just returns ENOSYS.
6.13:
- The `*xattr` family of syscalls have been enhanced with new `*xattrat`
versions, similar to the other file-based `at` calls.
6.15:
- Atomically create a detached mount tree and set mount options on it.
Finally, this commit also adds the syscall numbers for OpenRISC and maps
it to the `or1k` cpu.
The `atime()`, etc wrappers here expect to create a `std.linux.timespec`
(defined in `linux.zig` to have `isize` fields), so the u32 causes errors:
error: expected type 'isize', found 'u32'
.nsec = self.atim_nsec,
Make the nsec fields signed for consistency with all the other structs,
with and with `std.linux.timespec`.
Also looks like the comment on `__pad1` was copied from `__pad0`, but it
only applies to `__pad0`.
LLVM always assumes these are on. Zig backends do not observe them.
If Zig backends want to start using them, they can be introduced, one
arch at a time, with proper documentation.
Macos uses the BSD definition of msghdr
All linux architectures share a single msghdr definition. Many
architectures had manually inserted padding fields that were endian
specific and some had fields with different integers. This unifies all
architectures to use a single correct msghdr definition.
musl and glibc both specify r0 as an output register because its value
may be overwritten by system calls. As with the updates for 64-bit
PowerPC in the previous commit, this commit brings Zig's syscall
functions for 32-bit PowerPC in line with musl and glibc by adding r0 to
the list of clobbers. (Listing r0 as both an input and a clobber is as
close as we can get to musl, which declares it as a "+r" read-write
output, since Zig doesn't support multiple outputs or the "+"
specifier.)
On powerpc64le Linux, the registers used for passing syscall parameters
(r4-r8, as well as r0 for the syscall number) are volatile, or
caller-saved. However, Zig's syscall wrappers for this architecture do
not include all such registers in the list of clobbers, leading the
compiler to assume these registers will maintain their values after the
syscall completes.
In practice, this resulted in a segfault when allocating memory with
`std.heap.SmpAllocator`, which calls `std.os.linux.sched_getaffinity`.
The third parameter to `sched_getaffinity` is a pointer to a `cpu_set_t`
and is stored in register r5. After the syscall, the code attempts to
access data in the `cpu_set_t`, but because the compiler doesn't realize
the value of r5 may have changed, it uses r5 as the memory address, which
in practice resulted in a memory access at address 0x8.
This commit adds all volatile registers to the list of clobbers.
* `futex2_waitv` always takes a 64-bit timespec. Perhaps the
`kernel_timespec` should be renamed `timespec64`? Its used in iouring,
too.
* Add `packed struct` for futex v2 flags and parameters.
* Add very basic "tests" for the futex v2 syscalls (just to ensure the
code compiles).
* Update the stale or broken comments. (I could also just delete these
they're not really documenting Zig-specific behavior.)
Given that the futex2 APIs are not used by Zig's library (they're a bit
too new), and the fact that these are very specialized syscalls, and they
currently provide no benefit over the existing v1 API, I wonder if instead
of fixing these up, we should just replace them with a stub that says 'use
a 3rd party library'.
* Use `packed struct` for flags arguments. So, instead of
`linux.FUTEX.WAIT` use `.{ .cmd = .WAIT, .private = true }`
* rename `futex_wait` and `futex_wake` which didn't actually specify
wait/wake, as `futex_3arg` and `futex_4arg` (as its the number
of parameters that is different, the `op` is whatever is specified.
* expose the full six-arg flavor of the syscall (for some of the advanced
ops), and add packed structs for their arguments.
* Use a `packed union` to support the 4th parameter which is sometimes a
`timespec` pointer, and sometimes a `u32`.
* Add tests that make sure the structure layout is correct and that the
basic argument passing is working (no actual futexes are contended).
This code applies to ~any POSIX OS where we don't link libc. For example, it'll
be useful for FreeBSD and NetBSD.
As part of this, move std.os.linux.pie to std.pie since there's really nothing
Linux-specific about what that file is doing.
All the existing code that manipulates `ucontext_t` expects there to be a
glibc-compatible sigmask (1024-bit). The `ucontext_t` struct need to be
cleaned up so the glibc-dependent format is only used when linking
glibc/musl library, but that is a more involved change.
In practice, no Zig code looks at the sigset field contents, so it just
needs to be the right size.
By returning an initialized sigset (instead of taking the set as an output
parameter), these functions can be used to directly initialize the `mask`
parameter of a `Sigaction` instance.
The kernel ABI sigset_t is smaller than the glibc one. Define the
right-sized sigset_t and fixup the sigaction() wrapper to leverage it.
The Sigaction wrapper here is not an ABI, so relax it (drop the "extern"
and the "restorer" fields), the existing `k_sigaction` is the ABI
sigaction struct.
Linux defines `sigset_t` with a c_ulong, so it can be 32-bit or 64-bit,
depending on the platform. This can make a difference on big-endian
systems.
Patch up `ucontext_t` so that this change doesn't impact its layout.
AFAICT, its currently the glibc layout.
The code was using u32 and usize interchangably, which doesn't work on
64-bit systems. This:
`pub const sigset_t = [1024 / 32]u32;`
is not consistent with this:
`const shift = @as(u5, @intCast(s & (usize_bits - 1)));`
However, normal signal numbers are less than 31, so the bad math doesn't matter much. Also, despite support for 1024 signals in the set, only setting signals between 1 and NSIG (which is mostly 65, but sometimes 128) is defined. The existing tests only exercised signal numbers in the first 31 bits so they didn't trip over this:
The C library `sigaddset` will return `EINVAL` if given an out of bounds signal number. I made the Zig code just silently ignore any out of bounds signal numbers.
Moved all the `sigset` related declarations next to each in the source, too.
The `filled_sigset` seems non-standard to me. I think it is meant to be used like `empty_sigset`, but it only contains 31 set signals, which seems wrong (should be 64 or 128, aka `NSIG`). It's also unused. The oddly named but similar `all_mask` is used (by posix.zig) but sets all 1024 bits (which I understood to be undefined behavior but seems to work just fine). For comparison the musl `sigfillset` fills in 65 bits or 128 bits.
This PR consistently maps .ACCES into AccessDenied and .PERM into
PermissionDenied. AccessDenied is returned if the file mode bit
(user/group/other rwx bits) disallow access (errno was `EACCES`).
PermissionDenied is returned if something else denies access (errno was
`EPERM`) (immutable bit, SELinux, capabilities, etc). This somewhat
subtle distinction is a POSIX thing.
Most of the change is updating std.posix Error Sets to contain both
errors, and then propagating the pair up through caller Error Sets.
Fixes#16782
[Incremental provided buffer
consumption](https://github.com/axboe/liburing/wiki/What's-new-with-io_uring-in-6.11-and-6.12#incremental-provided-buffer-consumption)
support is added in kernel 6.12.
IoUring.BufferGroup will now use incremental consumption whenever
kernel supports it.
Before, provided buffers are wholly consumed when picked. Each cqe
points to the different buffer. With this, cqe points to the part of the
buffer. Multiple cqe's can reuse same buffer.
Appropriate sizing of buffers becomes less important.
There are slight changes in BufferGroup interface (it now needs to track
current receive point for each buffer). Init requires allocator
instead of buffers slice, it will allocate buffers slice and head
pointers slice. Get and put now requires cqe becasue there we have
information will the buffer be reused.
ring.cmd_sock is generic socket operation. Two most common uses are
setsockopt and getsockopt. This provides same interface as posix
versions of this methods.
libring has also [sqe_set_flags](https://man7.org/linux/man-pages/man3/io_uring_sqe_set_flags.3.html)
method. Adding that in our io_uring_sqe. Adding sqe.link_next method for setting most common flag.
* fix merge conflicts
* rename the declarations
* reword documentation
* extract FixedBufferAllocator to separate file
* take advantage of locals
* remove the assertion about max alignment in Allocator API, leaving it
Allocator implementation defined
* fix non-inline function call in start logic
The GeneralPurposeAllocator implementation is totally broken because it
uses global state but I didn't address that in this commit.
heap.zig: define new default page sizes
heap.zig: add min/max_page_size and their options
lib/std/c: add miscellaneous declarations
heap.zig: add pageSize() and its options
switch to new page sizes, especially in GPA/stdlib
mem.zig: remove page_size