`send_zc` tries to avoid making intermediate copies of data. Zerocopy execution
is not guaranteed and may fall back to copying.
The flags field of the first struct io_uring_cqe may likely contain
IORING_CQE_F_MORE , which means that there will be a second completion event /
notification for the request, with the user_data field set to the same value.
The user must not modify the data buffer until the notification is posted. The
first cqe follows the usual rules and so its res field will contain the number
of bytes sent or a negative error code. The notification's res field will be set
to zero and the flags field will contain IORING_CQE_F_NOTIF. The two step model
is needed because the kernel may hold on to buffers for a long time, e.g.
waiting for a TCP ACK, and having a separate cqe for request completions allows
userspace to push more data without extra delays. Note, notifications are only
responsible for controlling the lifetime of the buffers, and as such don't mean
anything about whether the data has atually been sent out or received by the
other end. Even errored requests may generate a notification, and the user must
check for IORING_CQE_F_MORE rather than relying on the result.
Available since kernel 6.0.
References:
https://man7.org/linux/man-pages/man3/io_uring_prep_send_zc.3.htmlhttps://man7.org/linux/man-pages/man2/io_uring_enter.2.html
There is no grantee that `copy_cqes` will return exactly wait_nr number of cqes.
If there are ready cqes it can return > 0 but < wait_nr number of cqes.
Server networking application typically accept multiple connections. Multishot
accept simplifies handling these situations. Applications submits once and
receives CQE whenever a new connection request comes in.
Multishot is active until it is canceled or experience error. While active, and
further notification are expected CQE completion will have IORING_CQE_F_MORE set
in the flags. If this flag isn't set, the application must re-arm this request
by submitting a new one.
Reference: [io_uring and networking in 2023](https://github.com/axboe/liburing/wiki/io_uring-and-networking-in-2023#multi-shot)
This reverts commit 0c99ba1eab, reversing
changes made to 5f92b070bf.
This caused a CI failure when it landed in master branch due to a
128-bit `@byteSwap` in std.mem.
Most of this migration was performed automatically with `zig fmt`. There
were a few exceptions which I had to manually fix:
* `@alignCast` and `@addrSpaceCast` cannot be automatically rewritten
* `@truncate`'s fixup is incorrect for vectors
* Test cases are not formatted, and their error locations change
* io_uring: fix the timeout_remove test
The test does a IORING_OP_TIMEOUT followed with a IORING_OP_TIMEOUT_REMOVE
and assumed we would get the CQEs in the same order.
Linux v5.18 changed how this works and we now get them in the reverse order.
The documentation doesn't explicitly say which CQE we should get first
so just make the test work with both cases.
* io_uring: fix the remove_buffers test
The original test was buggy but accidentally worked with kernels < 5.18
The test assumed that IORING_OP_REMOVE_BUFFERS removed from the start of
but in fact the documentation doesn't specify which buffer is removed,
only that a certain number of buffers are removed.
Starting with the kernel 5.18 the check for the `used_buffer_id` fails.
Turns out that previous kernels removed buffers in such a way that the
remaining buffer for this read would always be 0, however this isn't
true anymore.
Instead of checking a specific value just check that the `used_buffer_id`
corresponds to a valid ID.
alongside the typical msghdr struct, Zig has added a msghdr_const
type that can be used with sendmsg which allows const data to
be provided. I believe that data pointed to by the iov and control
fields in msghdr are also left unmodified, in which case they can
be marked const as well.
readv() is essentially identical to read() except for the buffer type,
this simplifies the API for the caller at the cost of not clearly mapping to the liburing C API.
Reads can be done in two ways with io_uring:
* using a simple buffer
* using a automatic buffer selection which requires the user to have
provided a number of buffers before
ReadBuffer let's the caller choose where the data should be read.
* os/linux/io_uring: add recvmsg and sendmsg
* Use std.os.iovec and std.os.iovec_const
* Remove msg_ prefix in msghdr and msghdr_const in arm64 etc
* Strip msg_ prefix in msghdr and msghdr_const for linux arm-eabi
* Copy msghdr and msghdr_const from i386 to mips
* Add sockaddr to lib/std/os/linux/mips.zig
* Copy msghdr and msghdr_const from x86_64 to riscv64
copy_cqes() is not guaranteed to return as many CQEs as provided in the
`wait_nr` argument, meaning the assert in `copy_cqe` can trigger.
Instead, loop until we do get at least one CQE returned.
This mimics the behaviour of liburing's _io_uring_get_cqe.
For renameat, unlinkat, mkdirat, symlinkat and linkat the error code
differs between kernel 5.4 which returns EBADF and kernel 5.10 which returns EINVAL.
Fixes#10466