<feed xmlns='http://www.w3.org/2005/Atom'>
<title>lwn.git/io_uring, branch v6.7-rc2</title>
<subtitle>Linux kernel documentation tree maintained by Jonathan Corbet</subtitle>
<id>http://mirrors.hust.edu.cn/git/lwn.git/atom?h=v6.7-rc2</id>
<link rel='self' href='http://mirrors.hust.edu.cn/git/lwn.git/atom?h=v6.7-rc2'/>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/'/>
<updated>2023-11-15T13:35:46+00:00</updated>
<entry>
<title>io_uring/fdinfo: remove need for sqpoll lock for thread/pid retrieval</title>
<updated>2023-11-15T13:35:46+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2023-11-14T16:55:50+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=a0d45c3f596be53c1bd8822a1984532d14fdcea9'/>
<id>urn:sha1:a0d45c3f596be53c1bd8822a1984532d14fdcea9</id>
<content type='text'>
A previous commit added a trylock for getting the SQPOLL thread info via
fdinfo, but this introduced a regression where we often fail to get it if
the thread is busy. For that case, we end up not printing the current CPU
and PID info.

Rather than rely on this lock, just print the pid we already stored in
the io_sq_data struct, and ensure we update the current CPU every time
we've slept or potentially rescheduled. The latter won't potentially be
100% accurate, but that wasn't the case before either as the task can
get migrated at any time unless it has been pinned at creation time.

We retain keeping the io_sq_data dereference inside the ctx-&gt;uring_lock,
as it has always been, as destruction of the thread and data happen below
that. We could make this RCU safe, but there's little point in doing that.

With this, we always print the last valid information we had, rather than
have spurious outputs with missing information.

Fixes: 7644b1a1c9a7 ("io_uring/fdinfo: lock SQ thread while retrieving thread cpu/pid")
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring: do not clamp read length for multishot read</title>
<updated>2023-11-06T20:41:58+00:00</updated>
<author>
<name>Dylan Yudaken</name>
<email>dyudaken@gmail.com</email>
</author>
<published>2023-11-06T20:39:09+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=e53759298a7d7e98c3e5c2440d395d19cea7d6bf'/>
<id>urn:sha1:e53759298a7d7e98c3e5c2440d395d19cea7d6bf</id>
<content type='text'>
When doing a multishot read, the code path reuses the old read
paths. However this breaks an assumption built into those paths,
namely that struct io_rw::len is available for reuse by __io_import_iovec.

For multishot this results in len being set for the first receive
call, and then subsequent calls are clamped to that buffer length
incorrectly.

Instead keep len as zero after recycling buffers, to reuse the full
buffer size of the next selected buffer.

Fixes: fc68fcda0491 ("io_uring/rw: add support for IORING_OP_READ_MULTISHOT")
Signed-off-by: Dylan Yudaken &lt;dyudaken@gmail.com&gt;
Link: https://lore.kernel.org/r/20231106203909.197089-4-dyudaken@gmail.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring: do not allow multishot read to set addr or len</title>
<updated>2023-11-06T20:41:58+00:00</updated>
<author>
<name>Dylan Yudaken</name>
<email>dyudaken@gmail.com</email>
</author>
<published>2023-11-06T20:39:08+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=49fbe99486786661994a55ced855c31d966bbdf0'/>
<id>urn:sha1:49fbe99486786661994a55ced855c31d966bbdf0</id>
<content type='text'>
For addr: this field is not used, since buffer select is forced.
But by forcing it to be zero it leaves open future uses of the field.

len is actually usable, you could imagine that you want to receive
multishot up to a certain length.
However right now this is not how it is implemented, and it seems
safer to force this to be zero.

Fixes: fc68fcda0491 ("io_uring/rw: add support for IORING_OP_READ_MULTISHOT")
Signed-off-by: Dylan Yudaken &lt;dyudaken@gmail.com&gt;
Link: https://lore.kernel.org/r/20231106203909.197089-3-dyudaken@gmail.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring: indicate if io_kbuf_recycle did recycle anything</title>
<updated>2023-11-06T20:41:58+00:00</updated>
<author>
<name>Dylan Yudaken</name>
<email>dyudaken@gmail.com</email>
</author>
<published>2023-11-06T20:39:07+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=89d528ba2f8281de61163c6b62e598b64d832175'/>
<id>urn:sha1:89d528ba2f8281de61163c6b62e598b64d832175</id>
<content type='text'>
It can be useful to know if io_kbuf_recycle did actually recycle the
buffer on the request, or if it left the request alone.

Signed-off-by: Dylan Yudaken &lt;dyudaken@gmail.com&gt;
Link: https://lore.kernel.org/r/20231106203909.197089-2-dyudaken@gmail.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring/rw: add separate prep handler for fixed read/write</title>
<updated>2023-11-06T14:43:16+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2023-11-06T14:43:16+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=f688944cfb810986c626cb13d95bc666e5c8a36c'/>
<id>urn:sha1:f688944cfb810986c626cb13d95bc666e5c8a36c</id>
<content type='text'>
Rather than sprinkle opcode checks in the generic read/write prep handler,
have a separate prep handler for the vectored readv/writev operation.

Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring/rw: add separate prep handler for readv/writev</title>
<updated>2023-11-06T14:41:17+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2023-11-06T14:41:17+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=0e984ec88da9747549227900e5215c5e6a1b65ae'/>
<id>urn:sha1:0e984ec88da9747549227900e5215c5e6a1b65ae</id>
<content type='text'>
Rather than sprinkle opcode checks in the generic read/write prep handler,
have a separate prep handler for the vectored readv/writev operation.

Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring/net: ensure socket is marked connected on connect retry</title>
<updated>2023-11-03T19:25:50+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2023-11-03T16:35:40+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=f8f9ab2d98116e79d220f1d089df7464ad4e026d'/>
<id>urn:sha1:f8f9ab2d98116e79d220f1d089df7464ad4e026d</id>
<content type='text'>
io_uring does non-blocking connection attempts, which can yield some
unexpected results if a connect request is re-attempted by an an
application. This is equivalent to the following sync syscall sequence:

sock = socket(AF_INET, SOCK_STREAM | SOCK_NONBLOCK, IPPROTO_TCP);
connect(sock, &amp;addr, sizeof(addr);

ret == -1 and errno == EINPROGRESS expected here. Now poll for POLLOUT
on sock, and when that returns, we expect the socket to be connected.
But if we follow that procedure with:

connect(sock, &amp;addr, sizeof(addr));

you'd expect ret == -1 and errno == EISCONN here, but you actually get
ret == 0. If we attempt the connection one more time, then we get EISCON
as expected.

io_uring used to do this, but turns out that bluetooth fails with EBADFD
if you attempt to re-connect. Also looks like EISCONN _could_ occur with
this sequence.

Retain the -&gt;in_progress logic, but work-around a potential EISCONN or
EBADFD error and only in those cases look at the sock_error(). This
should work in general and avoid the odd sequence of a repeated connect
request returning success when the socket is already connected.

This is all a side effect of the socket state being in a CONNECTING
state when we get EINPROGRESS, and only a re-connect or other related
operation will turn that into CONNECTED.

Cc: stable@vger.kernel.org
Fixes: 3fb1bd688172 ("io_uring/net: handle -EINPROGRESS correct for IORING_OP_CONNECT")
Link: https://github.com/axboe/liburing/issues/980
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>io_uring/rw: don't attempt to allocate async data if opcode doesn't need it</title>
<updated>2023-11-03T15:31:21+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2023-11-03T15:26:13+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=0df96fb71a395b4fc9c80180306420c743f395a8'/>
<id>urn:sha1:0df96fb71a395b4fc9c80180306420c743f395a8</id>
<content type='text'>
The new read multishot method doesn't need to allocate async data ever,
as it doesn't do vectored IO and it must only be used with provided
buffers. While it doesn't have -&gt;prep_async() set, it also sets
-&gt;async_size to 0, which is different from any other read/write type we
otherwise support.

If it's used on a file type that isn't pollable, we do try and allocate
this async data, and then try and use that data. But since we passed in
a size of 0 for the data, we get a NULL back on data allocation. We then
proceed to dereference that to copy state, and that obviously won't end
well.

Add a check in io_setup_async_rw() for this condition, and avoid copying
state. Also add a check for whether or not buffer selection is specified
in prep while at it.

Fixes: fc68fcda0491 ("io_uring/rw: add support for IORING_OP_READ_MULTISHOT")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218101
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>Merge tag 'io_uring-futex-2023-10-30' of git://git.kernel.dk/linux</title>
<updated>2023-11-01T21:25:08+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-11-01T21:25:08+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=4de520f1fcefd4ebb7dddcf28bde1b330c2f6b5d'/>
<id>urn:sha1:4de520f1fcefd4ebb7dddcf28bde1b330c2f6b5d</id>
<content type='text'>
Pull io_uring futex support from Jens Axboe:
 "This adds support for using futexes through io_uring - first futex
  wake and wait, and then the vectored variant of waiting, futex waitv.

  For both wait/wake/waitv, we support the bitset variant, as the
  'normal' variants can be easily implemented on top of that.

  PI and requeue are not supported through io_uring, just the above
  mentioned parts. This may change in the future, but in the spirit of
  keeping this small (and based on what people have been asking for),
  this is what we currently have.

  Wake support is pretty straight forward, most of the thought has gone
  into the wait side to avoid needing to offload wait operations to a
  blocking context. Instead, we rely on the usual callbacks to retry and
  post a completion event, when appropriate.

  As far as I can recall, the first request for futex support with
  io_uring came from Andres Freund, working on postgres. His aio rework
  of postgres was one of the early adopters of io_uring, and futex
  support was a natural extension for that. This is relevant from both a
  usability point of view, as well as for effiency and performance. In
  Andres's words, for the former:

     Futex wait support in io_uring makes it a lot easier to avoid
     deadlocks in concurrent programs that have their own buffer pool:
     Obviously pages in the application buffer pool have to be locked
     during IO. If the initiator of IO A needs to wait for a held lock
     B, the holder of lock B might wait for the IO A to complete. The
     ability to wait for a lock and IO completions at the same time
     provides an efficient way to avoid such deadlocks

  and in terms of effiency, even without unlocking the full potential
  yet, Andres says:

     Futex wake support in io_uring is useful because it allows for more
     efficient directed wakeups. For some "locks" postgres has queues
     implemented in userspace, with wakeup logic that cannot easily be
     implemented with FUTEX_WAKE_BITSET on a single "futex word"
     (imagine waiting for journal flushes to have completed up to a
     certain point).

     Thus a "lock release" sometimes need to wake up many processes in a
     row. A quick-and-dirty conversion to doing these wakeups via
     io_uring lead to a 3% throughput increase, with 12% fewer context
     switches, albeit in a fairly extreme workload"

* tag 'io_uring-futex-2023-10-30' of git://git.kernel.dk/linux:
  io_uring: add support for vectored futex waits
  futex: make the vectored futex operations available
  futex: make futex_parse_waitv() available as a helper
  futex: add wake_data to struct futex_q
  io_uring: add support for futex wake and wait
  futex: abstract out a __futex_wake_mark() helper
  futex: factor out the futex wake handling
  futex: move FUTEX2_VALID_MASK to futex.h
</content>
</entry>
<entry>
<title>Merge tag 'for-6.7/io_uring-sockopt-2023-10-30' of git://git.kernel.dk/linux</title>
<updated>2023-11-01T21:16:34+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-11-01T21:16:34+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=f5277ad1e9768dbd05b1ae8dcdba690215d8c5b7'/>
<id>urn:sha1:f5277ad1e9768dbd05b1ae8dcdba690215d8c5b7</id>
<content type='text'>
Pull io_uring {get,set}sockopt support from Jens Axboe:
 "This adds support for using getsockopt and setsockopt via io_uring.

  The main use cases for this is to enable use of direct descriptors,
  rather than first instantiating a normal file descriptor, doing the
  option tweaking needed, then turning it into a direct descriptor. With
  this support, we can avoid needing a regular file descriptor
  completely.

  The net and bpf bits have been signed off on their side"

* tag 'for-6.7/io_uring-sockopt-2023-10-30' of git://git.kernel.dk/linux:
  selftests/bpf/sockopt: Add io_uring support
  io_uring/cmd: Introduce SOCKET_URING_OP_SETSOCKOPT
  io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT
  io_uring/cmd: return -EOPNOTSUPP if net is disabled
  selftests/net: Extract uring helpers to be reusable
  tools headers: Grab copy of io_uring.h
  io_uring/cmd: Pass compat mode in issue_flags
  net/socket: Break down __sys_getsockopt
  net/socket: Break down __sys_setsockopt
  bpf: Add sockptr support for setsockopt
  bpf: Add sockptr support for getsockopt
</content>
</entry>
</feed>
