<feed xmlns='http://www.w3.org/2005/Atom'>
<title>lwn.git/drivers/nvme, branch docs-4.19</title>
<subtitle>Linux kernel documentation tree maintained by Jonathan Corbet</subtitle>
<id>http://mirrors.hust.edu.cn/git/lwn.git/atom?h=docs-4.19</id>
<link rel='self' href='http://mirrors.hust.edu.cn/git/lwn.git/atom?h=docs-4.19'/>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/'/>
<updated>2018-06-28T14:29:54+00:00</updated>
<entry>
<title>nvme-rdma: fix possible double free of controller async event buffer</title>
<updated>2018-06-28T14:29:54+00:00</updated>
<author>
<name>Sagi Grimberg</name>
<email>sagi@grimberg.me</email>
</author>
<published>2018-06-25T17:58:17+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=682630f00a219a1b0696abe9c0967e660068187b'/>
<id>urn:sha1:682630f00a219a1b0696abe9c0967e660068187b</id>
<content type='text'>
If reconnect/reset failed where the controller async event buffer
was freed, we might end up freeing it again as we call
nvme_rdma_destroy_admin_queue again in the remove path. Given that
the sequence is guaranteed to serialize by .ctrl_stop, we simply
set ctrl-&gt;async_event_sqe.data to NULL and don't free it in future
visits.

Reported-by: Max Gurtovoy &lt;maxg@mellanox.com&gt;
Tested-by: Max Gurtovoy &lt;maxg@mellanox.com&gt;
Signed-off-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
</entry>
<entry>
<title>nvme-pci: limit max IO size and segments to avoid high order allocations</title>
<updated>2018-06-21T16:59:46+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2018-06-21T15:49:37+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=943e942e6266f22babee5efeb00f8f672fbff5bd'/>
<id>urn:sha1:943e942e6266f22babee5efeb00f8f672fbff5bd</id>
<content type='text'>
nvme requires an sg table allocation for each request. If the request
is large, then the allocation can become quite large. For instance,
with our default software settings of 1280KB IO size, we'll need
10248 bytes of sg table. That turns into a 2nd order allocation,
which we can't always guarantee. If we fail the allocation, blk-mq
will retry it later. But there's no guarantee that we'll EVER be
able to allocate that much contigious memory.

Limit the IO size such that we never need more than a single page
of memory. That's a lot faster and more reliable. Then back that
allocation with a mempool, so that we know we'll always be able
to succeed the allocation at some point.

Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Acked-by: Keith Busch &lt;keith.busch@intel.com&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
</entry>
<entry>
<title>nvme-pci: move nvme_kill_queues to nvme_remove_dead_ctrl</title>
<updated>2018-06-21T14:59:42+00:00</updated>
<author>
<name>Jianchao Wang</name>
<email>jianchao.w.wang@oracle.com</email>
</author>
<published>2018-06-20T05:42:22+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=9f9cafc14016f23f982d3ce18f9057923bd3037a'/>
<id>urn:sha1:9f9cafc14016f23f982d3ce18f9057923bd3037a</id>
<content type='text'>
There is race between nvme_remove and nvme_reset_work that can
lead to io hang.

nvme_remove                    nvme_reset_work
                               -&gt; nvme_remove_dead_ctrl
                                 -&gt; nvme_dev_disable
                                   -&gt; quiesce request_queue
                                 -&gt; queue remove_work
-&gt; cancel_work_sync reset_work
-&gt; nvme_remove_namespaces
  -&gt; splice ctrl-&gt;namespaces
                               nvme_remove_dead_ctrl_work
                               -&gt; nvme_kill_queues
  -&gt; nvme_ns_remove               do nothing
    -&gt; blk_cleanup_queue
      -&gt; blk_freeze_queue

Finally, the request_queue is quiesced state when wait freeze,
we will get io hang here. To fix it, move the nvme_kill_queues
from nvme_remove_dead_ctrl_work to nvme_remove_dead_ctrl.

Suggested-by: Keith Busch &lt;keith.busch@linux.intel.com&gt;
Signed-off-by: Jianchao Wang &lt;jianchao.w.wang@oracle.com&gt;
Reviewed-by: Keith Busch &lt;keith.busch@intel.com&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
</entry>
<entry>
<title>nvme-fc: release io queues to allow fast fail</title>
<updated>2018-06-21T07:31:28+00:00</updated>
<author>
<name>James Smart</name>
<email>jsmart2021@gmail.com</email>
</author>
<published>2018-06-20T14:44:12+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=02d62a8bc48e92171c46540722e2d52ce77d87af'/>
<id>urn:sha1:02d62a8bc48e92171c46540722e2d52ce77d87af</id>
<content type='text'>
Rather than leaving io queues quiesced after tearing down an association,
restart them. This allows ios to be replayed, with fastfail ios terminating
and non-fastfail getting into loops of retry.

This follows rdma's lead.

Signed-off-by: James Smart &lt;james.smart@broadcom.com&gt;
Reviewed-by: Sagi Grimberg &lt;sagi@grimber.me&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
</entry>
<entry>
<title>nvmet: reset keep alive timer in controller enable</title>
<updated>2018-06-20T12:20:51+00:00</updated>
<author>
<name>Max Gurtuvoy</name>
<email>maxg@mellanox.com</email>
</author>
<published>2018-06-19T12:45:33+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=d68a90e148f5a82aa67654c5012071e31c0e4baa'/>
<id>urn:sha1:d68a90e148f5a82aa67654c5012071e31c0e4baa</id>
<content type='text'>
Controllers that are not yet enabled should not really enforce keep alive
timeouts, but we still want to track a timeout and cleanup in case a host
died before it enabled the controller.  Hence, simply reset the keep
alive timer when the controller is enabled.

Suggested-by: Max Gurtovoy &lt;maxg@mellanox.com&gt;
Signed-off-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
</entry>
<entry>
<title>nvme-rdma: don't override opts-&gt;queue_size</title>
<updated>2018-06-20T12:20:51+00:00</updated>
<author>
<name>Sagi Grimberg</name>
<email>sagi@grimberg.me</email>
</author>
<published>2018-06-19T12:34:13+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=5e77d61cbc7e766778037127dab69e6410a8fc48'/>
<id>urn:sha1:5e77d61cbc7e766778037127dab69e6410a8fc48</id>
<content type='text'>
That is user argument, and theoretically controller limits can change
over time (over reconnects/resets).  Instead, use the sqsize controller
attribute to check queue depth boundaries and use it to the tagset
allocation.

Signed-off-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
</entry>
<entry>
<title>nvme-rdma: Fix command completion race at error recovery</title>
<updated>2018-06-20T12:20:51+00:00</updated>
<author>
<name>Israel Rukshin</name>
<email>israelr@mellanox.com</email>
</author>
<published>2018-06-19T12:34:11+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=c947657b15379505a9bba36a02005882b66abe57'/>
<id>urn:sha1:c947657b15379505a9bba36a02005882b66abe57</id>
<content type='text'>
The race is between completing the request at error recovery work and
rdma completions.  If we cancel the request before getting the good
rdma completion we get a NULL deref of the request MR at
nvme_rdma_process_nvme_rsp().

When Canceling the request we return its mr to the mr pool (set mr to
NULL) and also unmap its data.  Canceling the requests while the rdma
queues are active is not safe.  Because rdma queues are active and we
get good rdma completions that can use the mr pointer which may be NULL.
Completing the request too soon may lead also to performing DMA to/from
user buffers which might have been already unmapped.

The commit fixes the race by draining the QP before starting the abort
commands mechanism.

Signed-off-by: Israel Rukshin &lt;israelr@mellanox.com&gt;
Reviewed-by: Max Gurtovoy &lt;maxg@mellanox.com&gt;
Signed-off-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
</entry>
<entry>
<title>nvme-rdma: fix possible free of a non-allocated async event buffer</title>
<updated>2018-06-20T12:20:28+00:00</updated>
<author>
<name>Sagi Grimberg</name>
<email>sagi@grimberg.me</email>
</author>
<published>2018-06-19T12:34:10+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=94e42213cc1ae41c57819539c0130f8dfc69d718'/>
<id>urn:sha1:94e42213cc1ae41c57819539c0130f8dfc69d718</id>
<content type='text'>
If nvme_rdma_configure_admin_queue fails before we allocated
the async event buffer, we will falsly free it because
nvme_rdma_free_queue is freeing it. Fix it by allocating the buffer right
after nvme_rdma_alloc_queue and free it right before nvme_rdma_queue_free
to maintain orderly reverse cleanup sequence.

Reported-by: Israel Rukshin &lt;israelr@mellanox.com&gt;
Signed-off-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Reviewed-by: Max Gurtovoy &lt;maxg@mellanox.com&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
</entry>
<entry>
<title>nvme-rdma: fix possible double free condition when failing to create a controller</title>
<updated>2018-06-20T12:20:10+00:00</updated>
<author>
<name>Sagi Grimberg</name>
<email>sagi@grimberg.me</email>
</author>
<published>2018-06-19T12:34:09+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=3d0641015bf73aaa1cb54c936674959e7805070f'/>
<id>urn:sha1:3d0641015bf73aaa1cb54c936674959e7805070f</id>
<content type='text'>
Failures after nvme_init_ctrl will defer resource cleanups to .free_ctrl
when the reference is released, hence we should not free the controller
queues for these failures.

Fix that by moving controller queues allocation before controller
initialization and correctly freeing them for failures before
initialization and skip them for failures after initialization.

Signed-off-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
</entry>
<entry>
<title>Merge branch 'nvme-4.18' of git://git.infradead.org/nvme into for-linus</title>
<updated>2018-06-15T14:11:05+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2018-06-15T14:11:05+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=95c7c09f4cc8ac3cfbcf4382ff3f7ecfd97e8ed6'/>
<id>urn:sha1:95c7c09f4cc8ac3cfbcf4382ff3f7ecfd97e8ed6</id>
<content type='text'>
Pull NVMe fixes from Christoph:

"Fix various little regressions introduced in this merge window, plus
 a rework of the fibre channel connect and reconnect path to share the
 code instead of having separate sets of bugs. Last but not least a
 trivial trace point addition from Hannes."

* 'nvme-4.18' of git://git.infradead.org/nvme:
  nvme-fabrics: fix and refine state checks in __nvmf_check_ready
  nvme-fabrics: handle the admin-only case properly in nvmf_check_ready
  nvme-fabrics: refactor queue ready check
  blk-mq: remove blk_mq_tagset_iter
  nvme: remove nvme_reinit_tagset
  nvme-fc: fix nulling of queue data on reconnect
  nvme-fc: remove reinit_request routine
  nvme-fc: change controllers first connect to use reconnect path
  nvme: don't rely on the changed namespace list log
  nvmet: free smart-log buffer after use
  nvme-rdma: fix error flow during mapping request data
  nvme: add bio remapping tracepoint
  nvme: fix NULL pointer dereference in nvme_init_subsystem
</content>
</entry>
</feed>
