<feed xmlns='http://www.w3.org/2005/Atom'>
<title>lwn.git/fs/nfsd, branch v4.4.8</title>
<subtitle>Linux kernel documentation tree maintained by Jonathan Corbet</subtitle>
<id>http://mirrors.hust.edu.cn/git/lwn.git/atom?h=v4.4.8</id>
<link rel='self' href='http://mirrors.hust.edu.cn/git/lwn.git/atom?h=v4.4.8'/>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/'/>
<updated>2016-04-12T16:09:03+00:00</updated>
<entry>
<title>nfsd: fix deadlock secinfo+readdir compound</title>
<updated>2016-04-12T16:09:03+00:00</updated>
<author>
<name>J. Bruce Fields</name>
<email>bfields@redhat.com</email>
</author>
<published>2016-03-03T00:36:21+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=56fb92d684cc51baf3a421a3ac19af441a52f413'/>
<id>urn:sha1:56fb92d684cc51baf3a421a3ac19af441a52f413</id>
<content type='text'>
commit 2f6fc056e899bd0144a08da5cacaecbe8997cd74 upstream.

nfsd_lookup_dentry exits with the parent filehandle locked.  fh_put also
unlocks if necessary (nfsd filehandle locking is probably too lenient),
so it gets unlocked eventually, but if the following op in the compound
needs to lock it again, we can deadlock.

A fuzzer ran into this; normal clients don't send a secinfo followed by
a readdir in the same compound.

Signed-off-by: J. Bruce Fields &lt;bfields@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>nfsd4: fix bad bounds checking</title>
<updated>2016-04-12T16:09:03+00:00</updated>
<author>
<name>J. Bruce Fields</name>
<email>bfields@redhat.com</email>
</author>
<published>2016-03-01T01:21:21+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=9ef1ecc409b1e1113bf5e4b6bdd47137e5f9cebb'/>
<id>urn:sha1:9ef1ecc409b1e1113bf5e4b6bdd47137e5f9cebb</id>
<content type='text'>
commit 4aed9c46afb80164401143aa0fdcfe3798baa9d5 upstream.

A number of spots in the xdr decoding follow a pattern like

	n = be32_to_cpup(p++);
	READ_BUF(n + 4);

where n is a u32.  The only bounds checking is done in READ_BUF itself,
but since it's checking (n + 4), it won't catch cases where n is very
large, (u32)(-4) or higher.  I'm not sure exactly what the consequences
are, but we've seen crashes soon after.

Instead, just break these up into two READ_BUF()s.

Signed-off-by: J. Bruce Fields &lt;bfields@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>nfsd: don't hold ls_mutex across a layout recall</title>
<updated>2015-12-16T16:49:58+00:00</updated>
<author>
<name>Jeff Layton</name>
<email>jlayton@poochiereds.net</email>
</author>
<published>2015-11-29T13:46:14+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=be20aa00c67102aaa54599518c086a2338b19f4c'/>
<id>urn:sha1:be20aa00c67102aaa54599518c086a2338b19f4c</id>
<content type='text'>
We do need to serialize layout stateid morphing operations, but we
currently hold the ls_mutex across a layout recall which is pretty
ugly. It's also unnecessary -- once we've bumped the seqid and
copied it, we don't need to serialize the rest of the CB_LAYOUTRECALL
vs. anything else. Just drop the mutex once the copy is done.

This was causing a "workqueue leaked lock or atomic" warning and an
occasional deadlock.

There's more work to be done here but this fixes the immediate
regression.

Fixes: cc8a55320b5f "nfsd: serialize layout stateid morphing operations"
Cc: stable@vger.kernel.org
Reported-by: Kinglong Mee &lt;kinglongmee@gmail.com&gt;
Signed-off-by: Jeff Layton &lt;jeff.layton@primarydata.com&gt;
Signed-off-by: J. Bruce Fields &lt;bfields@redhat.com&gt;
</content>
</entry>
<entry>
<title>nfsd: fix race with open / open upgrade stateids</title>
<updated>2015-11-10T14:29:45+00:00</updated>
<author>
<name>Andrew Elble</name>
<email>aweits@rit.edu</email>
</author>
<published>2015-11-06T01:42:43+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=7fc0564e3a8d16df096f48c9c6425ba84d945c6e'/>
<id>urn:sha1:7fc0564e3a8d16df096f48c9c6425ba84d945c6e</id>
<content type='text'>
We observed multiple open stateids on the server for files that
seemingly should have been closed.

nfsd4_process_open2() tests for the existence of a preexisting
stateid. If one is not found, the locks are dropped and a new
one is created. The problem is that init_open_stateid(), which
is also responsible for hashing the newly initialized stateid,
doesn't check to see if another open has raced in and created
a matching stateid. This fix is to enable init_open_stateid() to
return the matching stateid and have nfsd4_process_open2()
swap to that stateid and switch to the open upgrade path.
In testing this patch, coverage to the newly created
path indicates that the race was indeed happening.

Signed-off-by: Andrew Elble &lt;aweits@rit.edu&gt;
Reviewed-by: Jeff Layton &lt;jlayton@poochiereds.net&gt;
Signed-off-by: J. Bruce Fields &lt;bfields@redhat.com&gt;
</content>
</entry>
<entry>
<title>nfsd: eliminate sending duplicate and repeated delegations</title>
<updated>2015-11-10T14:29:44+00:00</updated>
<author>
<name>Andrew Elble</name>
<email>aweits@rit.edu</email>
</author>
<published>2015-10-15T16:07:28+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=34ed9872e745fa56f10e9bef2cf3d2336c6c8816'/>
<id>urn:sha1:34ed9872e745fa56f10e9bef2cf3d2336c6c8816</id>
<content type='text'>
We've observed the nfsd server in a state where there are
multiple delegations on the same nfs4_file for the same client.
The nfs client does attempt to DELEGRETURN these when they are presented to
it - but apparently under some (unknown) circumstances the client does not
manage to return all of them. This leads to the eventual
attempt to CB_RECALL more than one delegation with the same nfs
filehandle to the same client. The first recall will succeed, but the
next recall will fail with NFS4ERR_BADHANDLE. This leads to the server
having delegations on cl_revoked that the client has no way to FREE
or DELEGRETURN, with resulting inability to recover. The state manager
on the server will continually assert SEQ4_STATUS_RECALLABLE_STATE_REVOKED,
and the state manager on the client will be looping unable to satisfy
the server.

List discussion also reports a race between OPEN and DELEGRETURN that
will be avoided by only sending the delegation once to the
client. This is also logically in accordance with RFC5561 9.1.1 and 10.2.

So, let's:

1.) Not hand out duplicate delegations.
2.) Only send them to the client once.

RFC 5561:

9.1.1:
"Delegations and layouts, on the other hand, are not associated with a
specific owner but are associated with the client as a whole
(identified by a client ID)."

10.2:
"...the stateid for a delegation is associated with a client ID and may be
used on behalf of all the open-owners for the given client.  A
delegation is made to the client as a whole and not to any specific
process or thread of control within it."

Reported-by: Eric Meddaugh &lt;etmsys@rit.edu&gt;
Cc: Trond Myklebust &lt;trond.myklebust@primarydata.com&gt;
Cc: Olga Kornievskaia &lt;aglo@umich.edu&gt;
Signed-off-by: Andrew Elble &lt;aweits@rit.edu&gt;
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields &lt;bfields@redhat.com&gt;
</content>
</entry>
<entry>
<title>nfsd: remove recurring workqueue job to clean DRC</title>
<updated>2015-11-10T14:25:51+00:00</updated>
<author>
<name>Jeff Layton</name>
<email>jlayton@poochiereds.net</email>
</author>
<published>2015-11-04T16:02:29+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=3e80dbcda7f3e1e349a779d7a14c0e08677c39fa'/>
<id>urn:sha1:3e80dbcda7f3e1e349a779d7a14c0e08677c39fa</id>
<content type='text'>
We have a shrinker, we clean out the cache when nfsd is shut down, and
prune the chains on each request. A recurring workqueue job seems like
unnecessary overhead. Just remove it.

Signed-off-by: Jeff Layton &lt;jeff.layton@primarydata.com&gt;
Signed-off-by: J. Bruce Fields &lt;bfields@redhat.com&gt;
</content>
</entry>
<entry>
<title>nfsd: ensure that seqid morphing operations are atomic wrt to copies</title>
<updated>2015-10-23T19:57:33+00:00</updated>
<author>
<name>Jeff Layton</name>
<email>jlayton@poochiereds.net</email>
</author>
<published>2015-10-01T13:05:50+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=9767feb2c64b29775f1ea683130b44f95f67d169'/>
<id>urn:sha1:9767feb2c64b29775f1ea683130b44f95f67d169</id>
<content type='text'>
Bruce points out that the increment of the seqid in stateids is not
serialized in any way, so it's possible for racing calls to bump it
twice and end up sending the same stateid. While we don't have any
reports of this problem it _is_ theoretically possible, and could lead
to spurious state recovery by the client.

In the current code, update_stateid is always followed by a memcpy of
that stateid, so we can combine the two operations. For better
atomicity, we add a spinlock to the nfs4_stid and hold that when bumping
the seqid and copying the stateid.

Signed-off-by: Jeff Layton &lt;jeff.layton@primarydata.com&gt;
Signed-off-by: J. Bruce Fields &lt;bfields@redhat.com&gt;
</content>
</entry>
<entry>
<title>nfsd: serialize layout stateid morphing operations</title>
<updated>2015-10-23T19:57:32+00:00</updated>
<author>
<name>Jeff Layton</name>
<email>jlayton@poochiereds.net</email>
</author>
<published>2015-09-17T11:58:24+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=cc8a55320b5f1196bee5bd14e4bb2ebd3b983317'/>
<id>urn:sha1:cc8a55320b5f1196bee5bd14e4bb2ebd3b983317</id>
<content type='text'>
In order to allow the client to make a sane determination of what
happened with racing LAYOUTGET/LAYOUTRETURN/CB_LAYOUTRECALL calls, we
must ensure that the seqids return accurately represent the order of
operations. The simplest way to do that is to ensure that operations on
a single stateid are serialized.

This patch adds a mutex to the layout stateid, and locks it when
checking the layout stateid's seqid. The mutex is held over the entire
operation and released after the seqid is bumped.

Note that in the case of CB_LAYOUTRECALL we must move the increment of
the seqid and setting into a new cb "prepare" operation. The lease
infrastructure will call the lm_break callback with a spinlock held, so
and we can't take the mutex in that codepath.

Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jeff Layton &lt;jeff.layton@primarydata.com&gt;
Signed-off-by: J. Bruce Fields &lt;bfields@redhat.com&gt;
</content>
</entry>
<entry>
<title>nfsd: improve client_has_state to check for unused openowners</title>
<updated>2015-10-23T19:57:31+00:00</updated>
<author>
<name>J. Bruce Fields</name>
<email>bfields@redhat.com</email>
</author>
<published>2015-10-15T20:11:01+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=4eaea13425078272895ec37814c6878d78b8db9f'/>
<id>urn:sha1:4eaea13425078272895ec37814c6878d78b8db9f</id>
<content type='text'>
At least in the v4.0 case openowners can hang around for a while after
last close, but they shouldn't really block (for example), a new mount
with a different principal.

Signed-off-by: J. Bruce Fields &lt;bfields@redhat.com&gt;
</content>
</entry>
<entry>
<title>nfsd: fix clid_inuse on mount with security change</title>
<updated>2015-10-23T19:57:30+00:00</updated>
<author>
<name>J. Bruce Fields</name>
<email>bfields@redhat.com</email>
</author>
<published>2015-10-15T19:33:23+00:00</published>
<link rel='alternate' type='text/html' href='http://mirrors.hust.edu.cn/git/lwn.git/commit/?id=2b63482185e6054cc11ca6d6c073f90160c161fd'/>
<id>urn:sha1:2b63482185e6054cc11ca6d6c073f90160c161fd</id>
<content type='text'>
In bakeathon testing Solaris client was getting CLID_INUSE error when
doing a krb5 mount soon after an auth_sys mount, or vice versa.

That's not really necessary since in this case the old client doesn't
have any state any more:

	http://tools.ietf.org/html/rfc7530#page-103

	"when the server gets a SETCLIENTID for a client ID that
	currently has no state, or it has state but the lease has
	expired, rather than returning NFS4ERR_CLID_INUSE, the server
	MUST allow the SETCLIENTID and confirm the new client ID if
	followed by the appropriate SETCLIENTID_CONFIRM."

This doesn't fix the problem completely since our client_has_state()
check counts openowners left around to handle close replays, which we
should probably just remove in this case.

Signed-off-by: J. Bruce Fields &lt;bfields@redhat.com&gt;
</content>
</entry>
</feed>
