diff options
author | David Howells <dhowells@redhat.com> | 2020-04-24 15:10:00 +0100 |
---|---|---|
committer | David Howells <dhowells@redhat.com> | 2020-05-31 15:19:51 +0100 |
commit | f6cbb368bcb0bc4fa7c11554d5293658bb4b26a2 (patch) | |
tree | 49b9ad0705135c3e985ce6560b793ab038a2b111 /fs/afs/volume.c | |
parent | 977e5f8ed0ab2786755f8d2a96b78a3c7320f7c4 (diff) | |
download | lwn-f6cbb368bcb0bc4fa7c11554d5293658bb4b26a2.tar.gz lwn-f6cbb368bcb0bc4fa7c11554d5293658bb4b26a2.zip |
afs: Actively poll fileservers to maintain NAT or firewall openings
When an AFS client accesses a file, it receives a limited-duration callback
promise that the server will notify it if another client changes a file.
This callback duration can be a few hours in length.
If a client mounts a volume and then an application prevents it from being
unmounted, say by chdir'ing into it, but then does nothing for some time,
the rxrpc_peer record will expire and rxrpc-level keepalive will cease.
If there is NAT or a firewall between the client and the server, the route
back for the server may close after a comparatively short duration, meaning
that attempts by the server to notify the client may then bounce.
The client, however, may (so far as it knows) still have a valid unexpired
promise and will then rely on its cached data and will not see changes made
on the server by a third party until it incidentally rechecks the status or
the promise needs renewal.
To deal with this, the client needs to regularly probe the server. This
has two effects: firstly, it keeps a route open back for the server, and
secondly, it causes the server to disgorge any notifications that got
queued up because they couldn't be sent.
Fix this by adding a mechanism to emit regular probes.
Two levels of probing are made available: Under normal circumstances the
'slow' queue will be used for a fileserver - this just probes the preferred
address once every 5 mins or so; however, if server fails to respond to any
probes, the server will shift to the 'fast' queue from which all its
interfaces will be probed every 30s. When it finally responds, the record
will switch back to the slow queue.
Further notes:
(1) Probing is now no longer driven from the fileserver rotation
algorithm.
(2) Probes are dispatched to all interfaces on a fileserver when that an
afs_server object is set up to record it.
(3) The afs_server object is removed from the probe queues when we start
to probe it. afs_is_probing_server() returns true if it's not listed
- ie. it's undergoing probing.
(4) The afs_server object is added back on to the probe queue when the
final outstanding probe completes, but the probed_at time is set when
we're about to launch a probe so that it's not dependent on the probe
duration.
(5) The timer and the work item added for this must be handed a count on
net->servers_outstanding, which they hand on or release. This makes
sure that network namespace cleanup waits for them.
Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Dave Botsch <botsch@cnf.cornell.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
Diffstat (limited to 'fs/afs/volume.c')
-rw-r--r-- | fs/afs/volume.c | 24 |
1 files changed, 13 insertions, 11 deletions
diff --git a/fs/afs/volume.c b/fs/afs/volume.c index 4310336b9bb8..249000195f8a 100644 --- a/fs/afs/volume.c +++ b/fs/afs/volume.c @@ -266,7 +266,6 @@ static int afs_update_volume_status(struct afs_volume *volume, struct key *key) } volume->update_at = ktime_get_real_seconds() + afs_volume_record_life; - clear_bit(AFS_VOLUME_NEEDS_UPDATE, &volume->flags); write_unlock(&volume->servers_lock); ret = 0; @@ -283,23 +282,25 @@ error: */ int afs_check_volume_status(struct afs_volume *volume, struct afs_fs_cursor *fc) { - time64_t now = ktime_get_real_seconds(); int ret, retries = 0; _enter(""); - if (volume->update_at <= now) - set_bit(AFS_VOLUME_NEEDS_UPDATE, &volume->flags); - retry: - if (!test_bit(AFS_VOLUME_NEEDS_UPDATE, &volume->flags) && - !test_bit(AFS_VOLUME_WAIT, &volume->flags)) { - _leave(" = 0"); - return 0; - } - + if (test_bit(AFS_VOLUME_WAIT, &volume->flags)) + goto wait; + if (volume->update_at <= ktime_get_real_seconds() || + test_bit(AFS_VOLUME_NEEDS_UPDATE, &volume->flags)) + goto update; + _leave(" = 0"); + return 0; + +update: if (!test_and_set_bit_lock(AFS_VOLUME_UPDATING, &volume->flags)) { + clear_bit(AFS_VOLUME_NEEDS_UPDATE, &volume->flags); ret = afs_update_volume_status(volume, fc->key); + if (ret < 0) + set_bit(AFS_VOLUME_NEEDS_UPDATE, &volume->flags); clear_bit_unlock(AFS_VOLUME_WAIT, &volume->flags); clear_bit_unlock(AFS_VOLUME_UPDATING, &volume->flags); wake_up_bit(&volume->flags, AFS_VOLUME_WAIT); @@ -307,6 +308,7 @@ retry: return ret; } +wait: if (!test_bit(AFS_VOLUME_WAIT, &volume->flags)) { _leave(" = 0 [no wait]"); return 0; |