diff options
author | Ilya Dryomov <idryomov@gmail.com> | 2015-02-17 19:37:15 +0300 |
---|---|---|
committer | Greg Kroah-Hartman <gregkh@linuxfoundation.org> | 2015-03-06 14:40:54 -0800 |
commit | 6af167fbe6c42fda5203b8095b92669dd0a687d4 (patch) | |
tree | 88af72ffdbe62a32a35cfdbdb92ecdca8d7ded48 /kernel/sched_clock.c | |
parent | 54ff4c89a5445fa8f313a338c1cf5478317df154 (diff) | |
download | lwn-6af167fbe6c42fda5203b8095b92669dd0a687d4.tar.gz lwn-6af167fbe6c42fda5203b8095b92669dd0a687d4.zip |
libceph: fix double __remove_osd() problem
commit 7eb71e0351fbb1b242ae70abb7bb17107fe2f792 upstream.
It turns out it's possible to get __remove_osd() called twice on the
same OSD. That doesn't sit well with rb_erase() - depending on the
shape of the tree we can get a NULL dereference, a soft lockup or
a random crash at some point in the future as we end up touching freed
memory. One scenario that I was able to reproduce is as follows:
<osd3 is idle, on the osd lru list>
<con reset - osd3>
con_fault_finish()
osd_reset()
<osdmap - osd3 down>
ceph_osdc_handle_map()
<takes map_sem>
kick_requests()
<takes request_mutex>
reset_changed_osds()
__reset_osd()
__remove_osd()
<releases request_mutex>
<releases map_sem>
<takes map_sem>
<takes request_mutex>
__kick_osd_requests()
__reset_osd()
__remove_osd() <-- !!!
A case can be made that osd refcounting is imperfect and reworking it
would be a proper resolution, but for now Sage and I decided to fix
this by adding a safe guard around __remove_osd().
Fixes: http://tracker.ceph.com/issues/8087
Cc: Sage Weil <sage@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Diffstat (limited to 'kernel/sched_clock.c')
0 files changed, 0 insertions, 0 deletions