Age | Commit message (Collapse) | Author |
|
Found by visual inspection, this wasn't caught by my xfstest, since it's
effect is ignoring positive dentries in the cache the fallback just goes
to the disk. it was introduced in the last iteration of the
case-insensitive patch.
d_compare should return 0 when the entries match, so make sure we are
correctly comparing the entire string if the encoding feature is set and
we are on a case-INsensitive directory.
Fixes: b886ee3e778e ("ext4: Support case-insensitive file name lookups")
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
It is possible that unlinked inode enters ext4_setattr() (e.g. if
somebody calls ftruncate(2) on unlinked but still open file). In such
case we should not delete the inode from the orphan list if truncate
fails. Note that this is mostly a theoretical concern as filesystem is
corrupted if we reach this path anyway but let's be consistent in our
orphan handling.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
|
|
We didn't wait for outstanding direct IO during truncate in nojournal
mode (as we skip orphan handling in that case). This can lead to fs
corruption or stale data exposure if truncate ends up freeing blocks
and these get reallocated before direct IO finishes. Fix the condition
determining whether the wait is necessary.
CC: stable@vger.kernel.org
Fixes: 1c9114f9c0f1 ("ext4: serialize unlocked dio reads with truncate")
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Since the journal inode is already checked when we added it to the
block validity's system zone, if we check it again, we'll just trigger
a failure.
This was causing failures like this:
[ 53.897001] EXT4-fs error (device sda): ext4_find_extent:909: inode
#8: comm jbd2/sda-8: pblk 121667583 bad header/extent: invalid extent entries - magic f30a, entries 8, max 340(340), depth 0(0)
[ 53.931430] jbd2_journal_bmap: journal block not found at offset 49 on sda-8
[ 53.938480] Aborting journal on device sda-8.
... but only if the system was under enough memory pressure that
logical->physical mapping for the journal inode gets pushed out of the
extent cache. (This is why it wasn't noticed earlier.)
Fixes: 345c0dbf3a30 ("ext4: protect journal inode's blocks using block_validity")
Reported-by: Dan Rue <dan.rue@linaro.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
|
|
Handling of aborted journal is a special code path different from
standard ext4_error() one and it can call panic() as well. Commit
1dc1097ff60e ("ext4: avoid panic during forced reboot") forgot to update
this path so fix that omission.
Fixes: 1dc1097ff60e ("ext4: avoid panic during forced reboot")
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org # 5.1
|
|
Commit 345c0dbf3a30 ("ext4: protect journal inode's blocks using
block_validity") failed to add an exception for the journal inode in
ext4_check_blockref(), which is the function used by ext4_get_branch()
for indirect blocks. This caused attempts to read from the ext3-style
journals to fail with:
[ 848.968550] EXT4-fs error (device sdb7): ext4_get_branch:171: inode #8: block 30343695: comm jbd2/sdb7-8: invalid block
Fix this by adding the missing exception check.
Fixes: 345c0dbf3a30 ("ext4: protect journal inode's blocks using block_validity")
Reported-by: Arthur Marsh <arthur.marsh@internode.on.net>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: Gabriel Krisman Bertazi <krisman@collabora.com>
|
|
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: Gabriel Krisman Bertazi <krisman@collabora.com>
|
|
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
There are two cases where u32 variables n and err are being checked
for less than zero error values, the checks is always false because
the variables are not signed. Fix this by making the variables ints.
Addresses-Coverity: ("Unsigned compared against 0")
Fixes: 345c0dbf3a30 ("ext4: protect journal inode's blocks using block_validity")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
The buffer_head (frames[0].bh) and it's corresping page can be
potentially free'd once brelse() is done inside the for loop
but before the for loop exits in dx_release(). It can be free'd
in another context, when the page cache is flushed via
drop_caches_sysctl_handler(). This results into below data abort
when accessing info->indirect_levels in dx_release().
Unable to handle kernel paging request at virtual address ffffffc17ac3e01e
Call trace:
dx_release+0x70/0x90
ext4_htree_fill_tree+0x2d4/0x300
ext4_readdir+0x244/0x6f8
iterate_dir+0xbc/0x160
SyS_getdents64+0x94/0x174
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Cc: stable@kernel.org
|
|
Unaligned AIO must be serialized because the zeroing of partial blocks
of unaligned AIO can result in data corruption in case it's overlapping
another in flight IO.
Currently we wait for all unwritten extents before we submit unaligned
AIO which protects data in case of unaligned AIO is following overlapping
IO. However if a unaligned AIO is followed by overlapping aligned AIO we
can still end up corrupting data.
To fix this, we must make sure that the unaligned AIO is the only IO in
flight by waiting for unwritten extents conversion not just before the
IO submission, but right after it as well.
This problem can be reproduced by xfstest generic/538
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
|
|
When failing from creating cache jbd2_inode_cache, we will destroy the
previously created cache jbd2_handle_cache twice. This patch fixes
this by moving each cache initialization/destruction to its own
separate, individual function.
Signed-off-by: Chengguang Xu <cgxu519@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
|
|
This commit zeroes out the unused memory region in the buffer_head
corresponding to the extent metablock after writing the extent header
and the corresponding extent node entries.
This is done to prevent random uninitialized data from getting into
the filesystem when the extent block is synced.
This fixes CVE-2019-11833.
Signed-off-by: Sriram Rajagopalan <sriramr@arista.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
|
|
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Instead of removing EXT4_MOUNT_JOURNAL_CHECKSUM from s_def_mount_opt as
I assume was intended, all other options were blown away leading to
_ext4_show_options() output being incorrect.
Fixes: 1e381f60dad9 ("ext4: do not allow journal_opts for fs w/o journal")
Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@kernel.org
|
|
scripts/mkutf8data is used only when regenerating utf8data.h,
which never happens in the normal kernel build. However, it is
irrespectively built if CONFIG_UNICODE is enabled.
Moreover, there is no good reason for it to reside in the scripts/
directory since it is only used in fs/unicode/.
Hence, move it from scripts/ to fs/unicode/.
In some cases, we bypass build artifacts in the normal build. The
conventional way to do so is to surround the code with ifdef REGENERATE_*.
For example,
- 7373f4f83c71 ("kbuild: add implicit rules for parser generation")
- 6aaf49b495b4 ("crypto: arm,arm64 - Fix random regeneration of S_shipped")
I rewrote the rule in a more kbuild'ish style.
In the normal build, utf8data.h is just shipped from the check-in file.
$ make
[ snip ]
SHIPPED fs/unicode/utf8data.h
CC fs/unicode/utf8-norm.o
CC fs/unicode/utf8-core.o
CC fs/unicode/utf8-selftest.o
AR fs/unicode/built-in.a
If you want to generate utf8data.h based on UCD, put *.txt files into
fs/unicode/, then pass REGENERATE_UTF8DATA=1 from the command line.
The mkutf8data tool will be automatically compiled to generate the
utf8data.h from the *.txt files.
$ make REGENERATE_UTF8DATA=1
[ snip ]
HOSTCC fs/unicode/mkutf8data
GEN fs/unicode/utf8data.h
CC fs/unicode/utf8-norm.o
CC fs/unicode/utf8-core.o
CC fs/unicode/utf8-selftest.o
AR fs/unicode/built-in.a
I renamed the check-in utf8data.h to utf8data.h_shipped so that this
will work for the out-of-tree build.
You can update it based on the latest UCD like this:
$ make REGENERATE_UTF8DATA=1 fs/unicode/
$ cp fs/unicode/utf8data.h fs/unicode/utf8data.h_shipped
Also, I added entries to .gitignore and dontdiff.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
This patch implements the actual support for case-insensitive file name
lookups in ext4, based on the feature bit and the encoding stored in the
superblock.
A filesystem that has the casefold feature set is able to configure
directories with the +F (EXT4_CASEFOLD_FL) attribute, enabling lookups
to succeed in that directory in a case-insensitive fashion, i.e: match
a directory entry even if the name used by userspace is not a byte per
byte match with the disk name, but is an equivalent case-insensitive
version of the Unicode string. This operation is called a
case-insensitive file name lookup.
The feature is configured as an inode attribute applied to directories
and inherited by its children. This attribute can only be enabled on
empty directories for filesystems that support the encoding feature,
thus preventing collision of file names that only differ by case.
* dcache handling:
For a +F directory, Ext4 only stores the first equivalent name dentry
used in the dcache. This is done to prevent unintentional duplication of
dentries in the dcache, while also allowing the VFS code to quickly find
the right entry in the cache despite which equivalent string was used in
a previous lookup, without having to resort to ->lookup().
d_hash() of casefolded directories is implemented as the hash of the
casefolded string, such that we always have a well-known bucket for all
the equivalencies of the same string. d_compare() uses the
utf8_strncasecmp() infrastructure, which handles the comparison of
equivalent, same case, names as well.
For now, negative lookups are not inserted in the dcache, since they
would need to be invalidated anyway, because we can't trust missing file
dentries. This is bad for performance but requires some leveraging of
the vfs layer to fix. We can live without that for now, and so does
everyone else.
* on-disk data:
Despite using a specific version of the name as the internal
representation within the dcache, the name stored and fetched from the
disk is a byte-per-byte match with what the user requested, making this
implementation 'name-preserving'. i.e. no actual information is lost
when writing to storage.
DX is supported by modifying the hashes used in +F directories to make
them case/encoding-aware. The new disk hashes are calculated as the
hash of the full casefolded string, instead of the string directly.
This allows us to efficiently search for file names in the htree without
requiring the user to provide an exact name.
* Dealing with invalid sequences:
By default, when a invalid UTF-8 sequence is identified, ext4 will treat
it as an opaque byte sequence, ignoring the encoding and reverting to
the old behavior for that unique file. This means that case-insensitive
file name lookup will not work only for that file. An optional bit can
be set in the superblock telling the filesystem code and userspace tools
to enforce the encoding. When that optional bit is set, any attempt to
create a file name using an invalid UTF-8 sequence will fail and return
an error to userspace.
* Normalization algorithm:
The UTF-8 algorithms used to compare strings in ext4 is implemented
lives in fs/unicode, and is based on a previous version developed by
SGI. It implements the Canonical decomposition (NFD) algorithm
described by the Unicode specification 12.1, or higher, combined with
the elimination of ignorable code points (NFDi) and full
case-folding (CF) as documented in fs/unicode/utf8_norm.c.
NFD seems to be the best normalization method for EXT4 because:
- It has a lower cost than NFC/NFKC (which requires
decomposing to NFD as an intermediary step)
- It doesn't eliminate important semantic meaning like
compatibility decompositions.
Although:
- This implementation is not completely linguistic accurate, because
different languages have conflicting rules, which would require the
specialization of the filesystem to a given locale, which brings all
sorts of problems for removable media and for users who use more than
one language.
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Support for encoding is considered an incompatible feature, since it has
potential to create collisions of file names in existing filesystems.
If the feature flag is not enabled, the entire filesystem will operate
on opaque byte sequences, respecting the original behavior.
The s_encoding field stores a magic number indicating the encoding
format and version used globally by file and directory names in the
filesystem. The s_encoding_flags defines policies for using the charset
encoding, like how to handle invalid sequences. The magic number is
mapped to the exact charset table, but the mapping is specific to ext4.
Since we don't have any commitment to support old encodings, the only
encoding I am supporting right now is utf8-12.1.0.
The current implementation prevents the user from enabling encoding and
per-directory encryption on the same filesystem at the same time. The
incompatibility between these features lies in how we do efficient
directory searches when we cannot be sure the encryption of the user
provided fname will match the actual hash stored in the disk without
decrypting every directory entry, because of normalization cases. My
quickest solution is to simply block the concurrent use of these
features for now, and enable it later, once we have a better solution.
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Regenerate utf8data.h based on the latest UCD files and run tests
against the latest version.
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
This implements a in-kernel sanity test module for the utf8
normalization core. At probe time, it will run basic sequences through
the utf8n core, to identify problems will equivalent sequences and
normalization/casefold code. This is supposed to be useful for
regression testing when adding support for a new version of utf8 to
linux.
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
This patch integrates the utf8n patches with some higher level API to
perform UTF-8 string comparison, normalization and casefolding
operations. Implemented is a variation of NFD, and casefold is
performed by doing full casefold on top of NFD. These algorithms are
based on the core implemented by Olaf Weber from SGI.
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Remove the Hangul decompositions from the utf8data trie, and do
algorithmic decomposition to calculate them on the fly. To store the
decomposition the caller of utf8lookup()/utf8nlookup() must provide a
12-byte buffer, which is used to synthesize a leaf with the
decomposition. This significantly reduces the size of the utf8data[]
array.
Changes made by Gabriel:
Rebase to mainline
Fix checkpatch errors
Extract robustness fixes and merge back to original mkutf8data.c patch
Regenerate utf8data.h
Signed-off-by: Olaf Weber <olaf@sgi.com>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Supporting functions for UTF-8 normalization are in utf8norm.c with the
header utf8norm.h. Two normalization forms are supported: nfdi and
nfdicf.
nfdi:
- Apply unicode normalization form NFD.
- Remove any Default_Ignorable_Code_Point.
nfdicf:
- Apply unicode normalization form NFD.
- Remove any Default_Ignorable_Code_Point.
- Apply a full casefold (C + F).
For the purposes of the code, a string is valid UTF-8 if:
- The values encoded are 0x1..0x10FFFF.
- The surrogate codepoints 0xD800..0xDFFFF are not encoded.
- The shortest possible encoding is used for all values.
The supporting functions work on null-terminated strings (utf8 prefix)
and on length-limited strings (utf8n prefix).
From the original SGI patch and for conformity with coding standards,
the utf8data_t typedef was dropped, since it was just masking the struct
keyword. On other occasions, namely utf8leaf_t and utf8trie_t, I
decided to keep it, since they are simple pointers to memory buffers,
and using uchars here wouldn't provide any more meaningful information.
From the original submission, we also converted from the compatibility
form to canonical.
Changes made by Gabriel:
Rebase to Mainline
Fix up checkpatch.pl warnings
Drop typedefs
move out of libxfs
Convert from NFKD to NFD
Signed-off-by: Olaf Weber <olaf@sgi.com>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
The decomposition and casefolding of UTF-8 characters are described in a
prefix tree in utf8data.h, which is a generate from the Unicode
Character Database (UCD), published by the Unicode Consortium, and
should not be edited by hand. The structures in utf8data.h are meant to
be used for lookup operations by the unicode subsystem, when decoding a
utf-8 string.
mkutf8data.c is the source for a program that generates utf8data.h. It
was written by Olaf Weber from SGI and originally proposed to be merged
into Linux in 2014. The original proposal performed the compatibility
decomposition, NFKD, but the current version was modified by me to do
canonical decomposition, NFD, as suggested by the community. The
changes from the original submission are:
* Rebase to mainline.
* Fix out-of-tree-build.
* Update makefile to build 11.0.0 ucd files.
* drop references to xfs.
* Convert NFKD to NFD.
* Merge back robustness fixes from original patch. Requested by
Dave Chinner.
The original submission is archived at:
<https://linux-xfs.oss.sgi.narkive.com/Xx10wjVY/rfc-unicode-utf-8-support-for-xfs>
The utf8data.h file can be regenerated using the instructions in
fs/unicode/README.utf8data.
- Notes on the update from 8.0.0 to 11.0:
The structure of the ucd files and special cases have not experienced
any changes between versions 8.0.0 and 11.0.0. 8.0.0 saw the addition
of Cherokee LC characters, which is an interesting case for
case-folding. The update is accompanied by new tests on the test_ucd
module to catch specific cases. No changes to mkutf8data script were
required for the updates.
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
It is never possible, that number of block groups decreases,
since only online grow is supported.
But after a growing occured, we have to zero inode tables
for just created new block groups.
Fixes: 19c5246d2516 ("ext4: add new online resize interface")
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@kernel.org
|
|
Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
|
|
When remounting with debug_want_extra_isize, we were not performing the
same checks that we do during a normal mount. That allowed us to set a
value for s_want_extra_isize that reached outside the s_inode_size.
Fixes: e2b911c53584 ("ext4: clean up feature test macros with predicate functions")
Reported-by: syzbot+f584efa0ac7213c226b7@syzkaller.appspotmail.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Barret Rhoden <brho@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
|
|
The reference to iloc.bh has been dropped in ext4_mark_iloc_dirty.
However, the reference is dropped again if error occurs during
ext4_handle_dirty_metadata, which may result in use-after-free bugs.
Fixes: fb265c9cb49e("ext4: add ext4_sb_bread() to disambiguate ENOMEM cases")
Signed-off-by: Pan Bian <bianpan2016@163.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@kernel.org
|
|
In other places in fs/ext4/xattr.c, if e_value_inum is non-zero, the
code ignores the value in e_value_offs. The e_value_offs *should* be
zero, but we shouldn't depend upon it, since it might not be true in a
corrupted/fuzzed file system.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202897
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202877
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
|
|
Add the blocks which belong to the journal inode to block_validity's
system zone so attempts to deallocate or overwrite the journal due a
corrupted file system where the journal blocks are also claimed by
another inode.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202879
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
|
|
BUG_ON(1) leads to bogus warnings from clang when
CONFIG_PROFILE_ANNOTATED_BRANCHES is set:
fs/ext4/inode.c:544:4: error: variable 'retval' is used uninitialized whenever 'if' condition is false
[-Werror,-Wsometimes-uninitialized]
BUG_ON(1);
^~~~~~~~~
include/asm-generic/bug.h:61:36: note: expanded from macro 'BUG_ON'
^~~~~~~~~~~~~~~~~~~
include/linux/compiler.h:48:23: note: expanded from macro 'unlikely'
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
fs/ext4/inode.c:591:6: note: uninitialized use occurs here
if (retval > 0 && map->m_flags & EXT4_MAP_MAPPED) {
^~~~~~
fs/ext4/inode.c:544:4: note: remove the 'if' if its condition is always true
BUG_ON(1);
^
include/asm-generic/bug.h:61:32: note: expanded from macro 'BUG_ON'
^
fs/ext4/inode.c:502:12: note: initialize the variable 'retval' to silence this warning
Change it to BUG() so clang can see that this code path can never
continue.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Jan Kara <jack@suse.cz>
|
|
In ext4_mpage_readpages(), if the parameter pages is not NULL, another
parameter page is NULL. At the first time prefetchw(&page->flags)
works on NULL. From second time, prefetchw(&page->flags) always works on
the last consumed page. This might do little improvment for handling
current page. So prefetchw() should be called while the page pointer
has just been updated.
Signed-off-by: Liu Xiang <liu.xiang6@zte.com.cn>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
We hit a BUG at fs/buffer.c:3057 if we detached the nbd device
before unmounting ext4 filesystem.
The typical chain of events leading to the BUG:
jbd2_write_superblock
submit_bh
submit_bh_wbc
BUG_ON(!buffer_mapped(bh));
The block device is removed and all the pages are invalidated. JBD2
was trying to write journal superblock to the block device which is
no longer present.
Fix this by checking the journal superblock's buffer head prior to
submitting.
Reported-by: Eric Ren <renzhen@linux.alibaba.com>
Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@kernel.org
|
|
The comment above NEXT_ORPHAN() was meant for ext4_encrypted_inode(),
which was moved by commit a7550b30ab70 ("ext4 crypto: migrate into vfs's
crypto engine") but the comment was accidentally left in place. Since
ext4_encrypted_inode() has now been removed, just remove the comment.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
|
|
The sanity check in mb_find_extent() only checked that returned extent
does not extend past blocksize * 8, however it should not extend past
EXT4_CLUSTERS_PER_GROUP(sb). This can happen when clusters_per_group <
blocksize * 8 and the tail of the bitmap is not properly filled by 1s
which happened e.g. when ancient kernels have grown the filesystem.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
|
|
At the beginning, nblocks has been assigned. There is no need
to repeat the assignment in the while loop, and remove it.
Signed-off-by: Liu Song <liu.song11@zte.com.cn>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
|
|
Merge misc fixes from Andrew Morton:
"22 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (22 commits)
fs/proc/proc_sysctl.c: fix NULL pointer dereference in put_links
fs: fs_parser: fix printk format warning
checkpatch: add %pt as a valid vsprintf extension
mm/migrate.c: add missing flush_dcache_page for non-mapped page migrate
drivers/block/zram/zram_drv.c: fix idle/writeback string compare
mm/page_isolation.c: fix a wrong flag in set_migratetype_isolate()
mm/memory_hotplug.c: fix notification in offline error path
ptrace: take into account saved_sigmask in PTRACE{GET,SET}SIGMASK
fs/proc/kcore.c: make kcore_modules static
include/linux/list.h: fix list_is_first() kernel-doc
mm/debug.c: fix __dump_page when mapping->host is not set
mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified
include/linux/hugetlb.h: convert to use vm_fault_t
iommu/io-pgtable-arm-v7s: request DMA32 memory, and improve debugging
mm: add support for kmem caches in DMA32 zone
ocfs2: fix inode bh swapping mixup in ocfs2_reflink_inodes_lock
mm/hotplug: fix offline undo_isolate_page_range()
fs/open.c: allow opening only regular files during execve()
mailmap: add Changbin Du
mm/debug.c: add a cast to u64 for atomic64_read()
...
|
|
Pull block fixes from Jens Axboe:
"Small set of fixes that should go into this series. This contains:
- compat signal mask fix for io_uring (Arnd)
- EAGAIN corner case for direct vs buffered writes for io_uring
(Roman)
- NVMe pull request from Christoph with various little fixes
- sbitmap ws_active fix, which caused a perf regression for shared
tags (me)
- sbitmap bit ordering fix (Ming)
- libata on-stack DMA fix (Raymond)"
* tag 'for-linus-20190329' of git://git.kernel.dk/linux-block:
nvmet: fix error flow during ns enable
nvmet: fix building bvec from sg list
nvme-multipath: relax ANA state check
nvme-tcp: fix an endianess miss-annotation
libata: fix using DMA buffers on stack
io_uring: offload write to async worker in case of -EAGAIN
sbitmap: order READ/WRITE freed instance and setting clear bit
blk-mq: fix sbitmap ws_active for shared tags
io_uring: fix big-endian compat signal mask handling
blk-mq: update comment for blk_mq_hctx_has_pending()
blk-mq: use blk_mq_put_driver_tag() to put tag
|
|
Pull ceph fixes from Ilya Dryomov:
"A patch to avoid choking on multipage bvecs in the messenger and a
small use-after-free fix"
* tag 'ceph-for-5.1-rc3' of git://github.com/ceph/ceph-client:
ceph: fix use-after-free on symlink traversal
libceph: fix breakage caused by multipage bvecs
|
|
Pull xfs fixes from Darrick Wong:
"Here are a few fixes for some corruption bugs and uninitialized
variable problems. The few patches here have gone through a few days
worth of fstest runs with no new problems observed.
Changes since last update:
- Fix a bunch of static checker complaints about uninitialized
variables and insufficient range checks.
- Avoid a crash when incore extent map data are corrupt.
- Disallow FITRIM when we haven't recovered the log and know the
metadata are stale.
- Fix a data corruption when doing unaligned overlapping dio writes"
* tag 'xfs-5.1-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: serialize unaligned dio writes against all other dio writes
xfs: prohibit fstrim in norecovery mode
xfs: always init bma in xfs_bmapi_write
xfs: fix btree scrub checking with regards to root-in-inode
xfs: dabtree scrub needs to range-check level
xfs: don't trip over uninitialized buffer on extent read of corrupted inode
|
|
Syzkaller reports:
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN PTI
CPU: 1 PID: 5373 Comm: syz-executor.0 Not tainted 5.0.0-rc8+ #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
RIP: 0010:put_links+0x101/0x440 fs/proc/proc_sysctl.c:1599
Code: 00 0f 85 3a 03 00 00 48 8b 43 38 48 89 44 24 20 48 83 c0 38 48 89 c2 48 89 44 24 28 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 fe 02 00 00 48 8b 74 24 20 48 c7 c7 60 2a 9d 91
RSP: 0018:ffff8881d828f238 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: ffff8881e01b1140 RCX: ffffffff8ee98267
RDX: 0000000000000007 RSI: ffffc90001479000 RDI: ffff8881e01b1178
RBP: dffffc0000000000 R08: ffffed103ee27259 R09: ffffed103ee27259
R10: 0000000000000001 R11: ffffed103ee27258 R12: fffffffffffffff4
R13: 0000000000000006 R14: ffff8881f59838c0 R15: dffffc0000000000
FS: 00007f072254f700(0000) GS:ffff8881f7100000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fff8b286668 CR3: 00000001f0542002 CR4: 00000000007606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
drop_sysctl_table+0x152/0x9f0 fs/proc/proc_sysctl.c:1629
get_subdir fs/proc/proc_sysctl.c:1022 [inline]
__register_sysctl_table+0xd65/0x1090 fs/proc/proc_sysctl.c:1335
br_netfilter_init+0xbc/0x1000 [br_netfilter]
do_one_initcall+0xfa/0x5ca init/main.c:887
do_init_module+0x204/0x5f6 kernel/module.c:3460
load_module+0x66b2/0x8570 kernel/module.c:3808
__do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902
do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x462e99
Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f072254ec58 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99
RDX: 0000000000000000 RSI: 0000000020000280 RDI: 0000000000000003
RBP: 00007f072254ec70 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f072254f6bc
R13: 00000000004bcefa R14: 00000000006f6fb0 R15: 0000000000000004
Modules linked in: br_netfilter(+) dvb_usb_dibusb_mc_common dib3000mc dibx000_common dvb_usb_dibusb_common dvb_usb_dw2102 dvb_usb classmate_laptop palmas_regulator cn videobuf2_v4l2 v4l2_common snd_soc_bd28623 mptbase snd_usb_usx2y snd_usbmidi_lib snd_rawmidi wmi libnvdimm lockd sunrpc grace rc_kworld_pc150u rc_core rtc_da9063 sha1_ssse3 i2c_cros_ec_tunnel adxl34x_spi adxl34x nfnetlink lib80211 i5500_temp dvb_as102 dvb_core videobuf2_common videodev media videobuf2_vmalloc videobuf2_memops udc_core lnbp22 leds_lp3952 hid_roccat_ryos s1d13xxxfb mtd vport_geneve openvswitch nf_conncount nf_nat_ipv6 nsh geneve udp_tunnel ip6_udp_tunnel snd_soc_mt6351 sis_agp phylink snd_soc_adau1761_spi snd_soc_adau1761 snd_soc_adau17x1 snd_soc_core snd_pcm_dmaengine ac97_bus snd_compress snd_soc_adau_utils snd_soc_sigmadsp_regmap snd_soc_sigmadsp raid_class hid_roccat_konepure hid_roccat_common hid_roccat c2port_duramar2150 core mdio_bcm_unimac iptable_security iptable_raw iptable_mangle
iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel hsr veth netdevsim devlink vxcan batman_adv cfg80211 rfkill chnl_net caif nlmon dummy team bonding vcan bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel joydev mousedev ide_pci_generic piix aesni_intel aes_x86_64 ide_core crypto_simd atkbd cryptd glue_helper serio_raw ata_generic pata_acpi i2c_piix4 floppy sch_fq_codel ip_tables x_tables ipv6 [last unloaded: lm73]
Dumping ftrace buffer:
(ftrace buffer empty)
---[ end trace 770020de38961fd0 ]---
A new dir entry can be created in get_subdir and its 'header->parent' is
set to NULL. Only after insert_header success, it will be set to 'dir',
otherwise 'header->parent' is set to NULL and drop_sysctl_table is called.
However in err handling path of get_subdir, drop_sysctl_table also be
called on 'new->header' regardless its value of parent pointer. Then
put_links is called, which triggers NULL-ptr deref when access member of
header->parent.
In fact we have multiple error paths which call drop_sysctl_table() there,
upon failure on insert_links() we also call drop_sysctl_table().And even
in the successful case on __register_sysctl_table() we still always call
drop_sysctl_table().This patch fix it.
Link: http://lkml.kernel.org/r/20190314085527.13244-1-yuehaibing@huawei.com
Fixes: 0e47c99d7fe25 ("sysctl: Replace root_list with links between sysctl_table_sets")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: <stable@vger.kernel.org> [3.4+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Fix printk format warning (seen on i386 builds) by using ptrdiff format
specifier (%t):
fs/fs_parser.c:413:6: warning: format `%lu' expects argument of type `long unsigned int', but argument 3 has type `int' [-Wformat=]
Link: http://lkml.kernel.org/r/19432668-ffd3-fbb2-af4f-1c8e48f6cc81@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Fix sparse warning:
fs/proc/kcore.c:591:19: warning:
symbol 'kcore_modules' was not declared. Should it be static?
Link: http://lkml.kernel.org/r/20190320135417.13272-1-yuehaibing@huawei.com
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Mukesh Ojha <mojha@codeaurora.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Omar Sandoval <osandov@fb.com>
Cc: James Morse <james.morse@arm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
ocfs2_reflink_inodes_lock() can swap the inode1/inode2 variables so that
we always grab cluster locks in order of increasing inode number.
Unfortunately, we forget to swap the inode record buffer head pointers
when we've done this, which leads to incorrect bookkeepping when we're
trying to make the two inodes have the same refcount tree.
This has the effect of causing filesystem shutdowns if you're trying to
reflink data from inode 100 into inode 97, where inode 100 already has a
refcount tree attached and inode 97 doesn't. The reflink code decides
to copy the refcount tree pointer from 100 to 97, but uses inode 97's
inode record to open the tree root (which it doesn't have) and blows up.
This issue causes filesystem shutdowns and metadata corruption!
Link: http://lkml.kernel.org/r/20190312214910.GK20533@magnolia
Fixes: 29ac8e856cb369 ("ocfs2: implement the VFS clone_range, copy_range, and dedupe_range features")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
syzbot is hitting lockdep warning [1] due to trying to open a fifo
during an execve() operation. But we don't need to open non regular
files during an execve() operation, for all files which we will need are
the executable file itself and the interpreter programs like /bin/sh and
ld-linux.so.2 .
Since the manpage for execve(2) says that execve() returns EACCES when
the file or a script interpreter is not a regular file, and the manpage
for uselib(2) says that uselib() can return EACCES, and we use
FMODE_EXEC when opening for execve()/uselib(), we can bail out if a non
regular file is requested with FMODE_EXEC set.
Since this deadlock followed by khungtaskd warnings is trivially
reproducible by a local unprivileged user, and syzbot's frequent crash
due to this deadlock defers finding other bugs, let's workaround this
deadlock until we get a chance to find a better solution.
[1] https://syzkaller.appspot.com/bug?id=b5095bfec44ec84213bac54742a82483aad578ce
Link: http://lkml.kernel.org/r/1552044017-7890-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
Reported-by: syzbot <syzbot+e93a80c1bb7c5c56e522461c149f8bf55eab1b2b@syzkaller.appspotmail.com>
Fixes: 8924feff66f35fe2 ("splice: lift pipe_lock out of splice_to_pipe()")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: <stable@vger.kernel.org> [4.9+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The marshalling of AFS.StoreData, AFS.StoreData64 and YFS.StoreData64 calls
generated by ->setattr() ops for the purpose of expanding a file is
incorrect due to older documentation incorrectly describing the way the RPC
'FileLength' parameter is meant to work.
The older documentation says that this is the length the file is meant to
end up at the end of the operation; however, it was never implemented this
way in any of the servers, but rather the file is truncated down to this
before the write operation is effected, and never expanded to it (and,
indeed, it was renamed to 'TruncPos' in 2014).
Fix this by setting the position parameter to the new file length and doing
a zero-lengh write there.
The bug causes Xwayland to SIGBUS due to unexpected non-expansion of a file
it then mmaps. This can be tested by giving the following test program a
filename in an AFS directory:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/mman.h>
int main(int argc, char *argv[])
{
char *p;
int fd;
if (argc != 2) {
fprintf(stderr,
"Format: test-trunc-mmap <file>\n");
exit(2);
}
fd = open(argv[1], O_RDWR | O_CREAT | O_TRUNC);
if (fd < 0) {
perror(argv[1]);
exit(1);
}
if (ftruncate(fd, 0x140008) == -1) {
perror("ftruncate");
exit(1);
}
p = mmap(NULL, 4096, PROT_READ | PROT_WRITE,
MAP_SHARED, fd, 0);
if (p == MAP_FAILED) {
perror("mmap");
exit(1);
}
p[0] = 'a';
if (munmap(p, 4096) < 0) {
perror("munmap");
exit(1);
}
if (close(fd) < 0) {
perror("close");
exit(1);
}
exit(0);
}
Fixes: 31143d5d515e ("AFS: implement basic file write support")
Reported-by: Jonathan Billings <jsbillin@umich.edu>
Tested-by: Jonathan Billings <jsbillin@umich.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
free the symlink body after the same RCU delay we have for freeing the
struct inode itself, so that traversal during RCU pathwalk wouldn't step
into freed memory.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Pull NFS client bugfixes from Trond Myklebust:
"Highlights include:
Stable fixes:
- Fix nfs4_lock_state refcounting in nfs4_alloc_{lock,unlock}data()
- fix mount/umount race in nlmclnt.
- NFSv4.1 don't free interrupted slot on open
Bugfixes:
- Don't let RPC_SOFTCONN tasks time out if the transport is connected
- Fix a typo in nfs_init_timeout_values()
- Fix layoutstats handling during read failovers
- fix uninitialized variable warning"
* tag 'nfs-for-5.1-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
SUNRPC: fix uninitialized variable warning
pNFS/flexfiles: Fix layoutstats handling during read failovers
NFS: Fix a typo in nfs_init_timeout_values()
SUNRPC: Don't let RPC_SOFTCONN tasks time out if the transport is connected
NFSv4.1 don't free interrupted slot on open
NFS: fix mount/umount race in nlmclnt.
NFS: Fix nfs4_lock_state refcounting in nfs4_alloc_{lock,unlock}data()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- fsync fixes: i_size for truncate vs fsync, dio vs buffered during
snapshotting, remove complicated but incomplete assertion
- removed excessive warnigs, misreported device stats updates
- fix raid56 page mapping for 32bit arch
- fixes reported by static analyzer
* tag 'for-5.1-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
Btrfs: fix assertion failure on fsync with NO_HOLES enabled
btrfs: Avoid possible qgroup_rsv_size overflow in btrfs_calculate_inode_block_rsv_size
btrfs: Fix bound checking in qgroup_trace_new_subtree_blocks
btrfs: raid56: properly unmap parity page in finish_parity_scrub()
btrfs: don't report readahead errors and don't update statistics
Btrfs: fix file corruption after snapshotting due to mix of buffered/DIO writes
btrfs: remove WARN_ON in log_dir_items
Btrfs: fix incorrect file size after shrinking truncate and fsync
|