lwn.git - Linux kernel documentation tree maintained by Jonathan Corbet

Age	Commit message (Collapse)	Author
2015-06-30	Merge branch 'for-linus-4.2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs updates from Chris Mason: "Outside of our usual batch of fixes, this integrates the subvolume quota updates that Qu Wenruo from Fujitsu has been working on for a few releases now. He gets an extra gold star for making btrfs smaller this time, and fixing a number of quota corners in the process. Dave Sterba tested and integrated Anand Jain's sysfs improvements. Outside of exporting a symbol (ack'd by Greg) these are all internal to btrfs and it's mostly cleanups and fixes. Anand also attached some of our sysfs objects to our internal device management structs instead of an object off the super block. It will make device management easier overall and it's a better fit for how the sysfs files are used. None of the existing sysfs files are moved around. Thanks for all the fixes everyone" * 'for-linus-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (87 commits) btrfs: delayed-ref: double free in btrfs_add_delayed_tree_ref() Btrfs: Check if kobject is initialized before put lib: export symbol kobject_move() Btrfs: sysfs: add support to show replacing target in the sysfs Btrfs: free the stale device Btrfs: use received_uuid of parent during send Btrfs: fix use-after-free in btrfs_replay_log btrfs: wait for delayed iputs on no space btrfs: qgroup: Make snapshot accounting work with new extent-oriented qgroup. btrfs: qgroup: Add the ability to skip given qgroup for old/new_roots. btrfs: ulist: Add ulist_del() function. btrfs: qgroup: Cleanup the old ref_node-oriented mechanism. btrfs: qgroup: Switch self test to extent-oriented qgroup mechanism. btrfs: qgroup: Switch to new extent-oriented qgroup mechanism. btrfs: qgroup: Switch rescan to new mechanism. btrfs: qgroup: Add new qgroup calculation function btrfs_qgroup_account_extents(). btrfs: backref: Add special time_seq == (u64)-1 case for btrfs_find_all_roots(). btrfs: qgroup: Add new function to record old_roots. btrfs: qgroup: Record possible quota-related extent for qgroup. btrfs: qgroup: Add function qgroup_update_counters(). ...
2015-06-25	lib/kobject.c: use strreplace()	Rasmus Villemoes
	There's probably not many slashes in the name, but starting over when we see one feels wrong. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-19	lib: export symbol kobject_move()	Anand Jain
	drivers/cpufreq/cpufreq.c is already using this function. And now btrfs needs it as well. Export symbol kobject_move(). Signed-off-by: Anand Jain <anand.jain@oracle.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David Sterba <dsterba@suse.cz>
2015-03-25	kobject: WARN as tip when call kobject_get() to a kobject not initialized	Ethan Zhao
	call kobject_get() to kojbect that is not initalized or released will only leave following like call trace to us: -----------[ cut here ]------------ [ 54.545816] WARNING: CPU: 0 PID: 213 at include/linux/kref.h:47 kobject_get+0x41/0x50() [ 54.642595] Modules linked in: i2c_i801(+) mfd_core shpchp(+) acpi_cpufreq(+) edac_core ioatdma(+) xfs libcrc32c ast syscopyarea ixgbe sysfillrect sysimgblt sr_mod sd_mod drm_kms_helper igb mdio cdrom e1000e ahci dca ttm libahci uas drm i2c_algo_bit ptp megaraid_sas libata usb_storage i2c_core pps_core dm_mirror dm_region_hash dm_log dm_mod [ 55.007264] CPU: 0 PID: 213 Comm: kworker/0:2 Not tainted 3.18.5 [ 55.099970] Hardware name: Oracle Corporation SUN FIRE X4170 M2 SERVER /ASSY,MOTHERBOARD,X4170, BIOS 08120104 05/08/2012 [ 55.239736] Workqueue: kacpi_notify acpi_os_execute_deferred [ 55.308598] 0000000000000000 00000000bd730b61 ffff88046742baf8 ffffffff816b7edb [ 55.398305] 0000000000000000 0000000000000000 ffff88046742bb38 ffffffff81078ae1 [ 55.488040] ffff88046742bbd8 ffff8806706b3000 0000000000000292 0000000000000000 [ 55.577776] Call Trace: [ 55.608228] [<ffffffff816b7edb>] dump_stack+0x46/0x58 [ 55.670895] [<ffffffff81078ae1>] warn_slowpath_common+0x81/0xa0 [ 55.743952] [<ffffffff81078bfa>] warn_slowpath_null+0x1a/0x20 [ 55.814929] [<ffffffff8130d0d1>] kobject_get+0x41/0x50 [ 55.878654] [<ffffffff8153e955>] cpufreq_cpu_get+0x75/0xc0 [ 55.946528] [<ffffffff8153f37e>] cpufreq_update_policy+0x2e/0x1f0 The above issue was casued by a race condition, if there is a WARN in kobject_get() of the kobject is not initialized, that would save us much time to debug it. Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-07	kobject: fix NULL pointer derefernce in kobj_child_ns_ops	Pankaj Dubey
	We will hit NULL pointer dereference if we call platform_device_register_simple or platform_device_add at very early stage. I have observed following crash when called platform_device_add from "init_irq" hook of machine_desc. This patch fixes this issue and let system handle this case gracefully instead of kernel panic. [0.000000] Unable to handle kernel NULL pointer dereference at virtual address 0000000c [0.000000] pgd = c0004000 [0.000000] [0000000c] *pgd=00000000 [0.000000] Internal error: Oops: 5 [#1] PREEMPT ARM [0.000000] Modules linked in: [0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G W 3.17.0-rc6-00198-ga1603f1-dirty #319 [0.000000] task: c05b23f0 ti: c05a8000 task.ti: c05a8000 [0.000000] PC is at kobject_namespace+0x18/0x58 [0.000000] LR is at kobject_add_internal+0x90/0x2ec [snip] [0.000000] [<c01b1df0>] (kobject_namespace) from [<c01b2338>] (kobject_add_internal+0x90/0x2ec) [0.000000] [<c01b2338>] (kobject_add_internal) from [<c01b2728>] (kobject_add+0x4c/0x98) [0.000000] [<c01b2728>] (kobject_add) from [<c0226274>] (device_add+0xe8/0x51c) [0.000000] [<c0226274>] (device_add) from [<c0229c70>] (platform_device_add+0xb4/0x214) [0.000000] [<c0229c70>] (platform_device_add) from [<c022a338>] (platform_device_register_full+0xb8/0xdc) [0.000000] [<c022a338>] (platform_device_register_full) from [<c0570214>] (exynos_init_irq+0x90/0x9c) [0.000000] [<c0570214>] (exynos_init_irq) from [<c056c18c>] (init_IRQ+0x2c/0x78) [0.000000] [<c056c18c>] (init_IRQ) from [<c0569a54>] (start_kernel+0x22c/0x378) [0.000000] [<c0569a54>] (start_kernel) from [<40008070>] (0x40008070) [0.000000] Code: e590000c e3500000 0a00000e e5903014 (e593300c) Signed-off-by: Pankaj Dubey <pankaj.dubey@samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-07	sysfs, kobject: add sysfs wrapper for kernfs_enable_ns()	Tejun Heo
	Currently, kobject is invoking kernfs_enable_ns() directly. This is fine now as sysfs and kernfs are enabled and disabled together. If sysfs is disabled, kernfs_enable_ns() is switched to dummy implementation too and everything is fine; however, kernfs will soon have its own config option CONFIG_KERNFS and !SYSFS && KERNFS will be possible, which can make kobject call into non-dummy kernfs_enable_ns() with NULL kernfs_node pointers leading to an oops. Introduce sysfs_enable_ns() which is a wrapper around kernfs_enable_ns() so that it can be made a noop depending only on CONFIG_SYSFS regardless of the planned CONFIG_KERNFS. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-30	Merge branch 'for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs updates from Chris Mason: "This is a pretty big pull, and most of these changes have been floating in btrfs-next for a long time. Filipe's properties work is a cool building block for inheriting attributes like compression down on a per inode basis. Jeff Mahoney kicked in code to export filesystem info into sysfs. Otherwise, lots of performance improvements, cleanups and bug fixes. Looks like there are still a few other small pending incrementals, but I wanted to get the bulk of this in first" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (149 commits) Btrfs: fix spin_unlock in check_ref_cleanup Btrfs: setup inode location during btrfs_init_inode_locked Btrfs: don't use ram_bytes for uncompressed inline items Btrfs: fix btrfs_search_slot_for_read backwards iteration Btrfs: do not export ulist functions Btrfs: rework ulist with list+rb_tree Btrfs: fix memory leaks on walking backrefs failure Btrfs: fix send file hole detection leading to data corruption Btrfs: add a reschedule point in btrfs_find_all_roots() Btrfs: make send's file extent item search more efficient Btrfs: fix to catch all errors when resolving indirect ref Btrfs: fix protection between walking backrefs and root deletion btrfs: fix warning while merging two adjacent extents Btrfs: fix infinite path build loops in incremental send btrfs: undo sysfs when open_ctree() fails Btrfs: fix snprintf usage by send's gen_unique_name btrfs: fix defrag 32-bit integer overflow btrfs: sysfs: list the NO_HOLES feature btrfs: sysfs: don't show reserved incompat feature btrfs: call permission checks earlier in ioctls and return EPERM ...
2014-01-28	kobject: export kobj_sysfs_ops	Jeff Mahoney
	struct kobj_attribute implements the baseline attribute functionality that can be used all over the place. We should export the ops associated with it. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-01-08	kobject: Fix source code comment spelling	Bart Van Assche
	Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-04	Revert "kobject: introduce kobj_completion"	Greg Kroah-Hartman
	This reverts commit eee031649707db3c9920d9498f8d03819b74fc23. Jeff writes: I have no objections to reverting it. There were concerns from Al Viro that it'd be tough to get right by callers and I had assumed it got dropped after that. I had planned on using it in my btrfs sysfs exports patchset but came up with a better way. Cc: Jeff Mahoney <jeffm@suse.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-11	kernfs: s/sysfs_dirent/kernfs_node/ and rename its friends accordingly	Tejun Heo
	kernfs has just been separated out from sysfs and we're already in full conflict mode. Nothing can make the situation any worse. Let's take the chance to name things properly. This patch performs the following renames. * s/sysfs_elem_dir/kernfs_elem_dir/ * s/sysfs_elem_symlink/kernfs_elem_symlink/ * s/sysfs_elem_attr/kernfs_elem_file/ * s/sysfs_dirent/kernfs_node/ * s/sd/kn/ in kernfs proper * s/parent_sd/parent/ * s/target_sd/target/ * s/dir_sd/parent/ * s/to_sysfs_dirent()/rb_to_kn()/ * misc renames of local vars when they conflict with the above Because md, mic and gpio dig into sysfs details, this patch ends up modifying them. All are sysfs_dirent renames and trivial. While we can avoid these by introducing a dummy wrapping struct sysfs_dirent around kernfs_node, given the limited usage outside kernfs and sysfs proper, I don't think such workaround is called for. This patch is strictly rename only and doesn't introduce any functional difference. - mic / gpio renames were missing. Spotted by kbuild test robot. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Neil Brown <neilb@suse.de> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Ashutosh Dixit <ashutosh.dixit@intel.com> Cc: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-08	kobject: fix memory leak in kobject_set_name_vargs	Maurizio Lombardi
	If the call to kvasprintf fails then the old name of the object will be leaked, this patch fixes the bug by restoring the old name before returning ENOMEM. Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-07	kobject: remove kset from sysfs immediately in kset_unregister()	Bjorn Helgaas
	There's no "unlink from sysfs" interface for ksets, so I think callers of kset_unregister() expect the kset to be removed from sysfs immediately, without waiting for the last reference to be released. This patch makes the sysfs removal happen immediately, so the caller may create a new kset with the same name as soon as kset_unregister() returns. Without this, every caller has to call "kobject_del(&kset->kobj)" first unless it knows it will never create a new kset with the same name. This sometimes shows up on module unload and reload, where the reload fails because it tries to create a kobject with the same name as one from the original load that still exists. CONFIG_DEBUG_KOBJECT_RELEASE=y makes this problem easier to hit. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-07	kobject: delay kobject release for random time	Bjorn Helgaas
	When CONFIG_DEBUG_KOBJECT_RELEASE=y, delay kobject release functions for a random time between 1 and 8 seconds, which effectively changes the order in which they're called. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29	sysfs, kernfs: introduce kernfs_create_dir[_ns]()	Tejun Heo
	Introduce kernfs interface to manipulate a directory which takes and returns sysfs_dirents. create_dir() is renamed to kernfs_create_dir_ns() and its argumantes and return value are updated. create_dir() usages are replaced with kernfs_create_dir_ns() and sysfs_create_subdir() usages are replaced with kernfs_create_dir(). Dup warnings are handled explicitly by sysfs users of the kernfs interface. sysfs_enable_ns() is renamed to kernfs_enable_ns(). This patch doesn't introduce any behavior changes. v2: Dummy implementation for !CONFIG_SYSFS updated to return -ENOSYS. v3: kernfs_enable_ns() added. v4: Refreshed on top of "sysfs: drop kobj_ns_type handling, take #2" so that this patch removes sysfs_enable_ns(). Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-27	sysfs: drop kobj_ns_type handling, take #2	Tejun Heo
	The way namespace tags are implemented in sysfs is more complicated than necessary. As each tag is a pointer value and required to be non-NULL under a namespace enabled parent, there's no need to record separately what type each tag is. If multiple namespace types are needed, which currently aren't, we can simply compare the tag to a set of allowed tags in the superblock assuming that the tags, being pointers, won't have the same value across multiple types. This patch rips out kobj_ns_type handling from sysfs. sysfs now has an enable switch to turn on namespace under a node. If enabled, all children are required to have non-NULL namespace tags and filtered against the super_block's tag. kobject namespace determination is now performed in lib/kobject.c::create_dir() making sysfs_read_ns_type() unnecessary. The sanity checks are also moved. create_dir() is restructured to ease such addition. This removes most kobject namespace knowledge from sysfs proper which will enable proper separation and layering of sysfs. This is the second try. The first one was cb26a311578e ("sysfs: drop kobj_ns_type handling") which tried to automatically enable namespace if there are children with non-NULL namespace tags; however, it was broken for symlinks as they should inherit the target's tag iff namespace is enabled in the parent. This led to namespace filtering enabled incorrectly for wireless net class devices through phy80211 symlinks and thus network configuration failure. a1212d278c05 ("Revert "sysfs: drop kobj_ns_type handling"") reverted the commit. This shouldn't introduce any behavior changes, for real. v2: Dummy implementation of sysfs_enable_ns() for !CONFIG_SYSFS was missing and caused build failure. Reported by kbuild test robot. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Kay Sievers <kay@vrfy.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-07	Revert "sysfs: drop kobj_ns_type handling"	Linus Torvalds
	This reverts commit cb26a311578e67769e92a39a0a63476533cb7e12. It mysteriously causes NetworkManager to not find the wireless device for me. As far as I can tell, Tejun meant for this commit to not make any semantic changes, but there clearly are some. So revert it, taking into account some of the calling convention changes that happened in this area in subsequent commits. Cc: Tejun Heo <tj@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-11	kobject: show debug info on delayed kobject release	Fengguang Wu
	Useful for locating buggy drivers on kernel oops. It may add dozens of new lines to boot dmesg. DEBUG_KOBJECT_RELEASE is hopefully only enabled in debug kernels (like maybe the Fedora rawhide one, or at developers), so being a bit more verbose is likely ok. Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-03	kobject: grab an extra reference on kobject->sd to allow duplicate deletes	Tejun Heo
	sysfs currently has a rather weird behavior regarding removals. A directory removal would delete all files directly under it but wouldn't recurse into subdirectories, which, while a bit inconsistent, seems to make sense at the first glance as each directory is supposedly associated with a kobject and each kobject can take care of the directory deletion; however, this doesn't really hold as we have groups which can be directories without a kobject associated with it and require explicit deletions. We're in the process of separating out sysfs from kboject / driver core and want a consistent behavior. A removal should delete either only the specified node or everything under it. I think it is helpful to support recursive atomic removal and later patches will implement it. Such change means that a sysfs_dirent associated with kobject may be deleted before the kobject itself is removed if one of its ancestor gets removed before it. As sysfs_remove_dir() puts the base ref, we may end up with dangling pointer on descendants. This can be solved by holding an extra reference on the sd from kobject. Acquire an extra reference on the associated sysfs_dirent on directory creation and put it after removal. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-29	Merge 3.12-rc3 into driver-core-next	Greg Kroah-Hartman
	We want the driver core and sysfs fixes in here to make merges and development easier. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-27	sysfs: Allow mounting without CONFIG_NET	Eric W. Biederman
	In kobj_ns_current_may_mount the default should be to allow the mount. The test is only for a single kobj_ns_type at a time, and unless there is a reason to prevent it the mounting sysfs should be allowed. Subsystems that are not registered can't have are not involved so can't have a reason to prevent mounting sysfs. This is a bug-fix to commit 7dc5dbc879bd ("sysfs: Restrict mounting sysfs") that came in via the userns tree during the 3.12 merge window. Reported-and-tested-by: James Hogan <james.hogan@imgtec.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-26	kobject: introduce kobj_completion	Jeff Mahoney
	A common way to handle kobject lifetimes in embedded in objects with different lifetime rules is to pair the kobject with a struct completion. This introduces a kobj_completion structure that can be used in place of the pairing, along with several convenience functions for initialization, release, and put-and-wait. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-26	sysfs: drop kobj_ns_type handling	Tejun Heo
	The way namespace tags are implemented in sysfs is more complicated than necessary. As each tag is a pointer value and required to be non-NULL under a namespace enabled parent, there's no need to record separately what type each tag is or where namespace is enabled. If multiple namespace types are needed, which currently aren't, we can simply compare the tag to a set of allowed tags in the superblock assuming that the tags, being pointers, won't have the same value across multiple types. Also, whether to filter by namespace tag or not can be trivially determined by whether the node has any tagged children or not. This patch rips out kobj_ns_type handling from sysfs. sysfs no longer cares whether specific type of namespace is enabled or not. If a sysfs_dirent has a non-NULL tag, the parent is marked as needing namespace filtering and the value is tested against the allowed set of tags for the superblock (currently only one but increasing this number isn't difficult) and the sysfs_dirent is ignored if it doesn't match. This removes most kobject namespace knowledge from sysfs proper which will enable proper separation and layering of sysfs. The namespace sanity checks in fs/sysfs/dir.c are replaced by the new sanity check in kobject_namespace(). As this is the only place ktype->namespace() is called for sysfs, this doesn't weaken the sanity check significantly. I omitted converting the sanity check in sysfs_do_create_link_sd(). While the check can be shifted to upper layer, mistakes there are well contained and should be easily visible anyway. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Kay Sievers <kay@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-26	sysfs: remove ktype->namespace() invocations in directory code	Tejun Heo
	For some unrecognizable reason, namespace information is communicated to sysfs through ktype->namespace() callback when there's nothing which needs the use of a callback. The whole sequence of operations is completely synchronous and sysfs operations simply end up calling back into the layer which just invoked it in order to find out the namespace information, which is completely backwards, obfuscates what's going on and unnecessarily tangles two separate layers. This patch doesn't remove ktype->namespace() but shifts its handling to kobject layer. We probably want to get rid of the callback in the long term. This patch adds an explicit param to sysfs_{create\|rename\|move}_dir() and renames them to sysfs_{create\|rename\|move}_dir_ns(), respectively. ktype->namespace() invocations are moved to the calling sites of the above functions. A new helper kboject_namespace() is introduced which directly tests kobj_ns_type_operations->type which should give the same result as testing sysfs_fs_type(parent_sd) and returns @kobj's namespace tag as necessary. kobject_namespace() is extern as it will be used from another file in the following patches. This patch should be an equivalent conversion without any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Kay Sievers <kay@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-07	Merge branch 'for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull namespace changes from Eric Biederman: "This is an assorted mishmash of small cleanups, enhancements and bug fixes. The major theme is user namespace mount restrictions. nsown_capable is killed as it encourages not thinking about details that need to be considered. A very hard to hit pid namespace exiting bug was finally tracked and fixed. A couple of cleanups to the basic namespace infrastructure. Finally there is an enhancement that makes per user namespace capabilities usable as capabilities, and an enhancement that allows the per userns root to nice other processes in the user namespace" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: userns: Kill nsown_capable it makes the wrong thing easy capabilities: allow nice if we are privileged pidns: Don't have unshare(CLONE_NEWPID) imply CLONE_THREAD userns: Allow PR_CAPBSET_DROP in a user namespace. namespaces: Simplify copy_namespaces so it is clear what is going on. pidns: Fix hang in zap_pid_ns_processes by sending a potentially extra wakeup sysfs: Restrict mounting sysfs userns: Better restrictions on when proc and sysfs can be mounted vfs: Don't copy mount bind mounts of /proc/<pid>/ns/mnt between namespaces kernel/nsproxy.c: Improving a snippet of code. proc: Restrict mounting the proc filesystem vfs: Lock in place mounts from more privileged users
2013-08-28	sysfs: Restrict mounting sysfs	Eric W. Biederman
	Don't allow mounting sysfs unless the caller has CAP_SYS_ADMIN rights over the net namespace. The principle here is if you create or have capabilities over it you can mount it, otherwise you get to live with what other people have mounted. Instead of testing this with a straight forward ns_capable call, perform this check the long and torturous way with kobject helpers, this keeps direct knowledge of namespaces out of sysfs, and preserves the existing sysfs abstractions. Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-07-25	kobject: delayed kobject release: help find buggy drivers	Russell King
	Implement debugging for kobject release functions. kobjects are reference counted, so the drop of the last reference to them is not predictable. However, the common case is for the last reference to be the kobject's removal from a subsystem, which results in the release function being immediately called. This can hide subtle bugs, which can occur when another thread holds a reference to the kobject at the same time that a kobject is removed. This results in the release method being delayed. In order to make these kinds of problems more visible, the following patch implements a delayed release; this has the effect that the release function will be out of order with respect to the removal of the kobject in the same manner that it would be if a reference was being held. This provides us with an easy way to allow driver writers to debug their drivers and fix otherwise hidden problems. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-07	kobject: sanitize argument for format string	Kees Cook
	Unlike kobject_set_name(), the kset_create_and_add() interface does not provide a way to use format strings, so make sure that the interface cannot be abused accidentally. It looks like all current callers use static strings, so there's no existing flaw. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-07	kref: minor cleanup	Anatol Pomozov
	- make warning smp-safe - result of atomic _unless_zero functions should be checked by caller to avoid use-after-free error - trivial whitespace fix. Link: https://lkml.org/lkml/2013/4/12/391 Tested: compile x86, boot machine and run xfstests Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com> [ Removed line-break, changed to use WARN_ON_ONCE() - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-13	kobject: fix kset_find_obj() race with concurrent last kobject_put()	Linus Torvalds
	Anatol Pomozov identified a race condition that hits module unloading and re-loading. To quote Anatol: "This is a race codition that exists between kset_find_obj() and kobject_put(). kset_find_obj() might return kobject that has refcount equal to 0 if this kobject is freeing by kobject_put() in other thread. Here is timeline for the crash in case if kset_find_obj() searches for an object tht nobody holds and other thread is doing kobject_put() on the same kobject: THREAD A (calls kset_find_obj()) THREAD B (calls kobject_put()) splin_lock() atomic_dec_return(kobj->kref), counter gets zero here ... starts kobject cleanup .... spin_lock() // WAIT thread A in kobj_kset_leave() iterate over kset->list atomic_inc(kobj->kref) (counter becomes 1) spin_unlock() spin_lock() // taken // it does not know that thread A increased counter so it remove obj from list spin_unlock() vfree(module) // frees module object with containing kobj // kobj points to freed memory area!! kobject_put(kobj) // OOPS!!!! The race above happens because module.c tries to use kset_find_obj() when somebody unloads module. The module.c code was introduced in commit 6494a93d55fa" Anatol supplied a patch specific for module.c that worked around the problem by simply not using kset_find_obj() at all, but rather than make a local band-aid, this just fixes kset_find_obj() to be thread-safe using the proper model of refusing the get a new reference if the refcount has already dropped to zero. See examples of this proper refcount handling not only in the kref documentation, but in various other equivalent uses of this pattern by grepping for atomic_inc_not_zero(). [ Side note: the module race does indicate that module loading and unloading is not properly serialized wrt sysfs information using the module mutex. That may require further thought, but this is the correct fix at the kobject layer regardless. ] Reported-analyzed-and-tested-by: Anatol Pomozov <anatol.pomozov@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-07	kobject: fix the uncorrect comment	Zhi Yong Wu
	Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-02	Merge 3.4-rc5 into driver-core-next	Greg Kroah-Hartman
	This was done to resolve a merge issue with the init/main.c file. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-23	lib/kobject.c : Remove redundant check in create_dir	yan
	create_dir is a static function used only in kobject_add_internal. There's no need to do check here, for kobject_add_internal will reject kobject with invalid name. Signed-off-by: Yan Hong <clouds.yan@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-10	kobject: provide more diagnostic info for kobject_add_internal() failures	Dan Williams
	1/ convert open-coded KERN_ERR+dump_stack() to WARN(), so that automated tools pick up this warning. 2/ include the 'child' and 'parent' kobject names. This information was useful for tracking down the case where scsi invoked device_del() on a parent object and subsequently invoked device_add() on a child. Now the warning looks like: kobject_add_internal failed for target8:0:16 (error: -2 parent: end_device-8:0:24) Pid: 2942, comm: scsi_scan_8 Not tainted 3.3.0-rc7-isci+ #2 Call Trace: [<ffffffff8125e551>] kobject_add_internal+0x1c1/0x1f3 [<ffffffff81075149>] ? trace_hardirqs_on+0xd/0xf [<ffffffff8125e659>] kobject_add_varg+0x41/0x50 [<ffffffff8125e723>] kobject_add+0x64/0x66 [<ffffffff8131124b>] device_add+0x12d/0x63a [<ffffffff8125e0ef>] ? kobject_put+0x4c/0x50 [<ffffffff8132f370>] scsi_sysfs_add_sdev+0x4e/0x28a [<ffffffff8132dce3>] do_scan_async+0x9c/0x145 Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: James Bottomley <JBottomley@parallels.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-07	lib: reduce the use of module.h wherever possible	Paul Gortmaker
	For files only using THIS_MODULE and/or EXPORT_SYMBOL, map them onto including export.h -- or if the file isn't even using those, then just delete the include. Fix up any implicit include dependencies that were being masked by module.h along the way. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-12-21	kobject: remove kset_find_obj_hinted()	Kay Sievers
	Now that there are no in-kernel users of this function, remove it as it is no longer needed. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-12	Delay struct net freeing while there's a sysfs instance refering to it	Al Viro
	* new refcount in struct net, controlling actual freeing of the memory * new method in kobj_ns_type_operations (->drop_ns()) * ->current_ns() semantics change - it's supposed to be followed by corresponding ->drop_ns(). For struct net in case of CONFIG_NET_NS it bumps the new refcount; net_drop_ns() decrements it and calls net_free() if the last reference has been dropped. Method renamed to ->grab_current_ns(). * old net_free() callers call net_drop_ns() instead. * sysfs_exit_ns() is gone, along with a large part of callchain leading to it; now that the references stored in ->ns[...] stay valid we do not need to hunt them down and replace them with NULL. That fixes problems in sysfs_lookup() and sysfs_readdir(), along with getting rid of sb->s_instances abuse. Note that struct net shutdown logics has not changed - net_cleanup() is called exactly when it used to be called. The only thing postponed by having a sysfs instance refering to that struct net is actual freeing of memory occupied by struct net. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-22	kobject: Introduce kset_find_obj_hinted.	Robin Holt
	One call chain getting to kset_find_obj is: link_mem_sections() find_mem_section() kset_find_obj() This is done during boot. The memory sections were added in a linearly increasing order and link_mem_sections tends to utilize them in that same linear order. Introduce a kset_find_obj_hinted which is passed the result of the previous kset_find_obj which it uses for a quick "is the next object our desired object" check before falling back to the old behavior. Signed-off-by: Robin Holt <holt@sgi.com> To: Robert P. J. Day <rpjday@crashcourse.ca> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-21	sysfs: Comment sysfs directory tagging logic	Serge E. Hallyn
	Add some in-line comments to explain the new infrastructure, which was introduced to support sysfs directory tagging with namespaces. I think an overall description someplace might be good too, but it didn't really seem to fit into Documentation/filesystems/sysfs.txt, which appears more geared toward users, rather than maintainers, of sysfs. (Tejun, please let me know if I can make anything clearer or failed altogether to comment something that should be commented.) Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-21	sysfs: Implement sysfs tagged directory support.	Eric W. Biederman
	The problem. When implementing a network namespace I need to be able to have multiple network devices with the same name. Currently this is a problem for /sys/class/net/, /sys/devices/virtual/net/, and potentially a few other directories of the form /sys/ ... /net/. What this patch does is to add an additional tag field to the sysfs dirent structure. For directories that should show different contents depending on the context such as /sys/class/net/, and /sys/devices/virtual/net/ this tag field is used to specify the context in which those directories should be visible. Effectively this is the same as creating multiple distinct directories with the same name but internally to sysfs the result is nicer. I am calling the concept of a single directory that looks like multiple directories all at the same path in the filesystem tagged directories. For the networking namespace the set of directories whose contents I need to filter with tags can depend on the presence or absence of hotplug hardware or which modules are currently loaded. Which means I need a simple race free way to setup those directories as tagged. To achieve a reace free design all tagged directories are created and managed by sysfs itself. Users of this interface: - define a type in the sysfs_tag_type enumeration. - call sysfs_register_ns_types with the type and it's operations - sysfs_exit_ns when an individual tag is no longer valid - Implement mount_ns() which returns the ns of the calling process so we can attach it to a sysfs superblock. - Implement ktype.namespace() which returns the ns of a syfs kobject. Everything else is left up to sysfs and the driver layer. For the network namespace mount_ns and namespace() are essentially one line functions, and look to remain that. Tags are currently represented a const void pointers as that is both generic, prevides enough information for equality comparisons, and is trivial to create for current users, as it is just the existing namespace pointer. The work needed in sysfs is more extensive. At each directory or symlink creating I need to check if the directory it is being created in is a tagged directory and if so generate the appropriate tag to place on the sysfs_dirent. Likewise at each symlink or directory removal I need to check if the sysfs directory it is being removed from is a tagged directory and if so figure out which tag goes along with the name I am deleting. Currently only directories which hold kobjects, and symlinks are supported. There is not enough information in the current file attribute interfaces to give us anything to discriminate on which makes it useless, and there are no potential users which makes it an uninteresting problem to solve. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-21	kobj: Add basic infrastructure for dealing with namespaces.	Eric W. Biederman
	Move complete knowledge of namespaces into the kobject layer so we can use that information when reporting kobjects to userspace. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-07	Driver core: Constify struct sysfs_ops in struct kobj_type	Emese Revfy
	Constify struct sysfs_ops. This is part of the ops structure constification effort started by Arjan van de Ven et al. Benefits of this constification: * prevents modification of data that is shared (referenced) by many other structure instances at runtime * detects/prevents accidental (but not intentional) modification attempts on archs that enforce read-only kernel data at runtime * potentially better optimized code as the compiler can assume that the const data cannot be changed * the compiler/linker move const data into .rodata and therefore exclude them from false sharing Signed-off-by: Emese Revfy <re.emese@gmail.com> Acked-by: David Teigland <teigland@redhat.com> Acked-by: Matt Domsch <Matt_Domsch@dell.com> Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com> Acked-by: Hans J. Koch <hjk@linutronix.de> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Jens Axboe <jens.axboe@oracle.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-07	kobject: Constify struct kset_uevent_ops	Emese Revfy
	Constify struct kset_uevent_ops. This is part of the ops structure constification effort started by Arjan van de Ven et al. Benefits of this constification: * prevents modification of data that is shared (referenced) by many other structure instances at runtime * detects/prevents accidental (but not intentional) modification attempts on archs that enforce read-only kernel data at runtime * potentially better optimized code as the compiler can assume that the const data cannot be changed * the compiler/linker move const data into .rodata and therefore exclude them from false sharing Signed-off-by: Emese Revfy <re.emese@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-15	kobject: make kset_create check kobject_set_name return value	Dave Young
	kset_create should check the kobject_set_name return value. Add the return value checking code. Signed-off-by: Dave Young <hidave.darkstar@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-04-20	driver: dont update dev_name via device_add path	Kay Sievers
	notice one system /proc/iomem some entries missed the name for pci_devices it turns that dev->dev.kobj name is changed after device_add. for pci code: via acpi_pci_root_driver.ops.add (aka acpi_pci_root_add) ==> pci_acpi_scan_root is used to scan pci bus/device, and at the same time we read the resource for pci_dev in the pci_read_bases, we have res->name = pci_name(pci_dev); pci_name is calling dev_name. later via acpi_pci_root_driver.ops.start (aka acpi_pci_root_start) ==> pci_bus_add_device to add all pci_dev in kobj tree. pci_bus_add_device will call device_add. actually in device_add /* first, register with generic layer. */ error = kobject_add(&dev->kobj, dev->kobj.parent, "%s", dev_name(dev)); if (error) goto Error; will get one new name for that kobj, old name is freed. [Impact: fix corrupted names in /proc/iomem ] Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-24	driver core: get rid of struct device's bus_id string array	Kay Sievers
	Now that all users of bus_id is gone, we can remove it from struct device. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-10-16	kobject: Cleanup kobject_rename and !CONFIG_SYSFS	Eric W. Biederman
	It finally dawned on me what the clean fix to sysfs_rename_dir calling kobject_set_name is. Move the work into kobject_rename where it belongs. The callers serialize us anyway so this is safe. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-10-16	kobject: Fix kobject_rename and !CONFIG_SYSFS	Eric W. Biederman
	When looking at kobject_rename I found two bugs with that exist when sysfs support is disabled in the kernel. kobject_rename does not change the name on the kobject when sysfs support is not compiled in. kobject_rename without locking attempts to check the validity of a rename operation, which the kobject layer simply does not have the infrastructure to do. This patch documents the previously unstated requirement of kobject_rename that is the responsibility of the caller to provide mutual exclusion and to be certain that the new_name for the kobject is valid. This patch modifies sysfs_rename_dir in !CONFIG_SYSFS case to call kobject_set_name to actually change the kobject_name. This patch removes the bogus and misleading check in kobject_rename that attempts to see if a rename is valid. The check is bogus because we do not have the proper locking. The check is misleading because it looks like we can and do perform checking at the kobject level that we don't. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-08-21	kobject: Replace ALL occurrences of '/' with '!' instead of only the first one.	Ingo Oeser
	A recent patch from Kay Sievers <kay.sievers@vrfy.org> replaced the first occurrence of '/' with '!' as needed for block devices. Now do some cheap defensive coding and replace all of them to avoid future issues in this area. Signed-off-by: Ingo Oeser <ioe-lkml@rameria.de> Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-25	Example use of WARN()	Arjan van de Ven
	Now that WARN() exists, we can fold some of the printk's into it. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>