summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2016-12-08Revert "kthread: Pin the stack via try_get_task_stack()/put_task_stack() in ↵Oleg Nesterov
to_live_kthread() function" This reverts commit 23196f2e5f5d810578a772785807dcdc2b9fdce9. Now that struct kthread is kmalloc'ed and not longer on the task stack there is no need anymore to pin the stack. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Chunming Zhou <David1.Zhou@amd.com> Cc: Roman Pen <roman.penyaev@profitbricks.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Tejun Heo <tj@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20161129175100.GA5333@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-08kthread: Make struct kthread kmalloc'edOleg Nesterov
commit 23196f2e5f5d "kthread: Pin the stack via try_get_task_stack() / put_task_stack() in to_live_kthread() function" is a workaround for the fragile design of struct kthread being allocated on the task stack. struct kthread in its current form should be removed, but this needs cleanups outside of kthread.c. As a first step move struct kthread away from the task stack by making it kmalloc'ed. This allows to access kthread.exited without the magic of trying to pin task stack and the try logic in to_live_kthread(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Chunming Zhou <David1.Zhou@amd.com> Cc: Roman Pen <roman.penyaev@profitbricks.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Tejun Heo <tj@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20161129175057.GA5330@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-24sched: Extend scheduler's asym packingTim Chen
We generalize the scheduler's asym packing to provide an ordering of the cpu beyond just the cpu number. This allows the use of the ASYM_PACKING scheduler machinery to move loads to preferred CPU in a sched domain. The preference is defined with the cpu priority given by arch_asym_cpu_priority(cpu). We also record the most preferred cpu in a sched group when we build the cpu's capacity for fast lookup of preferred cpu during load balancing. Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: linux-pm@vger.kernel.org Cc: jolsa@redhat.com Cc: rjw@rjwysocki.net Cc: linux-acpi@vger.kernel.org Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: bp@suse.de Link: http://lkml.kernel.org/r/0e73ae12737dfaafa46c07066cc7c5d3f1675e46.1479844244.git.tim.c.chen@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-23sched/fair: Clean up the tunable parameter definitionsIngo Molnar
No change in functionality: - align the default values vertically to make them easier to scan - standardize the 'default:' lines - fix minor whitespace typos Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-23sched/dl: Fix comment in pick_next_task_dl()T.Zhou
Fix cut & paste oversight: s/pull_rt_task/pull_dl_task Signed-off-by: T.Zhou <t1zhou@163.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: juri.lelli@gmail.com Link: http://lkml.kernel.org/r/20161123004832.GA2983@geo Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-23Merge branch 'linus' into sched/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-22sched/autogroup: Do not use autogroup->tg in zombie threadsOleg Nesterov
Exactly because for_each_thread() in autogroup_move_group() can't see it and update its ->sched_task_group before _put() and possibly free(). So the exiting task needs another sched_move_task() before exit_notify() and we need to re-introduce the PF_EXITING (or similar) check removed by the previous change for another reason. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: hartsjc@redhat.com Cc: vbendel@redhat.com Cc: vlovejoy@redhat.com Link: http://lkml.kernel.org/r/20161114184612.GA15968@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-22sched/autogroup: Fix autogroup_move_group() to never skip sched_move_task()Oleg Nesterov
The PF_EXITING check in task_wants_autogroup() is no longer needed. Remove it, but see the next patch. However the comment is correct in that autogroup_move_group() must always change task_group() for every thread so the sysctl_ check is very wrong; we can race with cgroups and even sys_setsid() is not safe because a task running with task_group() == ag->tg must participate in refcounting: int main(void) { int sctl = open("/proc/sys/kernel/sched_autogroup_enabled", O_WRONLY); assert(sctl > 0); if (fork()) { wait(NULL); // destroy the child's ag/tg pause(); } assert(pwrite(sctl, "1\n", 2, 0) == 2); assert(setsid() > 0); if (fork()) pause(); kill(getppid(), SIGKILL); sleep(1); // The child has gone, the grandchild runs with kref == 1 assert(pwrite(sctl, "0\n", 2, 0) == 2); assert(setsid() > 0); // runs with the freed ag/tg for (;;) sleep(1); return 0; } crashes the kernel. It doesn't really need sleep(1), it doesn't matter if autogroup_move_group() actually frees the task_group or this happens later. Reported-by: Vern Lovejoy <vlovejoy@redhat.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: hartsjc@redhat.com Cc: vbendel@redhat.com Link: http://lkml.kernel.org/r/20161114184609.GA15965@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparcLinus Torvalds
Pull sparc fixes from David Miller: 1) With modern networking cards we can run out of 32-bit DMA space, so support 64-bit DMA addressing when possible on sparc64. From Dave Tushar. 2) Some signal frame validation checks are inverted on sparc32, fix from Andreas Larsson. 3) Lockdep tables can get too large in some circumstances on sparc64, add a way to adjust the size a bit. From Babu Moger. 4) Fix NUMA node probing on some sun4v systems, from Thomas Tai. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sparc: drop duplicate header scatterlist.h lockdep: Limit static allocations if PROVE_LOCKING_SMALL is defined config: Adding the new config parameter CONFIG_PROVE_LOCKING_SMALL for sparc sunbmac: Fix compiler warning sunqe: Fix compiler warnings sparc64: Enable 64-bit DMA sparc64: Enable sun4v dma ops to use IOMMU v2 APIs sparc64: Bind PCIe devices to use IOMMU v2 service sparc64: Initialize iommu_map_table and iommu_pool sparc64: Add ATU (new IOMMU) support sparc64: Add FORCE_MAX_ZONEORDER and default to 13 sparc64: fix compile warning section mismatch in find_node() sparc32: Fix inverted invalid_frame_pointer checks on sigreturns sparc64: Fix find_node warning if numa node cannot be found
2016-11-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Clear congestion control state when changing algorithms on an existing socket, from Florian Westphal. 2) Fix register bit values in altr_tse_pcs portion of stmmac driver, from Jia Jie Ho. 3) Fix PTP handling in stammc driver for GMAC4, from Giuseppe CAVALLARO. 4) Fix udplite multicast delivery handling, it ignores the udp_table parameter passed into the lookups, from Pablo Neira Ayuso. 5) Synchronize the space estimated by rtnl_vfinfo_size and the space actually used by rtnl_fill_vfinfo. From Sabrina Dubroca. 6) Fix memory leak in fib_info when splitting nodes, from Alexander Duyck. 7) If a driver does a napi_hash_del() explicitily and not via netif_napi_del(), it must perform RCU synchronization as needed. Fix this in virtio-net and bnxt drivers, from Eric Dumazet. 8) Likewise, it is not necessary to invoke napi_hash_del() is we are also doing neif_napi_del() in the same code path. Remove such calls from be2net and cxgb4 drivers, also from Eric Dumazet. 9) Don't allocate an ID in peernet2id_alloc() if the netns is dead, from WANG Cong. 10) Fix OF node and device struct leaks in of_mdio, from Johan Hovold. 11) We cannot cache routes in ip6_tunnel when using inherited traffic classes, from Paolo Abeni. 12) Fix several crashes and leaks in cpsw driver, from Johan Hovold. 13) Splice operations cannot use freezable blocking calls in AF_UNIX, from WANG Cong. 14) Link dump filtering by master device and kind support added an error in loop index updates during the dump if we actually do filter, fix from Zhang Shengju. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (59 commits) tcp: zero ca_priv area when switching cc algorithms net: l2tp: Treat NET_XMIT_CN as success in l2tp_eth_dev_xmit ethernet: stmmac: make DWMAC_STM32 depend on it's associated SoC tipc: eliminate obsolete socket locking policy description rtnl: fix the loop index update error in rtnl_dump_ifinfo() l2tp: fix racy SOCK_ZAPPED flag check in l2tp_ip{,6}_bind() net: macb: add check for dma mapping error in start_xmit() rtnetlink: fix FDB size computation netns: fix get_net_ns_by_fd(int pid) typo af_unix: conditionally use freezable blocking calls in read net: ethernet: ti: cpsw: fix fixed-link phy probe deferral net: ethernet: ti: cpsw: add missing sanity check net: ethernet: ti: cpsw: fix secondary-emac probe error path net: ethernet: ti: cpsw: fix of_node and phydev leaks net: ethernet: ti: cpsw: fix deferred probe net: ethernet: ti: cpsw: fix mdio device reference leak net: ethernet: ti: cpsw: fix bad register access in probe error path net: sky2: Fix shutdown crash cfg80211: limit scan results cache size net sched filters: pass netlink message flags in event notification ...
2016-11-18lockdep: Limit static allocations if PROVE_LOCKING_SMALL is definedBabu Moger
Reduce the size of data structure for lockdep entries by half if PROVE_LOCKING_SMALL if defined. This is used only for sparc. Signed-off-by: Babu Moger <babu.moger@oracle.com> Acked-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16bpf: fix range arithmetic for bpf map accessJosef Bacik
I made some invalid assumptions with BPF_AND and BPF_MOD that could result in invalid accesses to bpf map entries. Fix this up by doing a few things 1) Kill BPF_MOD support. This doesn't actually get used by the compiler in real life and just adds extra complexity. 2) Fix the logic for BPF_AND, don't allow AND of negative numbers and set the minimum value to 0 for positive AND's. 3) Don't do operations on the ranges if they are set to the limits, as they are by definition undefined, and allowing arithmetic operations on those values could make them appear valid when they really aren't. This fixes the testcase provided by Jann as well as a few other theoretical problems. Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16sched/fair: Fix task group initializationVincent Guittot
The moves of tasks are now propagated down to root and the utilization of cfs_rq reflects reality so it doesn't need to be estimated at init. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten.Rasmussen@arm.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Cc: kernellwp@gmail.com Cc: pjt@google.com Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1478598827-32372-7-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-16sched/fair: Propagate asynchrous detachVincent Guittot
A task can be asynchronously detached from cfs_rq when migrating between CPUs. The load of the migrated task is then removed from source cfs_rq during its next update. We use this event to set propagation flag. During the load balance, we take advantage of the update of blocked load to propagate any pending changes. The propagation relies on patch: "sched: Fix hierarchical order in rq->leaf_cfs_rq_list" ... which orders children and parents, to ensure that it's done in one pass. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten.Rasmussen@arm.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Cc: kernellwp@gmail.com Cc: pjt@google.com Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1478598827-32372-6-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-16sched/fair: Propagate load during synchronous attach/detachVincent Guittot
When a task moves from/to a cfs_rq, we set a flag which is then used to propagate the change at parent level (sched_entity and cfs_rq) during next update. If the cfs_rq is throttled, the flag will stay pending until the cfs_rq is unthrottled. For propagating the utilization, we copy the utilization of group cfs_rq to the sched_entity. For propagating the load, we have to take into account the load of the whole task group in order to evaluate the load of the sched_entity. Similarly to what was done before the rewrite of PELT, we add a correction factor in case the task group's load is greater than its share so it will contribute the same load of a task of equal weight. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten.Rasmussen@arm.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Cc: kernellwp@gmail.com Cc: pjt@google.com Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1478598827-32372-5-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-16sched/fair: Factorize PELT updateVincent Guittot
Every time we modify load/utilization of sched_entity, we start to sync it with its cfs_rq. This update is done in different ways: - when attaching/detaching a sched_entity, we update cfs_rq and then we sync the entity with the cfs_rq. - when enqueueing/dequeuing the sched_entity, we update both sched_entity and cfs_rq metrics to now. Use update_load_avg() everytime we have to update and sync cfs_rq and sched_entity before changing the state of a sched_enity. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten.Rasmussen@arm.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Cc: kernellwp@gmail.com Cc: pjt@google.com Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1478598827-32372-4-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-16sched/fair: Fix hierarchical order in rq->leaf_cfs_rq_listVincent Guittot
Fix the insertion of cfs_rq in rq->leaf_cfs_rq_list to ensure that a child will always be called before its parent. The hierarchical order in shares update list has been introduced by commit: 67e86250f8ea ("sched: Introduce hierarchal order on shares update list") With the current implementation a child can be still put after its parent. Lets take the example of: root \ b /\ c d* | e* with root -> b -> c already enqueued but not d -> e so the leaf_cfs_rq_list looks like: head -> c -> b -> root -> tail The branch d -> e will be added the first time that they are enqueued, starting with e then d. When e is added, its parents is not already on the list so e is put at the tail : head -> c -> b -> root -> e -> tail Then, d is added at the head because its parent is already on the list: head -> d -> c -> b -> root -> e -> tail e is not placed at the right position and will be called the last whereas it should be called at the beginning. Because it follows the bottom-up enqueue sequence, we are sure that we will finished to add either a cfs_rq without parent or a cfs_rq with a parent that is already on the list. We can use this event to detect when we have finished to add a new branch. For the others, whose parents are not already added, we have to ensure that they will be added after their children that have just been inserted the steps before, and after any potential parents that are already in the list. The easiest way is to put the cfs_rq just after the last inserted one and to keep track of it untl the branch is fully added. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten.Rasmussen@arm.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Cc: kernellwp@gmail.com Cc: pjt@google.com Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1478598827-32372-3-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-16sched/fair: Factorize attach/detach entityVincent Guittot
Factorize post_init_entity_util_avg() and part of attach_task_cfs_rq() in one function attach_entity_cfs_rq(). Create symmetric detach_entity_cfs_rq() function. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten.Rasmussen@arm.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Cc: kernellwp@gmail.com Cc: pjt@google.com Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1478598827-32372-2-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-16sched/fair: Fix incorrect comment for capacity_marginMorten Rasmussen
The comment for capacity_margin introduced in: 3273163c6775 ("sched/fair: Let asymmetric CPU configurations balance at wake-up") ... got its usage the wrong way round - fix it. Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dietmar.eggemann@arm.com Cc: freedom.tan@mediatek.com Cc: keita.kobayashi.ym@renesas.com Cc: mgalbraith@suse.de Cc: sgurrappadi@nvidia.com Cc: vincent.guittot@linaro.org Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1476452472-24740-7-git-send-email-morten.rasmussen@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-16sched/fair: Avoid pulling tasks from non-overloaded higher capacity groupsMorten Rasmussen
For asymmetric CPU capacity systems it is counter-productive for throughput if low capacity CPUs are pulling tasks from non-overloaded CPUs with higher capacity. The assumption is that higher CPU capacity is preferred over running alone in a group with lower CPU capacity. This patch rejects higher CPU capacity groups with one or less task per CPU as potential busiest group which could otherwise lead to a series of failing load-balancing attempts leading to a force-migration. Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dietmar.eggemann@arm.com Cc: freedom.tan@mediatek.com Cc: keita.kobayashi.ym@renesas.com Cc: mgalbraith@suse.de Cc: sgurrappadi@nvidia.com Cc: vincent.guittot@linaro.org Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1476452472-24740-5-git-send-email-morten.rasmussen@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-16sched/fair: Add per-CPU min capacity to sched_group_capacityMorten Rasmussen
struct sched_group_capacity currently represents the compute capacity sum of all CPUs in the sched_group. Unless it is divided by the group_weight to get the average capacity per CPU, it hides differences in CPU capacity for mixed capacity systems (e.g. high RT/IRQ utilization or ARM big.LITTLE). But even the average may not be sufficient if the group covers CPUs of different capacities. Instead, by extending struct sched_group_capacity to indicate min per-CPU capacity in the group a suitable group for a given task utilization can more easily be found such that CPUs with reduced capacity can be avoided for tasks with high utilization (not implemented by this patch). Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dietmar.eggemann@arm.com Cc: freedom.tan@mediatek.com Cc: keita.kobayashi.ym@renesas.com Cc: mgalbraith@suse.de Cc: sgurrappadi@nvidia.com Cc: vincent.guittot@linaro.org Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1476452472-24740-4-git-send-email-morten.rasmussen@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-16sched/fair: Consider spare capacity in find_idlest_group()Morten Rasmussen
In low-utilization scenarios comparing relative loads in find_idlest_group() doesn't always lead to the most optimum choice. Systems with groups containing different numbers of cpus and/or cpus of different compute capacity are significantly better off when considering spare capacity rather than relative load in those scenarios. In addition to existing load based search an alternative spare capacity based candidate sched_group is found and selected instead if sufficient spare capacity exists. If not, existing behaviour is preserved. Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dietmar.eggemann@arm.com Cc: freedom.tan@mediatek.com Cc: keita.kobayashi.ym@renesas.com Cc: mgalbraith@suse.de Cc: sgurrappadi@nvidia.com Cc: vincent.guittot@linaro.org Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1476452472-24740-3-git-send-email-morten.rasmussen@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-16sched/fair: Compute task/cpu utilization at wake-up correctlyMorten Rasmussen
At task wake-up load-tracking isn't updated until the task is enqueued. The task's own view of its utilization contribution may therefore not be aligned with its contribution to the cfs_rq load-tracking which may have been updated in the meantime. Basically, the task's own utilization hasn't yet accounted for the sleep decay, while the cfs_rq may have (partially). Estimating the cfs_rq utilization in case the task is migrated at wake-up as task_rq(p)->cfs.avg.util_avg - p->se.avg.util_avg is therefore incorrect as the two load-tracking signals aren't time synchronized (different last update). To solve this problem, this patch synchronizes the task utilization with its previous rq before the task utilization is used in the wake-up path. Currently the update/synchronization is done _after_ the task has been placed by select_task_rq_fair(). The synchronization is done without having to take the rq lock using the existing mechanism used in remove_entity_load_avg(). Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dietmar.eggemann@arm.com Cc: freedom.tan@mediatek.com Cc: keita.kobayashi.ym@renesas.com Cc: mgalbraith@suse.de Cc: sgurrappadi@nvidia.com Cc: vincent.guittot@linaro.org Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1476452472-24740-2-git-send-email-morten.rasmussen@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-16sched/cpuacct: Avoid %lld seq_printf warningMartin Schwidefsky
For s390 kernel builds I keep getting this warning: kernel/sched/cpuacct.c: In function 'cpuacct_stats_show': kernel/sched/cpuacct.c:298:25: warning: format '%lld' expects argument of type 'long long int', but argument 4 has type 'clock_t {aka long int}' [-Wformat=] seq_printf(sf, "%s %lld\n", Silence the warning by adding an explicit cast. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161111142749.6545-1-schwidefsky@de.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-16Merge branch 'linus' into sched/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-15Merge tag 'trace-v4.9-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Alexei discovered a race condition in modules failing to load that can cause a ftrace check to trigger and disable ftrace. This is because of the way modules are registered to ftrace. Their functions are loaded in the ftrace function tables but set to "disabled" since they are still in the process of being loaded by the module. After the module is finished, it calls back into the ftrace infrastructure to enable it. Looking deeper into the locations that access all the functions in the table, I found more locations that should ignore the disabled ones" * tag 'trace-v4.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace: Add more checks for FTRACE_FL_DISABLED in processing ip records ftrace: Ignore FTRACE_FL_DISABLED while walking dyn_ftrace records
2016-11-15sched/cputime: Simplify task_cputime()Stanislaw Gruszka
Now since fetch_task_cputime() has no other users than task_cputime(), its code could be used directly in task_cputime(). Moreover since only 2 task_cputime() calls of 17 use a NULL argument, we can add dummy variables to those calls and remove NULL checks from task_cputimes(). Also remove NULL checks from task_cputimes_scaled(). Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Neuling <mikey@neuling.org> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1479175612-14718-5-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-15sched/cputime, powerpc, s390: Make scaled cputime arch specificStanislaw Gruszka
Only s390 and powerpc have hardware facilities allowing to measure cputimes scaled by frequency. On all other architectures utimescaled/stimescaled are equal to utime/stime (however they are accounted separately). Remove {u,s}timescaled accounting on all architectures except powerpc and s390, where those values are explicitly accounted in the proper places. Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Neuling <mikey@neuling.org> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161031162143.GB12646@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-15sched/cputime, powerpc: Remove cputime_to_scaled()Stanislaw Gruszka
Currently cputime_to_scaled() just return it's argument on all implementations, we don't need to call this function. Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Reviewed-by: Paul Mackerras <paulus@ozlabs.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Neuling <mikey@neuling.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1479175612-14718-3-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix off by one wrt. indexing when dumping /proc/net/route entries, from Alexander Duyck. 2) Fix lockdep splats in iwlwifi, from Johannes Berg. 3) Cure panic when inserting certain netfilter rules when NFT_SET_HASH is disabled, from Liping Zhang. 4) Memory leak when nft_expr_clone() fails, also from Liping Zhang. 5) Disable UFO when path will apply IPSEC tranformations, from Jakub Sitnicki. 6) Don't bogusly double cwnd in dctcp module, from Florian Westphal. 7) skb_checksum_help() should never actually use the value "0" for the resulting checksum, that has a special meaning, use CSUM_MANGLED_0 instead. From Eric Dumazet. 8) Per-tx/rx queue statistic strings are wrong in qed driver, fix from Yuval MIntz. 9) Fix SCTP reference counting of associations and transports in sctp_diag. From Xin Long. 10) When we hit ip6tunnel_xmit() we could have come from an ipv4 path in a previous layer or similar, so explicitly clear the ipv6 control block in the skb. From Eli Cooper. 11) Fix bogus sleeping inside of inet_wait_for_connect(), from WANG Cong. 12) Correct deivce ID of T6 adapter in cxgb4 driver, from Hariprasad Shenai. 13) Fix potential access past the end of the skb page frag array in tcp_sendmsg(). From Eric Dumazet. 14) 'skb' can legitimately be NULL in inet{,6}_exact_dif_match(). Fix from David Ahern. 15) Don't return an error in tcp_sendmsg() if we wronte any bytes successfully, from Eric Dumazet. 16) Extraneous unlocks in netlink_diag_dump(), we removed the locking but forgot to purge these unlock calls. From Eric Dumazet. 17) Fix memory leak in error path of __genl_register_family(). We leak the attrbuf, from WANG Cong. 18) cgroupstats netlink policy table is mis-sized, from WANG Cong. 19) Several XDP bug fixes in mlx5, from Saeed Mahameed. 20) Fix several device refcount leaks in network drivers, from Johan Hovold. 21) icmp6_send() should use skb dst device not skb->dev to determine L3 routing domain. From David Ahern. 22) ip_vs_genl_family sets maxattr incorrectly, from WANG Cong. 23) We leak new macvlan port in some cases of maclan_common_netlink() errors. Fix from Gao Feng. 24) Similar to the icmp6_send() fix, icmp_route_lookup() should determine L3 routing domain using skb_dst(skb)->dev not skb->dev. Also from David Ahern. 25) Several fixes for route offloading and FIB notification handling in mlxsw driver, from Jiri Pirko. 26) Properly cap __skb_flow_dissect()'s return value, from Eric Dumazet. 27) Fix long standing regression in ipv4 redirect handling, wrt. validating the new neighbour's reachability. From Stephen Suryaputra Lin. 28) If sk_filter() trims the packet excessively, handle it reasonably in tcp input instead of exploding. From Eric Dumazet. 29) Fix handling of napi hash state when copying channels in sfc driver, from Bert Kenward. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (121 commits) mlxsw: spectrum_router: Flush FIB tables during fini net: stmmac: Fix lack of link transition for fixed PHYs sctp: change sk state only when it has assocs in sctp_shutdown bnx2: Wait for in-flight DMA to complete at probe stage Revert "bnx2: Reset device during driver initialization" ps3_gelic: fix spelling mistake in debug message net: ethernet: ixp4xx_eth: fix spelling mistake in debug message ibmvnic: Fix size of debugfs name buffer ibmvnic: Unmap ibmvnic_statistics structure sfc: clear napi_hash state when copying channels mlxsw: spectrum_router: Correctly dump neighbour activity mlxsw: spectrum: Fix refcount bug on span entries bnxt_en: Fix VF virtual link state. bnxt_en: Fix ring arithmetic in bnxt_setup_tc(). Revert "include/uapi/linux/atm_zatm.h: include linux/time.h" tcp: take care of truncations done by sk_filter() ipv4: use new_gw for redirect neigh lookup r8152: Fix error path in open function net: bpqether.h: remove if_ether.h guard net: __skb_flow_dissect() must cap its return value ...
2016-11-14ftrace: Add more checks for FTRACE_FL_DISABLED in processing ip recordsSteven Rostedt (Red Hat)
When a module is first loaded and its function ip records are added to the ftrace list of functions to modify, they are set to DISABLED, as their text is still in a read only state. When the module is fully loaded, and can be updated, the flag is cleared, and if their's any functions that should be tracing them, it is updated at that moment. But there's several locations that do record accounting and should ignore records that are marked as disabled, or they can cause issues. Alexei already fixed one location, but others need to be addressed. Cc: stable@vger.kernel.org Fixes: b7ffffbb46f2 "ftrace: Add infrastructure for delayed enabling of module functions" Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-11-14ftrace: Ignore FTRACE_FL_DISABLED while walking dyn_ftrace recordsAlexei Starovoitov
ftrace_shutdown() checks for sanity of ftrace records and if dyn_ftrace->flags is not zero, it will warn. It can happen that 'flags' are set to FTRACE_FL_DISABLED at this point, since some module was loaded, but before ftrace_module_enable() cleared the flags for this module. In other words the module.c is doing: ftrace_module_init(mod); // calls ftrace_update_code() that sets flags=FTRACE_FL_DISABLED ... // here ftrace_shutdown() is called that warns, since err = prepare_coming_module(mod); // didn't have a chance to clear FTRACE_FL_DISABLED Fix it by ignoring disabled records. It's similar to what __ftrace_hash_rec_update() is already doing. Link: http://lkml.kernel.org/r/1478560460-3818619-1-git-send-email-ast@fb.com Cc: stable@vger.kernel.org Fixes: b7ffffbb46f2 "ftrace: Add infrastructure for delayed enabling of module functions" Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-11-14Revert "printk: make reading the kernel log flush pending lines"Linus Torvalds
This reverts commit bfd8d3f23b51018388be0411ccbc2d56277fe294. It turns out that this flushes things much too aggressiverly, and causes lines to break up when the system logger races with new continuation lines being printed. There's a pending patch to make printk() flushing much more straightforward, but it's too invasive for 4.9, so in the meantime let's just not make the system message logging flush continuation lines. They'll be flushed by the final newline anyway. Suggested-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-11-14Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Ingo Molnar: "This fixes a genirq regression that resulted in the Intel/Broxton pinctrl/GPIO driver (and possibly others) spewing warnings" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Use irq type from irqdata instead of irqdesc
2016-11-11Merge tag 'pm-4.9-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix two bugs in error code paths in the PM core (system-wide suspend of devices), a device reference leak in the boot-time suspend test code and a cpupower utility regression from the 4.7 cycle. Specifics: - Prevent the PM core from attempting to suspend parent devices if any of their children, whose suspend callbacks were invoked asynchronously, have failed to suspend during the "late" and "noirq" phases of system-wide suspend of devices (Brian Norris). - Prevent the boot-time system suspend test code from leaking a reference to the RTC device used by it (Johan Hovold). - Fix cpupower to use the return value of one of its library functions correctly and restore the correct behavior of it when used for setting cpufreq tunables broken during the 4.7 development cycle (Laura Abbott)" * tag 'pm-4.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM / sleep: don't suspend parent when async child suspend_{noirq, late} fails PM / sleep: fix device reference leak in test_suspend cpupower: Correct return type of cpu_power_is_cpu_online() in cpufreq-set
2016-11-11Merge branches 'pm-tools-fixes' and 'pm-sleep-fixes'Rafael J. Wysocki
* pm-tools-fixes: cpupower: Correct return type of cpu_power_is_cpu_online() in cpufreq-set * pm-sleep-fixes: PM / sleep: don't suspend parent when async child suspend_{noirq, late} fails PM / sleep: fix device reference leak in test_suspend
2016-11-11Revert "console: don't prefer first registered if DT specifies stdout-path"Hans de Goede
This reverts commit 05fd007e4629 ("console: don't prefer first registered if DT specifies stdout-path"). The reverted commit changes existing behavior on which many ARM boards rely. Many ARM small-board-computers, like e.g. the Raspberry Pi have both a video output and a serial console. Depending on whether the user is using the device as a more regular computer; or as a headless device we need to have the console on either one or the other. Many users rely on the kernel behavior of the console being present on both outputs, before the reverted commit the console setup with no console= kernel arguments on an ARM board which sets stdout-path in dt would look like this: [root@localhost ~]# cat /proc/consoles ttyS0 -W- (EC p a) 4:64 tty0 -WU (E p ) 4:1 Where as after the reverted commit, it looks like this: [root@localhost ~]# cat /proc/consoles ttyS0 -W- (EC p a) 4:64 This commit reverts commit 05fd007e4629 ("console: don't prefer first registered if DT specifies stdout-path") restoring the original behavior. Fixes: 05fd007e4629 ("console: don't prefer first registered if DT specifies stdout-path") Link: http://lkml.kernel.org/r/20161104121135.4780-2-hdegoede@redhat.com Signed-off-by: Hans de Goede <hdegoede@redhat.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Rob Herring <robh+dt@kernel.org> Cc: Frank Rowand <frowand.list@gmail.com> Cc: Thorsten Leemhuis <regressions@leemhuis.info> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-11-11sched/deadline: Fix typo in a commentDaniel Bristot de Oliveira
In the comment: /* * The task might have changed its scheduling policy to something * different than SCHED_DEADLINE (through switched_fromd_dl()). */ s/fromd/from/ Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luca Abeni <luca.abeni@unitn.it> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/5408b3b3f9ee197a7b7f10fb834341100a4f2c88.1478599881.git.bristot@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-11Merge branch 'linus' into sched/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-08genirq: Use irq type from irqdata instead of irqdescThomas Gleixner
The type flags in the irq descriptor are there for historical reasons and only updated via irq_modify_status() or irq_set_type(). Both functions also update the type flags in irqdata. __setup_irq() is the only left over user of the type flags in the irq descriptor. If __setup_irq() is called with empty irq type flags, then the type flags are retrieved from irqdata. If an interrupt is shared, then the type flags are compared with the type flags stored in the irq descriptor. On x86 the ioapic does not have a irq_set_type() callback because the type is defined in the BIOS tables and cannot be changed. The type is stored in irqdata at setup time without updating the type data in the irq descriptor. As a result the comparison described above fails. There is no point in updating the irq descriptor flags because the only relevant storage is irqdata. Use the type flags from irqdata for both retrieval and comparison in __setup_irq() instead. Aside of that the print out in case of non matching type flags has the old and new type flags arguments flipped. Fix that as well. For correctness sake the flags stored in the irq descriptor should be removed, but this is beyond the scope of this bugfix and will be done in a later patch. Fixes: 4b357daed698 ("genirq: Look-up trigger type if not specified by caller") Reported-and-tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Jon Hunter <jonathanh@nvidia.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1611072020360.3501@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-07bpf: fix map not being uncharged during map creation failureDaniel Borkmann
In map_create(), we first find and create the map, then once that suceeded, we charge it to the user's RLIMIT_MEMLOCK, and then fetch a new anon fd through anon_inode_getfd(). The problem is, once the latter fails f.e. due to RLIMIT_NOFILE limit, then we only destruct the map via map->ops->map_free(), but without uncharging the previously locked memory first. That means that the user_struct allocation is leaked as well as the accounted RLIMIT_MEMLOCK memory not released. Make the label names in the fix consistent with bpf_prog_load(). Fixes: aaac3ba95e4c ("bpf: charge user for creation of BPF maps and programs") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-07bpf: fix htab map destruction when extra reserve is in useDaniel Borkmann
Commit a6ed3ea65d98 ("bpf: restore behavior of bpf_map_update_elem") added an extra per-cpu reserve to the hash table map to restore old behaviour from pre prealloc times. When non-prealloc is in use for a map, then problem is that once a hash table extra element has been linked into the hash-table, and the hash table is destroyed due to refcount dropping to zero, then htab_map_free() -> delete_all_elements() will walk the whole hash table and drop all elements via htab_elem_free(). The problem is that the element from the extra reserve is first fed to the wrong backend allocator and eventually freed twice. Fixes: a6ed3ea65d98 ("bpf: restore behavior of bpf_map_update_elem") Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-05Merge branches 'sched-urgent-for-linus' and 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull stack vmap fixups from Thomas Gleixner: "Two small patches related to sched_show_task(): - make sure to hold a reference on the task stack while accessing it - remove the thread_saved_pc printout .. and add a sanity check into release_task_stack() to catch problems with task stack references" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/core: Remove pointless printout in sched_show_task() sched/core: Fix oops in sched_show_task() * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: fork: Add task stack refcounting sanity check and prevent premature task stack freeing
2016-11-03taskstats: fix the length of cgroupstats_cmd_get_policyWANG Cong
cgroupstats_cmd_get_policy is [CGROUPSTATS_CMD_ATTR_MAX+1], taskstats_cmd_get_policy[TASKSTATS_CMD_ATTR_MAX+1], but their family.maxattr is TASKSTATS_CMD_ATTR_MAX. CGROUPSTATS_CMD_ATTR_MAX is less than TASKSTATS_CMD_ATTR_MAX, so we could end up accessing out-of-bound. Change cgroupstats_cmd_get_policy to TASKSTATS_CMD_ATTR_MAX+1, this is safe because the rest are initialized to 0's. Reported-by: Andrey Konovalov <andreyknvl@google.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-03sched/core: Remove pointless printout in sched_show_task()Linus Torvalds
In sched_show_task() we print out a useless hex number, not even a symbol, and there's a big question mark whether this even makes sense anyway, I suspect we should just remove it all. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bp@alien8.de Cc: brgerst@gmail.com Cc: jann@thejh.net Cc: keescook@chromium.org Cc: linux-api@vger.kernel.org Cc: tycho.andersen@canonical.com Link: http://lkml.kernel.org/r/CA+55aFzphURPFzAvU4z6Moy7ZmimcwPuUdYU8bj9z0J+S8X1rw@mail.gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-03sched/core: Fix oops in sched_show_task()Tetsuo Handa
When CONFIG_THREAD_INFO_IN_TASK=y, it is possible that an exited thread remains in the task list after its stack pointer was already set to NULL. Therefore, thread_saved_pc() and stack_not_used() in sched_show_task() will trigger NULL pointer dereference if an attempt to dump such thread's traces (e.g. SysRq-t, khungtaskd) is made. Since show_stack() in sched_show_task() calls try_get_task_stack() and sched_show_task() is called from interrupt context, calling try_get_task_stack() from sched_show_task() will be safe as well. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Andy Lutomirski <luto@kernel.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bp@alien8.de Cc: brgerst@gmail.com Cc: jann@thejh.net Cc: keescook@chromium.org Cc: linux-api@vger.kernel.org Cc: tycho.andersen@canonical.com Link: http://lkml.kernel.org/r/201611021950.FEJ34368.HFFJOOMLtQOVSF@I-love.SAKURA.ne.jp Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-02PM / sleep: fix device reference leak in test_suspendJohan Hovold
Make sure to drop the reference taken by class_find_device() after opening the RTC device. Fixes: 77437fd4e61f (pm: boot time suspend selftest) Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-01fork: Add task stack refcounting sanity check and prevent premature task ↵Andy Lutomirski
stack freeing If something goes wrong with task stack refcounting and a stack refcount hits zero too early, warn and leak it rather than potentially freeing it early (and silently). Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/f29119c783a9680a4b4656e751b6123917ace94b.1477926663.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-28Merge tag 'pm-4.9-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix two intel_pstate issues related to the way it works when the scaling_governor sysfs attribute is set to "performance" and fix up messages in the system suspend core code. Specifics: - Fix a missing KERN_CONT in a system suspend message by converting the affected code to using pr_info() and pr_cont() instead of the "raw" printk() (Jon Hunter). - Make intel_pstate set the CPU P-state from its .set_policy() callback when the scaling_governor sysfs attribute is set to "performance" so that it interacts with NOHZ_FULL more predictably which was the case before 4.7 (Rafael Wysocki). - Make intel_pstate always request the maximum allowed P-state when the scaling_governor sysfs attribute is set to "performance" to prevent it from effectively ingoring that setting is some situations (Rafael Wysocki)" * tag 'pm-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: intel_pstate: Always set max P-state in performance mode PM / suspend: Fix missing KERN_CONT for suspend message cpufreq: intel_pstate: Set P-state upfront in performance mode
2016-10-28Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Misc kernel fixes: a virtualization environment related fix, an uncore PMU driver removal handling fix, a PowerPC fix and new events for Knights Landing" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Honour the CPUID for number of fixed counters in hypervisors perf/powerpc: Don't call perf_event_disable() from atomic context perf/core: Protect PMU device removal with a 'pmu_bus_running' check, to fix CONFIG_DEBUG_TEST_DRIVER_REMOVE=y kernel panic perf/x86/intel/cstate: Add C-state residency events for Knights Landing