summaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/ABI/testing/sysfs-devices-system-cpu4
-rw-r--r--Documentation/ABI/testing/sysfs-power29
-rw-r--r--Documentation/cpu-freq/core.txt29
-rw-r--r--Documentation/cpu-freq/cpu-drivers.txt19
-rw-r--r--Documentation/cpu-freq/index.txt4
-rw-r--r--Documentation/kernel-parameters.txt24
-rw-r--r--Documentation/power/devices.txt34
-rw-r--r--Documentation/power/opp.txt40
-rw-r--r--Documentation/power/runtime_pm.txt37
-rw-r--r--Documentation/power/states.txt87
-rw-r--r--Documentation/power/swsusp.txt5
11 files changed, 217 insertions, 95 deletions
diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index d5a0d33c571f..acb9bfc89b48 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -128,7 +128,7 @@ Description: Discover cpuidle policy and mechanism
What: /sys/devices/system/cpu/cpu#/cpufreq/*
Date: pre-git history
-Contact: cpufreq@vger.kernel.org
+Contact: linux-pm@vger.kernel.org
Description: Discover and change clock speed of CPUs
Clock scaling allows you to change the clock speed of the
@@ -146,7 +146,7 @@ Description: Discover and change clock speed of CPUs
What: /sys/devices/system/cpu/cpu#/cpufreq/freqdomain_cpus
Date: June 2013
-Contact: cpufreq@vger.kernel.org
+Contact: linux-pm@vger.kernel.org
Description: Discover CPUs in the same CPU frequency coordination domain
freqdomain_cpus is the list of CPUs (online+offline) that share
diff --git a/Documentation/ABI/testing/sysfs-power b/Documentation/ABI/testing/sysfs-power
index 64c9276e9421..f4551816329e 100644
--- a/Documentation/ABI/testing/sysfs-power
+++ b/Documentation/ABI/testing/sysfs-power
@@ -7,19 +7,30 @@ Description:
subsystem.
What: /sys/power/state
-Date: August 2006
+Date: May 2014
Contact: Rafael J. Wysocki <rjw@rjwysocki.net>
Description:
- The /sys/power/state file controls the system power state.
- Reading from this file returns what states are supported,
- which is hard-coded to 'freeze' (Low-Power Idle), 'standby'
- (Power-On Suspend), 'mem' (Suspend-to-RAM), and 'disk'
- (Suspend-to-Disk).
+ The /sys/power/state file controls system sleep states.
+ Reading from this file returns the available sleep state
+ labels, which may be "mem", "standby", "freeze" and "disk"
+ (hibernation). The meanings of the first three labels depend on
+ the relative_sleep_states command line argument as follows:
+ 1) relative_sleep_states = 1
+ "mem", "standby", "freeze" represent non-hibernation sleep
+ states from the deepest ("mem", always present) to the
+ shallowest ("freeze"). "standby" and "freeze" may or may
+ not be present depending on the capabilities of the
+ platform. "freeze" can only be present if "standby" is
+ present.
+ 2) relative_sleep_states = 0 (default)
+ "mem" - "suspend-to-RAM", present if supported.
+ "standby" - "power-on suspend", present if supported.
+ "freeze" - "suspend-to-idle", always present.
Writing to this file one of these strings causes the system to
- transition into that state. Please see the file
- Documentation/power/states.txt for a description of each of
- these states.
+ transition into the corresponding state, if available. See
+ Documentation/power/states.txt for a description of what
+ "suspend-to-RAM", "power-on suspend" and "suspend-to-idle" mean.
What: /sys/power/disk
Date: September 2006
diff --git a/Documentation/cpu-freq/core.txt b/Documentation/cpu-freq/core.txt
index 0060d76b445f..70933eadc308 100644
--- a/Documentation/cpu-freq/core.txt
+++ b/Documentation/cpu-freq/core.txt
@@ -20,6 +20,7 @@ Contents:
---------
1. CPUFreq core and interfaces
2. CPUFreq notifiers
+3. CPUFreq Table Generation with Operating Performance Point (OPP)
1. General Information
=======================
@@ -92,3 +93,31 @@ values:
cpu - number of the affected CPU
old - old frequency
new - new frequency
+
+3. CPUFreq Table Generation with Operating Performance Point (OPP)
+==================================================================
+For details about OPP, see Documentation/power/opp.txt
+
+dev_pm_opp_init_cpufreq_table - cpufreq framework typically is initialized with
+ cpufreq_frequency_table_cpuinfo which is provided with the list of
+ frequencies that are available for operation. This function provides
+ a ready to use conversion routine to translate the OPP layer's internal
+ information about the available frequencies into a format readily
+ providable to cpufreq.
+
+ WARNING: Do not use this function in interrupt context.
+
+ Example:
+ soc_pm_init()
+ {
+ /* Do things */
+ r = dev_pm_opp_init_cpufreq_table(dev, &freq_table);
+ if (!r)
+ cpufreq_frequency_table_cpuinfo(policy, freq_table);
+ /* Do other things */
+ }
+
+ NOTE: This function is available only if CONFIG_CPU_FREQ is enabled in
+ addition to CONFIG_PM_OPP.
+
+dev_pm_opp_free_cpufreq_table - Free up the table allocated by dev_pm_opp_init_cpufreq_table
diff --git a/Documentation/cpu-freq/cpu-drivers.txt b/Documentation/cpu-freq/cpu-drivers.txt
index 48da5fdcb9f1..b045fe54986a 100644
--- a/Documentation/cpu-freq/cpu-drivers.txt
+++ b/Documentation/cpu-freq/cpu-drivers.txt
@@ -228,3 +228,22 @@ is the corresponding frequency table helper for the ->target
stage. Just pass the values to this function, and the unsigned int
index returns the number of the frequency table entry which contains
the frequency the CPU shall be set to.
+
+The following macros can be used as iterators over cpufreq_frequency_table:
+
+cpufreq_for_each_entry(pos, table) - iterates over all entries of frequency
+table.
+
+cpufreq-for_each_valid_entry(pos, table) - iterates over all entries,
+excluding CPUFREQ_ENTRY_INVALID frequencies.
+Use arguments "pos" - a cpufreq_frequency_table * as a loop cursor and
+"table" - the cpufreq_frequency_table * you want to iterate over.
+
+For example:
+
+ struct cpufreq_frequency_table *pos, *driver_freq_table;
+
+ cpufreq_for_each_entry(pos, driver_freq_table) {
+ /* Do something with pos */
+ pos->frequency = ...
+ }
diff --git a/Documentation/cpu-freq/index.txt b/Documentation/cpu-freq/index.txt
index 3d0b915035b9..dc024ab4054f 100644
--- a/Documentation/cpu-freq/index.txt
+++ b/Documentation/cpu-freq/index.txt
@@ -35,8 +35,8 @@ Mailing List
------------
There is a CPU frequency changing CVS commit and general list where
you can report bugs, problems or submit patches. To post a message,
-send an email to cpufreq@vger.kernel.org, to subscribe go to
-http://vger.kernel.org/vger-lists.html#cpufreq and follow the
+send an email to linux-pm@vger.kernel.org, to subscribe go to
+http://vger.kernel.org/vger-lists.html#linux-pm and follow the
instructions there.
Links
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 4ddcbf949699..af55e13ace8f 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -214,6 +214,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
unusable. The "log_buf_len" parameter may be useful
if you need to capture more output.
+ acpi_force_table_verification [HW,ACPI]
+ Enable table checksum verification during early stage.
+ By default, this is disabled due to x86 early mapping
+ size limitation.
+
acpi_irq_balance [HW,ACPI]
ACPI will balance active IRQs
default in APIC mode
@@ -237,7 +242,15 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
This feature is enabled by default.
This option allows to turn off the feature.
- acpi_no_auto_ssdt [HW,ACPI] Disable automatic loading of SSDT
+ acpi_no_static_ssdt [HW,ACPI]
+ Disable installation of static SSDTs at early boot time
+ By default, SSDTs contained in the RSDT/XSDT will be
+ installed automatically and they will appear under
+ /sys/firmware/acpi/tables.
+ This option turns off this feature.
+ Note that specifying this option does not affect
+ dynamic table installation which will install SSDT
+ tables to /sys/firmware/acpi/tables/dynamic.
acpica_no_return_repair [HW, ACPI]
Disable AML predefined validation mechanism
@@ -2898,6 +2911,13 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
[KNL, SMP] Set scheduler's default relax_domain_level.
See Documentation/cgroups/cpusets.txt.
+ relative_sleep_states=
+ [SUSPEND] Use sleep state labeling where the deepest
+ state available other than hibernation is always "mem".
+ Format: { "0" | "1" }
+ 0 -- Traditional sleep state labels.
+ 1 -- Relative sleep state labels.
+
reserve= [KNL,BUGS] Force the kernel to ignore some iomem area
reservetop= [X86-32]
@@ -3470,7 +3490,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
the allocated input device; If set to 0, video driver
will only send out the event without touching backlight
brightness level.
- default: 1
+ default: 0
virtio_mmio.device=
[VMMIO] Memory mapped virtio (platform) device.
diff --git a/Documentation/power/devices.txt b/Documentation/power/devices.txt
index 47d46dff70f7..d172bce0fd49 100644
--- a/Documentation/power/devices.txt
+++ b/Documentation/power/devices.txt
@@ -2,6 +2,7 @@ Device Power Management
Copyright (c) 2010-2011 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
Copyright (c) 2010 Alan Stern <stern@rowland.harvard.edu>
+Copyright (c) 2014 Intel Corp., Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Most of the code in Linux is device drivers, so most of the Linux power
@@ -326,6 +327,20 @@ the phases are:
driver in some way for the upcoming system power transition, but it
should not put the device into a low-power state.
+ For devices supporting runtime power management, the return value of the
+ prepare callback can be used to indicate to the PM core that it may
+ safely leave the device in runtime suspend (if runtime-suspended
+ already), provided that all of the device's descendants are also left in
+ runtime suspend. Namely, if the prepare callback returns a positive
+ number and that happens for all of the descendants of the device too,
+ and all of them (including the device itself) are runtime-suspended, the
+ PM core will skip the suspend, suspend_late and suspend_noirq suspend
+ phases as well as the resume_noirq, resume_early and resume phases of
+ the following system resume for all of these devices. In that case,
+ the complete callback will be called directly after the prepare callback
+ and is entirely responsible for bringing the device back to the
+ functional state as appropriate.
+
2. The suspend methods should quiesce the device to stop it from performing
I/O. They also may save the device registers and put it into the
appropriate low-power state, depending on the bus type the device is on,
@@ -400,12 +415,23 @@ When resuming from freeze, standby or memory sleep, the phases are:
the resume callbacks occur; it's not necessary to wait until the
complete phase.
+ Moreover, if the preceding prepare callback returned a positive number,
+ the device may have been left in runtime suspend throughout the whole
+ system suspend and resume (the suspend, suspend_late, suspend_noirq
+ phases of system suspend and the resume_noirq, resume_early, resume
+ phases of system resume may have been skipped for it). In that case,
+ the complete callback is entirely responsible for bringing the device
+ back to the functional state after system suspend if necessary. [For
+ example, it may need to queue up a runtime resume request for the device
+ for this purpose.] To check if that is the case, the complete callback
+ can consult the device's power.direct_complete flag. Namely, if that
+ flag is set when the complete callback is being run, it has been called
+ directly after the preceding prepare and special action may be required
+ to make the device work correctly afterward.
+
At the end of these phases, drivers should be as functional as they were before
suspending: I/O can be performed using DMA and IRQs, and the relevant clocks are
-gated on. Even if the device was in a low-power state before the system sleep
-because of runtime power management, afterwards it should be back in its
-full-power state. There are multiple reasons why it's best to do this; they are
-discussed in more detail in Documentation/power/runtime_pm.txt.
+gated on.
However, the details here may again be platform-specific. For example,
some systems support multiple "run" states, and the mode in effect at
diff --git a/Documentation/power/opp.txt b/Documentation/power/opp.txt
index b8a907dc0169..a9adad828cdc 100644
--- a/Documentation/power/opp.txt
+++ b/Documentation/power/opp.txt
@@ -10,8 +10,7 @@ Contents
3. OPP Search Functions
4. OPP Availability Control Functions
5. OPP Data Retrieval Functions
-6. Cpufreq Table Generation
-7. Data Structures
+6. Data Structures
1. Introduction
===============
@@ -72,7 +71,6 @@ operations until that OPP could be re-enabled if possible.
OPP library facilitates this concept in it's implementation. The following
operational functions operate only on available opps:
opp_find_freq_{ceil, floor}, dev_pm_opp_get_voltage, dev_pm_opp_get_freq, dev_pm_opp_get_opp_count
-and dev_pm_opp_init_cpufreq_table
dev_pm_opp_find_freq_exact is meant to be used to find the opp pointer which can then
be used for dev_pm_opp_enable/disable functions to make an opp available as required.
@@ -96,10 +94,9 @@ using RCU read locks. The opp_find_freq_{exact,ceil,floor},
opp_get_{voltage, freq, opp_count} fall into this category.
opp_{add,enable,disable} are updaters which use mutex and implement it's own
-RCU locking mechanisms. dev_pm_opp_init_cpufreq_table acts as an updater and uses
-mutex to implment RCU updater strategy. These functions should *NOT* be called
-under RCU locks and other contexts that prevent blocking functions in RCU or
-mutex operations from working.
+RCU locking mechanisms. These functions should *NOT* be called under RCU locks
+and other contexts that prevent blocking functions in RCU or mutex operations
+from working.
2. Initial OPP List Registration
================================
@@ -311,34 +308,7 @@ dev_pm_opp_get_opp_count - Retrieve the number of available opps for a device
/* Do other things */
}
-6. Cpufreq Table Generation
-===========================
-dev_pm_opp_init_cpufreq_table - cpufreq framework typically is initialized with
- cpufreq_frequency_table_cpuinfo which is provided with the list of
- frequencies that are available for operation. This function provides
- a ready to use conversion routine to translate the OPP layer's internal
- information about the available frequencies into a format readily
- providable to cpufreq.
-
- WARNING: Do not use this function in interrupt context.
-
- Example:
- soc_pm_init()
- {
- /* Do things */
- r = dev_pm_opp_init_cpufreq_table(dev, &freq_table);
- if (!r)
- cpufreq_frequency_table_cpuinfo(policy, freq_table);
- /* Do other things */
- }
-
- NOTE: This function is available only if CONFIG_CPU_FREQ is enabled in
- addition to CONFIG_PM as power management feature is required to
- dynamically scale voltage and frequency in a system.
-
-dev_pm_opp_free_cpufreq_table - Free up the table allocated by dev_pm_opp_init_cpufreq_table
-
-7. Data Structures
+6. Data Structures
==================
Typically an SoC contains multiple voltage domains which are variable. Each
domain is represented by a device pointer. The relationship to OPP can be
diff --git a/Documentation/power/runtime_pm.txt b/Documentation/power/runtime_pm.txt
index 5f96daf8566a..f32ce5419573 100644
--- a/Documentation/power/runtime_pm.txt
+++ b/Documentation/power/runtime_pm.txt
@@ -2,6 +2,7 @@ Runtime Power Management Framework for I/O Devices
(C) 2009-2011 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
(C) 2010 Alan Stern <stern@rowland.harvard.edu>
+(C) 2014 Intel Corp., Rafael J. Wysocki <rafael.j.wysocki@intel.com>
1. Introduction
@@ -444,6 +445,10 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
bool pm_runtime_status_suspended(struct device *dev);
- return true if the device's runtime PM status is 'suspended'
+ bool pm_runtime_suspended_if_enabled(struct device *dev);
+ - return true if the device's runtime PM status is 'suspended' and its
+ 'power.disable_depth' field is equal to 1
+
void pm_runtime_allow(struct device *dev);
- set the power.runtime_auto flag for the device and decrease its usage
counter (used by the /sys/devices/.../power/control interface to
@@ -644,19 +649,33 @@ place (in particular, if the system is not waking up from hibernation), it may
be more efficient to leave the devices that had been suspended before the system
suspend began in the suspended state.
+To this end, the PM core provides a mechanism allowing some coordination between
+different levels of device hierarchy. Namely, if a system suspend .prepare()
+callback returns a positive number for a device, that indicates to the PM core
+that the device appears to be runtime-suspended and its state is fine, so it
+may be left in runtime suspend provided that all of its descendants are also
+left in runtime suspend. If that happens, the PM core will not execute any
+system suspend and resume callbacks for all of those devices, except for the
+complete callback, which is then entirely responsible for handling the device
+as appropriate. This only applies to system suspend transitions that are not
+related to hibernation (see Documentation/power/devices.txt for more
+information).
+
The PM core does its best to reduce the probability of race conditions between
the runtime PM and system suspend/resume (and hibernation) callbacks by carrying
out the following operations:
- * During system suspend it calls pm_runtime_get_noresume() and
- pm_runtime_barrier() for every device right before executing the
- subsystem-level .suspend() callback for it. In addition to that it calls
- __pm_runtime_disable() with 'false' as the second argument for every device
- right before executing the subsystem-level .suspend_late() callback for it.
-
- * During system resume it calls pm_runtime_enable() and pm_runtime_put()
- for every device right after executing the subsystem-level .resume_early()
- callback and right after executing the subsystem-level .resume() callback
+ * During system suspend pm_runtime_get_noresume() is called for every device
+ right before executing the subsystem-level .prepare() callback for it and
+ pm_runtime_barrier() is called for every device right before executing the
+ subsystem-level .suspend() callback for it. In addition to that the PM core
+ calls __pm_runtime_disable() with 'false' as the second argument for every
+ device right before executing the subsystem-level .suspend_late() callback
+ for it.
+
+ * During system resume pm_runtime_enable() and pm_runtime_put() are called for
+ every device right after executing the subsystem-level .resume_early()
+ callback and right after executing the subsystem-level .complete() callback
for it, respectively.
7. Generic subsystem callbacks
diff --git a/Documentation/power/states.txt b/Documentation/power/states.txt
index 442d43df9b25..50f3ef9177c1 100644
--- a/Documentation/power/states.txt
+++ b/Documentation/power/states.txt
@@ -1,62 +1,87 @@
+System Power Management Sleep States
-System Power Management States
+(C) 2014 Intel Corp., Rafael J. Wysocki <rafael.j.wysocki@intel.com>
+The kernel supports up to four system sleep states generically, although three
+of them depend on the platform support code to implement the low-level details
+for each state.
-The kernel supports four power management states generically, though
-one is generic and the other three are dependent on platform support
-code to implement the low-level details for each state.
-This file describes each state, what they are
-commonly called, what ACPI state they map to, and what string to write
-to /sys/power/state to enter that state
+The states are represented by strings that can be read or written to the
+/sys/power/state file. Those strings may be "mem", "standby", "freeze" and
+"disk", where the last one always represents hibernation (Suspend-To-Disk) and
+the meaning of the remaining ones depends on the relative_sleep_states command
+line argument.
-state: Freeze / Low-Power Idle
+For relative_sleep_states=1, the strings "mem", "standby" and "freeze" label the
+available non-hibernation sleep states from the deepest to the shallowest,
+respectively. In that case, "mem" is always present in /sys/power/state,
+because there is at least one non-hibernation sleep state in every system. If
+the given system supports two non-hibernation sleep states, "standby" is present
+in /sys/power/state in addition to "mem". If the system supports three
+non-hibernation sleep states, "freeze" will be present in /sys/power/state in
+addition to "mem" and "standby".
+
+For relative_sleep_states=0, which is the default, the following descriptions
+apply.
+
+state: Suspend-To-Idle
ACPI state: S0
-String: "freeze"
+Label: "freeze"
-This state is a generic, pure software, light-weight, low-power state.
-It allows more energy to be saved relative to idle by freezing user
+This state is a generic, pure software, light-weight, system sleep state.
+It allows more energy to be saved relative to runtime idle by freezing user
space and putting all I/O devices into low-power states (possibly
lower-power than available at run time), such that the processors can
spend more time in their idle states.
-This state can be used for platforms without Standby/Suspend-to-RAM
+
+This state can be used for platforms without Power-On Suspend/Suspend-to-RAM
support, or it can be used in addition to Suspend-to-RAM (memory sleep)
-to provide reduced resume latency.
+to provide reduced resume latency. It is always supported.
State: Standby / Power-On Suspend
ACPI State: S1
-String: "standby"
+Label: "standby"
-This state offers minimal, though real, power savings, while providing
-a very low-latency transition back to a working system. No operating
-state is lost (the CPU retains power), so the system easily starts up
+This state, if supported, offers moderate, though real, power savings, while
+providing a relatively low-latency transition back to a working system. No
+operating state is lost (the CPU retains power), so the system easily starts up
again where it left off.
-We try to put devices in a low-power state equivalent to D1, which
-also offers low power savings, but low resume latency. Not all devices
-support D1, and those that don't are left on.
+In addition to freezing user space and putting all I/O devices into low-power
+states, which is done for Suspend-To-Idle too, nonboot CPUs are taken offline
+and all low-level system functions are suspended during transitions into this
+state. For this reason, it should allow more energy to be saved relative to
+Suspend-To-Idle, but the resume latency will generally be greater than for that
+state.
State: Suspend-to-RAM
ACPI State: S3
-String: "mem"
+Label: "mem"
-This state offers significant power savings as everything in the
-system is put into a low-power state, except for memory, which is
-placed in self-refresh mode to retain its contents.
+This state, if supported, offers significant power savings as everything in the
+system is put into a low-power state, except for memory, which should be placed
+into the self-refresh mode to retain its contents. All of the steps carried out
+when entering Power-On Suspend are also carried out during transitions to STR.
+Additional operations may take place depending on the platform capabilities. In
+particular, on ACPI systems the kernel passes control to the BIOS (platform
+firmware) as the last step during STR transitions and that usually results in
+powering down some more low-level components that aren't directly controlled by
+the kernel.
-System and device state is saved and kept in memory. All devices are
-suspended and put into D3. In many cases, all peripheral buses lose
-power when entering STR, so devices must be able to handle the
-transition back to the On state.
+System and device state is saved and kept in memory. All devices are suspended
+and put into low-power states. In many cases, all peripheral buses lose power
+when entering STR, so devices must be able to handle the transition back to the
+"on" state.
-For at least ACPI, STR requires some minimal boot-strapping code to
-resume the system from STR. This may be true on other platforms.
+For at least ACPI, STR requires some minimal boot-strapping code to resume the
+system from it. This may be the case on other platforms too.
State: Suspend-to-disk
ACPI State: S4
-String: "disk"
+Label: "disk"
This state offers the greatest power savings, and can be used even in
the absence of low-level platform support for power management. This
diff --git a/Documentation/power/swsusp.txt b/Documentation/power/swsusp.txt
index 079160e22bcc..f732a8321e8a 100644
--- a/Documentation/power/swsusp.txt
+++ b/Documentation/power/swsusp.txt
@@ -220,7 +220,10 @@ Q: After resuming, system is paging heavily, leading to very bad interactivity.
A: Try running
-cat `cat /proc/[0-9]*/maps | grep / | sed 's:.* /:/:' | sort -u` > /dev/null
+cat /proc/[0-9]*/maps | grep / | sed 's:.* /:/:' | sort -u | while read file
+do
+ test -f "$file" && cat "$file" > /dev/null
+done
after resume. swapoff -a; swapon -a may also be useful.