diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2012-07-30 09:53:50 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2012-07-30 09:53:50 -0700 |
commit | 8da8533dfb0929c5ea5d9fdf60ea6d3ffa02127d (patch) | |
tree | 1f9fe13e150dae31cf48ac3a88d5003040c2ec98 /Documentation | |
parent | f50f118c4974f7c2208a54f96452165ffb880471 (diff) | |
parent | c2078e4c9120e7b38b1a02cd9fc6dd4f792110bf (diff) | |
download | lwn-8da8533dfb0929c5ea5d9fdf60ea6d3ffa02127d.tar.gz lwn-8da8533dfb0929c5ea5d9fdf60ea6d3ffa02127d.zip |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac
Pull EDAC patches from Mauro Carvalho Chehab:
- the second part of the EDAC rework:
- Add the sysfs nodes that exports the real memory layout, instead
of the fake one (needed to properly represent Intel memory
controllers since 2002)
- convert EDAC MC to use "struct device" instead of creating the
sysfs nodes via the kobj API
- adds a tracepoint to represent memory errors
- some cleanup patches
- some fixes at i5000, i5400 and EDAC core
- a new EDAC driver for Caldera.
* git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: (33 commits)
edac i5000, i5400: fix pointer math in i5000_get_mc_regs()
edac: allow specifying the error count with fake_inject
edac: add support for Calxeda highbank L2 cache ecc
edac: add support for Calxeda highbank memory controller
edac: create top-level debugfs directory
sb_edac: properly handle error count
i7core_edac: properly handle error count
edac: edac_mc_handle_error(): add an error_count parameter
edac: remove arch-specific parameter for the error handler
amd64_edac: Don't pass driver name as an error parameter
edac_mc: check for allocation failure in edac_mc_alloc()
edac: Increase version to 3.0.0
edac_mc: Cleanup per-dimm_info debug messages
edac: Convert debugfX to edac_dbg(X,
edac: Use more normal debugging macro style
edac: Don't add __func__ or __FILE__ for debugf[0-9] msgs
Edac: Add ABI Documentation for the new device nodes
edac: move documentation ABI to ABI/testing/sysfs-devices-edac
i7core_edac: change the mem allocation scheme to make Documentation/kobject.txt happy
edac: change the mem allocation scheme to make Documentation/kobject.txt happy
...
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/ABI/testing/sysfs-devices-edac | 140 | ||||
-rw-r--r-- | Documentation/devicetree/bindings/arm/calxeda/l2ecc.txt | 15 | ||||
-rw-r--r-- | Documentation/devicetree/bindings/arm/calxeda/mem-ctrlr.txt | 14 | ||||
-rw-r--r-- | Documentation/edac.txt | 112 |
4 files changed, 177 insertions, 104 deletions
diff --git a/Documentation/ABI/testing/sysfs-devices-edac b/Documentation/ABI/testing/sysfs-devices-edac new file mode 100644 index 000000000000..30ee78aaed75 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-devices-edac @@ -0,0 +1,140 @@ +What: /sys/devices/system/edac/mc/mc*/reset_counters +Date: January 2006 +Contact: linux-edac@vger.kernel.org +Description: This write-only control file will zero all the statistical + counters for UE and CE errors on the given memory controller. + Zeroing the counters will also reset the timer indicating how + long since the last counter were reset. This is useful for + computing errors/time. Since the counters are always reset + at driver initialization time, no module/kernel parameter + is available. + +What: /sys/devices/system/edac/mc/mc*/seconds_since_reset +Date: January 2006 +Contact: linux-edac@vger.kernel.org +Description: This attribute file displays how many seconds have elapsed + since the last counter reset. This can be used with the error + counters to measure error rates. + +What: /sys/devices/system/edac/mc/mc*/mc_name +Date: January 2006 +Contact: linux-edac@vger.kernel.org +Description: This attribute file displays the type of memory controller + that is being utilized. + +What: /sys/devices/system/edac/mc/mc*/size_mb +Date: January 2006 +Contact: linux-edac@vger.kernel.org +Description: This attribute file displays, in count of megabytes, of memory + that this memory controller manages. + +What: /sys/devices/system/edac/mc/mc*/ue_count +Date: January 2006 +Contact: linux-edac@vger.kernel.org +Description: This attribute file displays the total count of uncorrectable + errors that have occurred on this memory controller. If + panic_on_ue is set, this counter will not have a chance to + increment, since EDAC will panic the system + +What: /sys/devices/system/edac/mc/mc*/ue_noinfo_count +Date: January 2006 +Contact: linux-edac@vger.kernel.org +Description: This attribute file displays the number of UEs that have + occurred on this memory controller with no information as to + which DIMM slot is having errors. + +What: /sys/devices/system/edac/mc/mc*/ce_count +Date: January 2006 +Contact: linux-edac@vger.kernel.org +Description: This attribute file displays the total count of correctable + errors that have occurred on this memory controller. This + count is very important to examine. CEs provide early + indications that a DIMM is beginning to fail. This count + field should be monitored for non-zero values and report + such information to the system administrator. + +What: /sys/devices/system/edac/mc/mc*/ce_noinfo_count +Date: January 2006 +Contact: linux-edac@vger.kernel.org +Description: This attribute file displays the number of CEs that + have occurred on this memory controller wherewith no + information as to which DIMM slot is having errors. Memory is + handicapped, but operational, yet no information is available + to indicate which slot the failing memory is in. This count + field should be also be monitored for non-zero values. + +What: /sys/devices/system/edac/mc/mc*/sdram_scrub_rate +Date: February 2007 +Contact: linux-edac@vger.kernel.org +Description: Read/Write attribute file that controls memory scrubbing. + The scrubbing rate used by the memory controller is set by + writing a minimum bandwidth in bytes/sec to the attribute file. + The rate will be translated to an internal value that gives at + least the specified rate. + Reading the file will return the actual scrubbing rate employed. + If configuration fails or memory scrubbing is not implemented, + the value of the attribute file will be -1. + +What: /sys/devices/system/edac/mc/mc*/max_location +Date: April 2012 +Contact: Mauro Carvalho Chehab <mchehab@redhat.com> + linux-edac@vger.kernel.org +Description: This attribute file displays the information about the last + available memory slot in this memory controller. It is used by + userspace tools in order to display the memory filling layout. + +What: /sys/devices/system/edac/mc/mc*/(dimm|rank)*/size +Date: April 2012 +Contact: Mauro Carvalho Chehab <mchehab@redhat.com> + linux-edac@vger.kernel.org +Description: This attribute file will display the size of dimm or rank. + For dimm*/size, this is the size, in MB of the DIMM memory + stick. For rank*/size, this is the size, in MB for one rank + of the DIMM memory stick. On single rank memories (1R), this + is also the total size of the dimm. On dual rank (2R) memories, + this is half the size of the total DIMM memories. + +What: /sys/devices/system/edac/mc/mc*/(dimm|rank)*/dimm_dev_type +Date: April 2012 +Contact: Mauro Carvalho Chehab <mchehab@redhat.com> + linux-edac@vger.kernel.org +Description: This attribute file will display what type of DRAM device is + being utilized on this DIMM (x1, x2, x4, x8, ...). + +What: /sys/devices/system/edac/mc/mc*/(dimm|rank)*/dimm_edac_mode +Date: April 2012 +Contact: Mauro Carvalho Chehab <mchehab@redhat.com> + linux-edac@vger.kernel.org +Description: This attribute file will display what type of Error detection + and correction is being utilized. For example: S4ECD4ED would + mean a Chipkill with x4 DRAM. + +What: /sys/devices/system/edac/mc/mc*/(dimm|rank)*/dimm_label +Date: April 2012 +Contact: Mauro Carvalho Chehab <mchehab@redhat.com> + linux-edac@vger.kernel.org +Description: This control file allows this DIMM to have a label assigned + to it. With this label in the module, when errors occur + the output can provide the DIMM label in the system log. + This becomes vital for panic events to isolate the + cause of the UE event. + DIMM Labels must be assigned after booting, with information + that correctly identifies the physical slot with its + silk screen label. This information is currently very + motherboard specific and determination of this information + must occur in userland at this time. + +What: /sys/devices/system/edac/mc/mc*/(dimm|rank)*/dimm_location +Date: April 2012 +Contact: Mauro Carvalho Chehab <mchehab@redhat.com> + linux-edac@vger.kernel.org +Description: This attribute file will display the location (csrow/channel, + branch/channel/slot or channel/slot) of the dimm or rank. + +What: /sys/devices/system/edac/mc/mc*/(dimm|rank)*/dimm_mem_type +Date: April 2012 +Contact: Mauro Carvalho Chehab <mchehab@redhat.com> + linux-edac@vger.kernel.org +Description: This attribute file will display what type of memory is + currently on this csrow. Normally, either buffered or + unbuffered memory (for example, Unbuffered-DDR3). diff --git a/Documentation/devicetree/bindings/arm/calxeda/l2ecc.txt b/Documentation/devicetree/bindings/arm/calxeda/l2ecc.txt new file mode 100644 index 000000000000..94e642a33db0 --- /dev/null +++ b/Documentation/devicetree/bindings/arm/calxeda/l2ecc.txt @@ -0,0 +1,15 @@ +Calxeda Highbank L2 cache ECC + +Properties: +- compatible : Should be "calxeda,hb-sregs-l2-ecc" +- reg : Address and size for ECC error interrupt clear registers. +- interrupts : Should be single bit error interrupt, then double bit error + interrupt. + +Example: + + sregs@fff3c200 { + compatible = "calxeda,hb-sregs-l2-ecc"; + reg = <0xfff3c200 0x100>; + interrupts = <0 71 4 0 72 4>; + }; diff --git a/Documentation/devicetree/bindings/arm/calxeda/mem-ctrlr.txt b/Documentation/devicetree/bindings/arm/calxeda/mem-ctrlr.txt new file mode 100644 index 000000000000..f770ac0893d4 --- /dev/null +++ b/Documentation/devicetree/bindings/arm/calxeda/mem-ctrlr.txt @@ -0,0 +1,14 @@ +Calxeda DDR memory controller + +Properties: +- compatible : Should be "calxeda,hb-ddr-ctrl" +- reg : Address and size for DDR controller registers. +- interrupts : Interrupt for DDR controller. + +Example: + + memory-controller@fff00000 { + compatible = "calxeda,hb-ddr-ctrl"; + reg = <0xfff00000 0x1000>; + interrupts = <0 91 4>; + }; diff --git a/Documentation/edac.txt b/Documentation/edac.txt index 03df2b020332..56c7e936430f 100644 --- a/Documentation/edac.txt +++ b/Documentation/edac.txt @@ -232,116 +232,20 @@ EDAC control and attribute files. In 'mcX' directories are EDAC control and attribute files for -this 'X' instance of the memory controllers: - - -Counter reset control file: - - 'reset_counters' - - This write-only control file will zero all the statistical counters - for UE and CE errors. Zeroing the counters will also reset the timer - indicating how long since the last counter zero. This is useful - for computing errors/time. Since the counters are always reset at - driver initialization time, no module/kernel parameter is available. - - RUN TIME: echo "anything" >/sys/devices/system/edac/mc/mc0/counter_reset - - This resets the counters on memory controller 0 - - -Seconds since last counter reset control file: - - 'seconds_since_reset' - - This attribute file displays how many seconds have elapsed since the - last counter reset. This can be used with the error counters to - measure error rates. - - - -Memory Controller name attribute file: - - 'mc_name' - - This attribute file displays the type of memory controller - that is being utilized. - - -Total memory managed by this memory controller attribute file: - - 'size_mb' - - This attribute file displays, in count of megabytes, of memory - that this instance of memory controller manages. - - -Total Uncorrectable Errors count attribute file: - - 'ue_count' - - This attribute file displays the total count of uncorrectable - errors that have occurred on this memory controller. If panic_on_ue - is set this counter will not have a chance to increment, - since EDAC will panic the system. - - -Total UE count that had no information attribute fileY: - - 'ue_noinfo_count' - - This attribute file displays the number of UEs that have occurred - with no information as to which DIMM slot is having errors. - - -Total Correctable Errors count attribute file: - - 'ce_count' - - This attribute file displays the total count of correctable - errors that have occurred on this memory controller. This - count is very important to examine. CEs provide early - indications that a DIMM is beginning to fail. This count - field should be monitored for non-zero values and report - such information to the system administrator. - - -Total Correctable Errors count attribute file: - - 'ce_noinfo_count' - - This attribute file displays the number of CEs that - have occurred wherewith no information as to which DIMM slot - is having errors. Memory is handicapped, but operational, - yet no information is available to indicate which slot - the failing memory is in. This count field should be also - be monitored for non-zero values. - -Device Symlink: - - 'device' - - Symlink to the memory controller device. - -Sdram memory scrubbing rate: - - 'sdram_scrub_rate' - - Read/Write attribute file that controls memory scrubbing. The scrubbing - rate is set by writing a minimum bandwidth in bytes/sec to the attribute - file. The rate will be translated to an internal value that gives at - least the specified rate. - - Reading the file will return the actual scrubbing rate employed. - - If configuration fails or memory scrubbing is not implemented, accessing - that attribute will fail. +this 'X' instance of the memory controllers. +For a description of the sysfs API, please see: + Documentation/ABI/testing/sysfs/devices-edac ============================================================================ 'csrowX' DIRECTORIES +When CONFIG_EDAC_LEGACY_SYSFS is enabled, the sysfs will contain the +csrowX directories. As this API doesn't work properly for Rambus, FB-DIMMs +and modern Intel Memory Controllers, this is being deprecated in favor +of dimmX directories. + In the 'csrowX' directories are EDAC control and attribute files for this 'X' instance of csrow: |