summaryrefslogtreecommitdiff
path: root/drivers/thermal
AgeCommit message (Collapse)Author
2024-07-18thermal: core: Allow thermal zones to tell the core to ignore themRafael J. Wysocki
The iwlwifi wireless driver registers a thermal zone that is only needed when the network interface handled by it is up and it wants that thermal zone to be effectively ignored by the core otherwise. Before commit a8a261774466 ("thermal: core: Call monitor_thermal_zone() if zone temperature is invalid") that could be achieved by returning an error code from the thermal zone's .get_temp() callback because the core did not really handle errors returned by it almost at all. However, commit a8a261774466 made the core attempt to recover from the situation in which the temperature of a thermal zone cannot be determined due to errors returned by its .get_temp() and is always invalid from the core's perspective. That was done because there are thermal zones in which .get_temp() returns errors to start with due to some difficulties related to the initialization ordering, but then it will start to produce valid temperature values at one point. Unfortunately, the simple approach taken by commit a8a261774466, which is to poll the thermal zone periodically until its .get_temp() callback starts to return valid temperature values, is at odds with the special thermal zone in iwlwifi in which .get_temp() may always return an error because its network interface may always be down. If that happens, every attempt to invoke the thermal zone's .get_temp() callback resulting in an error causes the thermal core to print a dev_warn() message to the kernel log which is super-noisy. To address this problem, make the core handle the case in which .get_temp() returns 0, but the temperature value returned by it is not actually valid, in a special way. Namely, make the core completely ignore the invalid temperature value coming from .get_temp() in that case, which requires folding in update_temperature() into its caller and a few related changes. On the iwlwifi side, modify iwl_mvm_tzone_get_temp() to return 0 and put THERMAL_TEMP_INVALID into the temperature return memory location instead of returning an error when the firmware is not running or it is not of the right type. Also, to clearly separate the handling of invalid temperature values from the thermal zone initialization, introduce a special THERMAL_TEMP_INIT value specifically for the latter purpose. Fixes: a8a261774466 ("thermal: core: Call monitor_thermal_zone() if zone temperature is invalid") Closes: https://lore.kernel.org/linux-pm/20240715044527.GA1544@sol.localdomain/ Reported-by: Eric Biggers <ebiggers@kernel.org> Reported-by: Stefan Lippers-Hollmann <s.l-h@gmx.de> Link: https://bugzilla.kernel.org/show_bug.cgi?id=201761 Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Tested-by: Stefan Lippers-Hollmann <s.l-h@gmx.de> Cc: 6.10+ <stable@vger.kernel.org> # 6.10+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/4950004.31r3eYUQgx@rjwysocki.net [ rjw: Rebased on top of the current mainline ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-07-15Merge branch 'thermal-intel'Rafael J. Wysocki
Merge updates of Intel thermal drivers for 6.11-rc1: - Switch Intel thermal drivers to new Intel CPU model defines (Tony Luck). - Clean up the int3400 and int3403 drivers (Erick Archer and David Alan Gilbert). - Improve intel_pch_thermal kernel log messages printed during suspend to idle (Zhang Rui). - Make the intel_tcc_cooling driver use a model-specific bitmask for TCC offset (Ricardo Neri). - Add DLVR and MSI interrupt support for the Lunar Lake platform to the int340x thermal driver (Srinivas Pandruvada). - Enable workload type hints (WLT) support and power floor interrupt support for the Lunar Lake platform in int340x ((Srinivas Pandruvada). - Make the HFI thermal driver use package scope for HFI instances as per the Intel SDM (Zhang Rui). * thermal-intel: thermal: intel: hfi: Give HFI instances package scope thermal: intel: int340x: Enable WLT and power floor support for Lunar Lake thermal: intel: int340x: Support MSI interrupt for Lunar Lake thermal: intel: int340x: Remove unnecessary calls to free irq thermal: intel: int340x: Add DLVR support for Lunar Lake thermal: intel: int340x: Capability to map user space to firmware values thermal: intel: int340x: Cleanup of DLVR sysfs on driver remove thermal: intel: intel_tcc_cooling: Use a model-specific bitmask for TCC offset thermal: intel: intel_tcc: Add model checks for temperature registers thermal: intel: intel_pch: Improve cooling log thermal: int3403: remove unused struct 'int3403_performance_state' thermal: int3400: Use sizeof(*pointer) instead of sizeof(type) thermal: intel: intel_soc_dts_thermal: Switch to new Intel CPU model defines thermal: intel: intel_tcc_cooling: Switch to new Intel CPU model defines
2024-07-15Merge branch 'thermal-core'Rafael J. Wysocki
Merge updates related to the thermal core for 6.11-rc1: - Redesign the .set_trip_temp() thermal zone callback to take a trip pointer instead of a trip ID and update its users (Rafael Wysocki). - Avoid using invalid combinations of polling_delay and passive_delay thermal zone parameters (Rafael Wysocki). - Update a cooling device registration function to take a const argument (Krzysztof Kozlowski). - Make the uniphier thermal driver use thermal_zone_for_each_trip() for walking trip points (Rafael Wysocki). * thermal-core: thermal: core: Add sanity checks for polling_delay and passive_delay thermal: trip: Fold __thermal_zone_get_trip() into its caller thermal: trip: Pass trip pointer to .set_trip_temp() thermal zone callback thermal: imx: Drop critical trip check from imx_set_trip_temp() thermal: trip: Add conversion macros for thermal trip priv field thermal: helpers: Introduce thermal_trip_is_bound_to_cdev() thermal: core: Change passive_delay and polling_delay data type thermal: core: constify 'type' in devm_thermal_of_cooling_device_register() thermal: uniphier: Use thermal_zone_for_each_trip() for walking trip points
2024-07-15thermal/drivers/sti: Cleanup code related to stih416Raphael Gallais-Pou
"st,stih416-mpe-thermal" compatible seems to appear nowhere in the device-tree nor in the documentation. Remove compatible and related code. Signed-off-by: Raphael Gallais-Pou <rgallaispou@gmail.com> Link: https://lore.kernel.org/r/20240708161840.102004-1-rgallaispou@gmail.com Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15thermal/drivers/generic-adc: Simplify with dev_err_probe()Krzysztof Kozlowski
Error handling in probe() can be a bit simpler with dev_err_probe(). Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20240709-thermal-probe-v1-12-241644e2b6e0@linaro.org Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15thermal/drivers/generic-adc: Simplify probe() with local dev variableKrzysztof Kozlowski
Simplify the probe() function by using local 'dev' instead of &pdev->dev. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20240709-thermal-probe-v1-11-241644e2b6e0@linaro.org Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15thermal/drivers/qcom-tsens: Simplify with dev_err_probe()Krzysztof Kozlowski
Error handling in probe() can be a bit simpler with dev_err_probe(). Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20240709-thermal-probe-v1-10-241644e2b6e0@linaro.org Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15thermal/drivers/qcom-spmi-adc-tm5: Simplify with dev_err_probe()Krzysztof Kozlowski
Error handling in probe() can be a bit simpler with dev_err_probe(). Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20240709-thermal-probe-v1-9-241644e2b6e0@linaro.org Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15thermal/drivers/imx: Simplify with dev_err_probe()Krzysztof Kozlowski
Error handling in probe() can be a bit simpler with dev_err_probe(). Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20240709-thermal-probe-v1-8-241644e2b6e0@linaro.org Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15thermal/drivers/imx: Simplify probe() with local dev variableKrzysztof Kozlowski
Simplify the probe() function by using local 'dev' instead of &pdev->dev. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20240709-thermal-probe-v1-7-241644e2b6e0@linaro.org Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15thermal/drivers/hisi: Simplify with dev_err_probe()Krzysztof Kozlowski
Error handling in probe() can be a bit simpler with dev_err_probe(). Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20240709-thermal-probe-v1-6-241644e2b6e0@linaro.org Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15thermal/drivers/exynos: Simplify with dev_err_probe()Krzysztof Kozlowski
Error handling in probe() can be a bit simpler with dev_err_probe(). Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20240709-thermal-probe-v1-5-241644e2b6e0@linaro.org Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15thermal/drivers/exynos: Simplify probe() with local dev variableKrzysztof Kozlowski
Simplify the probe() function by using local 'dev' instead of &pdev->dev. While touching devm_kzalloc(), use preferred sizeof(*) syntax. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Alim Akhtar <alim.akhtar@samsung.com> Link: https://lore.kernel.org/r/20240709-thermal-probe-v1-4-241644e2b6e0@linaro.org Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15thermal/drivers/broadcom: Simplify with dev_err_probe()Krzysztof Kozlowski
Error handling in probe() can be a bit simpler with dev_err_probe(). Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20240709-thermal-probe-v1-3-241644e2b6e0@linaro.org Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15thermal/drivers/broadcom: Simplify probe() with local dev variableKrzysztof Kozlowski
Simplify the probe() function by using local 'dev' instead of &pdev->dev. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20240709-thermal-probe-v1-2-241644e2b6e0@linaro.org Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15thermal/drivers/broadcom: Fix race between removal and clock disableKrzysztof Kozlowski
During the probe, driver enables clocks necessary to access registers (in get_temp()) and then registers thermal zone with managed-resources (devm) interface. Removal of device is not done in reversed order, because: 1. Clock will be disabled in driver remove() callback - thermal zone is still registered and accessible to users, 2. devm interface will unregister thermal zone. This leaves short window between (1) and (2) for accessing the get_temp() callback with disabled clock. Fix this by enabling clock also via devm-interface, so entire cleanup path will be in proper, reversed order. Fixes: 8454c8c09c77 ("thermal/drivers/bcm2835: Remove buggy call to thermal_of_zone_unregister") Cc: stable@vger.kernel.org Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20240709-thermal-probe-v1-1-241644e2b6e0@linaro.org Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15thermal/drivers/mediatek/lvts_thermal: Provide default calibration dataChen-Yu Tsai
On some pre-production hardware, the SoCs do not contain calibration data for the thermal sensors. The downstream drivers provide default values that sort of work, instead of having the thermal sensors not work at all. Port the default values to the upstream driver. These values are from the ChromeOS kernels, which sadly do not cover the MT7988. Signed-off-by: Chen-Yu Tsai <wenst@chromium.org> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/20240620092306.2352606-1-wenst@chromium.org Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15dt-bindings: thermal: mediatek: Fix thermal zone definitions for MT8188Julien Panis
Fix thermal zone names for consistency with the other SoCs: - GPU0 must be used as the first GPU item. - SOCx deal with audio DSP, video, and infra subsystems. The naming must be fixed "atomically" so compilation does not break. As a result, the change is made in the dt-bindings and in the LVTS driver within a single commit, despite the checkpatch warning. The definitions can be safely modified here because they are used only in the LVTS driver, which is modified accordingly, and have not yet been included in a released kernel. Fixes: 78c88534e5e1 ("dt-bindings: thermal: mediatek: Add LVTS thermal controller definition for MT8188") Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Julien Panis <jpanis@baylibre.com> Link: https://lore.kernel.org/r/20240603-mtk-thermal-mt818x-dtsi-v7-2-8c8e3c7a3643@baylibre.com Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15dt-bindings: thermal: mediatek: Fix thermal zone definition for MT8186Julien Panis
Fix a thermal zone name for consistency with the other SoCs: MFG contains GPU, the latter is more specific and must be used here. The naming must be fixed "atomically" so compilation does not break. As a result, the change is made in the dt-bindings and in the LVTS driver within a single commit, despite the checkpatch warning. The definition can be safely modified here because it is used only in the LVTS driver, which is modified accordingly, and has not yet been included in a released kernel. Fixes: a2ca202350f9 ("dt-bindings: thermal: mediatek: Add LVTS thermal controller definition for MT8186") Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Julien Panis <jpanis@baylibre.com> Link: https://lore.kernel.org/r/20240603-mtk-thermal-mt818x-dtsi-v7-1-8c8e3c7a3643@baylibre.com Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15thermal/drivers/k3_j72xx_bandgap: Implement suspend/resume supportThéo Lebrun
This add suspend-to-ram support. The derived_table is kept-as is, so the resume is only about pm_runtime_* calls and restoring the same registers as the probe. Extract the hardware initialization procedure to a function called at both probe-time & resume-time. The probe-time loop is split in two to ensure doing the hardware initialization before registering thermal zones. That ensures our callbacks cannot be called while in bad state. The 100ms delay in the hardware initialization sequence was removed. It was initially added to be sure the thresholds are programmed before enabling the interrupt, but in fact it's not needed (tested on J7200 platform). Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com> Acked-by: Keerthy <j-keerthy@ti.com> Signed-off-by: Thomas Richard <thomas.richard@bootlin.com> Link: https://lore.kernel.org/r/20240425153238.498750-1-thomas.richard@bootlin.com Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-07-15thermal/drivers/renesas/rcar: Add dependency on OFNiklas Söderlund
The R-Car thermal driver depends on OF, describe this. Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20240506154011.344324-3-niklas.soderlund+renesas@ragnatech.se
2024-07-15thermal/drivers/renesas: Group all renesas thermal drivers togetherNiklas Söderlund
Move all Renesas thermal drivers to a vendor specific directory. All drivers are moved verbatim apart from the updated include path for thermal_hwmon.h. Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20240506154011.344324-2-niklas.soderlund+renesas@ragnatech.se
2024-07-12thermal: core: Add sanity checks for polling_delay and passive_delayRafael J. Wysocki
If polling_delay is nonzero and passive_delay is greater than polling_delay, the thermal zone temperature will be updated less often when tz->passive is nonzero, which is not as expected. Make the thermal zone registration fail with -EINVAL in that case as this is a clear thermal zone configuration mistake. If polling_delay is nonzero and passive_delay is 0, which is regarded as a valid thermal zone configuration, the thermal zone will use polling except when tz->passive is nonzero. However, the expected behavior in that case is to continue temperature polling with the same delay value regardless of tz->passive, so set passive_delay to the polling_delay value then. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://patch.msgid.link/5802156.DvuYhMxLoT@rjwysocki.net
2024-07-12thermal: trip: Fold __thermal_zone_get_trip() into its callerRafael J. Wysocki
Because __thermal_zone_get_trip() is only called by thermal_zone_get_trip() now, fold the former into the latter. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/22339769.EfDdHjke4D@rjwysocki.net
2024-07-12thermal: trip: Pass trip pointer to .set_trip_temp() thermal zone callbackRafael J. Wysocki
Out of several drivers implementing the .set_trip_temp() thermal zone operation, three don't actually use the trip ID argument passed to it, two call __thermal_zone_get_trip() to get a struct thermal_trip corresponding to the given trip ID, and the other use the trip ID as an index into their own data structures with the assumption that it will always match the ordering of entries in the trips table passed to the core during thermal zone registration, which is fragile and not really guaranteed. Even though the trip IDs used by the core are in fact their indices in the trips table passed to it by the thermal zone creator, that is purely a matter of convenience and should not be relied on for correctness. For this reason, modify trip_point_temp_store() to pass a (const) trip pointer to .set_trip_temp() and adjust the drivers implementing it accordingly. This helps to simplify the drivers invoking __thermal_zone_get_trip() from their .set_trip_temp() callback functions because they will not need to do it now and the other drivers can store their internal trip indices in the priv field in struct thermal_trip and their .set_trip_temp() callback functions can get those indices from there. The intel_quark_dts thermal driver can instead use the trip type to determine the requisite trip index. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/8392906.T7Z3S40VBb@rjwysocki.net [ rjw: Add missing colon and 2 empty code lines ] [ rjw: Add missing change in imx_thermal.c and adjust the changelog ] [ rjw: Drop an unused local variable ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-07-10thermal: imx: Drop critical trip check from imx_set_trip_temp()Rafael J. Wysocki
Because the IMX thermal driver does not flag its critical trip as writable, imx_set_trip_temp() will never be invoked for it and so the critical trip check can be dropped from there. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/2272035.iZASKD2KPV@rjwysocki.net
2024-07-10Merge back thermal control material for 6.11.Rafael J. Wysocki
2024-07-09thermal: intel: hfi: Give HFI instances package scopeZhang Rui
The Intel Software Developer's Manual defines the scope of HFI (registers and memory buffer) as a package. Use package scope(*) in the software representation of an HFI instance. Using die scope in HFI instances has the effect of creating multiple conflicting instances for the same package: each instance allocates its own memory buffer and configures the same package-level registers. Specifically, only one of the allocated memory buffers can be set in the MSR_IA32_HW_FEEDBACK_PTR register. CPUs get incorrect HFI data from the table. The problem does not affect current HFI-capable platforms because they all have single-die processors. (*) We used die scope for HFI instances because there had been processors with packages enumerated as dies. None of those systems supported HFI, though. If such a system emerged, it would need to be quirked. Co-developed-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Reviewed-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Link: https://patch.msgid.link/20240703055445.125362-1-rui.zhang@intel.com [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-07-09thermal: trip: Add conversion macros for thermal trip priv fieldRafael J. Wysocki
Some drivers will need to store integers in the priv field of struct thermal_trip, so add conversion macros for doing this in a consistent way and switch over the int340x_thermal driver that already does it and uses custom conversion functions to using the new macros. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/3297884.aeNJFYEL58@rjwysocki.net
2024-07-09thermal: helpers: Introduce thermal_trip_is_bound_to_cdev()Rafael J. Wysocki
Introduce a new helper function thermal_trip_is_bound_to_cdev() for checking whether or not a given trip point has been bound to a given cooling device. The primary user of it will be the Tegra thermal driver. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/13545762.uLZWGnKmhe@rjwysocki.net
2024-07-09thermal: core: Change passive_delay and polling_delay data typeRafael J. Wysocki
It is better to use unsigned int as the data type for the passive_delay and polling_delay arguments of thermal_zone_device_register_with_trips() because they are implicitly cast to unsigned int anyway in thermal_set_delay_jiffies() and if they happen to be negative at that point, the resulting behavior may not be as desired. Update the thermal_zone_device_register_with_trips() definition accordingly. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://patch.msgid.link/5803791.DvuYhMxLoT@rjwysocki.net Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-07-08thermal: core: Fix list sorting in __thermal_zone_device_update()Rafael J. Wysocki
The order in which lists are sorted in __thermal_zone_device_update() is reverse with respect to what it should be due to a mistake in thermal_trip_notify_cmp(). Fix it and observe that it is not necessary to sort the lists in different orders. They can both be sorted in ascending order if way_down_list is walked in reverse order which allows the code to be slightly more straightforward (and less prone to silly mistakes). Fixes: 7454f2c42cce ("thermal: core: Sort trip point crossing notifications by temperature") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/12481676.O9o76ZdvQC@rjwysocki.net
2024-07-04thermal: core: Call monitor_thermal_zone() if zone temperature is invalidRafael J. Wysocki
Commit 202aa0d4bb53 ("thermal: core: Do not call handle_thermal_trip() if zone temperature is invalid") caused __thermal_zone_device_update() to return early if the current thermal zone temperature was invalid. This was done to avoid running handle_thermal_trip() and governor callbacks in that case which led to confusion. However, it went too far because monitor_thermal_zone() still needs to be called even when the zone temperature is invalid to ensure that it will be updated eventually in case thermal polling is enabled and the driver has no other means to notify the core of zone temperature changes (for example, it does not register an interrupt handler or ACPI notifier). Also if the .set_trips() zone callback is expected to set up monitoring interrupts for a thermal zone, it has to be provided with valid boundaries and that can only happen if the zone temperature is known. Accordingly, to ensure that __thermal_zone_device_update() will run again after a failing zone temperature check, make it call monitor_thermal_zone() regardless of whether or not the zone temperature is valid and make the latter schedule a thermal zone temperature update if the zone temperature is invalid even if polling is not enabled for the thermal zone. Fixes: 202aa0d4bb53 ("thermal: core: Do not call handle_thermal_trip() if zone temperature is invalid") Reported-by: Daniel Lezcano <daniel.lezcano@linaro.org> Tested-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/2764814.mvXUDI8C0e@rjwysocki.net [ rjw: Changed THERMAL_RECHECK_DELAY_MS to 250 ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-07-04thermal: core: constify 'type' in devm_thermal_of_cooling_device_register()Krzysztof Kozlowski
The 'type' string passed to thermal_of_cooling_device_register() is a 'const char *', so do the same in the devm interface. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/20240703083141.96013-1-krzysztof.kozlowski@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-07-04thermal: gov_power_allocator: Return early in manage if trip_max is NULLNícolas F. R. A. Prado
Commit da781936e7c3 ("thermal: gov_power_allocator: Allow binding without trip points") allowed the governor to bind even when trip_max is NULL. This allows a NULL pointer dereference to happen in the manage callback. Add an early return to prevent it, since the governor is expected to not do anything in this case. Fixes: da781936e7c3 ("thermal: gov_power_allocator: Allow binding without trip points") Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Link: https://patch.msgid.link/20240702-power-allocator-null-trip-max-v1-1-47a60dc55414@collabora.com Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-07-04thermal: uniphier: Use thermal_zone_for_each_trip() for walking trip pointsRafael J. Wysocki
It is generally inefficient to iterate over trip indices and call thermal_zone_get_trip() every time to get the struct thermal_trip corresponding to the given trip index, so modify the uniphier thermal driver to use thermal_zone_for_each_trip() for walking trips. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> Link: https://patch.msgid.link/2148114.bB369e8A3T@kreacher [ rjw: Add missing return statement, remove unused local variable ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-27Merge back thermal control material for v6.11.Rafael J. Wysocki
2024-06-25Merge branch 'thermal-core'Rafael J. Wysocki
Merge thermal core changes for v6.11: - Fix and clean up several minor shortcomings in thermal debug (Rafael Wysocki). - Rename __thermal_zone_set_trips() to thermal_zone_set_trips() and make it use trip thresholds (Rafael Wysocki). - Use READ_ONCE() for lockless access to trip temperature and hysteresis (Rafael Wysocki). - Drop unnecessary cooling device target state checks from the Bang-Bang thermal governor (Rafael Wysocki). - Avoid invoking thermal governor .trip_crossed() callback for critical and hot trip points (Rafael Wysocki). * thermal-core: thermal: core: Avoid calling .trip_crossed() for critical and hot trips thermal: gov_bang_bang: Drop unnecessary cooling device target state checks thermal: trip: Use READ_ONCE() for lockless access to trip properties thermal: trip: Make thermal_zone_set_trips() use trip thresholds thermal: trip: Rename __thermal_zone_set_trips() to thermal_zone_set_trips() thermal: trip: Use common set of trip type names thermal/debugfs: Move some statements from under thermal_dbg->lock thermal/debugfs: Compute maximum temperature for mitigation episode as a whole thermal/debugfs: Adjust check for trips without statistics in tze_seq_show() thermal/debugfs: Fix up units in "mitigations" files thermal/debugfs: Print mitigation timestamp value in milliseconds thermal/debugfs: Do not extend mitigation episodes beyond system resume thermal/debugfs: Use helper to update trip point overstepping duration
2024-06-25thermal: gov_step_wise: Go straight to instance->lower when mitigation is overRafael J. Wysocki
Commit b6846826982b ("thermal: gov_step_wise: Restore passive polling management") attempted to fix a Step-Wise thermal governor issue introduced by commit 042a3d80f118 ("thermal: core: Move passive polling management to the core"), which caused the governor to leave cooling devices in high states, by partially reverting that commit. However, this turns out to be insufficient on some systems due to interactions between the governor code restored by commit b6846826982b and the passive polling management in the thermal core. For this reason, revert commit b6846826982b and make the governor set the target cooling device state to the "lower" one as soon as the zone temperature falls below the threshold of the trip point corresponding to the given thermal instance, which means that thermal mitigation is not necessary any more. Before this change the "lower" cooling device state would be reached in steps through the passive polling mechanism which was questionable for three reasons: (1) cooling device were kept in high states when that was not necessary (and it could adversely impact performance), (2) it only worked for thermal zones with nonzero passive_delay_jiffies value, and (3) passive polling belongs to the core and should not be hijacked by governors for their internal purposes. Fixes: b6846826982b ("thermal: gov_step_wise: Restore passive polling management") Closes: https://lore.kernel.org/linux-pm/6759ce9f-281d-4fcd-bb4c-b784a1cc5f6e@oldschoolsolutions.biz Reported-by: Jens Glathe <jens.glathe@oldschoolsolutions.biz> Tested-by: Jens Glathe <jens.glathe@oldschoolsolutions.biz> Link: https://patch.msgid.link/12464461.O9o76ZdvQC@rjwysocki.net Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Steev Klimaszewski <steev@kali.org> Tested-by: Johan Hovold <johan+linaro@kernel.org>
2024-06-21Merge tag 'thermal-6.10-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull thermal control fixes from Rafael Wysocki: "These fix the Mediatek lvts_thermal driver, the Intel int340x driver, and the thermal core (two issues related to system suspend). Specifics: - Remove the filtered mode for mt8188 from lvts_thermal as it is not supported on this platform and fail the lvts_thermal initialization when the golden temperature is zero as that means the efuse data is not correctly set (Julien Panis) - Update the processor_thermal part of the Intel int340x driver to support shared interrupts as the processor thermal device interrupt may in fact be shared with PCI devices (Srinivas Pandruvada) - Synchronize the suspend-prepare and post-suspend actions of the thermal PM notifier to avoid a destructive race condition and change the priority of that notifier to the minimum to avoid interference between the work items spawned by it and the other PM notifiers during system resume (Rafael Wysocki)" * tag 'thermal-6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: thermal: int340x: processor_thermal: Support shared interrupts thermal: core: Change PM notifier priority to the minimum thermal: core: Synchronize suspend-prepare and post-suspend actions thermal/drivers/mediatek/lvts_thermal: Return error in case of invalid efuse data thermal/drivers/mediatek/lvts_thermal: Remove filtered mode for mt8188
2024-06-21thermal: intel: int340x: Enable WLT and power floor support for Lunar LakeSrinivas Pandruvada
Add feature flgas for WLT and power floor for Lunar Lake. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/20240619172109.497639-4-srinivas.pandruvada@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-21thermal: intel: int340x: Support MSI interrupt for Lunar LakeSrinivas Pandruvada
The legacy PCI interrupt is no longer supported for processor thermal device on Lunar Lake. The support is via MSI. Add feature PROC_THERMAL_FEATURE_MSI_SUPPORT to support MSI feature per generation. Define this feature for Lunar Lake processors. There are 4 MSI sources: 0 - Package thermal 1 - DDR Thermal 2 - Power floor interrupt 3 - Workload type hint On interrupt, check the source and call the corresponding handler. Here don't need to call proc_thermal_check_wt_intr() and proc_thermal_check_power_floor_intr() to check if the interrupt is for those sources as there is a dedicated MSI interrupt. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/20240619172109.497639-3-srinivas.pandruvada@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-21thermal: intel: int340x: Remove unnecessary calls to free irqSrinivas Pandruvada
Remove calls to devm_free_irq() and pci_free_irq_vectors(). They will be called on driver release anyway. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/20240619172109.497639-2-srinivas.pandruvada@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-21thermal: intel: int340x: Add DLVR support for Lunar LakeSrinivas Pandruvada
Add support for DLVR (Digital Linear Voltage Regulator) for Lunar Lake. There are no new sysfs attributes or difference in operation compared to prior generations. MMIO offset and bit positions are changed compared to Meteor Lake processors. Also for two attributes dlvr_frequency_mhz and dlvr_frequency_select, the value presented or accepted by the firmware is not raw frequency value but an index. For example: RFI_FREQ_SELECT and RFI_FREQ : 0 DLVR freq point 2227.2 MHz : 1 DLVR freq point 2140 MHz Hence create a mapping table for Lunar Lake to map user space values to the firmware accepted values. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/20240619124600.491168-4-srinivas.pandruvada@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-21thermal: intel: int340x: Capability to map user space to firmware valuesSrinivas Pandruvada
To ensure compatibility between user inputs and firmware requirements, a conversion mechanism is necessary for certain attributes. For instance, on some platforms, the DLVR frequency must be translated into a predefined index before being communicated to the firmware. On Lunar Lake platform: RFI_FREQ_SELECT and RFI_FREQ: Index 0 corresponds to a DLVR frequency of 2227.2 MHz Index 1 corresponds to a DLVR frequency of 2140 MHz Introduce a feature that enables the conversion of values between user space inputs and firmware-accepted formats. This feature would also facilitate the reverse process, converting firmware values back into user friendly display values. To support this functionality, a model-specific mapping table will be utilized. When available, this table will provide the necessary translations between user space values and firmware values, ensuring seamless communication and accurate settings. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/20240619124600.491168-3-srinivas.pandruvada@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-21thermal: intel: int340x: Cleanup of DLVR sysfs on driver removeSrinivas Pandruvada
When only DLVR enabled without DVFS, during driver remove, proc_thermal_rfim_remove() is not called. Hence the DLVR sysfs is not deleted. On Lunar Lake DLVR is enabled without DVFS, hence this issue can be reproduced. Check also PROC_THERMAL_FEATURE_DLVR to call proc_thermal_rfim_remove(). Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/20240619124600.491168-2-srinivas.pandruvada@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-21thermal: intel: intel_tcc_cooling: Use a model-specific bitmask for TCC offsetRicardo Neri
The TCC offset field in the register MSR_TEMPERATURE_TARGET is not architectural. The TCC library provides a model-specific bitmask. Use it to determine the maximum TCC offset. Suggested-by: Zhang Rui <rui.zhang@intel.com> Reviewed-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Link: https://patch.msgid.link/20240614211606.5896-3-ricardo.neri-calderon@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-21thermal: intel: intel_tcc: Add model checks for temperature registersRicardo Neri
The register MSR_TEMPERATURE_TARGET is not architectural. Its fields may be defined differently for each processor model. TCC_OFFSET is an example of such case. Despite being specified as architectural, the registers IA32_[PACKAGE]_ THERM_STATUS have become model-specific: in recent processors, the digital temperature readout uses bits [23:16] whereas the Intel Software Developer's manual specifies bits [22:16]. Create an array of processor models and their bitmasks for TCC_OFFSET and the digital temperature readout fields. Do not include recent processors. Instead, use the bitmasks of these recent processors as default. Use these model-specific bitmasks when reading TCC_OFFSET or the temperature sensors. Initialize a model-specific data structure during subsys_initcall() to have it ready when thermal drivers are loaded. Expose the new interface intel_tcc_get_offset_mask(). The intel_tcc_cooling driver will use it. Reviewed-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Link: https://patch.msgid.link/20240614211606.5896-2-ricardo.neri-calderon@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-21Merge back thermal control changes related to Intel platforms for v6.11Rafael J. Wysocki
2024-06-21Merge back thermal control material for v6.11.Rafael J. Wysocki