summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2024-11-27Merge tag 'riscv-for-linus-6.13-mw1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux into HEAD RISC-V Paches for the 6.13 Merge Window, Part 1 * Support for pointer masking in userspace, * Support for probing vector misaligned access performance. * Support for qspinlock on systems with Zacas and Zabha.
2024-11-27i2c: of-prober: Add GPIO support to simple helpersChen-Yu Tsai
Add GPIO support to the simple helpers for the I2C OF component prober. Components that the prober intends to probe likely require their regulator supplies be enabled, and GPIOs be toggled to enable them or bring them out of reset before they will respond to probe attempts. Regulator supplies were handled in the previous patch. The assumption is that the same class of components to be probed are always connected in the same fashion with the same regulator supply and GPIO. The names may vary due to binding differences, but the physical layout does not change. This supports at most one GPIO pin. The user must specify the GPIO name, the polarity, and the amount of time to wait after the GPIO is toggled. Devices with more than one GPIO pin likely require specific power sequencing beyond what generic code can easily support. Signed-off-by: Chen-Yu Tsai <wenst@chromium.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Tested-by: Andrey Skvortsov <andrej.skvortzov@gmail.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
2024-11-27i2c: of-prober: Add simple helpers for regulator supportChen-Yu Tsai
Add helpers to do regulator management for the I2C OF component prober. Components that the prober intends to probe likely require their regulator supplies be enabled, and GPIOs be toggled to enable them or bring them out of reset before they will respond to probe attempts. GPIOs will be handled in the next patch. The assumption is that the same class of components to be probed are always connected in the same fashion with the same regulator supply and GPIO. The names may vary due to binding differences, but the physical layout does not change. This set of helpers supports at most one regulator supply. The user must specify the node from which the supply is retrieved. The supply name and the amount of time to wait after the supply is enabled are also given by the user. Signed-off-by: Chen-Yu Tsai <wenst@chromium.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
2024-11-27i2c: Introduce OF component probe functionChen-Yu Tsai
Some devices are designed and manufactured with some components having multiple drop-in replacement options. These components are often connected to the mainboard via ribbon cables, having the same signals and pin assignments across all options. These may include the display panel and touchscreen on laptops and tablets, and the trackpad on laptops. Sometimes which component option is used in a particular device can be detected by some firmware provided identifier, other times that information is not available, and the kernel has to try to probe each device. This change attempts to make the "probe each device" case cleaner. The current approach is to have all options added and enabled in the device tree. The kernel would then bind each device and run each driver's probe function. This works, but has been broken before due to the introduction of asynchronous probing, causing multiple instances requesting "shared" resources, such as pinmuxes, GPIO pins, interrupt lines, at the same time, with only one instance succeeding. Work arounds for these include moving the pinmux to the parent I2C controller, using GPIO hogs or pinmux settings to keep the GPIO pins in some fixed configuration, and requesting the interrupt line very late. Such configurations can be seen on the MT8183 Krane Chromebook tablets, and the Qualcomm sc8280xp-based Lenovo Thinkpad 13S. Instead of this delicate dance between drivers and device tree quirks, this change introduces a simple I2C component probe function. For a given class of devices on the same I2C bus, it will go through all of them, doing a simple I2C read transfer and see which one of them responds. It will then enable the device that responds. This requires some minor modifications in the existing device tree. The status for all the device nodes for the component options must be set to "fail-needs-probe". This makes it clear that some mechanism is needed to enable one of them, and also prevents the prober and device drivers running at the same time. Signed-off-by: Chen-Yu Tsai <wenst@chromium.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
2024-11-27of: base: Add for_each_child_of_node_with_prefix()Chen-Yu Tsai
There are cases where drivers would go through child device nodes and operate on only the ones whose node name starts with a given prefix. Provide a helper for these users. This will mainly be used in a subsequent patch that implements a hardware component prober for I2C busses. Signed-off-by: Chen-Yu Tsai <wenst@chromium.org> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
2024-11-27of: dynamic: Add of_changeset_update_prop_stringChen-Yu Tsai
Add a helper function to add string property updates to an OF changeset. This is similar to of_changeset_add_prop_string(), but instead of adding the property (and failing if it exists), it will update the property. This shall be used later in the DT hardware prober. Signed-off-by: Chen-Yu Tsai <wenst@chromium.org> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
2024-11-26Merge tag 'i3c/for-6.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux Pull i3c updates from Alexandre Belloni: "Core: - avoid possible deadlock on probe - ensured preferred address is used on hot-join Drivers: - dw: add AMD I3C controller support - mipi-i3c-hci: fix SETDASA, DMA interrupts fixes - svc: many fixes for IBI and hotjoin" * tag 'i3c/for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux: i3c: Use i3cdev->desc->info instead of calling i3c_device_get_info() to avoid deadlock i3c: mipi-i3c-hci: Support SETDASA CCC i3c: dw: Add quirk to address OD/PP timing issue on AMD platform i3c: dw: Add support for AMDI0015 ACPI ID i3c: master: svc: Modify enabled_events bit 7:0 to act as IBI enable counter i3c: Document I3C_ADDR_SLOT_EXT_STATUS_MASK i3c: master: svc: Fix pm_runtime_set_suspended() with runtime pm enabled i3c: mipi-i3c-hci: Handle interrupts according to current specifications i3c: mipi-i3c-hci: Mask ring interrupts before ring stop request i3c: master: Fix miss free init_dyn_addr at i3c_master_put_i3c_addrs() i3c: master: Remove i3c_dev_disable_ibi_locked(olddev) on device hotjoin i3c: master: svc: fix possible assignment of the same address to two devices i3c: master: svc: wait for Manual ACK/NACK Done before next step i3c: master: svc: use spin_lock_irqsave at svc_i3c_master_ibi_work() i3c: master: svc: need check IBIWON for dynamic address assignment i3c: master: svc: manually emit NACK/ACK for hotjoin i3c: master: svc: use repeat start when IBI WIN happens i3c: master: Fix dynamic address leak when 'assigned-address' is present i3c: master: Extend address status bit to 4 and add I3C_ADDR_SLOT_EXT_DESIRED i3c: master: Replace hard code 2 with macro I3C_ADDR_SLOT_STATUS_BITS
2024-11-26Merge tag 'pci-v6.13-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull PCI updates from Bjorn Helgaas: "Enumeration: - Make pci_stop_dev() and pci_destroy_dev() safe so concurrent callers can't stop a device multiple times, even as we migrate from the global pci_rescan_remove_lock to finer-grained locking (Keith Busch) - Improve pci_walk_bus() implementation by making it recursive and moving locking up to avoid need for a 'locked' parameter (Keith Busch) - Unexport pci_walk_bus_locked(), which is only used internally by the PCI core (Keith Busch) - Detect some Thunderbolt chips that are built-in and hence 'trustworthy' by a heuristic since the 'ExternalFacingPort' and 'usb4-host-interface' ACPI properties are not quite enough (Esther Shimanovich) Resource management: - Use PCI bus addresses (not CPU addresses) in 'ranges' properties when building dynamic DT nodes so systems where PCI and CPU addresses differ work correctly (Andrea della Porta) - Tidy resource sizing and assignment with helpers to reduce redundancy (Ilpo Järvinen) - Improve pdev_sort_resources() 'bogus alignment' warning to be more specific (Ilpo Järvinen) Driver binding: - Convert driver .remove_new() callbacks to .remove() again to finish the conversion from returning 'int' to being 'void' (Sergio Paracuellos) - Export pcim_request_all_regions(), a managed interface to request all BARs (Philipp Stanner) - Replace pcim_iomap_regions_request_all() with pcim_request_all_regions(), and pcim_iomap_table()[n] with pcim_iomap(n), in the following drivers: ahci, crypto qat, crypto octeontx2, intel_th, iwlwifi, ntb idt, serial rp2, ALSA korg1212 (Philipp Stanner) - Remove the now unused pcim_iomap_regions_request_all() (Philipp Stanner) - Export pcim_iounmap_region(), a managed interface to unmap and release a PCI BAR (Philipp Stanner) - Replace pcim_iomap_regions(mask) with pcim_iomap_region(n), and pcim_iounmap_regions(mask) with pcim_iounmap_region(n), in the following drivers: fpga dfl-pci, block mtip32xx, gpio-merrifield, cavium (Philipp Stanner) Error handling: - Add sysfs 'reset_subordinate' to reset the entire hierarchy below a bridge; previously Secondary Bus Reset could only be used when there was a single device below a bridge (Keith Busch) - Warn if we reset a running device where the driver didn't register pci_error_handlers notification callbacks (Keith Busch) ASPM: - Disable ASPM L1 before touching L1 PM Substates to follow the spec closer and avoid a CPU load timeout on some platforms (Ajay Agarwal) - Set devices below Intel VMD to D0 before enabling ASPM L1 Substates as required per spec for all L1 Substates changes (Jian-Hong Pan) Power management: - Enable starfive controller runtime PM before probing host bridge (Mayank Rana) - Enable runtime power management for host bridges (Krishna chaitanya chundru) Power control: - Use of_platform_device_create() instead of of_platform_populate() to create pwrctl platform devices so we can control it based on the child nodes (Manivannan Sadhasivam) - Create pwrctrl platform devices only if there's a relevant power supply property (Manivannan Sadhasivam) - Add device link from the pwrctl supplier to the PCI dev to ensure pwrctl drivers are probed before the PCI dev driver; this avoids a race where pwrctl could change device power state while the PCI driver was active (Manivannan Sadhasivam) - Find pwrctl device for removal with of_find_device_by_node() instead of searching all children of the parent (Manivannan Sadhasivam) - Rename 'pwrctl' to 'pwrctrl' to match new bandwidth controller ('bwctrl') and hotplug files (Bjorn Helgaas) Bandwidth control: - Add read/modify/write locking for Link Control 2, which is used to manage Link speed (Ilpo Järvinen) - Extract Link Bandwidth Management Status check into pcie_lbms_seen(), where it can be shared between the bandwidth controller and quirks that use it to help retrain failed links (Ilpo Järvinen) - Re-add Link Bandwidth notification support with updates to address the reasons it was previously reverted (Alexandru Gagniuc, Ilpo Järvinen) - Add pcie_set_target_speed() and related functionality so drivers can manage PCIe Link speed based on thermal or other constraints (Ilpo Järvinen) - Add a thermal cooling driver to throttle PCIe Links via the existing thermal management framework (Ilpo Järvinen) - Add a userspace selftest for the PCIe bandwidth controller (Ilpo Järvinen) PCI device hotplug: - Add hotplug controller driver for Marvell OCTEON multi-function device where function 0 has a management console interface to enable/disable and provision various personalities for the other functions (Shijith Thotton) - Retain a reference to the pci_bus for the lifetime of a pci_slot to avoid a use-after-free when the thunderbolt driver resets USB4 host routers on boot, causing hotplug remove/add of downstream docks or other devices (Lukas Wunner) - Remove unused cpcihp struct cpci_hp_controller_ops.hardware_test (Guilherme Giacomo Simoes) - Remove unused cpqphp struct ctrl_dbg.ctrl (Christophe JAILLET) - Use pci_bus_read_dev_vendor_id() instead of hand-coded presence detection in cpqphp (Ilpo Järvinen) - Simplify cpqphp enumeration, which is already simple-minded and doesn't handle devices below hot-added bridges (Ilpo Järvinen) Virtualization: - Add ACS quirk for Wangxun FF5xxx NICs, which don't advertise an ACS capability but do isolate functions as though PCI_ACS_RR and PCI_ACS_CR were set, so the functions can be in independent IOMMU groups (Mengyuan Lou) TLP Processing Hints (TPH): - Add and document TLP Processing Hints (TPH) support so drivers can enable and disable TPH and the kernel can save/restore TPH configuration (Wei Huang) - Add TPH Steering Tag support so drivers can retrieve Steering Tag values associated with specific CPUs via an ACPI _DSM to improve performance by directing DMA writes closer to their consumers (Wei Huang) Data Object Exchange (DOE): - Wait up to 1 second for DOE Busy bit to clear before writing a request to the mailbox to avoid failures if the mailbox is still busy from a previous transfer (Gregory Price) Endpoint framework: - Skip attempts to allocate from endpoint controller memory window if the requested size is larger than the window (Damien Le Moal) - Add and document pci_epc_mem_map() and pci_epc_mem_unmap() to handle controller-specific size and alignment constraints, and add test cases to the endpoint test driver (Damien Le Moal) - Implement dwc pci_epc_ops.align_addr() so pci_epc_mem_map() can observe DWC-specific alignment requirements (Damien Le Moal) - Synchronously cancel command handler work in endpoint test before cleaning up DMA and BARs (Damien Le Moal) - Respect endpoint page size in dw_pcie_ep_align_addr() (Niklas Cassel) - Use dw_pcie_ep_align_addr() in dw_pcie_ep_raise_msi_irq() and dw_pcie_ep_raise_msix_irq() instead of open coding the equivalent (Niklas Cassel) - Avoid NULL dereference if Modem Host Interface Endpoint lacks 'mmio' DT property (Zhongqiu Han) - Release PCI domain ID of Endpoint controller parent (not controller itself) and before unregistering the controller, to avoid use-after-free (Zijun Hu) - Clear secondary (not primary) EPC in pci_epc_remove_epf() when removing the secondary controller associated with an NTB (Zijun Hu) Cadence PCIe controller driver: - Lower severity of 'phy-names' message (Bartosz Wawrzyniak) Freescale i.MX6 PCIe controller driver: - Fix suspend/resume support on i.MX6QDL, which has a hardware erratum that prevents use of L2 (Stefan Eichenberger) Intel VMD host bridge driver: - Add 0xb60b and 0xb06f Device IDs for client SKUs (Nirmal Patel) MediaTek PCIe Gen3 controller driver: - Update mediatek-gen3 DT binding to require the exact number of clocks for each SoC (Fei Shao) - Add support for DT 'max-link-speed' and 'num-lanes' properties to restrict the link speed and width (AngeloGioacchino Del Regno) Microchip PolarFlare PCIe controller driver: - Add DT and driver support for using either of the two PolarFire Root Ports (Conor Dooley) NVIDIA Tegra194 PCIe controller driver: - Move endpoint controller cleanups that depend on refclk from the host to the notifier that tells us the host has deasserted PERST#, when refclk should be valid (Manivannan Sadhasivam) Qualcomm PCIe controller driver: - Add qcom SAR2130P DT binding with an additional clock (Dmitry Baryshkov) - Enable MSI interrupts if 'global' IRQ is supported, since a previous commit unintentionally masked them (Manivannan Sadhasivam) - Move endpoint controller cleanups that depend on refclk from the host to the notifier that tells us the host has deasserted PERST#, when refclk should be valid (Manivannan Sadhasivam) - Add DT binding and driver support for IPQ9574, with Synopsys IP v5.80a and Qcom IP 1.27.0 (devi priya) - Move the OPP "operating-points-v2" table from the qcom,pcie-sm8450.yaml DT binding to qcom,pcie-common.yaml, where it can be used by other Qcom platforms (Qiang Yu) - Add 'global' SPI interrupt for events like link-up, link-down to qcom,pcie-x1e80100 DT binding so we can start enumeration when the link comes up (Qiang Yu) - Disable ASPM L0s for qcom,pcie-x1e80100 since the PHY is not tuned to support this (Qiang Yu) - Add ops_1_21_0 for SC8280X family SoC, which doesn't use the 'iommu-map' DT property and doesn't need BDF-to-SID translation (Qiang Yu) Rockchip PCIe controller driver: - Define ROCKCHIP_PCIE_AT_SIZE_ALIGN to replace magic 256 endpoint .align value (Damien Le Moal) - When unmapping an endpoint window, compute the region index instead of searching for it, and verify that the address was mapped (Damien Le Moal) - When mapping an endpoint window, verify that the address hasn't been mapped already (Damien Le Moal) - Implement pci_epc_ops.align_addr() for rockchip-ep (Damien Le Moal) - Fix MSI IRQ data mapping to observe the alignment constraint, which fixes intermittent page faults in memcpy_toio() and memcpy_fromio() (Damien Le Moal) - Rename rockchip_pcie_parse_ep_dt() to rockchip_pcie_ep_get_resources() for consistency with similar DT interfaces (Damien Le Moal) - Skip the unnecessary link train in rockchip_pcie_ep_probe() and do it only in the endpoint start operation (Damien Le Moal) - Implement pci_epc_ops.stop_link() to disable link training and controller configuration (Damien Le Moal) - Attempt link training at 5 GT/s when both partners support it (Damien Le Moal) - Add a handler for PERST# signal so we can detect host-initiated resets and start link training after PERST# is deasserted (Damien Le Moal) Synopsys DesignWare PCIe controller driver: - Clear outbound address on unmap so dw_pcie_find_index() won't match an ATU index that was already unmapped (Damien Le Moal) - Use of_property_present() instead of of_property_read_bool() when testing for presence of non-boolean DT properties (Rob Herring) - Advertise 1MB size if endpoint supports Resizable BARs, which was inadvertently lost in v6.11 (Niklas Cassel) TI J721E PCIe driver: - Add PCIe support for J722S SoC (Siddharth Vadapalli) - Delay PCIE_T_PVPERL_MS (100 ms), not just PCIE_T_PERST_CLK_US (100 us), before deasserting PERST# to ensure power and refclk are stable (Siddharth Vadapalli) TI Keystone PCIe controller driver: - Set the 'ti,keystone-pcie' mode so v3.65a devices work in Root Complex mode (Kishon Vijay Abraham I) - Try to avoid unrecoverable SError for attempts to issue config transactions when the link is down; this is racy but the best we can do (Kishon Vijay Abraham I) Miscellaneous: - Reorganize kerneldoc parameter names to match order in function signature (Julia Lawall) - Fix sysfs reset_method_store() memory leak (Todd Kjos) - Simplify pci_create_slot() (Ilpo Järvinen) - Fix incorrect printf format specifiers in pcitest (Luo Yifan)" * tag 'pci-v6.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (127 commits) PCI: rockchip-ep: Handle PERST# signal in EP mode PCI: rockchip-ep: Improve link training PCI: rockship-ep: Implement the pci_epc_ops::stop_link() operation PCI: rockchip-ep: Refactor endpoint link training enable PCI: rockchip-ep: Refactor rockchip_pcie_ep_probe() MSI-X hiding PCI: rockchip-ep: Refactor rockchip_pcie_ep_probe() memory allocations PCI: rockchip-ep: Rename rockchip_pcie_parse_ep_dt() PCI: rockchip-ep: Fix MSI IRQ data mapping PCI: rockchip-ep: Implement the pci_epc_ops::align_addr() operation PCI: rockchip-ep: Improve rockchip_pcie_ep_map_addr() PCI: rockchip-ep: Improve rockchip_pcie_ep_unmap_addr() PCI: rockchip-ep: Use a macro to define EP controller .align feature PCI: rockchip-ep: Fix address translation unit programming PCI/pwrctrl: Rename pwrctrl functions and structures PCI/pwrctrl: Rename pwrctl files to pwrctrl PCI/pwrctl: Remove pwrctl device without iterating over all children of pwrctl parent PCI/pwrctl: Ensure that pwrctl drivers are probed before PCI client drivers PCI/pwrctl: Create pwrctl device only if at least one power supply is present PCI/pwrctl: Use of_platform_device_create() to create pwrctl devices tools: PCI: Fix incorrect printf format specifiers ...
2024-11-27Rename .data.once to .data..once to fix resetting WARN*_ONCEMasahiro Yamada
Commit b1fca27d384e ("kernel debug: support resetting WARN*_ONCE") added support for clearing the state of once warnings. However, it is not functional when CONFIG_LD_DEAD_CODE_DATA_ELIMINATION or CONFIG_LTO_CLANG is enabled, because .data.once matches the .data.[0-9a-zA-Z_]* pattern in the DATA_MAIN macro. Commit cb87481ee89d ("kbuild: linker script do not match C names unless LD_DEAD_CODE_DATA_ELIMINATION is configured") was introduced to suppress the issue for the default CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=n case, providing a minimal fix for stable backporting. We were aware this did not address the issue for CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y. The plan was to apply correct fixes and then revert cb87481ee89d. [1] Seven years have passed since then, yet the #ifdef workaround remains in place. Meanwhile, commit b1fca27d384e introduced the .data.once section, and commit dc5723b02e52 ("kbuild: add support for Clang LTO") extended the #ifdef. Using a ".." separator in the section name fixes the issue for CONFIG_LD_DEAD_CODE_DATA_ELIMINATION and CONFIG_LTO_CLANG. [1]: https://lore.kernel.org/linux-kbuild/CAK7LNASck6BfdLnESxXUeECYL26yUDm0cwRZuM4gmaWUkxjL5g@mail.gmail.com/ Fixes: b1fca27d384e ("kernel debug: support resetting WARN*_ONCE") Fixes: dc5723b02e52 ("kbuild: add support for Clang LTO") Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-11-27Rename .data.unlikely to .data..unlikelyMasahiro Yamada
Commit 7ccaba5314ca ("consolidate WARN_...ONCE() static variables") was intended to collect all .data.unlikely sections into one chunk. However, this has not worked when CONFIG_LD_DEAD_CODE_DATA_ELIMINATION or CONFIG_LTO_CLANG is enabled, because .data.unlikely matches the .data.[0-9a-zA-Z_]* pattern in the DATA_MAIN macro. Commit cb87481ee89d ("kbuild: linker script do not match C names unless LD_DEAD_CODE_DATA_ELIMINATION is configured") was introduced to suppress the issue for the default CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=n case, providing a minimal fix for stable backporting. We were aware this did not address the issue for CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y. The plan was to apply correct fixes and then revert cb87481ee89d. [1] Seven years have passed since then, yet the #ifdef workaround remains in place. Using a ".." separator in the section name fixes the issue for CONFIG_LD_DEAD_CODE_DATA_ELIMINATION and CONFIG_LTO_CLANG. [1]: https://lore.kernel.org/linux-kbuild/CAK7LNASck6BfdLnESxXUeECYL26yUDm0cwRZuM4gmaWUkxjL5g@mail.gmail.com/ Fixes: cb87481ee89d ("kbuild: linker script do not match C names unless LD_DEAD_CODE_DATA_ELIMINATION is configured") Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-11-27kbuild: Add Propeller configuration for kernel buildRong Xu
Add the build support for using Clang's Propeller optimizer. Like AutoFDO, Propeller uses hardware sampling to gather information about the frequency of execution of different code paths within a binary. This information is then used to guide the compiler's optimization decisions, resulting in a more efficient binary. The support requires a Clang compiler LLVM 19 or later, and the create_llvm_prof tool (https://github.com/google/autofdo/releases/tag/v0.30.1). This commit is limited to x86 platforms that support PMU features like LBR on Intel machines and AMD Zen3 BRS. Here is an example workflow for building an AutoFDO+Propeller optimized kernel: 1) Build the kernel on the host machine, with AutoFDO and Propeller build config CONFIG_AUTOFDO_CLANG=y CONFIG_PROPELLER_CLANG=y then $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<autofdo_profile> “<autofdo_profile>” is the profile collected when doing a non-Propeller AutoFDO build. This step builds a kernel that has the same optimization level as AutoFDO, plus a metadata section that records basic block information. This kernel image runs as fast as an AutoFDO optimized kernel. 2) Install the kernel on test/production machines. 3) Run the load tests. The '-c' option in perf specifies the sample event period. We suggest using a suitable prime number, like 500009, for this purpose. For Intel platforms: $ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> \ -o <perf_file> -- <loadtest> For AMD platforms: The supported system are: Zen3 with BRS, or Zen4 with amd_lbr_v2 # To see if Zen3 support LBR: $ cat proc/cpuinfo | grep " brs" # To see if Zen4 support LBR: $ cat proc/cpuinfo | grep amd_lbr_v2 # If the result is yes, then collect the profile using: $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a \ -N -b -c <count> -o <perf_file> -- <loadtest> 4) (Optional) Download the raw perf file to the host machine. 5) Generate Propeller profile: $ create_llvm_prof --binary=<vmlinux> --profile=<perf_file> \ --format=propeller --propeller_output_module_name \ --out=<propeller_profile_prefix>_cc_profile.txt \ --propeller_symorder=<propeller_profile_prefix>_ld_profile.txt “create_llvm_prof” is the profile conversion tool, and a prebuilt binary for linux can be found on https://github.com/google/autofdo/releases/tag/v0.30.1 (can also build from source). "<propeller_profile_prefix>" can be something like "/home/user/dir/any_string". This command generates a pair of Propeller profiles: "<propeller_profile_prefix>_cc_profile.txt" and "<propeller_profile_prefix>_ld_profile.txt". 6) Rebuild the kernel using the AutoFDO and Propeller profile files. CONFIG_AUTOFDO_CLANG=y CONFIG_PROPELLER_CLANG=y and $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<autofdo_profile> \ CLANG_PROPELLER_PROFILE_PREFIX=<propeller_profile_prefix> Co-developed-by: Han Shen <shenhan@google.com> Signed-off-by: Han Shen <shenhan@google.com> Signed-off-by: Rong Xu <xur@google.com> Suggested-by: Sriraman Tallam <tmsriram@google.com> Suggested-by: Krzysztof Pszeniczny <kpszeniczny@google.com> Suggested-by: Nick Desaulniers <ndesaulniers@google.com> Suggested-by: Stephane Eranian <eranian@google.com> Tested-by: Yonghong Song <yonghong.song@linux.dev> Tested-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-11-27AutoFDO: Enable machine function split optimization for AutoFDORong Xu
Enable the machine function split optimization for AutoFDO in Clang. Machine function split (MFS) is a pass in the Clang compiler that splits a function into hot and cold parts. The linker groups all cold blocks across functions together. This decreases hot code fragmentation and improves iCache and iTLB utilization. MFS requires a profile so this is enabled only for the AutoFDO builds. Co-developed-by: Han Shen <shenhan@google.com> Signed-off-by: Han Shen <shenhan@google.com> Signed-off-by: Rong Xu <xur@google.com> Suggested-by: Sriraman Tallam <tmsriram@google.com> Suggested-by: Krzysztof Pszeniczny <kpszeniczny@google.com> Tested-by: Yonghong Song <yonghong.song@linux.dev> Tested-by: Yabin Cui <yabinc@google.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-11-27AutoFDO: Enable -ffunction-sections for the AutoFDO buildRong Xu
Enable -ffunction-sections by default for the AutoFDO build. With -ffunction-sections, the compiler places each function in its own section named .text.function_name instead of placing all functions in the .text section. In the AutoFDO build, this allows the linker to utilize profile information to reorganize functions for improved utilization of iCache and iTLB. Co-developed-by: Han Shen <shenhan@google.com> Signed-off-by: Han Shen <shenhan@google.com> Signed-off-by: Rong Xu <xur@google.com> Suggested-by: Sriraman Tallam <tmsriram@google.com> Tested-by: Yonghong Song <yonghong.song@linux.dev> Tested-by: Yabin Cui <yabinc@google.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-11-27vmlinux.lds.h: Add markers for text_unlikely and text_hot sectionsRong Xu
Add markers like __hot_text_start, __hot_text_end, __unlikely_text_start, and __unlikely_text_end which will be included in System.map. These markers indicate how the compiler groups functions, providing valuable information to developers about the layout and optimization of the code. Co-developed-by: Han Shen <shenhan@google.com> Signed-off-by: Han Shen <shenhan@google.com> Signed-off-by: Rong Xu <xur@google.com> Suggested-by: Sriraman Tallam <tmsriram@google.com> Tested-by: Yonghong Song <yonghong.song@linux.dev> Tested-by: Yabin Cui <yabinc@google.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-11-27vmlinux.lds.h: Adjust symbol ordering in text output sectionRong Xu
When the -ffunction-sections compiler option is enabled, each function is placed in a separate section named .text.function_name rather than putting all functions in a single .text section. However, using -function-sections can cause problems with the linker script. The comments included in include/asm-generic/vmlinux.lds.h note these issues.: “TEXT_MAIN here will match .text.fixup and .text.unlikely if dead code elimination is enabled, so these sections should be converted to use ".." first.” It is unclear whether there is a straightforward method for converting a suffix to "..". This patch modifies the order of subsections within the text output section. Specifically, it changes current order: .text.hot, .text, .text_unlikely, .text.unknown, .text.asan to the new order: .text.asan, .text.unknown, .text_unlikely, .text.hot, .text Here is the rationale behind the new layout: The majority of the code resides in three sections: .text.hot, .text, and .text.unlikely, with .text.unknown containing a negligible amount. .text.asan is only generated in ASAN builds. The primary goal is to group code segments based on their execution frequency (hotness). First, we want to place .text.hot adjacent to .text. Since we cannot put .text.hot after .text (Due to constraints with -ffunction-sections, placing .text.hot after .text is problematic), we need to put .text.hot before .text. Then it comes to .text.unlikely, we cannot put it after .text (same -ffunction-sections issue) . Therefore, we position .text.unlikely before .text.hot. .text.unknown and .tex.asan follow the same logic. This revised ordering effectively reverses the original arrangement (for .text.unlikely, .text.unknown, and .tex.asan), maintaining a similar level of affinity between sections. It also places .text.hot section at the beginning of a page to better utilize the TLB entry. Note that the limitation arises because the linker script employs glob patterns instead of regular expressions for string matching. While there is a method to maintain the current order using complex patterns, this significantly complicates the pattern and increases the likelihood of errors. This patch also changes vmlinux.lds.S for the sparc64 architecture to accommodate specific symbol placement requirements. Co-developed-by: Han Shen <shenhan@google.com> Signed-off-by: Han Shen <shenhan@google.com> Signed-off-by: Rong Xu <xur@google.com> Suggested-by: Sriraman Tallam <tmsriram@google.com> Suggested-by: Krzysztof Pszeniczny <kpszeniczny@google.com> Tested-by: Yonghong Song <yonghong.song@linux.dev> Tested-by: Yabin Cui <yabinc@google.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-11-26Merge tag 'vfs-6.13.exportfs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs exportfs updates from Christian Brauner: "This contains work to bring NFS connectable file handles to userspace servers. The name_to_handle_at() system call is extended to encode connectable file handles. Such file handles can be resolved to an open file with a connected path. So far userspace NFS servers couldn't make use of this functionality even though the kernel does already support it. This is achieved by introducing a new flag for name_to_handle_at(). Similarly, the open_by_handle_at() system call is tought to understand connectable file handles explicitly created via name_to_handle_at()" * tag 'vfs-6.13.exportfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: open_by_handle_at() support for decoding "explicit connectable" file handles fs: name_to_handle_at() support for "explicit connectable" file handles fs: prepare for "explicit connectable" file handles
2024-11-26Merge tag 'nfsd-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds
Pull nfsd updates from Chuck Lever: "Jeff Layton contributed a scalability improvement to NFSD's NFSv4 backchannel session implementation. This improvement is intended to increase the rate at which NFSD can safely recall NFSv4 delegations from clients, to avoid the need to revoke them. Revoking requires a slow state recovery process. A wide variety of bug fixes and other incremental improvements make up the bulk of commits in this series. As always I am grateful to the NFSD contributors, reviewers, testers, and bug reporters who participated during this cycle" * tag 'nfsd-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (72 commits) nfsd: allow for up to 32 callback session slots nfs_common: must not hold RCU while calling nfsd_file_put_local nfsd: get rid of include ../internal.h nfsd: fix nfs4_openowner leak when concurrent nfsd4_open occur NFSD: Add nfsd4_copy time-to-live NFSD: Add a laundromat reaper for async copy state NFSD: Block DESTROY_CLIENTID only when there are ongoing async COPY operations NFSD: Handle an NFS4ERR_DELAY response to CB_OFFLOAD NFSD: Free async copy information in nfsd4_cb_offload_release() NFSD: Fix nfsd4_shutdown_copy() NFSD: Add a tracepoint to record canceled async COPY operations nfsd: make nfsd4_session->se_flags a bool nfsd: remove nfsd4_session->se_bchannel nfsd: make use of warning provided by refcount_t nfsd: Don't fail OP_SETCLIENTID when there are too many clients. svcrdma: fix miss destroy percpu_counter in svc_rdma_proc_init() xdrgen: Remove program_stat_to_errno() call sites xdrgen: Update the files included in client-side source code xdrgen: Remove check for "nfs_ok" in C templates xdrgen: Remove tracepoint call site ...
2024-11-26Merge tag 'f2fs-for-6.13-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "This series introduces a device aliasing feature where user can carve out partitions but reclaim the space back by deleting aliased file in root dir. In addition to that, there're numerous minor bug fixes in zoned device support, checkpoint=disable, extent cache management, fiemap, and lazytime mount option. The full list of noticeable changes can be found below. Enhancements: - introduce device aliasing file - add stats in debugfs to show multiple devices - add a sysfs node to limit max read extent count per-inode - modify f2fs_is_checkpoint_ready logic to allow more data to be written with the CP disable - decrease spare area for pinned files for zoned devices Fixes: - Revert "f2fs: remove unreachable lazytime mount option parsing" - adjust unusable cap before checkpoint=disable mode - fix to drop all discards after creating snapshot on lvm device - fix to shrink read extent node in batches - fix changing cursegs if recovery fails on zoned device - fix to adjust appropriate length for fiemap - fix fiemap failure issue when page size is 16KB - fix to avoid forcing direct write to use buffered IO on inline_data inode - fix to map blocks correctly for direct write - fix to account dirty data in __get_secs_required() - fix null-ptr-deref in f2fs_submit_page_bio() - fix inconsistent update of i_blocks in release_compress_blocks and reserve_compress_blocks" * tag 'f2fs-for-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (40 commits) f2fs: fix to drop all discards after creating snapshot on lvm device f2fs: add a sysfs node to limit max read extent count per-inode f2fs: fix to shrink read extent node in batches f2fs: print message if fscorrupted was found in f2fs_new_node_page() f2fs: clear SBI_POR_DOING before initing inmem curseg f2fs: fix changing cursegs if recovery fails on zoned device f2fs: adjust unusable cap before checkpoint=disable mode f2fs: fix to requery extent which cross boundary of inquiry f2fs: fix to adjust appropriate length for fiemap f2fs: clean up w/ F2FS_{BLK_TO_BYTES,BTYES_TO_BLK} f2fs: fix to do cast in F2FS_{BLK_TO_BYTES, BTYES_TO_BLK} to avoid overflow f2fs: replace deprecated strcpy with strscpy Revert "f2fs: remove unreachable lazytime mount option parsing" f2fs: fix to avoid forcing direct write to use buffered IO on inline_data inode f2fs: fix to map blocks correctly for direct write f2fs: fix race in concurrent f2fs_stop_gc_thread f2fs: fix fiemap failure issue when page size is 16KB f2fs: remove redundant atomic file check in defragment f2fs: fix to convert log type to segment data type correctly f2fs: clean up the unused variable additional_reserved_segments ...
2024-11-26Merge tag 'fuse-update-6.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: - Add page -> folio conversions (Joanne Koong, Josef Bacik) - Allow max size of fuse requests to be configurable with a sysctl (Joanne Koong) - Allow FOPEN_DIRECT_IO to take advantage of async code path (yangyun) - Fix large kernel reads (like a module load) in virtio_fs (Hou Tao) - Fix attribute inconsistency in case readdirplus (and plain lookup in corner cases) is racing with inode eviction (Zhang Tianci) - Fix a WARN_ON triggered by virtio_fs (Asahi Lina) * tag 'fuse-update-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (30 commits) virtiofs: dax: remove ->writepages() callback fuse: check attributes staleness on fuse_iget() fuse: remove pages for requests and exclusively use folios fuse: convert direct io to use folios mm/writeback: add folio_mark_dirty_lock() fuse: convert writebacks to use folios fuse: convert retrieves to use folios fuse: convert ioctls to use folios fuse: convert writes (non-writeback) to use folios fuse: convert reads to use folios fuse: convert readdir to use folios fuse: convert readlink to use folios fuse: convert cuse to use folios fuse: add support in virtio for requests using folios fuse: support folios in struct fuse_args_pages and fuse_copy_pages() fuse: convert fuse_notify_store to use folios fuse: convert fuse_retrieve to use folios fuse: use the folio based vmstat helpers fuse: convert fuse_writepage_need_send to take a folio fuse: convert fuse_do_readpage to use folios ...
2024-11-26ALSA: hda/tas2781: Add speaker id check for ASUS projectsBaojun Xu
Add speaker id check by gpio in ACPI for ASUS projects. In other vendors, speaker id was checked by BIOS, and was applied in last bit of subsys id, so we can load corresponding firmware binary file for its speaker by subsys id. But in ASUS project, the firmware binary name will be appended an extra number to tell the speakers from different vendors. And this single digit come from gpio level of speaker id in BIOS. Signed-off-by: Baojun Xu <baojun.xu@ti.com> Link: https://patch.msgid.link/20241123073718.475-1-baojun.xu@ti.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-11-25Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds
Pull SCSI updates from James Bottomley: "Updates to the usual drivers (ufs, lpfc, hisi_sas, st). Amazingly enough, no core changes with the biggest set of driver changes being ufs (which conflicted with it's own fixes a bit, hence the merges) and the rest being minor fixes and updates" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (97 commits) scsi: st: New session only when Unit Attention for new tape scsi: st: Add MTIOCGET and MTLOAD to ioctls allowed after device reset scsi: st: Don't modify unknown block number in MTIOCGET scsi: ufs: core: Restore SM8650 support scsi: sun3: Mark driver struct with __refdata to prevent section mismatch scsi: sg: Enable runtime power management scsi: qedi: Fix a possible memory leak in qedi_alloc_and_init_sb() scsi: qedf: Fix a possible memory leak in qedf_alloc_and_init_sb() scsi: fusion: Remove unused variable 'rc' scsi: bfa: Fix use-after-free in bfad_im_module_exit() scsi: esas2r: Remove unused esas2r_build_cli_req() scsi: target: Fix incorrect function name in pscsi_create_type_disk() scsi: ufs: Replace deprecated PCI functions scsi: Switch back to struct platform_driver::remove() scsi: pm8001: Increase request sg length to support 4MiB requests scsi: pm8001: Initialize devices in pm8001_alloc_dev() scsi: pm8001: Use module param to set pcs event log severity scsi: ufs: ufs-mediatek: Configure individual LU queue flags scsi: MAINTAINERS: Update UFS Exynos entry scsi: lpfc: Copyright updates for 14.4.0.6 patches ...
2024-11-25iommu: remove stale declaration left over by a merge conflictLinus Torvalds
The merge commit ae3325f752ef ("Merge branches 'arm/smmu', 'mediatek', 's390', 'ti/omap', 'riscv' and 'core' into next") left a stale declaration of 'iommu_present()' even though the 'core' branch that was merged had removed the function (and the declaration). Remove it for real. Reported-by: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Joerg Roedel <jroedel@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-11-25Merge tag 'libnvdimm-for-6.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull nvdimm and DAX updates from Ira Weiny: "Most represent minor cleanups and code removals. One patch fixes potential NULL pointer arithmetic which was benign because the offset of the member was 0. Nevertheless it should be cleaned up. - typo fixes - clarify logic to remove potential NULL pointer math - remove dead code" * tag 'libnvdimm-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: dax: Remove an unused field in struct dax_operations dax: delete a stale directory pmem nvdimm: rectify the illogical code within nd_dax_probe() nvdimm: Correct some typos in comments
2024-11-25Merge tag 'mailbox-v6.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jassibrar/mailbox Pull mailbox updates from Jassi Brar: "Common: - switch back from remove_new() to remove() callback imx: - fix format specifier zynqmp: - setup IPI for each child node thead: - Add th1520 driver and bindings qcom: - add SM8750 and SAR2130p compatibles - fix expected clocks for callbacks - use IRQF_NO_SUSPEND for cpucp mtk-cmdq: - switch to __pm_runtime_put_autosuspend() - fix alloc size of clocks mpfs: - fix reg properties ti-msgmgr: - don't use of_match_ptr helper - enable COMPILE_TEST build pcc: - consider the PCC_ACK_FLAG arm_mhuv2: - fix non-fatal improper reuse of variable" * tag 'mailbox-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/jassibrar/mailbox: mailbox: pcc: Check before sending MCTP PCC response ACK mailbox: Switch back to struct platform_driver::remove() mailbox: imx: Modify the incorrect format specifier mailbox: arm_mhuv2: clean up loop in get_irq_chan_comb() mailbox: zynqmp: setup IPI for each valid child node dt-bindings: mailbox: Add thead,th1520-mailbox bindings mailbox: Introduce support for T-head TH1520 Mailbox driver mailbox: mtk-cmdq: fix wrong use of sizeof in cmdq_get_clocks() dt-bindings: mailbox: qcom-ipcc: Add SM8750 dt-bindings: mailbox: qcom,apcs-kpss-global: correct expected clocks for fallbacks dt-bindings: mailbox: qcom-ipcc: Add SAR2130P compatible mailbox: ti-msgmgr: Allow building under COMPILE_TEST mailbox: ti-msgmgr: Remove use of of_match_ptr() helper mailbox: qcom-cpucp: Mark the irq with IRQF_NO_SUSPEND flag mailbox: mtk-cmdq-mailbox: Switch to __pm_runtime_put_autosuspend() mailbox: mpfs: support new, syscon based, devicetree configuration dt-bindings: mailbox: mpfs: fix reg properties
2024-11-25Merge tag 'pinctrl-v6.13-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control updates from Linus Walleij: "No core changes this time. New drivers: - Xlinix Versal pin control driver - Ocelot LAN969x pin control driver - T-Head TH1520 RISC-V SoC pin control driver - Qualcomm SM8750, IPQ5424, QCS8300, SAR2130P and QCS615 SoC pin control drivers - Qualcomm SM8750 LPASS (low power audio subsystem) pin control driver - Qualcomm PM8937 mixsig IC pin control support, GPIO and MPP (multi-purpose-pin) - Samsung Exynos8895 and Exynos9810 SoC pin control driver - SpacemiT K1 SoC pin control driver - Airhoa EN7581 IC pin control driver Improvements: - The Renesas subdriver now supports schmitt-trigger and open drain pin configurations if the hardware supports it - Support GPIOF and GPIOG banks in the Aspeed G6 SoC - Support the DSW community in the Intel Elkhartlake SoC" * tag 'pinctrl-v6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (105 commits) pinctrl: airoha: Use unsigned long for bit search pinctrl: k210: Undef K210_PC_DEFAULT pinctrl: qcom: spmi: fix debugfs drive strength pinctrl: qcom: Add sm8750 pinctrl driver dt-bindings: pinctrl: qcom: Add sm8750 pinctrl pinctrl: cy8c95x0: remove unneeded goto labels pinctrl: cy8c95x0: embed iterator to the for-loop pinctrl: cy8c95x0: Use temporary variable for struct device pinctrl: cy8c95x0: use flexible sleeping in reset function pinctrl: cy8c95x0: switch to using devm_regulator_get_enable() pinctrl: cy8c95x0: Use 2-argument strscpy() dt-bindings: pinctrl: sx150xq: allow gpio line naming pinctrl: single: add marvell,pxa1908-padconf compatible dt-bindings: pinctrl: pinctrl-single: add marvell,pxa1908-padconf compatible dt-bindings: pinctrl: correct typo of description for cv1800 pinctrl: qcom: spmi-mpp: Add PM8937 compatible dt-bindings: pinctrl: qcom,pmic-mpp: Document PM8937 compatible pinctrl: qcom-pmic-gpio: add support for PM8937 dt-bindings: pinctrl: qcom,pmic-gpio: add PM8937 pinctrl: Use of_property_present() for non-boolean properties ...
2024-11-25tracing: Use guard() rather than scoped_guard()Mathieu Desnoyers
Using scoped_guard() in the implementation of trace_##name() adds an unnecessary level of indentation. Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Michael Jeanson <mjeanson@efficios.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Yonghong Song <yhs@fb.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> Cc: bpf@vger.kernel.org Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/20241125142514.2897143-1-mathieu.desnoyers@efficios.com Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-11-25Merge tag 'slab-for-6.13-v2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab Pull slab updates from Vlastimil Babka: - Add new slab_strict_numa boot parameter to enforce per-object memory policies on top of slab folio policies, for systems where saving cost of remote accesses is more important than minimizing slab allocation overhead (Christoph Lameter) - Fix for freeptr_offset alignment check being too strict for m68k (Geert Uytterhoeven) - krealloc() fixes for not violating __GFP_ZERO guarantees on krealloc() when slub_debug (redzone and object tracking) is enabled (Feng Tang) - Fix a memory leak in case sysfs registration fails for a slab cache, and also no longer fail to create the cache in that case (Hyeonggon Yoo) - Fix handling of detected consistency problems (due to buggy slab user) with slub_debug enabled, so that it does not cause further list corruption bugs (yuan.gao) - Code cleanup and kerneldocs polishing (Zhen Lei, Vlastimil Babka) * tag 'slab-for-6.13-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: slab: Fix too strict alignment check in create_cache() mm/slab: Allow cache creation to proceed even if sysfs registration fails mm/slub: Avoid list corruption when removing a slab from the full list mm/slub, kunit: Add testcase for krealloc redzone and zeroing mm/slub: Improve redzone check and zeroing for krealloc() mm/slub: Consider kfence case for get_orig_size() SLUB: Add support for per object memory policies mm, slab: add kerneldocs for common SLAB_ flags mm/slab: remove duplicate check in create_cache() mm/slub: Move krealloc() and related code to slub.c mm/kasan: Don't store metadata inside kmalloc object when slub_debug_orig_size is on
2024-11-25Merge tag 'mm-nonmm-stable-2024-11-24-02-05' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - The series "resource: A couple of cleanups" from Andy Shevchenko performs some cleanups in the resource management code - The series "Improve the copy of task comm" from Yafang Shao addresses possible race-induced overflows in the management of task_struct.comm[] - The series "Remove unnecessary header includes from {tools/}lib/list_sort.c" from Kuan-Wei Chiu adds some cleanups and a small fix to the list_sort library code and to its selftest - The series "Enhance min heap API with non-inline functions and optimizations" also from Kuan-Wei Chiu optimizes and cleans up the min_heap library code - The series "nilfs2: Finish folio conversion" from Ryusuke Konishi finishes off nilfs2's folioification - The series "add detect count for hung tasks" from Lance Yang adds more userspace visibility into the hung-task detector's activity - Apart from that, singelton patches in many places - please see the individual changelogs for details * tag 'mm-nonmm-stable-2024-11-24-02-05' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (71 commits) gdb: lx-symbols: do not error out on monolithic build kernel/reboot: replace sprintf() with sysfs_emit() lib: util_macros_kunit: add kunit test for util_macros.h util_macros.h: fix/rework find_closest() macros Improve consistency of '#error' directive messages ocfs2: fix uninitialized value in ocfs2_file_read_iter() hung_task: add docs for hung_task_detect_count hung_task: add detect count for hung tasks dma-buf: use atomic64_inc_return() in dma_buf_getfile() fs/proc/kcore.c: fix coccinelle reported ERROR instances resource: avoid unnecessary resource tree walking in __region_intersects() ocfs2: remove unused errmsg function and table ocfs2: cluster: fix a typo lib/scatterlist: use sg_phys() helper checkpatch: always parse orig_commit in fixes tag nilfs2: convert metadata aops from writepage to writepages nilfs2: convert nilfs_recovery_copy_block() to take a folio nilfs2: convert nilfs_page_count_clean_buffers() to take a folio nilfs2: remove nilfs_writepage nilfs2: convert checkpoint file to be folio-based ...
2024-11-25Merge tag 'trace-rust-v6.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull rust trace event support from Steven Rostedt: "Allow Rust code to have trace events Trace events is a popular way to debug what is happening inside the kernel or just to find out what is happening. Rust code is being added to the Linux kernel but it currently does not support the tracing infrastructure. Add support of trace events inside Rust code" * tag 'trace-rust-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: rust: jump_label: skip formatting generated file jump_label: rust: pass a mut ptr to `static_key_count` samples: rust: fix `rust_print` build making it a combined module rust: add arch_static_branch jump_label: adjust inline asm to be consistent rust: samples: add tracepoint to Rust sample rust: add tracepoint support rust: add static_branch_unlikely for static_key_false
2024-11-25Merge tag 'hardening-v6.13-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening updates from Kees Cook: - Disable __counted_by in Clang < 19.1.3 (Jan Hendrik Farr) - string_helpers: Silence output truncation warning (Bartosz Golaszewski) - compiler.h: Avoid needing BUILD_BUG_ON_ZERO() (Philipp Reisner) - MAINTAINERS: Add kernel hardening keywords __counted_by{_le|_be} (Thorsten Blum) * tag 'hardening-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: Compiler Attributes: disable __counted_by for clang < 19.1.3 compiler.h: Fix undefined BUILD_BUG_ON_ZERO() lib: string_helpers: silence snprintf() output truncation warning MAINTAINERS: Add kernel hardening keywords __counted_by{_le|_be}
2024-11-25Merge branch 'pci/endpoint'Bjorn Helgaas
- Add pci_epc_function_is_valid() to avoid repeating common validation checks (Damien Le Moal) - Skip attempts to allocate from endpoint controller memory window if the requested size is larger than the window (Damien Le Moal) - Add and document pci_epc_mem_map() and pci_epc_mem_unmap() to handle controller-specific size and alignment constraints, and add test cases to the endpoint test driver (Damien Le Moal) - Implement dwc pci_epc_ops.align_addr() so pci_epc_mem_map() can observe DWC-specific alignment requirements (Damien Le Moal) - Synchronously cancel command handler work in endpoint test before cleaning up DMA and BARs (Damien Le Moal) - Respect endpoint page size in dw_pcie_ep_align_addr() (Niklas Cassel) - Use dw_pcie_ep_align_addr() in dw_pcie_ep_raise_msi_irq() and dw_pcie_ep_raise_msix_irq() instead of open coding the equivalent (Niklas Cassel) - Remove superfluous 'return' from pci_epf_test_clean_dma_chan() (Wang Jiang) - Avoid NULL dereference if Modem Host Interface Endpoint lacks 'mmio' DT property (Zhongqiu Han) - Release PCI domain ID of Endpoint controller parent (not controller itself) and before unregistering the controller, to avoid use-after-free (Zijun Hu) - Clear secondary (not primary) EPC in pci_epc_remove_epf() when removing the secondary controller associated with an NTB (Zijun Hu) - Fix pci_epc_map map_size kerneldoc (Rick Wertenbroek) * pci/endpoint: PCI: endpoint: Fix pci_epc_map map_size kerneldoc string PCI: endpoint: Clear secondary (not primary) EPC in pci_epc_remove_epf() PCI: endpoint: Fix PCI domain ID release in pci_epc_destroy() PCI: endpoint: epf-mhi: Avoid NULL dereference if DT lacks 'mmio' PCI: endpoint: Remove surplus return statement from pci_epf_test_clean_dma_chan() PCI: dwc: ep: Use align addr function for dw_pcie_ep_raise_{msi,msix}_irq() PCI: endpoint: test: Synchronously cancel command handler work PCI: dwc: endpoint: Implement the pci_epc_ops::align_addr() operation PCI: endpoint: test: Use pci_epc_mem_map/unmap() PCI: endpoint: Update documentation PCI: endpoint: Introduce pci_epc_mem_map()/unmap() PCI: endpoint: Improve pci_epc_mem_alloc_addr() PCI: endpoint: Introduce pci_epc_function_is_valid()
2024-11-25Merge branch 'pci/tph'Bjorn Helgaas
- Add and document TLP Processing Hints (TPH) support so drivers can enable and disable TPH and the kernel can save/restore TPH configuration (Wei Huang) - Add TPH Steering Tag support so drivers can retrieve Steering Tag values associated with specific CPUs via an ACPI _DSM to direct DMA writes closer to their consumers (Wei Huang) * pci/tph: PCI/TPH: Add TPH documentation PCI/TPH: Add Steering Tag support PCI: Add TLP Processing Hints (TPH) support
2024-11-25Merge branch 'pci/thunderbolt'Bjorn Helgaas
- Detect some Thunderbolt chips that are built-in and hence 'trustworthy' by a heuristic since the 'ExternalFacingPort' and 'usb4-host-interface' ACPI properties are not quite enough (Esther Shimanovich) * pci/thunderbolt: PCI: Detect and trust built-in Thunderbolt chips
2024-11-25Merge branch 'pci/resource'Bjorn Helgaas
- Add resource_set_size() to set resource size when start has already been set (Ilpo Järvinen) - Add resource_set_range() helper to set both resource start and size (Ilpo Järvinen) - Use IS_ALIGNED() and resource_size() in quirk_s3_64M() instead of open-coding them (Ilpo Järvinen) - Add ALIGN_DOWN_IF_NONZERO() to avoid code duplication when distributing resources across devices (Ilpo Järvinen) - Improve pdev_sort_resources() warning message to be more specific (Ilpo Järvinen) * pci/resource: PCI: Improve pdev_sort_resources() warning message PCI: Add ALIGN_DOWN_IF_NONZERO() helper PCI: Use align and resource helpers, and SZ_* in quirk_s3_64M() PCI: Use resource_set_{range,size}() helpers resource: Add resource set range and size helpers
2024-11-25Merge branch 'pci/pwrctl'Bjorn Helgaas
- Use of_platform_device_create() instead of of_platform_populate() to create pwrctl platform devices so we can control it based on the child nodes (Manivannan Sadhasivam) - Create pwrctrl platform devices only if there's a relevant power supply property (Manivannan Sadhasivam) - Add device link from the pwrctl supplier to the PCI dev to ensure pwrctl drivers are probed before the PCI dev driver; this avoids a race where pwrctl could change device power state while the PCI driver was active (Manivannan Sadhasivam) - Find pwrctl device for removal with of_find_device_by_node() instead of searching all children of the parent (Manivannan Sadhasivam) - Rename 'pwrctl' to 'pwrctrl' to use the same 'ctrl' suffix as 'bwctrl' and other PCI files to reduce confusion (Bjorn Helgaas) * pci/pwrctl: PCI/pwrctrl: Rename pwrctrl functions and structures PCI/pwrctrl: Rename pwrctl files to pwrctrl PCI/pwrctl: Remove pwrctl device without iterating over all children of pwrctl parent PCI/pwrctl: Ensure that pwrctl drivers are probed before PCI client drivers PCI/pwrctl: Create pwrctl device only if at least one power supply is present PCI/pwrctl: Use of_platform_device_create() to create pwrctl devices # Conflicts: # drivers/pci/bus.c # drivers/pci/remove.c
2024-11-25Merge branch 'pci/locking'Bjorn Helgaas
- Make pci_stop_dev() and pci_destroy_dev() concurrent safe (Keith Busch) - Move __pci_walk_bus() mutex up into the caller, which avoids the need for a parameter to control locking (Keith Busch) - Simplify __pci_walk_bus() by making it recursive (Keith Busch) - Unexport pci_walk_bus_locked(), which is only used internally by the PCI core (Keith Busch) * pci/locking: PCI: Unexport pci_walk_bus_locked() PCI: Convert __pci_walk_bus() to be recursive PCI: Move __pci_walk_bus() mutex to where we need it PCI: Make pci_destroy_dev() concurrent safe PCI: Make pci_stop_dev() concurrent safe
2024-11-25Merge branch 'pci/enumeration'Bjorn Helgaas
- Simplify pci_read_bridge_bases() logic (Ilpo Järvinen) * pci/enumeration: PCI: Simplify pci_read_bridge_bases() logic PCI: Move struct pci_bus_resource into bus.c PCI: Remove unused PCI_SUBTRACTIVE_DECODE
2024-11-25Merge branch 'pci/devm'Bjorn Helgaas
- Export pcim_request_all_regions(), a managed interface to request all BARs (Philipp Stanner) - Replace pcim_iomap_regions_request_all() with pcim_request_all_regions(), and pcim_iomap_table()[n] with pcim_iomap(n), in the following drivers: ahci, crypto qat, crypto octeontx2, intel_th, iwlwifi, ntb idt, serial rp2, ALSA korg1212 (Philipp Stanner) - Remove the now unused pcim_iomap_regions_request_all() (Philipp Stanner) - Export pcim_iounmap_region(), a managed interface to unmap and release a PCI BAR (Philipp Stanner) - Replace pcim_iomap_regions(mask) with pcim_iomap_region(n), and pcim_iounmap_regions(mask) with pcim_iounmap_region(n), in the following drivers: fpga dfl-pci, block mtip32xx, gpio-merrifield, cavium (Philipp Stanner) * pci/devm: ethernet: cavium: Replace deprecated PCI functions gpio: Replace deprecated PCI functions fpga/dfl-pci.c: Replace deprecated PCI functions PCI: Deprecate pcim_iounmap_regions() PCI: Make pcim_iounmap_region() a public function PCI: Remove pcim_iomap_regions_request_all() ALSA: korg1212: Replace deprecated PCI functions serial: rp2: Replace deprecated PCI functions ntb: idt: Replace deprecated PCI functions wifi: iwlwifi: replace deprecated PCI functions intel_th: pci: Replace deprecated PCI functions crypto: marvell - replace deprecated PCI functions crypto: qat - replace deprecated PCI functions ata: ahci: Replace deprecated PCI functions PCI: Make pcim_request_all_regions() a public function
2024-11-25Merge tag 'input-for-v6.13-rc0' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input updates from Dmitry Torokhov: - support for NT36672A touchscreen added to novatek-nvt-ts driver - a change to ads7846 driver to prevent XPT2046 from locking up - a change switching platform input dirves back to using remove() method (from remove_new()) - updates to a number of input drivers to use the new cleanup facilities (__free(...), guard(), and scoped-guard()) which ensure that the resources and locks are released properly and automatically - other assorted driver cleanups and fixes. * tag 'input-for-v6.13-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (109 commits) Input: mpr121 - use devm_regulator_get_enable_read_voltage() Input: sun4i-lradc-keys - don't include 'pm_wakeup.h' directly Input: spear-keyboard - don't include 'pm_wakeup.h' directly Input: cypress-sf - constify struct i2c_device_id Input: ads7846 - increase xfer array size in 'struct ser_req' Input: fix the input_event struct documentation Input: i8042 - fix typo dublicate to duplicate Input: ads7846 - add dummy command register clearing cycle Input: cs40l50 - fix wrong usage of INIT_WORK() Input: introduce notion of passive observers for input handlers Input: maple_keyb - use guard notation when acquiring mutex Input: locomokbd - use guard notation when acquiring spinlock Input: hilkbd - use guard notation when acquiring spinlock Input: synaptics-rmi4 - switch to using cleanup functions in F34 Input: synaptics - fix a typo dt-bindings: input: rotary-encoder: Fix "rotary-encoder,rollover" type Input: omap-keypad - use guard notation when acquiring mutex Input: imagis - fix warning regarding 'imagis_3038_data' being unused Input: userio - remove unneeded semicolon Input: sparcspkr - use cleanup facility for device_node ...
2024-11-24Merge branch 'next' into for-linusDmitry Torokhov
Prepare input updates for 6.13 merge window.
2024-11-24mailbox: pcc: Check before sending MCTP PCC response ACKAdam Young
Type 4 PCC channels have an option to send back a response to the platform when they are done processing the request. The flag to indicate whether or not to respond is inside the message body, and thus is not available to the pcc mailbox. If the flag is not set, still set command completion bit after processing message. In order to read the flag, this patch maps the shared buffer to virtual memory. To avoid duplication of mapping the shared buffer is then made available to be used by the driver that uses the mailbox. Signed-off-by: Adam Young <admiyo@os.amperecomputing.com> Cc: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
2024-11-23Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "The biggest change here is eliminating the awful idea that KVM had of essentially guessing which pfns are refcounted pages. The reason to do so was that KVM needs to map both non-refcounted pages (for example BARs of VFIO devices) and VM_PFNMAP/VM_MIXMEDMAP VMAs that contain refcounted pages. However, the result was security issues in the past, and more recently the inability to map VM_IO and VM_PFNMAP memory that _is_ backed by struct page but is not refcounted. In particular this broke virtio-gpu blob resources (which directly map host graphics buffers into the guest as "vram" for the virtio-gpu device) with the amdgpu driver, because amdgpu allocates non-compound higher order pages and the tail pages could not be mapped into KVM. This requires adjusting all uses of struct page in the per-architecture code, to always work on the pfn whenever possible. The large series that did this, from David Stevens and Sean Christopherson, also cleaned up substantially the set of functions that provided arch code with the pfn for a host virtual addresses. The previous maze of twisty little passages, all different, is replaced by five functions (__gfn_to_page, __kvm_faultin_pfn, the non-__ versions of these two, and kvm_prefetch_pages) saving almost 200 lines of code. ARM: - Support for stage-1 permission indirection (FEAT_S1PIE) and permission overlays (FEAT_S1POE), including nested virt + the emulated page table walker - Introduce PSCI SYSTEM_OFF2 support to KVM + client driver. This call was introduced in PSCIv1.3 as a mechanism to request hibernation, similar to the S4 state in ACPI - Explicitly trap + hide FEAT_MPAM (QoS controls) from KVM guests. As part of it, introduce trivial initialization of the host's MPAM context so KVM can use the corresponding traps - PMU support under nested virtualization, honoring the guest hypervisor's trap configuration and event filtering when running a nested guest - Fixes to vgic ITS serialization where stale device/interrupt table entries are not zeroed when the mapping is invalidated by the VM - Avoid emulated MMIO completion if userspace has requested synchronous external abort injection - Various fixes and cleanups affecting pKVM, vCPU initialization, and selftests LoongArch: - Add iocsr and mmio bus simulation in kernel. - Add in-kernel interrupt controller emulation. - Add support for virtualization extensions to the eiointc irqchip. PPC: - Drop lingering and utterly obsolete references to PPC970 KVM, which was removed 10 years ago. - Fix incorrect documentation references to non-existing ioctls RISC-V: - Accelerate KVM RISC-V when running as a guest - Perf support to collect KVM guest statistics from host side s390: - New selftests: more ucontrol selftests and CPU model sanity checks - Support for the gen17 CPU model - List registers supported by KVM_GET/SET_ONE_REG in the documentation x86: - Cleanup KVM's handling of Accessed and Dirty bits to dedup code, improve documentation, harden against unexpected changes. Even if the hardware A/D tracking is disabled, it is possible to use the hardware-defined A/D bits to track if a PFN is Accessed and/or Dirty, and that removes a lot of special cases. - Elide TLB flushes when aging secondary PTEs, as has been done in x86's primary MMU for over 10 years. - Recover huge pages in-place in the TDP MMU when dirty page logging is toggled off, instead of zapping them and waiting until the page is re-accessed to create a huge mapping. This reduces vCPU jitter. - Batch TLB flushes when dirty page logging is toggled off. This reduces the time it takes to disable dirty logging by ~3x. - Remove the shrinker that was (poorly) attempting to reclaim shadow page tables in low-memory situations. - Clean up and optimize KVM's handling of writes to MSR_IA32_APICBASE. - Advertise CPUIDs for new instructions in Clearwater Forest - Quirk KVM's misguided behavior of initialized certain feature MSRs to their maximum supported feature set, which can result in KVM creating invalid vCPU state. E.g. initializing PERF_CAPABILITIES to a non-zero value results in the vCPU having invalid state if userspace hides PDCM from the guest, which in turn can lead to save/restore failures. - Fix KVM's handling of non-canonical checks for vCPUs that support LA57 to better follow the "architecture", in quotes because the actual behavior is poorly documented. E.g. most MSR writes and descriptor table loads ignore CR4.LA57 and operate purely on whether the CPU supports LA57. - Bypass the register cache when querying CPL from kvm_sched_out(), as filling the cache from IRQ context is generally unsafe; harden the cache accessors to try to prevent similar issues from occuring in the future. The issue that triggered this change was already fixed in 6.12, but was still kinda latent. - Advertise AMD_IBPB_RET to userspace, and fix a related bug where KVM over-advertises SPEC_CTRL when trying to support cross-vendor VMs. - Minor cleanups - Switch hugepage recovery thread to use vhost_task. These kthreads can consume significant amounts of CPU time on behalf of a VM or in response to how the VM behaves (for example how it accesses its memory); therefore KVM tried to place the thread in the VM's cgroups and charge the CPU time consumed by that work to the VM's container. However the kthreads did not process SIGSTOP/SIGCONT, and therefore cgroups which had KVM instances inside could not complete freezing. Fix this by replacing the kthread with a PF_USER_WORKER thread, via the vhost_task abstraction. Another 100+ lines removed, with generally better behavior too like having these threads properly parented in the process tree. - Revert a workaround for an old CPU erratum (Nehalem/Westmere) that didn't really work; there was really nothing to work around anyway: the broken patch was meant to fix nested virtualization, but the PERF_GLOBAL_CTRL MSR is virtualized and therefore unaffected by the erratum. - Fix 6.12 regression where CONFIG_KVM will be built as a module even if asked to be builtin, as long as neither KVM_INTEL nor KVM_AMD is 'y'. x86 selftests: - x86 selftests can now use AVX. Documentation: - Use rST internal links - Reorganize the introduction to the API document Generic: - Protect vcpu->pid accesses outside of vcpu->mutex with a rwlock instead of RCU, so that running a vCPU on a different task doesn't encounter long due to having to wait for all CPUs become quiescent. In general both reads and writes are rare, but userspace that supports confidential computing is introducing the use of "helper" vCPUs that may jump from one host processor to another. Those will be very happy to trigger a synchronize_rcu(), and the effect on performance is quite the disaster" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (298 commits) KVM: x86: Break CONFIG_KVM_X86's direct dependency on KVM_INTEL || KVM_AMD KVM: x86: add back X86_LOCAL_APIC dependency Revert "KVM: VMX: Move LOAD_IA32_PERF_GLOBAL_CTRL errata handling out of setup_vmcs_config()" KVM: x86: switch hugepage recovery thread to vhost_task KVM: x86: expose MSR_PLATFORM_INFO as a feature MSR x86: KVM: Advertise CPUIDs for new instructions in Clearwater Forest Documentation: KVM: fix malformed table irqchip/loongson-eiointc: Add virt extension support LoongArch: KVM: Add irqfd support LoongArch: KVM: Add PCHPIC user mode read and write functions LoongArch: KVM: Add PCHPIC read and write functions LoongArch: KVM: Add PCHPIC device support LoongArch: KVM: Add EIOINTC user mode read and write functions LoongArch: KVM: Add EIOINTC read and write functions LoongArch: KVM: Add EIOINTC device support LoongArch: KVM: Add IPI user mode read and write function LoongArch: KVM: Add IPI read and write function LoongArch: KVM: Add IPI device support LoongArch: KVM: Add iocsr and mmio bus simulation in kernel KVM: arm64: Pass on SVE mapping failures ...
2024-11-23tracing: Remove cond argument from __DECLARE_TRACE_SYSCALLMathieu Desnoyers
Syscall tracepoints do not require a "cond" argument, because they are meant to be used only for sys_enter and sys_exit instrumentation, which don't require condition evaluation. Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Michael Jeanson <mjeanson@efficios.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Yonghong Song <yhs@fb.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> Cc: bpf@vger.kernel.org Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Jordan Rife <jrife@google.com> Cc: linux-trace-kernel@vger.kernel.org Link: https://lore.kernel.org/20241123153031.2884933-6-mathieu.desnoyers@efficios.com Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-11-23tracing: Remove conditional locking from __DO_TRACE()Mathieu Desnoyers
Remove conditional locking by moving the __DO_TRACE() code into trace_##name(). When the faultable syscall tracepoints were implemented, __DO_TRACE() had a rcuidle argument which selected between SRCU and preempt disable. Therefore, the RCU tasks trace protection for faultable syscall tracepoints was introduced using the same pattern. At that point, it did not appear obvious that this feedback from Linus [1] applied here as well, because the __DO_TRACE() modification was extending a pre-existing pattern. Shortly before pulling the faultable syscall tracepoints modifications, Steven removed the rcuidle argument and SRCU protection scheme entirely from tracepoint.h: commit 48bcda684823 ("tracing: Remove definition of trace_*_rcuidle()") This required a rebase of the faultable syscall tracepoints series, which missed a perfect opportunity to integrate the prior recommendation from Linus. In response to the pull request, Linus pointed out [2] that he was not pleased by the implementation, expecting this to be fixed in a follow up patch series. Move __DO_TRACE() code into trace_##name() within each of __DECLARE_TRACE() and __DECLARE_TRACE_SYSCALL(). Use a scoped guard to guard the preempt disable notrace and RCU tasks trace critical sections. Link: https://lore.kernel.org/all/CAHk-=wggDLDeTKbhb5hh--x=-DQd69v41137M72m6NOTmbD-cw@mail.gmail.com/ [1] Link: https://lore.kernel.org/lkml/CAHk-=witPrLcu22dZ93VCyRQonS7+-dFYhQbna=KBa-TAhayMw@mail.gmail.com/ [2] Fixes: a363d27cdbc2 ("tracing: Allow system call tracepoints to handle page faults") Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Michael Jeanson <mjeanson@efficios.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Yonghong Song <yhs@fb.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> Cc: bpf@vger.kernel.org Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Jordan Rife <jrife@google.com> Cc: linux-trace-kernel@vger.kernel.org Link: https://lore.kernel.org/20241123153031.2884933-5-mathieu.desnoyers@efficios.com Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-11-23rcupdate_trace: Define rcu_tasks_trace lock guardMathieu Desnoyers
Define a rcu_tasks_trace lock guard for use by the syscall enter/exit tracepoints. Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Michael Jeanson <mjeanson@efficios.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Yonghong Song <yhs@fb.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> Cc: bpf@vger.kernel.org Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Jordan Rife <jrife@google.com> Cc: linux-trace-kernel@vger.kernel.org Link: https://lore.kernel.org/20241123153031.2884933-4-mathieu.desnoyers@efficios.com Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-11-23tracing: Remove __idx variable from __DO_TRACEMathieu Desnoyers
Since the removal of SRCU-protected tracepoints, the __idx variable in __DO_TRACE is unused. Remove this variable. Fixes: 48bcda684823 ("tracing: Remove definition of trace_*_rcuidle()") Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Michael Jeanson <mjeanson@efficios.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Yonghong Song <yhs@fb.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> Cc: bpf@vger.kernel.org Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/20241123153031.2884933-3-mathieu.desnoyers@efficios.com Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-11-23tracing: Move it_func[0] comment to the relevant contextMathieu Desnoyers
When introducing __DO_TRACE_CALL(), the iteration over it_func moved from __DO_TRACE() to __tracepoint_iter_##_name(), but the comment relevant for this iterator was left in its original location. Move the comment to the relevant context. Fixes: d25e37d89dd2 ("tracepoint: Optimize using static_call()") Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Michael Jeanson <mjeanson@efficios.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Yonghong Song <yhs@fb.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> Cc: bpf@vger.kernel.org Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/20241123153031.2884933-2-mathieu.desnoyers@efficios.com Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-11-23Merge tag 'mm-stable-2024-11-18-19-27' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - The series "zram: optimal post-processing target selection" from Sergey Senozhatsky improves zram's post-processing selection algorithm. This leads to improved memory savings. - Wei Yang has gone to town on the mapletree code, contributing several series which clean up the implementation: - "refine mas_mab_cp()" - "Reduce the space to be cleared for maple_big_node" - "maple_tree: simplify mas_push_node()" - "Following cleanup after introduce mas_wr_store_type()" - "refine storing null" - The series "selftests/mm: hugetlb_fault_after_madv improvements" from David Hildenbrand fixes this selftest for s390. - The series "introduce pte_offset_map_{ro|rw}_nolock()" from Qi Zheng implements some rationaizations and cleanups in the page mapping code. - The series "mm: optimize shadow entries removal" from Shakeel Butt optimizes the file truncation code by speeding up the handling of shadow entries. - The series "Remove PageKsm()" from Matthew Wilcox completes the migration of this flag over to being a folio-based flag. - The series "Unify hugetlb into arch_get_unmapped_area functions" from Oscar Salvador implements a bunch of consolidations and cleanups in the hugetlb code. - The series "Do not shatter hugezeropage on wp-fault" from Dev Jain takes away the wp-fault time practice of turning a huge zero page into small pages. Instead we replace the whole thing with a THP. More consistent cleaner and potentiall saves a large number of pagefaults. - The series "percpu: Add a test case and fix for clang" from Andy Shevchenko enhances and fixes the kernel's built in percpu test code. - The series "mm/mremap: Remove extra vma tree walk" from Liam Howlett optimizes mremap() by avoiding doing things which we didn't need to do. - The series "Improve the tmpfs large folio read performance" from Baolin Wang teaches tmpfs to copy data into userspace at the folio size rather than as individual pages. A 20% speedup was observed. - The series "mm/damon/vaddr: Fix issue in damon_va_evenly_split_region()" fro Zheng Yejian fixes DAMON splitting. - The series "memcg-v1: fully deprecate charge moving" from Shakeel Butt removes the long-deprecated memcgv2 charge moving feature. - The series "fix error handling in mmap_region() and refactor" from Lorenzo Stoakes cleanup up some of the mmap() error handling and addresses some potential performance issues. - The series "x86/module: use large ROX pages for text allocations" from Mike Rapoport teaches x86 to use large pages for read-only-execute module text. - The series "page allocation tag compression" from Suren Baghdasaryan is followon maintenance work for the new page allocation profiling feature. - The series "page->index removals in mm" from Matthew Wilcox remove most references to page->index in mm/. A slow march towards shrinking struct page. - The series "damon/{self,kunit}tests: minor fixups for DAMON debugfs interface tests" from Andrew Paniakin performs maintenance work for DAMON's self testing code. - The series "mm: zswap swap-out of large folios" from Kanchana Sridhar improves zswap's batching of compression and decompression. It is a step along the way towards using Intel IAA hardware acceleration for this zswap operation. - The series "kasan: migrate the last module test to kunit" from Sabyrzhan Tasbolatov completes the migration of the KASAN built-in tests over to the KUnit framework. - The series "implement lightweight guard pages" from Lorenzo Stoakes permits userapace to place fault-generating guard pages within a single VMA, rather than requiring that multiple VMAs be created for this. Improved efficiencies for userspace memory allocators are expected. - The series "memcg: tracepoint for flushing stats" from JP Kobryn uses tracepoints to provide increased visibility into memcg stats flushing activity. - The series "zram: IDLE flag handling fixes" from Sergey Senozhatsky fixes a zram buglet which potentially affected performance. - The series "mm: add more kernel parameters to control mTHP" from Maíra Canal enhances our ability to control/configuremultisize THP from the kernel boot command line. - The series "kasan: few improvements on kunit tests" from Sabyrzhan Tasbolatov has a couple of fixups for the KASAN KUnit tests. - The series "mm/list_lru: Split list_lru lock into per-cgroup scope" from Kairui Song optimizes list_lru memory utilization when lockdep is enabled. * tag 'mm-stable-2024-11-18-19-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (215 commits) cma: enforce non-zero pageblock_order during cma_init_reserved_mem() mm/kfence: add a new kunit test test_use_after_free_read_nofault() zram: fix NULL pointer in comp_algorithm_show() memcg/hugetlb: add hugeTLB counters to memcg vmstat: call fold_vm_zone_numa_events() before show per zone NUMA event mm: mmap_lock: check trace_mmap_lock_$type_enabled() instead of regcount zram: ZRAM_DEF_COMP should depend on ZRAM MAINTAINERS/MEMORY MANAGEMENT: add document files for mm Docs/mm/damon: recommend academic papers to read and/or cite mm: define general function pXd_init() kmemleak: iommu/iova: fix transient kmemleak false positive mm/list_lru: simplify the list_lru walk callback function mm/list_lru: split the lock to per-cgroup scope mm/list_lru: simplify reparenting and initial allocation mm/list_lru: code clean up for reparenting mm/list_lru: don't export list_lru_add mm/list_lru: don't pass unnecessary key parameters kasan: add kunit tests for kmalloc_track_caller, kmalloc_node_track_caller kasan: change kasan_atomics kunit test as KUNIT_CASE_SLOW kasan: use EXPORT_SYMBOL_IF_KUNIT to export symbols ...
2024-11-22Merge tag 'ovl-update-6.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs Pull overlayfs updates from Amir Goldstein: - Fix a syzbot reported NULL pointer deref with bfs lower layers - Fix a copy up failure of large file from lower fuse fs - Followup cleanup of backing_file API from Miklos - Introduction and use of revert/override_creds_light() helpers, that were suggested by Christian as a mitigation to cache line bouncing and false sharing of fields in overlayfs creator_cred long lived struct cred copy. - Store up to two backing file references (upper and lower) in an ovl_file container instead of storing a single backing file in file->private_data. This is used to avoid the practice of opening a short lived backing file for the duration of some file operations and to avoid the specialized use of FDPUT_FPUT in such occasions, that was getting in the way of Al's fd_file() conversions. * tag 'ovl-update-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs: ovl: Filter invalid inodes with missing lookup function ovl: convert ovl_real_fdget() callers to ovl_real_file() ovl: convert ovl_real_fdget_path() callers to ovl_real_file_path() ovl: store upper real file in ovl_file struct ovl: allocate a container struct ovl_file for ovl private context ovl: do not open non-data lower file for fsync ovl: Optimize override/revert creds ovl: pass an explicit reference of creators creds to callers ovl: use wrapper ovl_revert_creds() fs/backing-file: Convert to revert/override_creds_light() cred: Add a light version of override/revert_creds() backing-file: clean up the API ovl: properly handle large files in ovl_security_fileattr
2024-11-22Merge tag 'sysctl-6.13-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl Pull sysctl updates from Joel Granados: "sysctl ctl_table constification: - Constifying ctl_table structs prevents the modification of proc_handler function pointers. All ctl_table struct arguments are const qualified in the sysctl API in such a way that the ctl_table arrays being defined elsewhere and passed through sysctl can be constified one-by-one. We kick the constification off by qualifying user_table in kernel/ucount.c and expect all the ctl_tables to be constified in the coming releases. Misc fixes: - Adjust comments in two places to better reflect the code - Remove superfluous dput calls - Remove Luis from sysctl maintainership - Replace comments about holding a lock with calls to lockdep_assert_held" * tag 'sysctl-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl: sysctl: Reduce dput(child) calls in proc_sys_fill_cache() sysctl: Reorganize kerneldoc parameter names ucounts: constify sysctl table user_table sysctl: update comments to new registration APIs MAINTAINERS: remove me from sysctl sysctl: Convert locking comments to lockdep assertions const_structs.checkpatch: add ctl_table sysctl: make internal ctl_tables const sysctl: allow registration of const struct ctl_table sysctl: move internal interfaces to const struct ctl_table bpf: Constify ctl_table argument of filter function