diff options
Diffstat (limited to 'Documentation')
32 files changed, 972 insertions, 415 deletions
diff --git a/Documentation/ABI/obsolete/sysfs-class-rfkill b/Documentation/ABI/obsolete/sysfs-class-rfkill deleted file mode 100644 index ff60ad9eca4c..000000000000 --- a/Documentation/ABI/obsolete/sysfs-class-rfkill +++ /dev/null @@ -1,29 +0,0 @@ -rfkill - radio frequency (RF) connector kill switch support - -For details to this subsystem look at Documentation/rfkill.txt. - -What: /sys/class/rfkill/rfkill[0-9]+/state -Date: 09-Jul-2007 -KernelVersion v2.6.22 -Contact: linux-wireless@vger.kernel.org -Description: Current state of the transmitter. - This file is deprecated and scheduled to be removed in 2014, - because its not possible to express the 'soft and hard block' - state of the rfkill driver. -Values: A numeric value. - 0: RFKILL_STATE_SOFT_BLOCKED - transmitter is turned off by software - 1: RFKILL_STATE_UNBLOCKED - transmitter is (potentially) active - 2: RFKILL_STATE_HARD_BLOCKED - transmitter is forced off by something outside of - the driver's control. - -What: /sys/class/rfkill/rfkill[0-9]+/claim -Date: 09-Jul-2007 -KernelVersion v2.6.22 -Contact: linux-wireless@vger.kernel.org -Description: This file is deprecated because there no longer is a way to - claim just control over a single rfkill instance. - This file is scheduled to be removed in 2012. -Values: 0: Kernel handles events diff --git a/Documentation/ABI/removed/sysfs-class-rfkill b/Documentation/ABI/removed/sysfs-class-rfkill new file mode 100644 index 000000000000..3ce6231f20b2 --- /dev/null +++ b/Documentation/ABI/removed/sysfs-class-rfkill @@ -0,0 +1,13 @@ +rfkill - radio frequency (RF) connector kill switch support + +For details to this subsystem look at Documentation/rfkill.txt. + +What: /sys/class/rfkill/rfkill[0-9]+/claim +Date: 09-Jul-2007 +KernelVersion v2.6.22 +Contact: linux-wireless@vger.kernel.org +Description: This file was deprecated because there no longer was a way to + claim just control over a single rfkill instance. + This file was scheduled to be removed in 2012, and was removed + in 2016. +Values: 0: Kernel handles events diff --git a/Documentation/ABI/stable/sysfs-class-rfkill b/Documentation/ABI/stable/sysfs-class-rfkill index 097f522c33bb..e1ba4a104753 100644 --- a/Documentation/ABI/stable/sysfs-class-rfkill +++ b/Documentation/ABI/stable/sysfs-class-rfkill @@ -2,9 +2,8 @@ rfkill - radio frequency (RF) connector kill switch support For details to this subsystem look at Documentation/rfkill.txt. -For the deprecated /sys/class/rfkill/*/state and -/sys/class/rfkill/*/claim knobs of this interface look in -Documentation/ABI/obsolete/sysfs-class-rfkill. +For the deprecated /sys/class/rfkill/*/claim knobs of this interface look in +Documentation/ABI/removed/sysfs-class-rfkill. What: /sys/class/rfkill Date: 09-Jul-2007 @@ -42,6 +41,28 @@ Values: A numeric value. 1: true +What: /sys/class/rfkill/rfkill[0-9]+/state +Date: 09-Jul-2007 +KernelVersion v2.6.22 +Contact: linux-wireless@vger.kernel.org +Description: Current state of the transmitter. + This file was scheduled to be removed in 2014, but due to its + large number of users it will be sticking around for a bit + longer. Despite it being marked as stabe, the newer "hard" and + "soft" interfaces should be preffered, since it is not possible + to express the 'soft and hard block' state of the rfkill driver + through this interface. There will likely be another attempt to + remove it in the future. +Values: A numeric value. + 0: RFKILL_STATE_SOFT_BLOCKED + transmitter is turned off by software + 1: RFKILL_STATE_UNBLOCKED + transmitter is (potentially) active + 2: RFKILL_STATE_HARD_BLOCKED + transmitter is forced off by something outside of + the driver's control. + + What: /sys/class/rfkill/rfkill[0-9]+/hard Date: 12-March-2010 KernelVersion v2.6.34 diff --git a/Documentation/ABI/testing/sysfs-class-net-batman-adv b/Documentation/ABI/testing/sysfs-class-net-batman-adv index 7f34a95bb963..518f6a1dbc0c 100644 --- a/Documentation/ABI/testing/sysfs-class-net-batman-adv +++ b/Documentation/ABI/testing/sysfs-class-net-batman-adv @@ -1,4 +1,20 @@ +What: /sys/class/net/<iface>/batman-adv/throughput_override +Date: Feb 2014 +Contact: Antonio Quartulli <antonio@meshcoding.com> +description: + Defines the throughput value to be used by B.A.T.M.A.N. V + when estimating the link throughput using this interface. + If the value is set to 0 then batman-adv will try to + estimate the throughput by itself. + +What: /sys/class/net/<iface>/batman-adv/elp_interval +Date: Feb 2014 +Contact: Linus Lüssing <linus.luessing@web.de> +Description: + Defines the interval in milliseconds in which batman + sends its probing packets for link quality measurements. + What: /sys/class/net/<iface>/batman-adv/iface_status Date: May 2010 Contact: Marek Lindner <mareklindner@neomailbox.ch> @@ -12,4 +28,3 @@ Description: The /sys/class/net/<iface>/batman-adv/mesh_iface file displays the batman mesh interface this <iface> currently is associated with. - diff --git a/Documentation/devicetree/bindings/net/arc_emac.txt b/Documentation/devicetree/bindings/net/arc_emac.txt index a1d71eb43b20..c73a0e9c625e 100644 --- a/Documentation/devicetree/bindings/net/arc_emac.txt +++ b/Documentation/devicetree/bindings/net/arc_emac.txt @@ -7,6 +7,13 @@ Required properties: - max-speed: see ethernet.txt file in the same directory. - phy: see ethernet.txt file in the same directory. +Optional properties: +- phy-reset-gpios : Should specify the gpio for phy reset +- phy-reset-duration : Reset duration in milliseconds. Should present + only if property "phy-reset-gpios" is available. Missing the property + will have the duration be 1 millisecond. Numbers greater than 1000 are + invalid and 1 millisecond will be used instead. + Clock handling: The clock frequency is needed to calculate and set polling period of EMAC. It must be provided by one of: diff --git a/Documentation/devicetree/bindings/net/can/ifi_canfd.txt b/Documentation/devicetree/bindings/net/can/ifi_canfd.txt new file mode 100644 index 000000000000..20ea5c70ab82 --- /dev/null +++ b/Documentation/devicetree/bindings/net/can/ifi_canfd.txt @@ -0,0 +1,15 @@ +IFI CANFD controller +-------------------- + +Required properties: + - compatible: Should be "ifi,canfd-1.0" + - reg: Should contain CAN controller registers location and length + - interrupts: Should contain IRQ line for the CAN controller + +Example: + + canfd0: canfd@ff220000 { + compatible = "ifi,canfd-1.0"; + reg = <0xff220000 0x00001000>; + interrupts = <0 43 0>; + }; diff --git a/Documentation/devicetree/bindings/net/can/rcar_can.txt b/Documentation/devicetree/bindings/net/can/rcar_can.txt index 002d8440bf66..8d40ab27bc8c 100644 --- a/Documentation/devicetree/bindings/net/can/rcar_can.txt +++ b/Documentation/devicetree/bindings/net/can/rcar_can.txt @@ -6,6 +6,17 @@ Required properties: "renesas,can-r8a7779" if CAN controller is a part of R8A7779 SoC. "renesas,can-r8a7790" if CAN controller is a part of R8A7790 SoC. "renesas,can-r8a7791" if CAN controller is a part of R8A7791 SoC. + "renesas,can-r8a7792" if CAN controller is a part of R8A7792 SoC. + "renesas,can-r8a7793" if CAN controller is a part of R8A7793 SoC. + "renesas,can-r8a7794" if CAN controller is a part of R8A7794 SoC. + "renesas,can-r8a7795" if CAN controller is a part of R8A7795 SoC. + "renesas,rcar-gen1-can" for a generic R-Car Gen1 compatible device. + "renesas,rcar-gen2-can" for a generic R-Car Gen2 compatible device. + "renesas,rcar-gen3-can" for a generic R-Car Gen3 compatible device. + When compatible with the generic version, nodes must list the + SoC-specific version corresponding to the platform first + followed by the generic version. + - reg: physical base address and size of the R-Car CAN register map. - interrupts: interrupt specifier for the sole interrupt. - clocks: phandles and clock specifiers for 3 CAN clock inputs. @@ -13,6 +24,15 @@ Required properties: - pinctrl-0: pin control group to be used for this controller. - pinctrl-names: must be "default". +Required properties for "renesas,can-r8a7795" compatible: +In R8A7795 SoC, "clkp2" can be CANFD clock. This is a div6 clock and can be +used by both CAN and CAN FD controller at the same time. It needs to be scaled +to maximum frequency if any of these controllers use it. This is done using +the below properties. + +- assigned-clocks: phandle of clkp2(CANFD) clock. +- assigned-clock-rates: maximum frequency of this clock. + Optional properties: - renesas,can-clock-select: R-Car CAN Clock Source Select. Valid values are: <0x0> (default) : Peripheral clock (clkp1) @@ -25,7 +45,7 @@ Example SoC common .dtsi file: can0: can@e6e80000 { - compatible = "renesas,can-r8a7791"; + compatible = "renesas,can-r8a7791", "renesas,rcar-gen2-can"; reg = <0 0xe6e80000 0 0x1000>; interrupts = <0 186 IRQ_TYPE_LEVEL_HIGH>; clocks = <&mstp9_clks R8A7791_CLK_RCAN0>, diff --git a/Documentation/devicetree/bindings/net/can/sja1000.txt b/Documentation/devicetree/bindings/net/can/sja1000.txt index b4a6d53fb01a..ac3160eca96a 100644 --- a/Documentation/devicetree/bindings/net/can/sja1000.txt +++ b/Documentation/devicetree/bindings/net/can/sja1000.txt @@ -2,7 +2,7 @@ Memory mapped SJA1000 CAN controller from NXP (formerly Philips) Required properties: -- compatible : should be "nxp,sja1000". +- compatible : should be one of "nxp,sja1000", "technologic,sja1000". - reg : should specify the chip select, address offset and size required to map the registers of the SJA1000. The size is usually 0x80. @@ -14,6 +14,7 @@ Optional properties: - reg-io-width : Specify the size (in bytes) of the IO accesses that should be performed on the device. Valid value is 1, 2 or 4. + This property is ignored for technologic version. Default to 1 (8 bits). - nxp,external-clock-frequency : Frequency of the external oscillator diff --git a/Documentation/devicetree/bindings/net/cavium-mdio.txt b/Documentation/devicetree/bindings/net/cavium-mdio.txt index 04cb7491d232..020df08b8a30 100644 --- a/Documentation/devicetree/bindings/net/cavium-mdio.txt +++ b/Documentation/devicetree/bindings/net/cavium-mdio.txt @@ -1,9 +1,12 @@ * System Management Interface (SMI) / MDIO Properties: -- compatible: "cavium,octeon-3860-mdio" +- compatible: One of: - Compatibility with all cn3XXX, cn5XXX and cn6XXX SOCs. + "cavium,octeon-3860-mdio": Compatibility with all cn3XXX, cn5XXX + and cn6XXX SOCs. + + "cavium,thunder-8890-mdio": Compatibility with all cn8XXX SOCs. - reg: The base address of the MDIO bus controller register bank. @@ -25,3 +28,57 @@ Example: reg = <0>; }; }; + + +* System Management Interface (SMI) / MDIO Nexus + + Several mdio buses may be gathered as children of a single PCI + device, this PCI device is the nexus of the buses. + +Properties: + +- compatible: "cavium,thunder-8890-mdio-nexus"; + +- reg: The PCI device and function numbers of the nexus device. + +- #address-cells: Must be <2>. + +- #size-cells: Must be <2>. + +- ranges: As needed for mapping of the MDIO bus device registers. + +- assigned-addresses: As needed for mapping of the MDIO bus device registers. + +Example: + + mdio-nexus@1,3 { + compatible = "cavium,thunder-8890-mdio-nexus"; + #address-cells = <2>; + #size-cells = <2>; + reg = <0x0b00 0 0 0 0>; /* DEVFN = 0x0b (1:3) */ + assigned-addresses = <0x03000000 0x87e0 0x05000000 0x0 0x800000>; + ranges = <0x87e0 0x05000000 0x03000000 0x87e0 0x05000000 0x0 0x800000>; + + mdio0@87e0,05003800 { + compatible = "cavium,thunder-8890-mdio"; + #address-cells = <1>; + #size-cells = <0>; + reg = <0x87e0 0x05003800 0x0 0x30>; + + ethernet-phy@0 { + ... + reg = <0>; + }; + }; + mdio0@87e0,05003880 { + compatible = "cavium,thunder-8890-mdio"; + #address-cells = <1>; + #size-cells = <0>; + reg = <0x87e0 0x05003880 0x0 0x30>; + + ethernet-phy@0 { + ... + reg = <0>; + }; + }; + }; diff --git a/Documentation/devicetree/bindings/net/emac_rockchip.txt b/Documentation/devicetree/bindings/net/emac_rockchip.txt index 8dc1c79fef7f..05bd7dafce17 100644 --- a/Documentation/devicetree/bindings/net/emac_rockchip.txt +++ b/Documentation/devicetree/bindings/net/emac_rockchip.txt @@ -1,8 +1,10 @@ -* ARC EMAC 10/100 Ethernet platform driver for Rockchip Rk3066/RK3188 SoCs +* ARC EMAC 10/100 Ethernet platform driver for Rockchip RK3036/RK3066/RK3188 SoCs Required properties: -- compatible: Should be "rockchip,rk3066-emac" or "rockchip,rk3188-emac" - according to the target SoC. +- compatible: should be "rockchip,<name>-emac" + "rockchip,rk3036-emac": found on RK3036 SoCs + "rockchip,rk3066-emac": found on RK3066 SoCs + "rockchip,rk3188-emac": found on RK3188 SoCs - reg: Address and length of the register set for the device - interrupts: Should contain the EMAC interrupts - rockchip,grf: phandle to the syscon grf used to control speed and mode diff --git a/Documentation/devicetree/bindings/net/fsl-fec.txt b/Documentation/devicetree/bindings/net/fsl-fec.txt index a9eb611bee68..b037a9d78d93 100644 --- a/Documentation/devicetree/bindings/net/fsl-fec.txt +++ b/Documentation/devicetree/bindings/net/fsl-fec.txt @@ -12,6 +12,9 @@ Optional properties: only if property "phy-reset-gpios" is available. Missing the property will have the duration be 1 millisecond. Numbers greater than 1000 are invalid and 1 millisecond will be used instead. +- phy-reset-active-high : If present then the reset sequence using the GPIO + specified in the "phy-reset-gpios" property is reversed (H=reset state, + L=operation state). - phy-supply : regulator that powers the Ethernet PHY. - phy-handle : phandle to the PHY device connected to this device. - fixed-link : Assume a fixed link. See fixed-link.txt in the same directory. diff --git a/Documentation/devicetree/bindings/net/macb.txt b/Documentation/devicetree/bindings/net/macb.txt index d2e243b1ec0e..b5a42df4c928 100644 --- a/Documentation/devicetree/bindings/net/macb.txt +++ b/Documentation/devicetree/bindings/net/macb.txt @@ -25,6 +25,8 @@ Required properties: Optional properties for PHY child node: - reset-gpios : Should specify the gpio for phy reset +- magic-packet : If present, indicates that the hardware supports waking + up via magic packet. Examples: diff --git a/Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt b/Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt index d0cb8693963b..73be8970815e 100644 --- a/Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt +++ b/Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt @@ -18,15 +18,30 @@ Optional properties: "core" for core clock and "bus" for the optional bus clock. +Optional properties (valid only for Armada XP/38x): + +- buffer-manager: a phandle to a buffer manager node. Please refer to + Documentation/devicetree/bindings/net/marvell-neta-bm.txt +- bm,pool-long: ID of a pool, that will accept all packets of a size + higher than 'short' pool's threshold (if set) and up to MTU value. + Obligatory, when the port is supposed to use hardware + buffer management. +- bm,pool-short: ID of a pool, that will be used for accepting + packets of a size lower than given threshold. If not set, the port + will use a single 'long' pool for all packets, as defined above. + Example: -ethernet@d0070000 { +ethernet@70000 { compatible = "marvell,armada-370-neta"; - reg = <0xd0070000 0x2500>; + reg = <0x70000 0x2500>; interrupts = <8>; clocks = <&gate_clk 4>; tx-csum-limit = <9800> status = "okay"; phy = <&phy0>; phy-mode = "rgmii-id"; + buffer-manager = <&bm>; + bm,pool-long = <0>; + bm,pool-short = <1>; }; diff --git a/Documentation/devicetree/bindings/net/marvell-neta-bm.txt b/Documentation/devicetree/bindings/net/marvell-neta-bm.txt new file mode 100644 index 000000000000..c1b1d7c3bde1 --- /dev/null +++ b/Documentation/devicetree/bindings/net/marvell-neta-bm.txt @@ -0,0 +1,49 @@ +* Marvell Armada 380/XP Buffer Manager driver (BM) + +Required properties: + +- compatible: should be "marvell,armada-380-neta-bm". +- reg: address and length of the register set for the device. +- clocks: a pointer to the reference clock for this device. +- internal-mem: a phandle to BM internal SRAM definition. + +Optional properties (port): + +- pool<0 : 3>,capacity: size of external buffer pointers' ring maintained + in DRAM. Can be set for each pool (id 0 : 3) separately. The value has + to be chosen between 128 and 16352 and it also has to be aligned to 32. + Otherwise the driver would adjust a given number or choose default if + not set. +- pool<0 : 3>,pkt-size: maximum size of a packet accepted by a given buffer + pointers' pool (id 0 : 3). It will be taken into consideration only when pool + type is 'short'. For 'long' ones it would be overridden by port's MTU. + If not set a driver will choose a default value. + +In order to see how to hook the BM to a given ethernet port, please +refer to Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt. + +Example: + +- main node: + +bm: bm@c8000 { + compatible = "marvell,armada-380-neta-bm"; + reg = <0xc8000 0xac>; + clocks = <&gateclk 13>; + internal-mem = <&bm_bppi>; + status = "okay"; + pool2,capacity = <4096>; + pool1,pkt-size = <512>; +}; + +- internal SRAM node: + +bm_bppi: bm-bppi { + compatible = "mmio-sram"; + reg = <MBUS_ID(0x0c, 0x04) 0 0x100000>; + ranges = <0 MBUS_ID(0x0c, 0x04) 0 0x100000>; + #address-cells = <1>; + #size-cells = <1>; + clocks = <&gateclk 13>; + status = "okay"; +}; diff --git a/Documentation/devicetree/bindings/net/mediatek-net.txt b/Documentation/devicetree/bindings/net/mediatek-net.txt new file mode 100644 index 000000000000..5ca79290eabf --- /dev/null +++ b/Documentation/devicetree/bindings/net/mediatek-net.txt @@ -0,0 +1,77 @@ +MediaTek Frame Engine Ethernet controller +========================================= + +The frame engine ethernet controller can be found on MediaTek SoCs. These SoCs +have dual GMAC each represented by a child node.. + +* Ethernet controller node + +Required properties: +- compatible: Should be "mediatek,mt7623-eth" +- reg: Address and length of the register set for the device +- interrupts: Should contain the frame engines interrupt +- clocks: the clock used by the core +- clock-names: the names of the clock listed in the clocks property. These are + "ethif", "esw", "gp2", "gp1" +- power-domains: phandle to the power domain that the ethernet is part of +- resets: Should contain a phandle to the ethsys reset signal +- reset-names: Should contain the reset signal name "eth" +- mediatek,ethsys: phandle to the syscon node that handles the port setup +- mediatek,pctl: phandle to the syscon node that handles the ports slew rate + and driver current + +Optional properties: +- interrupt-parent: Should be the phandle for the interrupt controller + that services interrupts for this device + + +* Ethernet MAC node + +Required properties: +- compatible: Should be "mediatek,eth-mac" +- reg: The number of the MAC +- phy-handle: see ethernet.txt file in the same directory. + +Example: + +eth: ethernet@1b100000 { + compatible = "mediatek,mt7623-eth"; + reg = <0 0x1b100000 0 0x20000>; + clocks = <&topckgen CLK_TOP_ETHIF_SEL>, + <ðsys CLK_ETHSYS_ESW>, + <ðsys CLK_ETHSYS_GP2>, + <ðsys CLK_ETHSYS_GP1>; + clock-names = "ethif", "esw", "gp2", "gp1"; + interrupts = <GIC_SPI 200 IRQ_TYPE_LEVEL_LOW>; + power-domains = <&scpsys MT2701_POWER_DOMAIN_ETH>; + resets = <ðsys MT2701_ETHSYS_ETH_RST>; + reset-names = "eth"; + mediatek,ethsys = <ðsys>; + mediatek,pctl = <&syscfg_pctl_a>; + #address-cells = <1>; + #size-cells = <0>; + + gmac1: mac@0 { + compatible = "mediatek,eth-mac"; + reg = <0>; + phy-handle = <&phy0>; + }; + + gmac2: mac@1 { + compatible = "mediatek,eth-mac"; + reg = <1>; + phy-handle = <&phy1>; + }; + + mdio-bus { + phy0: ethernet-phy@0 { + reg = <0>; + phy-mode = "rgmii"; + }; + + phy1: ethernet-phy@1 { + reg = <1>; + phy-mode = "rgmii"; + }; + }; +}; diff --git a/Documentation/devicetree/bindings/net/micrel-ks8995.txt b/Documentation/devicetree/bindings/net/micrel-ks8995.txt new file mode 100644 index 000000000000..281bc2498d12 --- /dev/null +++ b/Documentation/devicetree/bindings/net/micrel-ks8995.txt @@ -0,0 +1,20 @@ +Micrel KS8995 SPI controlled Ethernet Switch families + +Required properties (according to spi-bus.txt): +- compatible: either "micrel,ks8995", "micrel,ksz8864" or "micrel,ksz8795" + +Optional properties: +- reset-gpios : phandle of gpio that will be used to reset chip during probe + +Example: + +spi-master { + ... + switch@0 { + compatible = "micrel,ksz8795"; + + reg = <0>; + spi-max-frequency = <50000000>; + reset-gpios = <&gpio0 46 GPIO_ACTIVE_LOW>; + }; +}; diff --git a/Documentation/devicetree/bindings/net/stmmac.txt b/Documentation/devicetree/bindings/net/stmmac.txt index e862a922bd3f..6605d19601c2 100644 --- a/Documentation/devicetree/bindings/net/stmmac.txt +++ b/Documentation/devicetree/bindings/net/stmmac.txt @@ -17,7 +17,25 @@ Required properties: The 1st cell is reset pre-delay in micro seconds. The 2nd cell is reset pulse in micro seconds. The 3rd cell is reset post-delay in micro seconds. + +Optional properties: +- resets: Should contain a phandle to the STMMAC reset signal, if any +- reset-names: Should contain the reset signal name "stmmaceth", if a + reset phandle is given +- max-frame-size: See ethernet.txt file in the same directory +- clocks: If present, the first clock should be the GMAC main clock and + the second clock should be peripheral's register interface clock. Further + clocks may be specified in derived bindings. +- clock-names: One name for each entry in the clocks property, the + first one should be "stmmaceth" and the second one should be "pclk". +- clk_ptp_ref: this is the PTP reference clock; in case of the PTP is + available this clock is used for programming the Timestamp Addend Register. + If not passed then the system clock will be used and this is fine on some + platforms. +- tx-fifo-depth: See ethernet.txt file in the same directory +- rx-fifo-depth: See ethernet.txt file in the same directory - snps,pbl Programmable Burst Length +- snps,aal Address-Aligned Beats - snps,fixed-burst Program the DMA to use the fixed burst mode - snps,mixed-burst Program the DMA to use the mixed burst mode - snps,force_thresh_dma_mode Force DMA to use the threshold mode for @@ -29,27 +47,28 @@ Required properties: supported by this device instance - snps,perfect-filter-entries: Number of perfect filter entries supported by this device instance - -Optional properties: -- resets: Should contain a phandle to the STMMAC reset signal, if any -- reset-names: Should contain the reset signal name "stmmaceth", if a - reset phandle is given -- max-frame-size: See ethernet.txt file in the same directory -- clocks: If present, the first clock should be the GMAC main clock - The optional second clock should be peripheral's register interface clock. - The third optional clock should be the ptp reference clock. - Further clocks may be specified in derived bindings. -- clock-names: One name for each entry in the clocks property. - The first one should be "stmmaceth". - The optional second one should be "pclk". - The optional third one should be "clk_ptp_ref". -- snps,burst_len: The AXI burst lenth value of the AXI BUS MODE register. -- tx-fifo-depth: See ethernet.txt file in the same directory -- rx-fifo-depth: See ethernet.txt file in the same directory +- AXI BUS Mode parameters: below the list of all the parameters to program the + AXI register inside the DMA module: + - snps,lpi_en: enable Low Power Interface + - snps,xit_frm: unlock on WoL + - snps,wr_osr_lmt: max write oustanding req. limit + - snps,rd_osr_lmt: max read oustanding req. limit + - snps,kbbe: do not cross 1KiB boundary. + - snps,axi_all: align address + - snps,blen: this is a vector of supported burst length. + - snps,fb: fixed-burst + - snps,mb: mixed-burst + - snps,rb: rebuild INCRx Burst - mdio: with compatible = "snps,dwmac-mdio", create and register mdio bus. Examples: + stmmac_axi_setup: stmmac-axi-config { + snps,wr_osr_lmt = <0xf>; + snps,rd_osr_lmt = <0xf>; + snps,blen = <256 128 64 32 0 0 0>; + }; + gmac0: ethernet@e0800000 { compatible = "st,spear600-gmac"; reg = <0xe0800000 0x8000>; @@ -65,6 +84,7 @@ Examples: tx-fifo-depth = <16384>; clocks = <&clock>; clock-names = "stmmaceth"; + snps,axi-config = <&stmmac_axi_setup>; mdio0 { #address-cells = <1>; #size-cells = <0>; diff --git a/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.txt b/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.txt index edefc26c6204..96aae6b4f736 100644 --- a/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.txt +++ b/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.txt @@ -1,17 +1,46 @@ * Qualcomm Atheros ath10k wireless devices -For ath10k devices the calibration data can be provided through Device -Tree. The node is a child node of the PCI controller. - Required properties: --compatible : Should be "qcom,ath10k" +- compatible: Should be one of the following: + * "qcom,ath10k" + * "qcom,ipq4019-wifi" + +PCI based devices uses compatible string "qcom,ath10k" and takes only +calibration data via "qcom,ath10k-calibration-data". Rest of the properties +are not applicable for PCI based devices. + +AHB based devices (i.e. ipq4019) uses compatible string "qcom,ipq4019-wifi" +and also uses most of the properties defined in this doc. Optional properties: +- reg: Address and length of the register set for the device. +- resets: Must contain an entry for each entry in reset-names. + See ../reset/reseti.txt for details. +- reset-names: Must include the list of following reset names, + "wifi_cpu_init" + "wifi_radio_srif" + "wifi_radio_warm" + "wifi_radio_cold" + "wifi_core_warm" + "wifi_core_cold" +- clocks: List of clock specifiers, must contain an entry for each required + entry in clock-names. +- clock-names: Should contain the clock names "wifi_wcss_cmd", "wifi_wcss_ref", + "wifi_wcss_rtc". +- interrupts: List of interrupt lines. Must contain an entry + for each entry in the interrupt-names property. +- interrupt-names: Must include the entries for MSI interrupt + names ("msi0" to "msi15") and legacy interrupt + name ("legacy"), +- qcom,msi_addr: MSI interrupt address. +- qcom,msi_base: Base value to add before writing MSI data into + MSI address register. - qcom,ath10k-calibration-data : calibration data as an array, the length can vary between hw versions +Example (to supply the calibration data alone): -Example: +In this example, the node is defined as child node of the PCI controller. pci { pcie@0 { @@ -28,3 +57,53 @@ pci { }; }; }; + +Example (to supply ipq4019 SoC wifi block details): + +wifi0: wifi@a000000 { + compatible = "qcom,ipq4019-wifi"; + reg = <0xa000000 0x200000>; + resets = <&gcc WIFI0_CPU_INIT_RESET>, + <&gcc WIFI0_RADIO_SRIF_RESET>, + <&gcc WIFI0_RADIO_WARM_RESET>, + <&gcc WIFI0_RADIO_COLD_RESET>, + <&gcc WIFI0_CORE_WARM_RESET>, + <&gcc WIFI0_CORE_COLD_RESET>; + reset-names = "wifi_cpu_init", + "wifi_radio_srif", + "wifi_radio_warm", + "wifi_radio_cold", + "wifi_core_warm", + "wifi_core_cold"; + clocks = <&gcc GCC_WCSS2G_CLK>, + <&gcc GCC_WCSS2G_REF_CLK>, + <&gcc GCC_WCSS2G_RTC_CLK>; + clock-names = "wifi_wcss_cmd", + "wifi_wcss_ref", + "wifi_wcss_rtc"; + interrupts = <0 0x20 0x1>, + <0 0x21 0x1>, + <0 0x22 0x1>, + <0 0x23 0x1>, + <0 0x24 0x1>, + <0 0x25 0x1>, + <0 0x26 0x1>, + <0 0x27 0x1>, + <0 0x28 0x1>, + <0 0x29 0x1>, + <0 0x2a 0x1>, + <0 0x2b 0x1>, + <0 0x2c 0x1>, + <0 0x2d 0x1>, + <0 0x2e 0x1>, + <0 0x2f 0x1>, + <0 0xa8 0x0>; + interrupt-names = "msi0", "msi1", "msi2", "msi3", + "msi4", "msi5", "msi6", "msi7", + "msi8", "msi9", "msi10", "msi11", + "msi12", "msi13", "msi14", "msi15", + "legacy"; + qcom,msi_addr = <0x0b006040>; + qcom,msi_base = <0x40>; + qcom,ath10k-calibration-data = [ 01 02 03 ... ]; +}; diff --git a/Documentation/devicetree/bindings/net/wireless/ti,wlcore,spi.txt b/Documentation/devicetree/bindings/net/wireless/ti,wlcore,spi.txt new file mode 100644 index 000000000000..9180724e182c --- /dev/null +++ b/Documentation/devicetree/bindings/net/wireless/ti,wlcore,spi.txt @@ -0,0 +1,36 @@ +* Texas Instruments wl1271 wireless lan controller + +The wl1271 chip can be connected via SPI or via SDIO. This +document describes the binding for the SPI connected chip. + +Required properties: +- compatible : Should be "ti,wl1271" +- reg : Chip select address of device +- spi-max-frequency : Maximum SPI clocking speed of device in Hz +- ref-clock-frequency : Reference clock frequency +- interrupt-parent, interrupts : + Should contain parameters for 1 interrupt line. + Interrupt parameters: parent, line number, type. +- vwlan-supply : Point the node of the regulator that powers/enable the wl1271 chip + +Optional properties: +- clock-xtal : boolean, clock is generated from XTAL + +- Please consult Documentation/devicetree/bindings/spi/spi-bus.txt + for optional SPI connection related properties, + +Examples: + +&spi1 { + wl1271@1 { + compatible = "ti,wl1271"; + + reg = <1>; + spi-max-frequency = <48000000>; + clock-xtal; + ref-clock-frequency = <38400000>; + interrupt-parent = <&gpio3>; + interrupts = <8 IRQ_TYPE_LEVEL_HIGH>; + vwlan-supply = <&vwlan_fixed>; + }; +}; diff --git a/Documentation/devicetree/bindings/sram/sram.txt b/Documentation/devicetree/bindings/sram/sram.txt index 42ee9438b771..227e3a341af1 100644 --- a/Documentation/devicetree/bindings/sram/sram.txt +++ b/Documentation/devicetree/bindings/sram/sram.txt @@ -25,6 +25,11 @@ Required properties in the sram node: - ranges : standard definition, should translate from local addresses within the sram to bus addresses +Optional properties in the sram node: + +- no-memory-wc : the flag indicating, that SRAM memory region has not to + be remapped as write combining. WC is used by default. + Required properties in the area nodes: - reg : iomem address range, relative to the SRAM range diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt b/Documentation/devicetree/bindings/vendor-prefixes.txt index ee66defcdd8b..156731cc649c 100644 --- a/Documentation/devicetree/bindings/vendor-prefixes.txt +++ b/Documentation/devicetree/bindings/vendor-prefixes.txt @@ -112,6 +112,7 @@ hp Hewlett Packard i2se I2SE GmbH ibm International Business Machines (IBM) idt Integrated Device Technologies, Inc. +ifi Ingenieurburo Fur Ic-Technologie (I/F/I) iom Iomega Corporation img Imagination Technologies Ltd. ingenic Ingenic Semiconductor diff --git a/Documentation/networking/00-INDEX b/Documentation/networking/00-INDEX index df27a1a50776..415154a487d0 100644 --- a/Documentation/networking/00-INDEX +++ b/Documentation/networking/00-INDEX @@ -44,6 +44,8 @@ can.txt - documentation on CAN protocol family. cdc_mbim.txt - 3G/LTE USB modem (Mobile Broadband Interface Model) +checksum-offloads.txt + - Explanation of checksum offloads; LCO, RCO cops.txt - info on the COPS LocalTalk Linux driver cs89x0.txt diff --git a/Documentation/networking/batman-adv.txt b/Documentation/networking/batman-adv.txt index ff23b755f5e4..1b5e7a7f2185 100644 --- a/Documentation/networking/batman-adv.txt +++ b/Documentation/networking/batman-adv.txt @@ -187,7 +187,7 @@ interfaces to the kernel module settings. For more information, please see the manpage (man batctl). -batctl is available on http://www.open-mesh.org/ +batctl is available on https://www.open-mesh.org/ CONTACT diff --git a/Documentation/networking/checksum-offloads.txt b/Documentation/networking/checksum-offloads.txt new file mode 100644 index 000000000000..de2a327766a7 --- /dev/null +++ b/Documentation/networking/checksum-offloads.txt @@ -0,0 +1,119 @@ +Checksum Offloads in the Linux Networking Stack + + +Introduction +============ + +This document describes a set of techniques in the Linux networking stack + to take advantage of checksum offload capabilities of various NICs. + +The following technologies are described: + * TX Checksum Offload + * LCO: Local Checksum Offload + * RCO: Remote Checksum Offload + +Things that should be documented here but aren't yet: + * RX Checksum Offload + * CHECKSUM_UNNECESSARY conversion + + +TX Checksum Offload +=================== + +The interface for offloading a transmit checksum to a device is explained + in detail in comments near the top of include/linux/skbuff.h. +In brief, it allows to request the device fill in a single ones-complement + checksum defined by the sk_buff fields skb->csum_start and + skb->csum_offset. The device should compute the 16-bit ones-complement + checksum (i.e. the 'IP-style' checksum) from csum_start to the end of the + packet, and fill in the result at (csum_start + csum_offset). +Because csum_offset cannot be negative, this ensures that the previous + value of the checksum field is included in the checksum computation, thus + it can be used to supply any needed corrections to the checksum (such as + the sum of the pseudo-header for UDP or TCP). +This interface only allows a single checksum to be offloaded. Where + encapsulation is used, the packet may have multiple checksum fields in + different header layers, and the rest will have to be handled by another + mechanism such as LCO or RCO. +No offloading of the IP header checksum is performed; it is always done in + software. This is OK because when we build the IP header, we obviously + have it in cache, so summing it isn't expensive. It's also rather short. +The requirements for GSO are more complicated, because when segmenting an + encapsulated packet both the inner and outer checksums may need to be + edited or recomputed for each resulting segment. See the skbuff.h comment + (section 'E') for more details. + +A driver declares its offload capabilities in netdev->hw_features; see + Documentation/networking/netdev-features for more. Note that a device + which only advertises NETIF_F_IP[V6]_CSUM must still obey the csum_start + and csum_offset given in the SKB; if it tries to deduce these itself in + hardware (as some NICs do) the driver should check that the values in the + SKB match those which the hardware will deduce, and if not, fall back to + checksumming in software instead (with skb_checksum_help or one of the + skb_csum_off_chk* functions as mentioned in include/linux/skbuff.h). This + is a pain, but that's what you get when hardware tries to be clever. + +The stack should, for the most part, assume that checksum offload is + supported by the underlying device. The only place that should check is + validate_xmit_skb(), and the functions it calls directly or indirectly. + That function compares the offload features requested by the SKB (which + may include other offloads besides TX Checksum Offload) and, if they are + not supported or enabled on the device (determined by netdev->features), + performs the corresponding offload in software. In the case of TX + Checksum Offload, that means calling skb_checksum_help(skb). + + +LCO: Local Checksum Offload +=========================== + +LCO is a technique for efficiently computing the outer checksum of an + encapsulated datagram when the inner checksum is due to be offloaded. +The ones-complement sum of a correctly checksummed TCP or UDP packet is + equal to the sum of the pseudo header, because everything else gets + 'cancelled out' by the checksum field. This is because the sum was + complemented before being written to the checksum field. +More generally, this holds in any case where the 'IP-style' ones complement + checksum is used, and thus any checksum that TX Checksum Offload supports. +That is, if we have set up TX Checksum Offload with a start/offset pair, we + know that _after the device has filled in that checksum_, the ones + complement sum from csum_start to the end of the packet will be equal to + _whatever value we put in the checksum field beforehand_. This allows us + to compute the outer checksum without looking at the payload: we simply + stop summing when we get to csum_start, then add the 16-bit word at + (csum_start + csum_offset). +Then, when the true inner checksum is filled in (either by hardware or by + skb_checksum_help()), the outer checksum will become correct by virtue of + the arithmetic. + +LCO is performed by the stack when constructing an outer UDP header for an + encapsulation such as VXLAN or GENEVE, in udp_set_csum(). Similarly for + the IPv6 equivalents, in udp6_set_csum(). +It is also performed when constructing an IPv4 GRE header, in + net/ipv4/ip_gre.c:build_header(). It is *not* currently performed when + constructing an IPv6 GRE header; the GRE checksum is computed over the + whole packet in net/ipv6/ip6_gre.c:ip6gre_xmit2(), but it should be + possible to use LCO here as IPv6 GRE still uses an IP-style checksum. +All of the LCO implementations use a helper function lco_csum(), in + include/linux/skbuff.h. + +LCO can safely be used for nested encapsulations; in this case, the outer + encapsulation layer will sum over both its own header and the 'middle' + header. This does mean that the 'middle' header will get summed multiple + times, but there doesn't seem to be a way to avoid that without incurring + bigger costs (e.g. in SKB bloat). + + +RCO: Remote Checksum Offload +============================ + +RCO is a technique for eliding the inner checksum of an encapsulated + datagram, allowing the outer checksum to be offloaded. It does, however, + involve a change to the encapsulation protocols, which the receiver must + also support. For this reason, it is disabled by default. +RCO is detailed in the following Internet-Drafts: +https://tools.ietf.org/html/draft-herbert-remotecsumoffload-00 +https://tools.ietf.org/html/draft-herbert-vxlan-rco-00 +In Linux, RCO is implemented individually in each encapsulation protocol, + and most tunnel types have flags controlling its use. For instance, VXLAN + has the flag VXLAN_F_REMCSUM_TX (per struct vxlan_rdst) to indicate that + RCO should be used when transmitting to a given remote destination. diff --git a/Documentation/networking/dsa/dsa.txt b/Documentation/networking/dsa/dsa.txt index aa9c1f9313cd..3b196c304b73 100644 --- a/Documentation/networking/dsa/dsa.txt +++ b/Documentation/networking/dsa/dsa.txt @@ -521,20 +521,17 @@ See Documentation/hwmon/sysfs-interface for details. Bridge layer ------------ -- port_join_bridge: bridge layer function invoked when a given switch port is +- port_bridge_join: bridge layer function invoked when a given switch port is added to a bridge, this function should be doing the necessary at the switch level to permit the joining port from being added to the relevant logical - domain for it to ingress/egress traffic with other members of the bridge. DSA - does nothing but calculate a bitmask of switch ports currently members of the - specified bridge being requested the join + domain for it to ingress/egress traffic with other members of the bridge. -- port_leave_bridge: bridge layer function invoked when a given switch port is +- port_bridge_leave: bridge layer function invoked when a given switch port is removed from a bridge, this function should be doing the necessary at the switch level to deny the leaving port from ingress/egress traffic from the remaining bridge members. When the port leaves the bridge, it should be aged out at the switch hardware for the switch to (re) learn MAC addresses behind - this port. DSA calculates the bitmask of ports still members of the bridge - being left + this port. - port_stp_update: bridge layer function invoked when a given switch port STP state is computed by the bridge layer and should be propagated to switch @@ -545,20 +542,15 @@ Bridge layer Bridge VLAN filtering --------------------- -- port_pvid_get: bridge layer function invoked when a Port-based VLAN ID is - queried for the given switch port - -- port_pvid_set: bridge layer function invoked when a Port-based VLAN ID needs - to be configured on the given switch port - - port_vlan_add: bridge layer function invoked when a VLAN is configured (tagged or untagged) for the given switch port - port_vlan_del: bridge layer function invoked when a VLAN is removed from the given switch port -- vlan_getnext: bridge layer function invoked to query the next configured VLAN - in the switch, i.e. returns the bitmaps of members and untagged ports +- port_vlan_dump: bridge layer function invoked with a switchdev callback + function that the driver has to call for each VLAN the given port is a member + of. A switchdev object is used to carry the VID and bridge flags. - port_fdb_add: bridge layer function invoked when the bridge wants to install a Forwarding Database entry, the switch hardware should be programmed with the diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 73b36d7c7b0d..d5df40c75aa4 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -1216,6 +1216,19 @@ promote_secondaries - BOOLEAN promote a corresponding secondary IP address instead of removing all the corresponding secondary IP addresses. +drop_unicast_in_l2_multicast - BOOLEAN + Drop any unicast IP packets that are received in link-layer + multicast (or broadcast) frames. + This behavior (for multicast) is actually a SHOULD in RFC + 1122, but is disabled by default for compatibility reasons. + Default: off (0) + +drop_gratuitous_arp - BOOLEAN + Drop all gratuitous ARP frames, for example if there's a known + good ARP proxy on the network and such frames need not be used + (or in the case of 802.11, must not be used to prevent attacks.) + Default: off (0) + tag - INTEGER Allows you to write a number, which can be used as required. @@ -1550,6 +1563,15 @@ temp_prefered_lft - INTEGER Preferred lifetime (in seconds) for temporary addresses. Default: 86400 (1 day) +keep_addr_on_down - INTEGER + Keep all IPv6 addresses on an interface down event. If set static + global addresses with no expiration time are not flushed. + >0 : enabled + 0 : system default + <0 : disabled + + Default: 0 (addresses are removed) + max_desync_factor - INTEGER Maximum value for DESYNC_FACTOR, which is a random value that ensures that clients don't synchronize with each @@ -1661,6 +1683,19 @@ stable_secret - IPv6 address By default the stable secret is unset. +drop_unicast_in_l2_multicast - BOOLEAN + Drop any unicast IPv6 packets that are received in link-layer + multicast (or broadcast) frames. + + By default this is turned off. + +drop_unsolicited_na - BOOLEAN + Drop all unsolicited neighbor advertisements, for example if there's + a known good NA proxy on the network and such frames need not be used + (or in the case of 802.11, must not be used to prevent attacks.) + + By default this is turned off. + icmp/*: ratelimit - INTEGER Limit the maximal rates for sending ICMPv6 packets. diff --git a/Documentation/networking/kcm.txt b/Documentation/networking/kcm.txt new file mode 100644 index 000000000000..3476ede5bc2c --- /dev/null +++ b/Documentation/networking/kcm.txt @@ -0,0 +1,285 @@ +Kernel Connection Mulitplexor +----------------------------- + +Kernel Connection Multiplexor (KCM) is a mechanism that provides a message based +interface over TCP for generic application protocols. With KCM an application +can efficiently send and receive application protocol messages over TCP using +datagram sockets. + +KCM implements an NxM multiplexor in the kernel as diagrammed below: + ++------------+ +------------+ +------------+ +------------+ +| KCM socket | | KCM socket | | KCM socket | | KCM socket | ++------------+ +------------+ +------------+ +------------+ + | | | | + +-----------+ | | +----------+ + | | | | + +----------------------------------+ + | Multiplexor | + +----------------------------------+ + | | | | | + +---------+ | | | ------------+ + | | | | | ++----------+ +----------+ +----------+ +----------+ +----------+ +| Psock | | Psock | | Psock | | Psock | | Psock | ++----------+ +----------+ +----------+ +----------+ +----------+ + | | | | | ++----------+ +----------+ +----------+ +----------+ +----------+ +| TCP sock | | TCP sock | | TCP sock | | TCP sock | | TCP sock | ++----------+ +----------+ +----------+ +----------+ +----------+ + +KCM sockets +----------- + +The KCM sockets provide the user interface to the muliplexor. All the KCM sockets +bound to a multiplexor are considered to have equivalent function, and I/O +operations in different sockets may be done in parallel without the need for +synchronization between threads in userspace. + +Multiplexor +----------- + +The multiplexor provides the message steering. In the transmit path, messages +written on a KCM socket are sent atomically on an appropriate TCP socket. +Similarly, in the receive path, messages are constructed on each TCP socket +(Psock) and complete messages are steered to a KCM socket. + +TCP sockets & Psocks +-------------------- + +TCP sockets may be bound to a KCM multiplexor. A Psock structure is allocated +for each bound TCP socket, this structure holds the state for constructing +messages on receive as well as other connection specific information for KCM. + +Connected mode semantics +------------------------ + +Each multiplexor assumes that all attached TCP connections are to the same +destination and can use the different connections for load balancing when +transmitting. The normal send and recv calls (include sendmmsg and recvmmsg) +can be used to send and receive messages from the KCM socket. + +Socket types +------------ + +KCM supports SOCK_DGRAM and SOCK_SEQPACKET socket types. + +Message delineation +------------------- + +Messages are sent over a TCP stream with some application protocol message +format that typically includes a header which frames the messages. The length +of a received message can be deduced from the application protocol header +(often just a simple length field). + +A TCP stream must be parsed to determine message boundaries. Berkeley Packet +Filter (BPF) is used for this. When attaching a TCP socket to a multiplexor a +BPF program must be specified. The program is called at the start of receiving +a new message and is given an skbuff that contains the bytes received so far. +It parses the message header and returns the length of the message. Given this +information, KCM will construct the message of the stated length and deliver it +to a KCM socket. + +TCP socket management +--------------------- + +When a TCP socket is attached to a KCM multiplexor data ready (POLLIN) and +write space available (POLLOUT) events are handled by the multiplexor. If there +is a state change (disconnection) or other error on a TCP socket, an error is +posted on the TCP socket so that a POLLERR event happens and KCM discontinues +using the socket. When the application gets the error notification for a +TCP socket, it should unattach the socket from KCM and then handle the error +condition (the typical response is to close the socket and create a new +connection if necessary). + +KCM limits the maximum receive message size to be the size of the receive +socket buffer on the attached TCP socket (the socket buffer size can be set by +SO_RCVBUF). If the length of a new message reported by the BPF program is +greater than this limit a corresponding error (EMSGSIZE) is posted on the TCP +socket. The BPF program may also enforce a maximum messages size and report an +error when it is exceeded. + +A timeout may be set for assembling messages on a receive socket. The timeout +value is taken from the receive timeout of the attached TCP socket (this is set +by SO_RCVTIMEO). If the timer expires before assembly is complete an error +(ETIMEDOUT) is posted on the socket. + +User interface +============== + +Creating a multiplexor +---------------------- + +A new multiplexor and initial KCM socket is created by a socket call: + + socket(AF_KCM, type, protocol) + + - type is either SOCK_DGRAM or SOCK_SEQPACKET + - protocol is KCMPROTO_CONNECTED + +Cloning KCM sockets +------------------- + +After the first KCM socket is created using the socket call as described +above, additional sockets for the multiplexor can be created by cloning +a KCM socket. This is accomplished by an ioctl on a KCM socket: + + /* From linux/kcm.h */ + struct kcm_clone { + int fd; + }; + + struct kcm_clone info; + + memset(&info, 0, sizeof(info)); + + err = ioctl(kcmfd, SIOCKCMCLONE, &info); + + if (!err) + newkcmfd = info.fd; + +Attach transport sockets +------------------------ + +Attaching of transport sockets to a multiplexor is performed by calling an +ioctl on a KCM socket for the multiplexor. e.g.: + + /* From linux/kcm.h */ + struct kcm_attach { + int fd; + int bpf_fd; + }; + + struct kcm_attach info; + + memset(&info, 0, sizeof(info)); + + info.fd = tcpfd; + info.bpf_fd = bpf_prog_fd; + + ioctl(kcmfd, SIOCKCMATTACH, &info); + +The kcm_attach structure contains: + fd: file descriptor for TCP socket being attached + bpf_prog_fd: file descriptor for compiled BPF program downloaded + +Unattach transport sockets +-------------------------- + +Unattaching a transport socket from a multiplexor is straightforward. An +"unattach" ioctl is done with the kcm_unattach structure as the argument: + + /* From linux/kcm.h */ + struct kcm_unattach { + int fd; + }; + + struct kcm_unattach info; + + memset(&info, 0, sizeof(info)); + + info.fd = cfd; + + ioctl(fd, SIOCKCMUNATTACH, &info); + +Disabling receive on KCM socket +------------------------------- + +A setsockopt is used to disable or enable receiving on a KCM socket. +When receive is disabled, any pending messages in the socket's +receive buffer are moved to other sockets. This feature is useful +if an application thread knows that it will be doing a lot of +work on a request and won't be able to service new messages for a +while. Example use: + + int val = 1; + + setsockopt(kcmfd, SOL_KCM, KCM_RECV_DISABLE, &val, sizeof(val)) + +BFP programs for message delineation +------------------------------------ + +BPF programs can be compiled using the BPF LLVM backend. For exmple, +the BPF program for parsing Thrift is: + + #include "bpf.h" /* for __sk_buff */ + #include "bpf_helpers.h" /* for load_word intrinsic */ + + SEC("socket_kcm") + int bpf_prog1(struct __sk_buff *skb) + { + return load_word(skb, 0) + 4; + } + + char _license[] SEC("license") = "GPL"; + +Use in applications +=================== + +KCM accelerates application layer protocols. Specifically, it allows +applications to use a message based interface for sending and receiving +messages. The kernel provides necessary assurances that messages are sent +and received atomically. This relieves much of the burden applications have +in mapping a message based protocol onto the TCP stream. KCM also make +application layer messages a unit of work in the kernel for the purposes of +steerng and scheduling, which in turn allows a simpler networking model in +multithreaded applications. + +Configurations +-------------- + +In an Nx1 configuration, KCM logically provides multiple socket handles +to the same TCP connection. This allows parallelism between in I/O +operations on the TCP socket (for instance copyin and copyout of data is +parallelized). In an application, a KCM socket can be opened for each +processing thread and inserted into the epoll (similar to how SO_REUSEPORT +is used to allow multiple listener sockets on the same port). + +In a MxN configuration, multiple connections are established to the +same destination. These are used for simple load balancing. + +Message batching +---------------- + +The primary purpose of KCM is load balancing between KCM sockets and hence +threads in a nominal use case. Perfect load balancing, that is steering +each received message to a different KCM socket or steering each sent +message to a different TCP socket, can negatively impact performance +since this doesn't allow for affinities to be established. Balancing +based on groups, or batches of messages, can be beneficial for performance. + +On transmit, there are three ways an application can batch (pipeline) +messages on a KCM socket. + 1) Send multiple messages in a single sendmmsg. + 2) Send a group of messages each with a sendmsg call, where all messages + except the last have MSG_BATCH in the flags of sendmsg call. + 3) Create "super message" composed of multiple messages and send this + with a single sendmsg. + +On receive, the KCM module attempts to queue messages received on the +same KCM socket during each TCP ready callback. The targeted KCM socket +changes at each receive ready callback on the KCM socket. The application +does not need to configure this. + +Error handling +-------------- + +An application should include a thread to monitor errors raised on +the TCP connection. Normally, this will be done by placing each +TCP socket attached to a KCM multiplexor in epoll set for POLLERR +event. If an error occurs on an attached TCP socket, KCM sets an EPIPE +on the socket thus waking up the application thread. When the application +sees the error (which may just be a disconnect) it should unattach the +socket from KCM and then close it. It is assumed that once an error is +posted on the TCP socket the data stream is unrecoverable (i.e. an error +may have occurred in in the middle of receiving a messssge). + +TCP connection monitoring +------------------------- + +In KCM there is no means to correlate a message to the TCP socket that +was used to send or receive the message (except in the case there is +only one attached TCP socket). However, the application does retain +an open file descriptor to the socket so it will be able to get statistics +from the socket which can be used in detecting issues (such as high +retransmissions on the socket). diff --git a/Documentation/networking/mac80211-injection.txt b/Documentation/networking/mac80211-injection.txt index 3a930072b161..ec8f934c2eb2 100644 --- a/Documentation/networking/mac80211-injection.txt +++ b/Documentation/networking/mac80211-injection.txt @@ -28,6 +28,23 @@ radiotap headers and used to control injection: IEEE80211_RADIOTAP_F_TX_NOACK: frame should be sent without waiting for an ACK even if it is a unicast frame + * IEEE80211_RADIOTAP_RATE + + legacy rate for the transmission (only for devices without own rate control) + + * IEEE80211_RADIOTAP_MCS + + HT rate for the transmission (only for devices without own rate control). + Also some flags are parsed + + IEEE80211_TX_RC_SHORT_GI: use short guard interval + IEEE80211_TX_RC_40_MHZ_WIDTH: send in HT40 mode + + * IEEE80211_RADIOTAP_DATA_RETRIES + + number of retries when either IEEE80211_RADIOTAP_RATE or + IEEE80211_RADIOTAP_MCS was used + The injection code can also skip all other currently defined radiotap fields facilitating replay of captured radiotap headers directly. diff --git a/Documentation/networking/netlink_mmap.txt b/Documentation/networking/netlink_mmap.txt deleted file mode 100644 index 54f10478e8e3..000000000000 --- a/Documentation/networking/netlink_mmap.txt +++ /dev/null @@ -1,332 +0,0 @@ -This file documents how to use memory mapped I/O with netlink. - -Author: Patrick McHardy <kaber@trash.net> - -Overview --------- - -Memory mapped netlink I/O can be used to increase throughput and decrease -overhead of unicast receive and transmit operations. Some netlink subsystems -require high throughput, these are mainly the netfilter subsystems -nfnetlink_queue and nfnetlink_log, but it can also help speed up large -dump operations of f.i. the routing database. - -Memory mapped netlink I/O used two circular ring buffers for RX and TX which -are mapped into the processes address space. - -The RX ring is used by the kernel to directly construct netlink messages into -user-space memory without copying them as done with regular socket I/O, -additionally as long as the ring contains messages no recvmsg() or poll() -syscalls have to be issued by user-space to get more message. - -The TX ring is used to process messages directly from user-space memory, the -kernel processes all messages contained in the ring using a single sendmsg() -call. - -Usage overview --------------- - -In order to use memory mapped netlink I/O, user-space needs three main changes: - -- ring setup -- conversion of the RX path to get messages from the ring instead of recvmsg() -- conversion of the TX path to construct messages into the ring - -Ring setup is done using setsockopt() to provide the ring parameters to the -kernel, then a call to mmap() to map the ring into the processes address space: - -- setsockopt(fd, SOL_NETLINK, NETLINK_RX_RING, ¶ms, sizeof(params)); -- setsockopt(fd, SOL_NETLINK, NETLINK_TX_RING, ¶ms, sizeof(params)); -- ring = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0) - -Usage of either ring is optional, but even if only the RX ring is used the -mapping still needs to be writable in order to update the frame status after -processing. - -Conversion of the reception path involves calling poll() on the file -descriptor, once the socket is readable the frames from the ring are -processed in order until no more messages are available, as indicated by -a status word in the frame header. - -On kernel side, in order to make use of memory mapped I/O on receive, the -originating netlink subsystem needs to support memory mapped I/O, otherwise -it will use an allocated socket buffer as usual and the contents will be - copied to the ring on transmission, nullifying most of the performance gains. -Dumps of kernel databases automatically support memory mapped I/O. - -Conversion of the transmit path involves changing message construction to -use memory from the TX ring instead of (usually) a buffer declared on the -stack and setting up the frame header appropriately. Optionally poll() can -be used to wait for free frames in the TX ring. - -Structured and definitions for using memory mapped I/O are contained in -<linux/netlink.h>. - -RX and TX rings ----------------- - -Each ring contains a number of continuous memory blocks, containing frames of -fixed size dependent on the parameters used for ring setup. - -Ring: [ block 0 ] - [ frame 0 ] - [ frame 1 ] - [ block 1 ] - [ frame 2 ] - [ frame 3 ] - ... - [ block n ] - [ frame 2 * n ] - [ frame 2 * n + 1 ] - -The blocks are only visible to the kernel, from the point of view of user-space -the ring just contains the frames in a continuous memory zone. - -The ring parameters used for setting up the ring are defined as follows: - -struct nl_mmap_req { - unsigned int nm_block_size; - unsigned int nm_block_nr; - unsigned int nm_frame_size; - unsigned int nm_frame_nr; -}; - -Frames are grouped into blocks, where each block is a continuous region of memory -and holds nm_block_size / nm_frame_size frames. The total number of frames in -the ring is nm_frame_nr. The following invariants hold: - -- frames_per_block = nm_block_size / nm_frame_size - -- nm_frame_nr = frames_per_block * nm_block_nr - -Some parameters are constrained, specifically: - -- nm_block_size must be a multiple of the architectures memory page size. - The getpagesize() function can be used to get the page size. - -- nm_frame_size must be equal or larger to NL_MMAP_HDRLEN, IOW a frame must be - able to hold at least the frame header - -- nm_frame_size must be smaller or equal to nm_block_size - -- nm_frame_size must be a multiple of NL_MMAP_MSG_ALIGNMENT - -- nm_frame_nr must equal the actual number of frames as specified above. - -When the kernel can't allocate physically continuous memory for a ring block, -it will fall back to use physically discontinuous memory. This might affect -performance negatively, in order to avoid this the nm_frame_size parameter -should be chosen to be as small as possible for the required frame size and -the number of blocks should be increased instead. - -Ring frames ------------- - -Each frames contain a frame header, consisting of a synchronization word and some -meta-data, and the message itself. - -Frame: [ header message ] - -The frame header is defined as follows: - -struct nl_mmap_hdr { - unsigned int nm_status; - unsigned int nm_len; - __u32 nm_group; - /* credentials */ - __u32 nm_pid; - __u32 nm_uid; - __u32 nm_gid; -}; - -- nm_status is used for synchronizing processing between the kernel and user- - space and specifies ownership of the frame as well as the operation to perform - -- nm_len contains the length of the message contained in the data area - -- nm_group specified the destination multicast group of message - -- nm_pid, nm_uid and nm_gid contain the netlink pid, UID and GID of the sending - process. These values correspond to the data available using SOCK_PASSCRED in - the SCM_CREDENTIALS cmsg. - -The possible values in the status word are: - -- NL_MMAP_STATUS_UNUSED: - RX ring: frame belongs to the kernel and contains no message - for user-space. Approriate action is to invoke poll() - to wait for new messages. - - TX ring: frame belongs to user-space and can be used for - message construction. - -- NL_MMAP_STATUS_RESERVED: - RX ring only: frame is currently used by the kernel for message - construction and contains no valid message yet. - Appropriate action is to invoke poll() to wait for - new messages. - -- NL_MMAP_STATUS_VALID: - RX ring: frame contains a valid message. Approriate action is - to process the message and release the frame back to - the kernel by setting the status to - NL_MMAP_STATUS_UNUSED or queue the frame by setting the - status to NL_MMAP_STATUS_SKIP. - - TX ring: the frame contains a valid message from user-space to - be processed by the kernel. After completing processing - the kernel will release the frame back to user-space by - setting the status to NL_MMAP_STATUS_UNUSED. - -- NL_MMAP_STATUS_COPY: - RX ring only: a message is ready to be processed but could not be - stored in the ring, either because it exceeded the - frame size or because the originating subsystem does - not support memory mapped I/O. Appropriate action is - to invoke recvmsg() to receive the message and release - the frame back to the kernel by setting the status to - NL_MMAP_STATUS_UNUSED. - -- NL_MMAP_STATUS_SKIP: - RX ring only: user-space queued the message for later processing, but - processed some messages following it in the ring. The - kernel should skip this frame when looking for unused - frames. - -The data area of a frame begins at a offset of NL_MMAP_HDRLEN relative to the -frame header. - -TX limitations --------------- - -As of Jan 2015 the message is always copied from the ring frame to an -allocated buffer due to unresolved security concerns. -See commit 4682a0358639b29cf ("netlink: Always copy on mmap TX."). - -Example -------- - -Ring setup: - - unsigned int block_size = 16 * getpagesize(); - struct nl_mmap_req req = { - .nm_block_size = block_size, - .nm_block_nr = 64, - .nm_frame_size = 16384, - .nm_frame_nr = 64 * block_size / 16384, - }; - unsigned int ring_size; - void *rx_ring, *tx_ring; - - /* Configure ring parameters */ - if (setsockopt(fd, SOL_NETLINK, NETLINK_RX_RING, &req, sizeof(req)) < 0) - exit(1); - if (setsockopt(fd, SOL_NETLINK, NETLINK_TX_RING, &req, sizeof(req)) < 0) - exit(1) - - /* Calculate size of each individual ring */ - ring_size = req.nm_block_nr * req.nm_block_size; - - /* Map RX/TX rings. The TX ring is located after the RX ring */ - rx_ring = mmap(NULL, 2 * ring_size, PROT_READ | PROT_WRITE, - MAP_SHARED, fd, 0); - if ((long)rx_ring == -1L) - exit(1); - tx_ring = rx_ring + ring_size: - -Message reception: - -This example assumes some ring parameters of the ring setup are available. - - unsigned int frame_offset = 0; - struct nl_mmap_hdr *hdr; - struct nlmsghdr *nlh; - unsigned char buf[16384]; - ssize_t len; - - while (1) { - struct pollfd pfds[1]; - - pfds[0].fd = fd; - pfds[0].events = POLLIN | POLLERR; - pfds[0].revents = 0; - - if (poll(pfds, 1, -1) < 0 && errno != -EINTR) - exit(1); - - /* Check for errors. Error handling omitted */ - if (pfds[0].revents & POLLERR) - <handle error> - - /* If no new messages, poll again */ - if (!(pfds[0].revents & POLLIN)) - continue; - - /* Process all frames */ - while (1) { - /* Get next frame header */ - hdr = rx_ring + frame_offset; - - if (hdr->nm_status == NL_MMAP_STATUS_VALID) { - /* Regular memory mapped frame */ - nlh = (void *)hdr + NL_MMAP_HDRLEN; - len = hdr->nm_len; - - /* Release empty message immediately. May happen - * on error during message construction. - */ - if (len == 0) - goto release; - } else if (hdr->nm_status == NL_MMAP_STATUS_COPY) { - /* Frame queued to socket receive queue */ - len = recv(fd, buf, sizeof(buf), MSG_DONTWAIT); - if (len <= 0) - break; - nlh = buf; - } else - /* No more messages to process, continue polling */ - break; - - process_msg(nlh); -release: - /* Release frame back to the kernel */ - hdr->nm_status = NL_MMAP_STATUS_UNUSED; - - /* Advance frame offset to next frame */ - frame_offset = (frame_offset + frame_size) % ring_size; - } - } - -Message transmission: - -This example assumes some ring parameters of the ring setup are available. -A single message is constructed and transmitted, to send multiple messages -at once they would be constructed in consecutive frames before a final call -to sendto(). - - unsigned int frame_offset = 0; - struct nl_mmap_hdr *hdr; - struct nlmsghdr *nlh; - struct sockaddr_nl addr = { - .nl_family = AF_NETLINK, - }; - - hdr = tx_ring + frame_offset; - if (hdr->nm_status != NL_MMAP_STATUS_UNUSED) - /* No frame available. Use poll() to avoid. */ - exit(1); - - nlh = (void *)hdr + NL_MMAP_HDRLEN; - - /* Build message */ - build_message(nlh); - - /* Fill frame header: length and status need to be set */ - hdr->nm_len = nlh->nlmsg_len; - hdr->nm_status = NL_MMAP_STATUS_VALID; - - if (sendto(fd, NULL, 0, 0, &addr, sizeof(addr)) < 0) - exit(1); - - /* Advance frame offset to next frame */ - frame_offset = (frame_offset + frame_size) % ring_size; diff --git a/Documentation/networking/phy.txt b/Documentation/networking/phy.txt index e839e7efc835..7ab9404a8412 100644 --- a/Documentation/networking/phy.txt +++ b/Documentation/networking/phy.txt @@ -267,13 +267,23 @@ Writing a PHY driver config_intr: Enable or disable interrupts remove: Does any driver take-down ts_info: Queries about the HW timestamping status + match_phy_device: used for Clause 45 capable PHYs to match devices + in package and ensure they are compatible hwtstamp: Set the PHY HW timestamping configuration rxtstamp: Requests a receive timestamp at the PHY level for a 'skb' txtsamp: Requests a transmit timestamp at the PHY level for a 'skb' set_wol: Enable Wake-on-LAN at the PHY level get_wol: Get the Wake-on-LAN status at the PHY level + link_change_notify: called to inform the core is about to change the + link state, can be used to work around bogus PHY between state changes read_mmd_indirect: Read PHY MMD indirect register write_mmd_indirect: Write PHY MMD indirect register + module_info: Get the size and type of an EEPROM contained in an plug-in + module + module_eeprom: Get EEPROM information of a plug-in module + get_sset_count: Get number of strings sets that get_strings will count + get_strings: Get strings from requested objects (statistics) + get_stats: Get the extended statistics from the PHY device Of these, only config_aneg and read_status are required to be assigned by the driver code. The rest are optional. Also, it is diff --git a/Documentation/networking/rds.txt b/Documentation/networking/rds.txt index e1a3d59bbe0f..9d219d856d46 100644 --- a/Documentation/networking/rds.txt +++ b/Documentation/networking/rds.txt @@ -19,9 +19,7 @@ to N*N if you use a connection-oriented socket transport like TCP. RDS is not Infiniband-specific; it was designed to support different transports. The current implementation used to support RDS over TCP as well -as IB. Work is in progress to support RDS over iWARP, and using DCE to -guarantee no dropped packets on Ethernet, it may be possible to use RDS over -UDP in the future. +as IB. The high-level semantics of RDS from the application's point of view are diff --git a/Documentation/rfkill.txt b/Documentation/rfkill.txt index 2ee6ef9a6554..1f0c27049340 100644 --- a/Documentation/rfkill.txt +++ b/Documentation/rfkill.txt @@ -83,6 +83,8 @@ rfkill drivers that control devices that can be hard-blocked unless they also assign the poll_hw_block() callback (then the rfkill core will poll the device). Don't do this unless you cannot get the event in any other way. +RFKill provides per-switch LED triggers, which can be used to drive LEDs +according to the switch state (LED_FULL when blocked, LED_OFF otherwise). 5. Userspace support |