summaryrefslogtreecommitdiff
path: root/Documentation/kernel-hacking
diff options
context:
space:
mode:
authorMauro Carvalho Chehab <mchehab@s-opensource.com>2017-05-11 09:41:52 -0300
committerMauro Carvalho Chehab <mchehab@s-opensource.com>2017-05-16 08:00:57 -0300
commitdca1e58e3f1c82a840abaafb9328f84ae69a9926 (patch)
treecd501e64af77a09459019627d00fd584a7515dcc /Documentation/kernel-hacking
parentc4fcd7cabb83fcac1e83e17845b1fede5401aa25 (diff)
downloadlwn-dca1e58e3f1c82a840abaafb9328f84ae69a9926.tar.gz
lwn-dca1e58e3f1c82a840abaafb9328f84ae69a9926.zip
kernel-hacking: update document
This document is fairly updated. Yet, some stuff moved to other kernel headers. So, update to point to the right places. While here, adjust some minor ReST markups. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Diffstat (limited to 'Documentation/kernel-hacking')
-rw-r--r--Documentation/kernel-hacking/hacking.rst179
1 files changed, 98 insertions, 81 deletions
diff --git a/Documentation/kernel-hacking/hacking.rst b/Documentation/kernel-hacking/hacking.rst
index 75597e627791..1a456b60a7cf 100644
--- a/Documentation/kernel-hacking/hacking.rst
+++ b/Documentation/kernel-hacking/hacking.rst
@@ -56,7 +56,7 @@ interrupts. You can sleep, by calling :c:func:`schedule()`.
In user context, the ``current`` pointer (indicating the task we are
currently executing) is valid, and :c:func:`in_interrupt()`
-(``include/linux/interrupt.h``) is false.
+(``include/linux/preempt.h``) is false.
.. warning::
@@ -114,12 +114,12 @@ time, although different tasklets can run simultaneously.
Kuznetsov had at the time.
You can tell you are in a softirq (or tasklet) using the
-:c:func:`in_softirq()` macro (``include/linux/interrupt.h``).
+:c:func:`in_softirq()` macro (``include/linux/preempt.h``).
.. warning::
- Beware that this will return a false positive if a bh lock (see
- below) is held.
+ Beware that this will return a false positive if a
+ :ref:`botton half lock <local_bh_disable>` is held.
Some Basic Rules
================
@@ -154,9 +154,7 @@ The Linux kernel is portable
ioctls: Not writing a new system call
=====================================
-A system call generally looks like this
-
-::
+A system call generally looks like this::
asmlinkage long sys_mycall(int arg)
{
@@ -175,7 +173,9 @@ If all your routine does is read or write some parameter, consider
implementing a :c:func:`sysfs()` interface instead.
Inside the ioctl you're in user context to a process. When a error
-occurs you return a negated errno (see ``include/linux/errno.h``),
+occurs you return a negated errno (see
+``include/uapi/asm-generic/errno-base.h``,
+``include/uapi/asm-generic/errno.h`` and ``include/linux/errno.h``),
otherwise you return 0.
After you slept you should check if a signal occurred: the Unix/Linux
@@ -195,9 +195,7 @@ some data structure.
If you're doing longer computations: first think userspace. If you
**really** want to do it in kernel you should regularly check if you need
to give up the CPU (remember there is cooperative multitasking per CPU).
-Idiom:
-
-::
+Idiom::
cond_resched(); /* Will sleep */
@@ -231,26 +229,24 @@ Really.
Common Routines
===============
-:c:func:`printk()` ``include/linux/kernel.h``
----------------------------------------------
+:c:func:`printk()`
+------------------
+
+Defined in ``include/linux/printk.h``
:c:func:`printk()` feeds kernel messages to the console, dmesg, and
the syslog daemon. It is useful for debugging and reporting errors, and
can be used inside interrupt context, but use with caution: a machine
which has its console flooded with printk messages is unusable. It uses
a format string mostly compatible with ANSI C printf, and C string
-concatenation to give it a first "priority" argument:
-
-::
+concatenation to give it a first "priority" argument::
printk(KERN_INFO "i = %u\n", i);
-See ``include/linux/kernel.h``; for other ``KERN_`` values; these are
+See ``include/linux/kern_levels.h``; for other ``KERN_`` values; these are
interpreted by syslog as the level. Special case: for printing an IP
-address use
-
-::
+address use::
__be32 ipaddress;
printk(KERN_INFO "my ip: %pI4\n", &ipaddress);
@@ -270,8 +266,10 @@ overruns. Make sure that will be enough.
on top of its printf function: "Printf should not be used for
chit-chat". You should follow that advice.
-:c:func:`copy_[to/from]_user()` / :c:func:`get_user()` / :c:func:`put_user()` ``include/linux/uaccess.h``
----------------------------------------------------------------------------------------------------------
+:c:func:`copy_to_user()` / :c:func:`copy_from_user()` / :c:func:`get_user()` / :c:func:`put_user()`
+---------------------------------------------------------------------------------------------------
+
+Defined in ``include/linux/uaccess.h`` / ``asm/uaccess.h``
**[SLEEPS]**
@@ -297,8 +295,10 @@ The functions may sleep implicitly. This should never be called outside
user context (it makes no sense), with interrupts disabled, or a
spinlock held.
-:c:func:`kmalloc()`/:c:func:`kfree()` ``include/linux/slab.h``
---------------------------------------------------------------
+:c:func:`kmalloc()`/:c:func:`kfree()`
+-------------------------------------
+
+Defined in ``include/linux/slab.h``
**[MAY SLEEP: SEE BELOW]**
@@ -324,9 +324,9 @@ message, then maybe you called a sleeping allocation function from
interrupt context without ``GFP_ATOMIC``. You should really fix that.
Run, don't walk.
-If you are allocating at least ``PAGE_SIZE`` (``include/asm/page.h``)
-bytes, consider using :c:func:`__get_free_pages()`
-(``include/linux/mm.h``). It takes an order argument (0 for page sized,
+If you are allocating at least ``PAGE_SIZE`` (``asm/page.h`` or
+``asm/page_types.h``) bytes, consider using :c:func:`__get_free_pages()`
+(``include/linux/gfp.h``). It takes an order argument (0 for page sized,
1 for double page, 2 for four pages etc.) and the same memory priority
flag word as above.
@@ -344,24 +344,30 @@ routine.
Before inventing your own cache of often-used objects consider using a
slab cache in ``include/linux/slab.h``
-:c:func:`current()` ``include/asm/current.h``
----------------------------------------------
+:c:func:`current()`
+-------------------
+
+Defined in ``include/asm/current.h``
This global variable (really a macro) contains a pointer to the current
task structure, so is only valid in user context. For example, when a
process makes a system call, this will point to the task structure of
the calling process. It is **not NULL** in interrupt context.
-:c:func:`mdelay()`/:c:func:`udelay()` ``include/asm/delay.h`` ``include/linux/delay.h``
----------------------------------------------------------------------------------------
+:c:func:`mdelay()`/:c:func:`udelay()`
+-------------------------------------
+
+Defined in ``include/asm/delay.h`` / ``include/linux/delay.h``
The :c:func:`udelay()` and :c:func:`ndelay()` functions can be
used for small pauses. Do not use large values with them as you risk
overflow - the helper function :c:func:`mdelay()` is useful here, or
consider :c:func:`msleep()`.
-:c:func:`cpu_to_be32()`/:c:func:`be32_to_cpu()`/:c:func:`cpu_to_le32()`/:c:func:`le32_to_cpu()` ``include/asm/byteorder.h``
----------------------------------------------------------------------------------------------------------------------------
+:c:func:`cpu_to_be32()`/:c:func:`be32_to_cpu()`/:c:func:`cpu_to_le32()`/:c:func:`le32_to_cpu()`
+-----------------------------------------------------------------------------------------------
+
+Defined in ``include/asm/byteorder.h``
The :c:func:`cpu_to_be32()` family (where the "32" can be replaced
by 64 or 16, and the "be" can be replaced by "le") are the general way
@@ -375,8 +381,10 @@ to the given type, and return the converted value. The other variation
is the "in-situ" family, such as :c:func:`cpu_to_be32s()`, which
convert value referred to by the pointer, and return void.
-:c:func:`local_irq_save()`/:c:func:`local_irq_restore()` ``include/linux/irqflags.h``
--------------------------------------------------------------------------------------
+:c:func:`local_irq_save()`/:c:func:`local_irq_restore()`
+--------------------------------------------------------
+
+Defined in ``include/linux/irqflags.h``
These routines disable hard interrupts on the local CPU, and restore
them. They are reentrant; saving the previous state in their one
@@ -384,16 +392,23 @@ them. They are reentrant; saving the previous state in their one
enabled, you can simply use :c:func:`local_irq_disable()` and
:c:func:`local_irq_enable()`.
-:c:func:`local_bh_disable()`/:c:func:`local_bh_enable()` ``include/linux/interrupt.h``
---------------------------------------------------------------------------------------
+.. _local_bh_disable:
+
+:c:func:`local_bh_disable()`/:c:func:`local_bh_enable()`
+--------------------------------------------------------
+
+Defined in ``include/linux/bottom_half.h``
+
These routines disable soft interrupts on the local CPU, and restore
them. They are reentrant; if soft interrupts were disabled before, they
will still be disabled after this pair of functions has been called.
They prevent softirqs and tasklets from running on the current CPU.
-:c:func:`smp_processor_id()`() ``include/asm/smp.h``
-----------------------------------------------------
+:c:func:`smp_processor_id()`
+----------------------------
+
+Defined in ``include/linux/smp.h``
:c:func:`get_cpu()` disables preemption (so you won't suddenly get
moved to another CPU) and returns the current processor number, between
@@ -405,8 +420,10 @@ If you know you cannot be preempted by another task (ie. you are in
interrupt context, or have preemption disabled) you can use
smp_processor_id().
-``__init``/``__exit``/``__initdata`` ``include/linux/init.h``
--------------------------------------------------------------
+``__init``/``__exit``/``__initdata``
+------------------------------------
+
+Defined in ``include/linux/init.h``
After boot, the kernel frees up a special section; functions marked with
``__init`` and data structures marked with ``__initdata`` are dropped
@@ -415,10 +432,13 @@ initialization. ``__exit`` is used to declare a function which is only
required on exit: the function will be dropped if this file is not
compiled as a module. See the header file for use. Note that it makes no
sense for a function marked with ``__init`` to be exported to modules
-with :c:func:`EXPORT_SYMBOL()` - this will break.
+with :c:func:`EXPORT_SYMBOL()` or :c:func:`EXPORT_SYMBOL_GPL()`- this
+will break.
+
+:c:func:`__initcall()`/:c:func:`module_init()`
+----------------------------------------------
-:c:func:`__initcall()`/:c:func:`module_init()` ``include/linux/init.h``
------------------------------------------------------------------------
+Defined in ``include/linux/init.h`` / ``include/linux/module.h``
Many parts of the kernel are well served as a module
(dynamically-loadable parts of the kernel). Using the
@@ -438,8 +458,11 @@ to fail (unfortunately, this has no effect if the module is compiled
into the kernel). This function is called in user context with
interrupts enabled, so it can sleep.
-:c:func:`module_exit()` ``include/linux/init.h``
-------------------------------------------------
+:c:func:`module_exit()`
+-----------------------
+
+
+Defined in ``include/linux/module.h``
This macro defines the function to be called at module removal time (or
never, in the case of the file compiled into the kernel). It will only
@@ -450,8 +473,10 @@ it returns.
Note that this macro is optional: if it is not present, your module will
not be removable (except for 'rmmod -f').
-:c:func:`try_module_get()`/:c:func:`module_put()` ``include/linux/module.h``
-----------------------------------------------------------------------------
+:c:func:`try_module_get()`/:c:func:`module_put()`
+-------------------------------------------------
+
+Defined in ``include/linux/module.h``
These manipulate the module usage count, to protect against removal (a
module also can't be removed if another module uses one of its exported
@@ -472,8 +497,8 @@ Wait Queues ``include/linux/wait.h``
A wait queue is used to wait for someone to wake you up when a certain
condition is true. They must be used carefully to ensure there is no
-race condition. You declare a ``wait_queue_head_t``, and then processes
-which want to wait for that condition declare a ``wait_queue_t``
+race condition. You declare a :c:type:`wait_queue_head_t`, and then processes
+which want to wait for that condition declare a :c:type:`wait_queue_t`
referring to themselves, and place that in the queue.
Declaring
@@ -490,15 +515,15 @@ Queuing
Placing yourself in the waitqueue is fairly complex, because you must
put yourself in the queue before checking the condition. There is a
macro to do this: :c:func:`wait_event_interruptible()`
-``include/linux/wait.h`` The first argument is the wait queue head, and
+(``include/linux/wait.h``) The first argument is the wait queue head, and
the second is an expression which is evaluated; the macro returns 0 when
-this expression is true, or -ERESTARTSYS if a signal is received. The
+this expression is true, or ``-ERESTARTSYS`` if a signal is received. The
:c:func:`wait_event()` version ignores signals.
Waking Up Queued Tasks
----------------------
-Call :c:func:`wake_up()` ``include/linux/wait.h``;, which will wake
+Call :c:func:`wake_up()` (``include/linux/wait.h``);, which will wake
up every process in the queue. The exception is if one has
``TASK_EXCLUSIVE`` set, in which case the remainder of the queue will
not be woken. There are other variants of this basic function available
@@ -508,9 +533,9 @@ Atomic Operations
=================
Certain operations are guaranteed atomic on all platforms. The first
-class of operations work on ``atomic_t`` ``include/asm/atomic.h``; this
-contains a signed integer (at least 32 bits long), and you must use
-these functions to manipulate or read atomic_t variables.
+class of operations work on :c:type:`atomic_t` (``include/asm/atomic.h``);
+this contains a signed integer (at least 32 bits long), and you must use
+these functions to manipulate or read :c:type:`atomic_t` variables.
:c:func:`atomic_read()` and :c:func:`atomic_set()` get and set
the counter, :c:func:`atomic_add()`, :c:func:`atomic_sub()`,
:c:func:`atomic_inc()`, :c:func:`atomic_dec()`, and
@@ -534,7 +559,7 @@ true if the bit was previously set; these are particularly useful for
atomically setting flags.
It is possible to call these operations with bit indices greater than
-BITS_PER_LONG. The resulting behavior is strange on big-endian
+``BITS_PER_LONG``. The resulting behavior is strange on big-endian
platforms though so it is a good idea not to do this.
Symbols
@@ -546,14 +571,18 @@ be used anywhere in the kernel). However, for modules, a special
exported symbol table is kept which limits the entry points to the
kernel proper. Modules can also export symbols.
-:c:func:`EXPORT_SYMBOL()` ``include/linux/export.h``
-----------------------------------------------------
+:c:func:`EXPORT_SYMBOL()`
+-------------------------
+
+Defined in ``include/linux/export.h``
This is the classic method of exporting a symbol: dynamically loaded
modules will be able to use the symbol as normal.
-:c:func:`EXPORT_SYMBOL_GPL()` ``include/linux/export.h``
---------------------------------------------------------
+:c:func:`EXPORT_SYMBOL_GPL()`
+-----------------------------
+
+Defined in ``include/linux/export.h``
Similar to :c:func:`EXPORT_SYMBOL()` except that the symbols
exported by :c:func:`EXPORT_SYMBOL_GPL()` can only be seen by
@@ -579,11 +608,11 @@ Return Conventions
------------------
For code called in user context, it's very common to defy C convention,
-and return 0 for success, and a negative error number (eg. -EFAULT) for
+and return 0 for success, and a negative error number (eg. ``-EFAULT``) for
failure. This can be unintuitive at first, but it's fairly widespread in
the kernel.
-Using :c:func:`ERR_PTR()` ``include/linux/err.h``; to encode a
+Using :c:func:`ERR_PTR()` (``include/linux/err.h``) to encode a
negative error number into a pointer, and :c:func:`IS_ERR()` and
:c:func:`PTR_ERR()` to get it back out again: avoids a separate
pointer parameter for the error number. Icky, but in a good way.
@@ -603,9 +632,7 @@ Initializing structure members
------------------------------
The preferred method of initializing structures is to use designated
-initialisers, as defined by ISO C99, eg:
-
-::
+initialisers, as defined by ISO C99, eg::
static struct block_device_operations opt_fops = {
.open = opt_open,
@@ -716,18 +743,14 @@ Kernel Cantrips
Some favorites from browsing the source. Feel free to add to this list.
-``arch/x86/include/asm/delay.h:``
-
-::
+``arch/x86/include/asm/delay.h``::
#define ndelay(n) (__builtin_constant_p(n) ? \
((n) > 20000 ? __bad_ndelay() : __const_udelay((n) * 5ul)) : \
__ndelay(n))
-``include/linux/fs.h``:
-
-::
+``include/linux/fs.h``::
/*
* Kernel pointers have redundant information, so we can use a
@@ -741,9 +764,7 @@ Some favorites from browsing the source. Feel free to add to this list.
#define PTR_ERR(ptr) ((long)(ptr))
#define IS_ERR(ptr) ((unsigned long)(ptr) > (unsigned long)(-1000))
-``arch/x86/include/asm/uaccess_32.h:``
-
-::
+``arch/x86/include/asm/uaccess_32.h:``::
#define copy_to_user(to,from,n) \
(__builtin_constant_p(n) ? \
@@ -751,9 +772,7 @@ Some favorites from browsing the source. Feel free to add to this list.
__generic_copy_to_user((to),(from),(n)))
-``arch/sparc/kernel/head.S:``
-
-::
+``arch/sparc/kernel/head.S:``::
/*
* Sun people can't spell worth damn. "compatability" indeed.
@@ -772,9 +791,7 @@ Some favorites from browsing the source. Feel free to add to this list.
.asciz "compatible"
-``arch/sparc/lib/checksum.S:``
-
-::
+``arch/sparc/lib/checksum.S:``::
/* Sun, you just can't beat me, you just can't. Stop trying,
* give up. I'm serious, I am going to kick the living shit