Merge branch 'mauro' into docs-mw

Mauro says: This patch series change how kdoc parser handles macro replacements. Instead of heavily relying on regular expressions that can sometimes be very complex, it uses a C lexical tokenizer. This ensures that BEGIN/END blocks on functions and structs are properly handled, even when nested. Checking before/after the patch series, for both man pages and rst only had: - whitespace differences; - struct_group macros now are shown as inner anonimous structs as it should be. Also, I didn't notice any relevant change on the documentation build time. With that regards, right now, every time a CMatch replacement rule takes in place, it does: for each transform: - tokenizes the source code; - handle CMatch; - convert tokens back to a string. A possible optimization would be to do, instead: - tokenizes source code; - for each transform handle CMatch; - convert tokens back to a string. For now, I opted not do do it, because: - too much changes on a single row; - docs build time is taking ~3:30 minutes, which is about the same time it ws taken before the changes; - there is a very dirty hack inside function_xforms: (KernRe(r"_noprof"), ""). This is meant to change function prototypes instead of function arguments. So, if ok for you, I would prefer to merge this one first. We can later optimize kdoc_parser to avoid multiple token <-> string conversions. - One important aspect of this series is that it introduces unittests for kernel-doc. I used it a lot during the development of this series, to ensure that the changes I was doing were producing the expected results. Tests are on two separate files that can be executed directly. Alternatively, there is a run.py script that runs all of them (and any other python script named tools/unittests/test_*.py"): $ tools/unittests/run.py test_cmatch: TestSearch: test_search_acquires_multiple: OK test_search_acquires_nested_paren: OK test_search_acquires_simple: OK test_search_must_hold: OK test_search_must_hold_shared: OK test_search_no_false_positive: OK test_search_no_function: OK test_search_no_macro_remains: OK TestSubMultipleMacros: test_acquires_multiple: OK test_acquires_nested_paren: OK test_acquires_simple: OK test_mixed_macros: OK test_must_hold: OK test_must_hold_shared: OK test_no_false_positive: OK test_no_function: OK test_no_macro_remains: OK TestSubSimple: test_rise_early_greedy: OK test_rise_multiple_greedy: OK test_strip_multiple_acquires: OK test_sub_count_parameter: OK test_sub_mixed_placeholders: OK test_sub_multiple_placeholders: OK test_sub_no_placeholder: OK test_sub_single_placeholder: OK test_sub_with_capture: OK test_sub_zero_placeholder: OK TestSubWithLocalXforms: test_functions_with_acquires_and_releases: OK test_raw_struct_group: OK test_raw_struct_group_tagged: OK test_struct_group: OK test_struct_group_attr: OK test_struct_group_tagged_with_private: OK test_struct_kcov: OK test_vars_stackdepot: OK test_tokenizer: TestPublicPrivate: test_balanced_inner_private: OK test_balanced_non_greddy_private: OK test_balanced_private: OK test_no private: OK test_unbalanced_inner_private: OK test_unbalanced_private: OK test_unbalanced_struct_group_tagged_with_private: OK test_unbalanced_two_struct_group_tagged_first_with_private: OK test_unbalanced_without_end_of_line: OK TestTokenizer: test_basic_tokens: OK test_depth_counters: OK test_mismatch_error: OK Ran 47 tests
author: Jonathan Corbet <corbet@lwn.net> 2026-03-22 15:06:59 -0600
committer: Jonathan Corbet <corbet@lwn.net> 2026-03-22 15:06:59 -0600
commit: 781171bec0650c00c642564afcb5cce57abda5bf (patch)
tree: 31e8cf6d41b40bef9f432ebbe4a8180f32af9fbf /tools
parent: 6108c809f4fd9dbb1a138ba4326d645cc3113a8d (diff)
parent: 7538df7a2d7d26428803cf8053476169a6d28659 (diff)
download: lwn-781171bec0650c00c642564afcb5cce57abda5bf.tar.gz
lwn-781171bec0650c00c642564afcb5cce57abda5bf.zip
8 files changed, 2437 insertions, 342 deletions
diff --git a/tools/lib/python/kdoc/c_lex.py b/tools/lib/python/kdoc/c_lex.py
new file mode 100644
index 000000000000..b6d58bd470a9
--- /dev/null
+++ b/tools/lib/python/kdoc/c_lex.py
@@ -0,0 +1,655 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+# Copyright(c) 2025: Mauro Carvalho Chehab <mchehab@kernel.org>.
+
+"""
+Regular expression ancillary classes.
+
+Those help caching regular expressions and do matching for kernel-doc.
+
+Please notice that the code here may rise exceptions to indicate bad
+usage inside kdoc to indicate problems at the replace pattern.
+
+Other errors are logged via log instance.
+"""
+
+import logging
+import re
+
+from copy import copy
+
+from .kdoc_re import KernRe
+
+log = logging.getLogger(__name__)
+
+def tokenizer_set_log(logger, prefix = ""):
+    """
+    Replace the module‑level logger with a LoggerAdapter that
+    prepends *prefix* to every message.
+    """
+    global log
+
+    class PrefixAdapter(logging.LoggerAdapter):
+        """
+        Ancillary class to set prefix on all message logs.
+        """
+        def process(self, msg, kwargs):
+            return f"{prefix}{msg}", kwargs
+
+    # Wrap the provided logger in our adapter
+    log = PrefixAdapter(logger, {"prefix": prefix})
+
+class CToken():
+    """
+    Data class to define a C token.
+    """
+
+    # Tokens that can be used by the parser. Works like an C enum.
+
+    COMMENT = 0     #: A standard C or C99 comment, including delimiter.
+    STRING = 1      #: A string, including quotation marks.
+    CHAR = 2        #: A character, including apostophes.
+    NUMBER = 3      #: A number.
+    PUNC = 4        #: A puntuation mark: / ``,`` / ``.``.
+    BEGIN = 5       #: A begin character: ``{`` / ``[`` / ``(``.
+    END = 6         #: A end character: ``}`` / ``]`` / ``)``.
+    CPP = 7         #: A preprocessor macro.
+    HASH = 8        #: The hash character - useful to handle other macros.
+    OP = 9          #: A C operator (add, subtract, ...).
+    STRUCT = 10     #: A ``struct`` keyword.
+    UNION = 11      #: An ``union`` keyword.
+    ENUM = 12       #: A ``struct`` keyword.
+    TYPEDEF = 13    #: A ``typedef`` keyword.
+    NAME = 14       #: A name. Can be an ID or a type.
+    SPACE = 15      #: Any space characters, including new lines
+    ENDSTMT = 16    #: End of an statement (``;``).
+
+    BACKREF = 17    #: Not a valid C sequence, but used at sub regex patterns.
+
+    MISMATCH = 255  #: an error indicator: should never happen in practice.
+
+    # Dict to convert from an enum interger into a string.
+    _name_by_val = {v: k for k, v in dict(vars()).items() if isinstance(v, int)}
+
+    # Dict to convert from string to an enum-like integer value.
+    _name_to_val = {k: v for v, k in _name_by_val.items()}
+
+    @staticmethod
+    def to_name(val):
+        """Convert from an integer value from CToken enum into a string"""
+
+        return CToken._name_by_val.get(val, f"UNKNOWN({val})")
+
+    @staticmethod
+    def from_name(name):
+        """Convert a string into a CToken enum value"""
+        if name in CToken._name_to_val:
+            return CToken._name_to_val[name]
+
+        return CToken.MISMATCH
+
+
+    def __init__(self, kind, value=None, pos=0,
+                 brace_level=0, paren_level=0, bracket_level=0):
+        self.kind = kind
+        self.value = value
+        self.pos = pos
+        self.level = (bracket_level, paren_level, brace_level)
+
+    def __repr__(self):
+        name = self.to_name(self.kind)
+        if isinstance(self.value, str):
+            value = '"' + self.value + '"'
+        else:
+            value = self.value
+
+        return f"CToken(CToken.{name}, {value}, {self.pos}, {self.level})"
+
+#: Regexes to parse C code, transforming it into tokens.
+RE_SCANNER_LIST = [
+    #
+    # Note that \s\S is different than .*, as it also catches \n
+    #
+    (CToken.COMMENT, r"//[^\n]*|/\*[\s\S]*?\*/"),
+
+    (CToken.STRING,  r'"(?:\\.|[^"\\])*"'),
+    (CToken.CHAR,    r"'(?:\\.|[^'\\])'"),
+
+    (CToken.NUMBER,  r"0[xX][\da-fA-F]+[uUlL]*|0[0-7]+[uUlL]*|"
+                     r"\d+(?:\.\d*)?(?:[eE][+-]?\d+)?[fFlL]*"),
+
+    (CToken.ENDSTMT, r"(?:\s+;|;)"),
+
+    (CToken.PUNC,    r"[,\.]"),
+
+    (CToken.BEGIN,   r"[\[\(\{]"),
+
+    (CToken.END,     r"[\]\)\}]"),
+
+    (CToken.CPP,     r"#\s*(?:define|include|ifdef|ifndef|if|else|elif|endif|undef|pragma)\b"),
+
+    (CToken.HASH,    r"#"),
+
+    (CToken.OP,      r"\+\+|\-\-|\->|==|\!=|<=|>=|&&|\|\||<<|>>|\+=|\-=|\*=|/=|%="
+                     r"|&=|\|=|\^=|[=\+\-\*/%<>&\|\^~!\?\:]"),
+
+    (CToken.STRUCT,  r"\bstruct\b"),
+    (CToken.UNION,   r"\bunion\b"),
+    (CToken.ENUM,    r"\benum\b"),
+    (CToken.TYPEDEF, r"\btypedef\b"),
+
+    (CToken.NAME,    r"[A-Za-z_]\w*"),
+
+    (CToken.SPACE,   r"\s+"),
+
+    (CToken.BACKREF, r"\\\d+"),
+
+    (CToken.MISMATCH,r"."),
+]
+
+def fill_re_scanner(token_list):
+    """Ancillary routine to convert RE_SCANNER_LIST into a finditer regex"""
+    re_tokens = []
+
+    for kind, pattern in token_list:
+        name = CToken.to_name(kind)
+        re_tokens.append(f"(?P<{name}>{pattern})")
+
+    return KernRe("|".join(re_tokens), re.MULTILINE | re.DOTALL)
+
+#: Handle C continuation lines.
+RE_CONT = KernRe(r"\\\n")
+
+RE_COMMENT_START = KernRe(r'/\*\s*')
+
+#: tokenizer regex. Will be filled at the first CTokenizer usage.
+RE_SCANNER = fill_re_scanner(RE_SCANNER_LIST)
+
+
+class CTokenizer():
+    """
+    Scan C statements and definitions and produce tokens.
+
+    When converted to string, it drops comments and handle public/private
+    values, respecting depth.
+    """
+
+    # This class is inspired and follows the basic concepts of:
+    #   https://docs.python.org/3/library/re.html#writing-a-tokenizer
+
+    def __init__(self, source=None, log=None):
+        """
+        Create a regular expression to handle RE_SCANNER_LIST.
+
+        While I generally don't like using regex group naming via:
+            (?P<name>...)
+
+        in this particular case, it makes sense, as we can pick the name
+        when matching a code via RE_SCANNER.
+        """
+
+        self.tokens = []
+
+        if not source:
+            return
+
+        if isinstance(source, list):
+            self.tokens = source
+            return
+
+        #
+        # While we could just use _tokenize directly via interator,
+        # As we'll need to use the tokenizer several times inside kernel-doc
+        # to handle macro transforms, cache the results on a list, as
+        # re-using it is cheaper than having to parse everytime.
+        #
+        for tok in self._tokenize(source):
+            self.tokens.append(tok)
+
+    def _tokenize(self, source):
+        """
+        Iterator that parses ``source``, splitting it into tokens, as defined
+        at ``self.RE_SCANNER_LIST``.
+
+        The interactor returns a CToken class object.
+        """
+
+        # Handle continuation lines. Note that kdoc_parser already has a
+        # logic to do that. Still, let's keep it for completeness, as we might
+        # end re-using this tokenizer outsize kernel-doc some day - or we may
+        # eventually remove from there as a future cleanup.
+        source = RE_CONT.sub("", source)
+
+        brace_level = 0
+        paren_level = 0
+        bracket_level = 0
+
+        for match in RE_SCANNER.finditer(source):
+            kind = CToken.from_name(match.lastgroup)
+            pos = match.start()
+            value = match.group()
+
+            if kind == CToken.MISMATCH:
+                log.error(f"Unexpected token '{value}' on pos {pos}:\n\t'{source}'")
+            elif kind == CToken.BEGIN:
+                if value == '(':
+                    paren_level += 1
+                elif value == '[':
+                    bracket_level += 1
+                else:  # value == '{'
+                    brace_level += 1
+
+            elif kind == CToken.END:
+                if value == ')' and paren_level > 0:
+                    paren_level -= 1
+                elif value == ']' and bracket_level > 0:
+                    bracket_level -= 1
+                elif brace_level > 0:    # value == '}'
+                    brace_level -= 1
+
+            yield CToken(kind, value, pos,
+                         brace_level, paren_level, bracket_level)
+
+    def __str__(self):
+        out=""
+        show_stack = [True]
+
+        for i, tok in enumerate(self.tokens):
+            if tok.kind == CToken.BEGIN:
+                show_stack.append(show_stack[-1])
+
+            elif tok.kind == CToken.END:
+                prev = show_stack[-1]
+                if len(show_stack) > 1:
+                    show_stack.pop()
+
+                if not prev and show_stack[-1]:
+                    #
+                    # Try to preserve indent
+                    #
+                    out += "\t" * (len(show_stack) - 1)
+
+                    out += str(tok.value)
+                    continue
+
+            elif tok.kind == CToken.COMMENT:
+                comment = RE_COMMENT_START.sub("", tok.value)
+
+                if comment.startswith("private:"):
+                    show_stack[-1] = False
+                    show = False
+                elif comment.startswith("public:"):
+                    show_stack[-1] = True
+
+                continue
+
+            if not show_stack[-1]:
+                continue
+
+            if i < len(self.tokens) - 1:
+                next_tok = self.tokens[i + 1]
+
+                # Do some cleanups before ";"
+
+                if tok.kind == CToken.SPACE and next_tok.kind == CToken.ENDSTMT:
+                    continue
+
+                if tok.kind == CToken.ENDSTMT and next_tok.kind == tok.kind:
+                    continue
+
+            out += str(tok.value)
+
+        return out
+
+
+class CTokenArgs:
+    """
+    Ancillary class to help using backrefs from sub matches.
+
+    If the highest backref contain a "+" at the last element,
+    the logic will be greedy, picking all other delims.
+
+    This is needed to parse struct_group macros with end with ``MEMBERS...``.
+    """
+    def __init__(self, sub_str):
+        self.sub_groups = set()
+        self.max_group = -1
+        self.greedy = None
+
+        for m in KernRe(r'\\(\d+)([+]?)').finditer(sub_str):
+            group = int(m.group(1))
+            if m.group(2) == "+":
+                if self.greedy and self.greedy != group:
+                    raise ValueError("There are multiple greedy patterns!")
+                self.greedy = group
+
+            self.sub_groups.add(group)
+            self.max_group = max(self.max_group, group)
+
+        if self.greedy:
+            if self.greedy != self.max_group:
+                raise ValueError("Greedy pattern is not the last one!")
+
+            sub_str = KernRe(r'(\\\d+)[+]').sub(r"\1", sub_str)
+
+        self.sub_str = sub_str
+        self.sub_tokeninzer = CTokenizer(sub_str)
+
+    def groups(self, new_tokenizer):
+        """
+        Create replacement arguments for backrefs like:
+
+        ``\0``, ``\1``, ``\2``, ...``\n``
+
+        It also accepts a ``+`` character to the highest backref. When used,
+        it means in practice to ignore delimins after it, being greedy.
+
+        The logic is smart enough to only go up to the maximum required
+        argument, even if there are more.
+
+        If there is a backref for an argument above the limit, it will
+        raise an exception. Please notice that, on C, square brackets
+        don't have any separator on it. Trying to use ``\1``..``\n`` for
+        brackets also raise an exception.
+        """
+
+        level = (0, 0, 0)
+
+        if self.max_group < 0:
+            return level, []
+
+        tokens = new_tokenizer.tokens
+
+        #
+        # Fill \0 with the full token contents
+        #
+        groups_list = [ [] ]
+
+        if 0 in self.sub_groups:
+            inner_level = 0
+
+            for i in range(0, len(tokens)):
+                tok = tokens[i]
+
+                if tok.kind == CToken.BEGIN:
+                    inner_level += 1
+
+                    #
+                    # Discard first begin
+                    #
+                    if not groups_list[0]:
+                        continue
+                elif tok.kind == CToken.END:
+                    inner_level -= 1
+                    if inner_level < 0:
+                        break
+
+                if inner_level:
+                    groups_list[0].append(tok)
+
+        if not self.max_group:
+            return level, groups_list
+
+        delim = None
+
+        #
+        # Ignore everything before BEGIN. The value of begin gives the
+        # delimiter to be used for the matches
+        #
+        for i in range(0, len(tokens)):
+            tok = tokens[i]
+            if tok.kind == CToken.BEGIN:
+                if tok.value == "{":
+                    delim = ";"
+                elif tok.value == "(":
+                    delim = ","
+                else:
+                    self.log.error(fr"Can't handle \1..\n on {sub_str}")
+
+                level = tok.level
+                break
+
+        pos = 1
+        groups_list.append([])
+
+        inner_level = 0
+        for i in range(i + 1, len(tokens)):
+            tok = tokens[i]
+
+            if tok.kind == CToken.BEGIN:
+                inner_level += 1
+            if tok.kind == CToken.END:
+                inner_level -= 1
+                if inner_level < 0:
+                    break
+
+            if tok.kind in [CToken.PUNC, CToken.ENDSTMT] and delim == tok.value:
+                pos += 1
+                if self.greedy and pos > self.max_group:
+                    pos -= 1
+                else:
+                    groups_list.append([])
+
+                    if pos > self.max_group:
+                        break
+
+                    continue
+
+            groups_list[pos].append(tok)
+
+        if pos < self.max_group:
+            log.error(fr"{self.sub_str} groups are up to {pos} instead of {self.max_group}")
+
+        return level, groups_list
+
+    def tokens(self, new_tokenizer):
+        level, groups = self.groups(new_tokenizer)
+
+        new = CTokenizer()
+
+        for tok in self.sub_tokeninzer.tokens:
+            if tok.kind == CToken.BACKREF:
+                group = int(tok.value[1:])
+
+                for group_tok in groups[group]:
+                    new_tok = copy(group_tok)
+
+                    new_level = [0, 0, 0]
+
+                    for i in range(0, len(level)):
+                        new_level[i] = new_tok.level[i] + level[i]
+
+                    new_tok.level = tuple(new_level)
+
+                    new.tokens += [ new_tok ]
+            else:
+                new.tokens += [ tok ]
+
+        return new.tokens
+
+
+class CMatch:
+    """
+    Finding nested delimiters is hard with regular expressions. It is
+    even harder on Python with its normal re module, as there are several
+    advanced regular expressions that are missing.
+
+    This is the case of this pattern::
+
+            '\\bSTRUCT_GROUP(\\(((?:(?>[^)(]+)|(?1))*)\\))[^;]*;'
+
+    which is used to properly match open/close parentheses of the
+    string search STRUCT_GROUP(),
+
+    Add a class that counts pairs of delimiters, using it to match and
+    replace nested expressions.
+
+    The original approach was suggested by:
+
+        https://stackoverflow.com/questions/5454322/python-how-to-match-nested-parentheses-with-regex
+
+    Although I re-implemented it to make it more generic and match 3 types
+    of delimiters. The logic checks if delimiters are paired. If not, it
+    will ignore the search string.
+    """
+
+
+    def __init__(self, regex, delim="("):
+        self.regex = KernRe("^" + regex + r"\b")
+        self.start_delim = delim
+
+    def _search(self, tokenizer):
+        """
+        Finds paired blocks for a regex that ends with a delimiter.
+
+        The suggestion of using finditer to match pairs came from:
+        https://stackoverflow.com/questions/5454322/python-how-to-match-nested-parentheses-with-regex
+        but I ended using a different implementation to align all three types
+        of delimiters and seek for an initial regular expression.
+
+        The algorithm seeks for open/close paired delimiters and places them
+        into a stack, yielding a start/stop position of each match when the
+        stack is zeroed.
+
+        The algorithm should work fine for properly paired lines, but will
+        silently ignore end delimiters that precede a start delimiter.
+        This should be OK for kernel-doc parser, as unaligned delimiters
+        would cause compilation errors. So, we don't need to raise exceptions
+        to cover such issues.
+        """
+
+        start = None
+        started = False
+
+        import sys
+
+        stack = []
+
+        for i, tok in enumerate(tokenizer.tokens):
+            if start is None:
+                if tok.kind == CToken.NAME and self.regex.match(tok.value):
+                    start = i
+                    stack.append((start, tok.level))
+                    started = False
+
+                continue
+
+            if not started:
+                if tok.kind == CToken.SPACE:
+                    continue
+
+                if tok.kind == CToken.BEGIN and tok.value == self.start_delim:
+                    started = True
+                    continue
+
+                # Name only token without BEGIN/END
+                if i > start:
+                    i -= 1
+                yield start, i
+                start = None
+
+            if tok.kind == CToken.END and tok.level == stack[-1][1]:
+                start, level = stack.pop()
+
+                yield start, i
+                start = None
+
+        #
+        # If an END zeroing levels is not there, return remaining stuff
+        # This is meant to solve cases where the caller logic might be
+        # picking an incomplete block.
+        #
+        if start and stack:
+            if started:
+                s = str(tokenizer)
+                log.warning(f"can't find a final end at {s}")
+
+            yield start, len(tokenizer.tokens)
+
+    def search(self, source):
+        """
+        This is similar to re.search:
+
+        It matches a regex that it is followed by a delimiter,
+        returning occurrences only if all delimiters are paired.
+        """
+
+        if isinstance(source, CTokenizer):
+            tokenizer = source
+            is_token = True
+        else:
+            tokenizer = CTokenizer(source)
+            is_token = False
+
+        for start, end in self._search(tokenizer):
+            new_tokenizer = CTokenizer(tokenizer.tokens[start:end + 1])
+
+            if is_token:
+                yield new_tokenizer
+            else:
+                yield str(new_tokenizer)
+
+    def sub(self, sub_str, source, count=0):
+        """
+        This is similar to re.sub:
+
+        It matches a regex that it is followed by a delimiter,
+        replacing occurrences only if all delimiters are paired.
+
+        if the sub argument contains::
+
+            r'\0'
+
+        it will work just like re: it places there the matched paired data
+        with the delimiter stripped.
+
+        If count is different than zero, it will replace at most count
+        items.
+        """
+        if isinstance(source, CTokenizer):
+            is_token = True
+            tokenizer = source
+        else:
+            is_token = False
+            tokenizer = CTokenizer(source)
+
+        # Detect if sub_str contains sub arguments
+
+        args_match = CTokenArgs(sub_str)
+
+        new_tokenizer = CTokenizer()
+        pos = 0
+        n = 0
+
+        #
+        # NOTE: the code below doesn't consider overlays at sub.
+        # We may need to add some extra unit tests to check if those
+        # would cause problems. When replacing by "", this should not
+        # be a problem, but other transformations could be problematic
+        #
+        for start, end in self._search(tokenizer):
+            new_tokenizer.tokens += tokenizer.tokens[pos:start]
+
+            new = CTokenizer(tokenizer.tokens[start:end + 1])
+
+            new_tokenizer.tokens += args_match.tokens(new)
+
+            pos = end + 1
+
+            n += 1
+            if count and n >= count:
+                break
+
+        new_tokenizer.tokens += tokenizer.tokens[pos:]
+
+        if not is_token:
+            return str(new_tokenizer)
+
+        return new_tokenizer
+
+    def __repr__(self):
+        """
+        Returns a displayable version of the class init.
+        """
+
+        return f'CMatch("{self.regex.regex.pattern}")'
diff --git a/tools/lib/python/kdoc/kdoc_parser.py b/tools/lib/python/kdoc/kdoc_parser.py
index edf70ba139a5..f6c4ee3b18c9 100644
--- a/tools/lib/python/kdoc/kdoc_parser.py
+++ b/tools/lib/python/kdoc/kdoc_parser.py
@@ -13,7 +13,8 @@ import sys
 import re
 from pprint import pformat
 
-from kdoc.kdoc_re import NestedMatch, KernRe
+from kdoc.c_lex import CTokenizer, tokenizer_set_log
+from kdoc.kdoc_re import KernRe
 from kdoc.kdoc_item import KdocItem
 
 #
@@ -84,15 +85,9 @@ def trim_private_members(text):
     """
     Remove ``struct``/``enum`` members that have been marked "private".
     """
-    # First look for a "public:" block that ends a private region, then
-    # handle the "private until the end" case.
-    #
-    text = KernRe(r'/\*\s*private:.*?/\*\s*public:.*?\*/', flags=re.S).sub('', text)
-    text = KernRe(r'/\*\s*private:.*', flags=re.S).sub('', text)
-    #
-    # We needed the comments to do the above, but now we can take them out.
-    #
-    return KernRe(r'\s*/\*.*?\*/\s*', flags=re.S).sub('', text).strip()
+
+    tokens = CTokenizer(text)
+    return str(tokens)
 
 class state:
     """
@@ -258,6 +253,8 @@ class KernelDoc:
         self.config = config
         self.xforms = xforms
 
+        tokenizer_set_log(self.config.log, f"{self.fname}: CMatch: ")
+
         # Initial state for the state machines
         self.state = state.NORMAL
 
@@ -726,6 +723,7 @@ class KernelDoc:
         #
         # Do the basic parse to get the pieces of the declaration.
         #
+        proto = trim_private_members(proto)
         struct_parts = self.split_struct_proto(proto)
         if not struct_parts:
             self.emit_msg(ln, f"{proto} error: Cannot parse struct or union!")
@@ -739,7 +737,6 @@ class KernelDoc:
         #
         # Go through the list of members applying all of our transformations.
         #
-        members = trim_private_members(members)
         members = self.xforms.apply("struct", members)
 
         #
@@ -766,6 +763,7 @@ class KernelDoc:
         # Strip preprocessor directives.  Note that this depends on the
         # trailing semicolon we added in process_proto_type().
         #
+        proto = trim_private_members(proto)
         proto = KernRe(r'#\s*((define|ifdef|if)\s+|endif)[^;]*;', flags=re.S).sub('', proto)
         #
         # Parse out the name and members of the enum.  Typedef form first.
@@ -773,7 +771,7 @@ class KernelDoc:
         r = KernRe(r'typedef\s+enum\s*\{(.*)\}\s*(\w*)\s*;')
         if r.search(proto):
             declaration_name = r.group(2)
-            members = trim_private_members(r.group(1))
+            members = r.group(1)
         #
         # Failing that, look for a straight enum
         #
@@ -781,7 +779,7 @@ class KernelDoc:
             r = KernRe(r'enum\s+(\w*)\s*\{(.*)\}')
             if r.match(proto):
                 declaration_name = r.group(1)
-                members = trim_private_members(r.group(2))
+                members = r.group(2)
         #
         # OK, this isn't going to work.
         #
@@ -810,9 +808,10 @@ class KernelDoc:
         member_set = set()
         members = KernRe(r'\([^;)]*\)').sub('', members)
         for arg in members.split(','):
-            if not arg:
-                continue
             arg = KernRe(r'^\s*(\w+).*').sub(r'\1', arg)
+            if not arg.strip():
+                continue
+
             self.entry.parameterlist.append(arg)
             if arg not in self.entry.parameterdescs:
                 self.entry.parameterdescs[arg] = self.undescribed
@@ -1355,6 +1354,12 @@ class KernelDoc:
         elif doc_content.search(line):
             self.emit_msg(ln, f"Incorrect use of kernel-doc format: {line}")
             self.state = state.PROTO
+
+            #
+            # Don't let it add partial comments at the code, as breaks the
+            # logic meant to remove comments from prototypes.
+            #
+            self.process_proto_type(ln, "/**\n" + line)
         # else ... ??
 
     def process_inline_text(self, ln, line):
diff --git a/tools/lib/python/kdoc/kdoc_re.py b/tools/lib/python/kdoc/kdoc_re.py
index 085b89a4547c..6f3ae28859ea 100644
--- a/tools/lib/python/kdoc/kdoc_re.py
+++ b/tools/lib/python/kdoc/kdoc_re.py
@@ -140,204 +140,3 @@ class KernRe:
         """
 
         return self.last_match.groups()
-
-#: Nested delimited pairs (brackets and parenthesis)
-DELIMITER_PAIRS = {
-    '{': '}',
-    '(': ')',
-    '[': ']',
-}
-
-#: compiled delimiters
-RE_DELIM = KernRe(r'[\{\}\[\]\(\)]')
-
-
-class NestedMatch:
-    """
-    Finding nested delimiters is hard with regular expressions. It is
-    even harder on Python with its normal re module, as there are several
-    advanced regular expressions that are missing.
-
-    This is the case of this pattern::
-
-            '\\bSTRUCT_GROUP(\\(((?:(?>[^)(]+)|(?1))*)\\))[^;]*;'
-
-    which is used to properly match open/close parentheses of the
-    string search STRUCT_GROUP(),
-
-    Add a class that counts pairs of delimiters, using it to match and
-    replace nested expressions.
-
-    The original approach was suggested by:
-
-        https://stackoverflow.com/questions/5454322/python-how-to-match-nested-parentheses-with-regex
-
-    Although I re-implemented it to make it more generic and match 3 types
-    of delimiters. The logic checks if delimiters are paired. If not, it
-    will ignore the search string.
-    """
-
-    # TODO: make NestedMatch handle multiple match groups
-    #
-    # Right now, regular expressions to match it are defined only up to
-    #       the start delimiter, e.g.:
-    #
-    #       \bSTRUCT_GROUP\(
-    #
-    # is similar to: STRUCT_GROUP\((.*)\)
-    # except that the content inside the match group is delimiter-aligned.
-    #
-    # The content inside parentheses is converted into a single replace
-    # group (e.g. r`\0').
-    #
-    # It would be nice to change such definition to support multiple
-    # match groups, allowing a regex equivalent to:
-    #
-    #   FOO\((.*), (.*), (.*)\)
-    #
-    # it is probably easier to define it not as a regular expression, but
-    # with some lexical definition like:
-    #
-    #   FOO(arg1, arg2, arg3)
-
-    def __init__(self, regex):
-        self.regex = KernRe(regex)
-
-    def _search(self, line):
-        """
-        Finds paired blocks for a regex that ends with a delimiter.
-
-        The suggestion of using finditer to match pairs came from:
-        https://stackoverflow.com/questions/5454322/python-how-to-match-nested-parentheses-with-regex
-        but I ended using a different implementation to align all three types
-        of delimiters and seek for an initial regular expression.
-
-        The algorithm seeks for open/close paired delimiters and places them
-        into a stack, yielding a start/stop position of each match when the
-        stack is zeroed.
-
-        The algorithm should work fine for properly paired lines, but will
-        silently ignore end delimiters that precede a start delimiter.
-        This should be OK for kernel-doc parser, as unaligned delimiters
-        would cause compilation errors. So, we don't need to raise exceptions
-        to cover such issues.
-        """
-
-        stack = []
-
-        for match_re in self.regex.finditer(line):
-            start = match_re.start()
-            offset = match_re.end()
-            string_char = None
-            escape = False
-
-            d = line[offset - 1]
-            if d not in DELIMITER_PAIRS:
-                continue
-
-            end = DELIMITER_PAIRS[d]
-            stack.append(end)
-
-            for match in RE_DELIM.finditer(line[offset:]):
-                pos = match.start() + offset
-
-                d = line[pos]
-
-                if escape:
-                    escape = False
-                    continue
-
-                if string_char:
-                    if d == '\\':
-                        escape = True
-                    elif d == string_char:
-                        string_char = None
-
-                    continue
-
-                if d in ('"', "'"):
-                    string_char = d
-                    continue
-
-                if d in DELIMITER_PAIRS:
-                    end = DELIMITER_PAIRS[d]
-
-                    stack.append(end)
-                    continue
-
-                # Does the end delimiter match what is expected?
-                if stack and d == stack[-1]:
-                    stack.pop()
-
-                    if not stack:
-                        yield start, offset, pos + 1
-                        break
-
-    def search(self, line):
-        """
-        This is similar to re.search:
-
-        It matches a regex that it is followed by a delimiter,
-        returning occurrences only if all delimiters are paired.
-        """
-
-        for t in self._search(line):
-
-            yield line[t[0]:t[2]]
-
-    def sub(self, sub, line, count=0):
-        """
-        This is similar to re.sub:
-
-        It matches a regex that it is followed by a delimiter,
-        replacing occurrences only if all delimiters are paired.
-
-        if the sub argument contains::
-
-            r'\0'
-
-        it will work just like re: it places there the matched paired data
-        with the delimiter stripped.
-
-        If count is different than zero, it will replace at most count
-        items.
-        """
-        out = ""
-
-        cur_pos = 0
-        n = 0
-
-        for start, end, pos in self._search(line):
-            out += line[cur_pos:start]
-
-            # Value, ignoring start/end delimiters
-            value = line[end:pos - 1]
-
-            # replaces \0 at the sub string, if \0 is used there
-            new_sub = sub
-            new_sub = new_sub.replace(r'\0', value)
-
-            out += new_sub
-
-            # Drop end ';' if any
-            if pos < len(line) and line[pos] == ';':
-                pos += 1
-
-            cur_pos = pos
-            n += 1
-
-            if count and count >= n:
-                break
-
-        # Append the remaining string
-        l = len(line)
-        out += line[cur_pos:l]
-
-        return out
-
-    def __repr__(self):
-        """
-        Returns a displayable version of the class init.
-        """
-
-        return f'NestedMatch("{self.regex.regex.pattern}")'
diff --git a/tools/lib/python/kdoc/xforms_lists.py b/tools/lib/python/kdoc/xforms_lists.py
index c07cbe1e6349..f6ea9efb11ae 100644
--- a/tools/lib/python/kdoc/xforms_lists.py
+++ b/tools/lib/python/kdoc/xforms_lists.py
@@ -4,9 +4,11 @@
 
 import re
 
-from kdoc.kdoc_re import KernRe, NestedMatch
+from kdoc.kdoc_re import KernRe
+from kdoc.c_lex import CMatch, CTokenizer
+
+struct_args_pattern = r"([^,)]+)"
 
-struct_args_pattern = r'([^,)]+)'
 
 class CTransforms:
     """
@@ -15,137 +17,106 @@ class CTransforms:
     into something we can parse and generate kdoc for.
     """
 
+    #
+    # NOTE:
+    #      Due to performance reasons, place CMatch rules before KernRe,
+    #      as this avoids running the C parser every time.
+    #
+
     #: Transforms for structs and unions.
     struct_xforms = [
-        # Strip attributes
-        (KernRe(r"__attribute__\s*\(\([a-z0-9,_\*\s\(\)]*\)\)", flags=re.I | re.S, cache=False), ' '),
-        (KernRe(r'\s*__aligned\s*\([^;]*\)', re.S), ' '),
-        (KernRe(r'\s*__counted_by\s*\([^;]*\)', re.S), ' '),
-        (KernRe(r'\s*__counted_by_(le|be)\s*\([^;]*\)', re.S), ' '),
-        (KernRe(r'\s*__guarded_by\s*\([^\)]*\)', re.S), ' '),
-        (KernRe(r'\s*__pt_guarded_by\s*\([^\)]*\)', re.S), ' '),
-        (KernRe(r'\s*__packed\s*', re.S), ' '),
-        (KernRe(r'\s*CRYPTO_MINALIGN_ATTR', re.S), ' '),
-        (KernRe(r'\s*__private', re.S), ' '),
-        (KernRe(r'\s*__rcu', re.S), ' '),
-        (KernRe(r'\s*____cacheline_aligned_in_smp', re.S), ' '),
-        (KernRe(r'\s*____cacheline_aligned', re.S), ' '),
-        (KernRe(r'\s*__cacheline_group_(begin|end)\([^\)]+\);'), ''),
-        #
-        # Unwrap struct_group macros based on this definition:
-        # __struct_group(TAG, NAME, ATTRS, MEMBERS...)
-        # which has variants like: struct_group(NAME, MEMBERS...)
-        # Only MEMBERS arguments require documentation.
-        #
-        # Parsing them happens on two steps:
-        #
-        # 1. drop struct group arguments that aren't at MEMBERS,
-        #    storing them as STRUCT_GROUP(MEMBERS)
-        #
-        # 2. remove STRUCT_GROUP() ancillary macro.
-        #
-        # The original logic used to remove STRUCT_GROUP() using an
-        # advanced regex:
-        #
-        #   \bSTRUCT_GROUP(\(((?:(?>[^)(]+)|(?1))*)\))[^;]*;
-        #
-        # with two patterns that are incompatible with
-        # Python re module, as it has:
-        #
-        #   - a recursive pattern: (?1)
-        #   - an atomic grouping: (?>...)
-        #
-        # I tried a simpler version: but it didn't work either:
-        #   \bSTRUCT_GROUP\(([^\)]+)\)[^;]*;
-        #
-        # As it doesn't properly match the end parenthesis on some cases.
-        #
-        # So, a better solution was crafted: there's now a NestedMatch
-        # class that ensures that delimiters after a search are properly
-        # matched. So, the implementation to drop STRUCT_GROUP() will be
-        # handled in separate.
-        #
-        (KernRe(r'\bstruct_group\s*\(([^,]*,)', re.S), r'STRUCT_GROUP('),
-        (KernRe(r'\bstruct_group_attr\s*\(([^,]*,){2}', re.S), r'STRUCT_GROUP('),
-        (KernRe(r'\bstruct_group_tagged\s*\(([^,]*),([^,]*),', re.S), r'struct \1 \2; STRUCT_GROUP('),
-        (KernRe(r'\b__struct_group\s*\(([^,]*,){3}', re.S), r'STRUCT_GROUP('),
-        #
-        # Replace macros
-        #
-        # TODO: use NestedMatch for FOO($1, $2, ...) matches
+        (CMatch("__attribute__"), ""),
+        (CMatch("__aligned"), ""),
+        (CMatch("__counted_by"), ""),
+        (CMatch("__counted_by_(le|be)"), ""),
+        (CMatch("__guarded_by"), ""),
+        (CMatch("__pt_guarded_by"), ""),
+        (CMatch("__packed"), ""),
+        (CMatch("CRYPTO_MINALIGN_ATTR"), ""),
+        (CMatch("__private"), ""),
+        (CMatch("__rcu"), ""),
+        (CMatch("____cacheline_aligned_in_smp"), ""),
+        (CMatch("____cacheline_aligned"), ""),
+        (CMatch("__cacheline_group_(?:begin|end)"), ""),
+        (CMatch("__ETHTOOL_DECLARE_LINK_MODE_MASK"), r"DECLARE_BITMAP(\1, __ETHTOOL_LINK_MODE_MASK_NBITS)"),
+        (CMatch("DECLARE_PHY_INTERFACE_MASK",),r"DECLARE_BITMAP(\1, PHY_INTERFACE_MODE_MAX)"),
+        (CMatch("DECLARE_BITMAP"), r"unsigned long \1[BITS_TO_LONGS(\2)]"),
+        (CMatch("DECLARE_HASHTABLE"), r"unsigned long \1[1 << ((\2) - 1)]"),
+        (CMatch("DECLARE_KFIFO"), r"\2 *\1"),
+        (CMatch("DECLARE_KFIFO_PTR"), r"\2 *\1"),
+        (CMatch("(?:__)?DECLARE_FLEX_ARRAY"), r"\1 \2[]"),
+        (CMatch("DEFINE_DMA_UNMAP_ADDR"), r"dma_addr_t \1"),
+        (CMatch("DEFINE_DMA_UNMAP_LEN"), r"__u32 \1"),
+        (CMatch("VIRTIO_DECLARE_FEATURES"), r"union { u64 \1; u64 \1_array[VIRTIO_FEATURES_U64S]; }"),
+        (CMatch("__cond_acquires"), ""),
+        (CMatch("__cond_releases"), ""),
+        (CMatch("__acquires"), ""),
+        (CMatch("__releases"), ""),
+        (CMatch("__must_hold"), ""),
+        (CMatch("__must_not_hold"), ""),
+        (CMatch("__must_hold_shared"), ""),
+        (CMatch("__cond_acquires_shared"), ""),
+        (CMatch("__acquires_shared"), ""),
+        (CMatch("__releases_shared"), ""),
+        (CMatch("__attribute__"), ""),
+
         #
-        # it is better to also move those to the NestedMatch logic,
-        # to ensure that parentheses will be properly matched.
+        # Macro __struct_group() creates an union with an anonymous
+        # and a non-anonymous struct, depending on the parameters. We only
+        # need one of those at kernel-doc, as we won't be documenting the same
+        # members twice.
         #
-        (KernRe(r'__ETHTOOL_DECLARE_LINK_MODE_MASK\s*\(([^\)]+)\)', re.S),
-        r'DECLARE_BITMAP(\1, __ETHTOOL_LINK_MODE_MASK_NBITS)'),
-        (KernRe(r'DECLARE_PHY_INTERFACE_MASK\s*\(([^\)]+)\)', re.S),
-        r'DECLARE_BITMAP(\1, PHY_INTERFACE_MODE_MAX)'),
-        (KernRe(r'DECLARE_BITMAP\s*\(' + struct_args_pattern + r',\s*' + struct_args_pattern + r'\)',
-                re.S), r'unsigned long \1[BITS_TO_LONGS(\2)]'),
-        (KernRe(r'DECLARE_HASHTABLE\s*\(' + struct_args_pattern + r',\s*' + struct_args_pattern + r'\)',
-                re.S), r'unsigned long \1[1 << ((\2) - 1)]'),
-        (KernRe(r'DECLARE_KFIFO\s*\(' + struct_args_pattern + r',\s*' + struct_args_pattern +
-                r',\s*' + struct_args_pattern + r'\)', re.S), r'\2 *\1'),
-        (KernRe(r'DECLARE_KFIFO_PTR\s*\(' + struct_args_pattern + r',\s*' +
-                struct_args_pattern + r'\)', re.S), r'\2 *\1'),
-        (KernRe(r'(?:__)?DECLARE_FLEX_ARRAY\s*\(' + struct_args_pattern + r',\s*' +
-                struct_args_pattern + r'\)', re.S), r'\1 \2[]'),
-        (KernRe(r'DEFINE_DMA_UNMAP_ADDR\s*\(' + struct_args_pattern + r'\)', re.S), r'dma_addr_t \1'),
-        (KernRe(r'DEFINE_DMA_UNMAP_LEN\s*\(' + struct_args_pattern + r'\)', re.S), r'__u32 \1'),
-        (KernRe(r'VIRTIO_DECLARE_FEATURES\(([\w_]+)\)'), r'union { u64 \1; u64 \1_array[VIRTIO_FEATURES_U64S]; }'),
-
-        (NestedMatch(r"__cond_acquires\s*\("), ""),
-        (NestedMatch(r"__cond_releases\s*\("), ""),
-        (NestedMatch(r"__acquires\s*\("), ""),
-        (NestedMatch(r"__releases\s*\("), ""),
-        (NestedMatch(r"__must_hold\s*\("), ""),
-        (NestedMatch(r"__must_not_hold\s*\("), ""),
-        (NestedMatch(r"__must_hold_shared\s*\("), ""),
-        (NestedMatch(r"__cond_acquires_shared\s*\("), ""),
-        (NestedMatch(r"__acquires_shared\s*\("), ""),
-        (NestedMatch(r"__releases_shared\s*\("), ""),
-        (NestedMatch(r'\bSTRUCT_GROUP\('), r'\0'),
+        (CMatch("struct_group"), r"struct { \2+ };"),
+        (CMatch("struct_group_attr"), r"struct { \3+ };"),
+        (CMatch("struct_group_tagged"), r"struct { \3+ };"),
+        (CMatch("__struct_group"), r"struct { \4+ };"),
     ]
 
     #: Transforms for function prototypes.
     function_xforms = [
-        (KernRe(r"^static +"), ""),
-        (KernRe(r"^extern +"), ""),
-        (KernRe(r"^asmlinkage +"), ""),
-        (KernRe(r"^inline +"), ""),
-        (KernRe(r"^__inline__ +"), ""),
-        (KernRe(r"^__inline +"), ""),
-        (KernRe(r"^__always_inline +"), ""),
-        (KernRe(r"^noinline +"), ""),
-        (KernRe(r"^__FORTIFY_INLINE +"), ""),
-        (KernRe(r"__init +"), ""),
-        (KernRe(r"__init_or_module +"), ""),
-        (KernRe(r"__exit +"), ""),
-        (KernRe(r"__deprecated +"), ""),
-        (KernRe(r"__flatten +"), ""),
-        (KernRe(r"__meminit +"), ""),
-        (KernRe(r"__must_check +"), ""),
-        (KernRe(r"__weak +"), ""),
-        (KernRe(r"__sched +"), ""),
-        (KernRe(r"_noprof"), ""),
-        (KernRe(r"__always_unused *"), ""),
-        (KernRe(r"__printf\s*\(\s*\d*\s*,\s*\d*\s*\) +"), ""),
-        (KernRe(r"__(?:re)?alloc_size\s*\(\s*\d+\s*(?:,\s*\d+\s*)?\) +"), ""),
-        (KernRe(r"__diagnose_as\s*\(\s*\S+\s*(?:,\s*\d+\s*)*\) +"), ""),
-        (KernRe(r"DECL_BUCKET_PARAMS\s*\(\s*(\S+)\s*,\s*(\S+)\s*\)"), r"\1, \2"),
-        (KernRe(r"__no_context_analysis\s*"), ""),
-        (KernRe(r"__attribute_const__ +"), ""),
-        (KernRe(r"__attribute__\s*\(\((?:[\w\s]+(?:\([^)]*\))?\s*,?)+\)\)\s+"), ""),
+        (CMatch("static"), ""),
+        (CMatch("extern"), ""),
+        (CMatch("asmlinkage"), ""),
+        (CMatch("inline"), ""),
+        (CMatch("__inline__"), ""),
+        (CMatch("__inline"), ""),
+        (CMatch("__always_inline"), ""),
+        (CMatch("noinline"), ""),
+        (CMatch("__FORTIFY_INLINE"), ""),
+        (CMatch("__init"), ""),
+        (CMatch("__init_or_module"), ""),
+        (CMatch("__exit"), ""),
+        (CMatch("__deprecated"), ""),
+        (CMatch("__flatten"), ""),
+        (CMatch("__meminit"), ""),
+        (CMatch("__must_check"), ""),
+        (CMatch("__weak"), ""),
+        (CMatch("__sched"), ""),
+        (CMatch("__always_unused"), ""),
+        (CMatch("__printf"), ""),
+        (CMatch("__(?:re)?alloc_size"), ""),
+        (CMatch("__diagnose_as"), ""),
+        (CMatch("DECL_BUCKET_PARAMS"), r"\1, \2"),
+        (CMatch("__no_context_analysis"), ""),
+        (CMatch("__attribute_const__"), ""),
+        (CMatch("__attribute__"), ""),
+
+        #
+        # HACK: this is similar to process_export() hack. It is meant to
+        # drop _noproof from function name. See for instance:
+        # ahash_request_alloc kernel-doc declaration at include/crypto/hash.h.
+        #
+        (KernRe("_noprof"), ""),
     ]
 
     #: Transforms for variable prototypes.
     var_xforms = [
-        (KernRe(r"__read_mostly"), ""),
-        (KernRe(r"__ro_after_init"), ""),
-        (KernRe(r'\s*__guarded_by\s*\([^\)]*\)', re.S), ""),
-        (KernRe(r'\s*__pt_guarded_by\s*\([^\)]*\)', re.S), ""),
-        (KernRe(r"LIST_HEAD\(([\w_]+)\)"), r"struct list_head \1"),
+        (CMatch("__read_mostly"), ""),
+        (CMatch("__ro_after_init"), ""),
+        (CMatch("__guarded_by"), ""),
+        (CMatch("__pt_guarded_by"), ""),
+        (CMatch("LIST_HEAD"), r"struct list_head \1"),
+
         (KernRe(r"(?://.*)$"), ""),
         (KernRe(r"(?:/\*.*\*/)"), ""),
         (KernRe(r";$"), ""),
@@ -158,13 +129,25 @@ class CTransforms:
         "var": var_xforms,
     }
 
-    def apply(self, xforms_type, text):
+    def apply(self, xforms_type, source):
         """
-        Apply a set of transforms to a block of text.
+        Apply a set of transforms to a block of source.
+
+        As tokenizer is used here, this function also remove comments
+        at the end.
         """
         if xforms_type not in self.xforms:
-            return text
+            return source
+
+        if isinstance(source, str):
+            source = CTokenizer(source)
 
         for search, subst in self.xforms[xforms_type]:
-            text = search.sub(subst, text)
-        return text
+            #
+            # KernRe only accept strings.
+            #
+            if isinstance(search, KernRe):
+                source = str(source)
+
+            source = search.sub(subst, source)
+        return str(source)
diff --git a/tools/lib/python/unittest_helper.py b/tools/lib/python/unittest_helper.py
new file mode 100755
index 000000000000..55d444cd73d4
--- /dev/null
+++ b/tools/lib/python/unittest_helper.py
@@ -0,0 +1,353 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+# Copyright(c) 2025-2026: Mauro Carvalho Chehab <mchehab@kernel.org>.
+#
+# pylint: disable=C0103,R0912,R0914,E1101
+
+"""
+Provides helper functions and classes execute python unit tests.
+
+Those help functions provide a nice colored output summary of each
+executed test and, when a test fails, it shows the different in diff
+format when running in verbose mode, like::
+
+    $ tools/unittests/nested_match.py -v
+    ...
+    Traceback (most recent call last):
+    File "/new_devel/docs/tools/unittests/nested_match.py", line 69, in test_count_limit
+        self.assertEqual(replaced, "bar(a); bar(b); foo(c)")
+        ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+    AssertionError: 'bar(a) foo(b); foo(c)' != 'bar(a); bar(b); foo(c)'
+    - bar(a) foo(b); foo(c)
+    ?       ^^^^
+    + bar(a); bar(b); foo(c)
+    ?       ^^^^^
+    ...
+
+It also allows filtering what tests will be executed via ``-k`` parameter.
+
+Typical usage is to do::
+
+    from unittest_helper import run_unittest
+    ...
+
+    if __name__ == "__main__":
+        run_unittest(__file__)
+
+If passing arguments is needed, on a more complex scenario, it can be
+used like on this example::
+
+    from unittest_helper import TestUnits, run_unittest
+    ...
+    env = {'sudo': ""}
+    ...
+    if __name__ == "__main__":
+        runner = TestUnits()
+        base_parser = runner.parse_args()
+        base_parser.add_argument('--sudo', action='store_true',
+                                help='Enable tests requiring sudo privileges')
+
+        args = base_parser.parse_args()
+
+        # Update module-level flag
+        if args.sudo:
+            env['sudo'] = "1"
+
+        # Run tests with customized arguments
+        runner.run(__file__, parser=base_parser, args=args, env=env)
+"""
+
+import argparse
+import atexit
+import os
+import re
+import unittest
+import sys
+
+from unittest.mock import patch
+
+
+class Summary(unittest.TestResult):
+    """
+    Overrides ``unittest.TestResult`` class to provide a nice colored
+    summary. When in verbose mode, displays actual/expected difference in
+    unified diff format.
+    """
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+
+        #: Dictionary to store organized test results.
+        self.test_results = {}
+
+        #: max length of the test names.
+        self.max_name_length = 0
+
+    def startTest(self, test):
+        super().startTest(test)
+        test_id = test.id()
+        parts = test_id.split(".")
+
+        # Extract module, class, and method names
+        if len(parts) >= 3:
+            module_name = parts[-3]
+        else:
+            module_name = ""
+        if len(parts) >= 2:
+            class_name = parts[-2]
+        else:
+            class_name = ""
+
+        method_name = parts[-1]
+
+        # Build the hierarchical structure
+        if module_name not in self.test_results:
+            self.test_results[module_name] = {}
+
+        if class_name not in self.test_results[module_name]:
+            self.test_results[module_name][class_name] = []
+
+        # Track maximum test name length for alignment
+        display_name = f"{method_name}:"
+
+        self.max_name_length = max(len(display_name), self.max_name_length)
+
+    def _record_test(self, test, status):
+        test_id = test.id()
+        parts = test_id.split(".")
+        if len(parts) >= 3:
+            module_name = parts[-3]
+        else:
+            module_name = ""
+        if len(parts) >= 2:
+            class_name = parts[-2]
+        else:
+            class_name = ""
+        method_name = parts[-1]
+        self.test_results[module_name][class_name].append((method_name, status))
+
+    def addSuccess(self, test):
+        super().addSuccess(test)
+        self._record_test(test, "OK")
+
+    def addFailure(self, test, err):
+        super().addFailure(test, err)
+        self._record_test(test, "FAIL")
+
+    def addError(self, test, err):
+        super().addError(test, err)
+        self._record_test(test, "ERROR")
+
+    def addSkip(self, test, reason):
+        super().addSkip(test, reason)
+        self._record_test(test, f"SKIP ({reason})")
+
+    def printResults(self):
+        """
+        Print results using colors if tty.
+        """
+        # Check for ANSI color support
+        use_color = sys.stdout.isatty()
+        COLORS = {
+            "OK":            "\033[32m",   # Green
+            "FAIL":          "\033[31m",   # Red
+            "SKIP":          "\033[1;33m", # Yellow
+            "PARTIAL":       "\033[33m",   # Orange
+            "EXPECTED_FAIL": "\033[36m",   # Cyan
+            "reset":         "\033[0m",    # Reset to default terminal color
+        }
+        if not use_color:
+            for c in COLORS:
+                COLORS[c] = ""
+
+        # Calculate maximum test name length
+        if not self.test_results:
+            return
+        try:
+            lengths = []
+            for module in self.test_results.values():
+                for tests in module.values():
+                    for test_name, _ in tests:
+                        lengths.append(len(test_name) + 1)  # +1 for colon
+            max_length = max(lengths) + 2  # Additional padding
+        except ValueError:
+            sys.exit("Test list is empty")
+
+        # Print results
+        for module_name, classes in self.test_results.items():
+            print(f"{module_name}:")
+            for class_name, tests in classes.items():
+                print(f"    {class_name}:")
+                for test_name, status in tests:
+                    # Get base status without reason for SKIP
+                    if status.startswith("SKIP"):
+                        status_code = status.split()[0]
+                    else:
+                        status_code = status
+                    color = COLORS.get(status_code, "")
+                    print(
+                        f"        {test_name + ':':<{max_length}}{color}{status}{COLORS['reset']}"
+                    )
+            print()
+
+        # Print summary
+        print(f"\nRan {self.testsRun} tests", end="")
+        if hasattr(self, "timeTaken"):
+            print(f" in {self.timeTaken:.3f}s", end="")
+        print()
+
+        if not self.wasSuccessful():
+            print(f"\n{COLORS['FAIL']}FAILED (", end="")
+            failures = getattr(self, "failures", [])
+            errors = getattr(self, "errors", [])
+            if failures:
+                print(f"failures={len(failures)}", end="")
+            if errors:
+                if failures:
+                    print(", ", end="")
+                print(f"errors={len(errors)}", end="")
+            print(f"){COLORS['reset']}")
+
+
+def flatten_suite(suite):
+    """Flatten test suite hierarchy."""
+    tests = []
+    for item in suite:
+        if isinstance(item, unittest.TestSuite):
+            tests.extend(flatten_suite(item))
+        else:
+            tests.append(item)
+    return tests
+
+
+class TestUnits:
+    """
+    Helper class to set verbosity level.
+
+    This class discover test files, import its unittest classes and
+    executes the test on it.
+    """
+    def parse_args(self):
+        """Returns a parser for command line arguments."""
+        parser = argparse.ArgumentParser(description="Test runner with regex filtering")
+        parser.add_argument("-v", "--verbose", action="count", default=1)
+        parser.add_argument("-f", "--failfast", action="store_true")
+        parser.add_argument("-k", "--keyword",
+                            help="Regex pattern to filter test methods")
+        return parser
+
+    def run(self, caller_file=None, pattern=None,
+            suite=None, parser=None, args=None, env=None):
+        """
+        Execute all tests from the unity test file.
+
+        It contains several optional parameters:
+
+        ``caller_file``:
+            -  name of the file that contains test.
+
+               typical usage is to place __file__ at the caller test, e.g.::
+
+                    if __name__ == "__main__":
+                        TestUnits().run(__file__)
+
+        ``pattern``:
+            - optional pattern to match multiple file names. Defaults
+              to basename of ``caller_file``.
+
+        ``suite``:
+            - an unittest suite initialized by the caller using
+              ``unittest.TestLoader().discover()``.
+
+        ``parser``:
+            - an argparse parser. If not defined, this helper will create
+              one.
+
+        ``args``:
+            - an ``argparse.Namespace`` data filled by the caller.
+
+        ``env``:
+            - environment variables that will be passed to the test suite
+
+        At least ``caller_file`` or ``suite`` must be used, otherwise a
+        ``TypeError`` will be raised.
+        """
+        if not args:
+            if not parser:
+                parser = self.parse_args()
+            args = parser.parse_args()
+
+        if not caller_file and not suite:
+            raise TypeError("Either caller_file or suite is needed at TestUnits")
+
+        verbose = args.verbose
+
+        if not env:
+            env = os.environ.copy()
+
+        env["VERBOSE"] = f"{verbose}"
+
+        patcher = patch.dict(os.environ, env)
+        patcher.start()
+        # ensure it gets stopped after
+        atexit.register(patcher.stop)
+
+
+        if verbose >= 2:
+            unittest.TextTestRunner(verbosity=verbose).run = lambda suite: suite
+
+        # Load ONLY tests from the calling file
+        if not suite:
+            if not pattern:
+                pattern = caller_file
+
+            loader = unittest.TestLoader()
+            suite = loader.discover(start_dir=os.path.dirname(caller_file),
+                                    pattern=os.path.basename(caller_file))
+
+        # Flatten the suite for environment injection
+        tests_to_inject = flatten_suite(suite)
+
+        # Filter tests by method name if -k specified
+        if args.keyword:
+            try:
+                pattern = re.compile(args.keyword)
+                filtered_suite = unittest.TestSuite()
+                for test in tests_to_inject:  # Use the pre-flattened list
+                    method_name = test.id().split(".")[-1]
+                    if pattern.search(method_name):
+                        filtered_suite.addTest(test)
+                suite = filtered_suite
+            except re.error as e:
+                sys.stderr.write(f"Invalid regex pattern: {e}\n")
+                sys.exit(1)
+        else:
+            # Maintain original suite structure if no keyword filtering
+            suite = unittest.TestSuite(tests_to_inject)
+
+        if verbose >= 2:
+            resultclass = None
+        else:
+            resultclass = Summary
+
+        runner = unittest.TextTestRunner(verbosity=args.verbose,
+                                            resultclass=resultclass,
+                                            failfast=args.failfast)
+        result = runner.run(suite)
+        if resultclass:
+            result.printResults()
+
+        sys.exit(not result.wasSuccessful())
+
+
+def run_unittest(fname):
+    """
+    Basic usage of TestUnits class.
+
+    Use it when there's no need to pass any extra argument to the tests
+    with. The recommended way is to place this at the end of each
+    unittest module::
+
+        if __name__ == "__main__":
+            run_unittest(__file__)
+    """
+    TestUnits().run(fname)
diff --git a/tools/unittests/run.py b/tools/unittests/run.py
new file mode 100755
index 000000000000..8c19036d43a1
--- /dev/null
+++ b/tools/unittests/run.py
@@ -0,0 +1,17 @@
+#!/bin/env python3
+import os
+import unittest
+import sys
+
+TOOLS_DIR=os.path.join(os.path.dirname(os.path.realpath(__file__)), "..")
+sys.path.insert(0, TOOLS_DIR)
+
+from lib.python.unittest_helper import TestUnits
+
+if __name__ == "__main__":
+    loader = unittest.TestLoader()
+
+    suite = loader.discover(start_dir=os.path.join(TOOLS_DIR, "unittests"),
+                            pattern="test*.py")
+
+    TestUnits().run("", suite=suite)
diff --git a/tools/unittests/test_cmatch.py b/tools/unittests/test_cmatch.py
new file mode 100755
index 000000000000..7b996f83784d
--- /dev/null
+++ b/tools/unittests/test_cmatch.py
@@ -0,0 +1,821 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+# Copyright(c) 2026: Mauro Carvalho Chehab <mchehab@kernel.org>.
+#
+# pylint: disable=C0413,R0904
+
+
+"""
+Unit tests for kernel-doc CMatch.
+"""
+
+import os
+import re
+import sys
+import unittest
+
+
+# Import Python modules
+
+SRC_DIR = os.path.dirname(os.path.realpath(__file__))
+sys.path.insert(0, os.path.join(SRC_DIR, "../lib/python"))
+
+from kdoc.c_lex import CMatch
+from kdoc.kdoc_re import KernRe
+from unittest_helper import run_unittest
+
+#
+# Override unittest.TestCase to better compare diffs ignoring whitespaces
+#
+class TestCaseDiff(unittest.TestCase):
+    """
+    Disable maximum limit on diffs and add a method to better
+    handle diffs with whitespace differences.
+    """
+
+    @classmethod
+    def setUpClass(cls):
+        """Ensure that there won't be limit for diffs"""
+        cls.maxDiff = None
+
+
+#
+# Tests doing with different macros
+#
+
+class TestSearch(TestCaseDiff):
+    """
+    Test search mechanism
+    """
+
+    def test_search_acquires_simple(self):
+        line = "__acquires(ctx) foo();"
+        result = ", ".join(CMatch("__acquires").search(line))
+        self.assertEqual(result, "__acquires(ctx)")
+
+    def test_search_acquires_multiple(self):
+        line = "__acquires(ctx) __acquires(other) bar();"
+        result = ", ".join(CMatch("__acquires").search(line))
+        self.assertEqual(result, "__acquires(ctx), __acquires(other)")
+
+    def test_search_acquires_nested_paren(self):
+        line = "__acquires((ctx1, ctx2)) baz();"
+        result = ", ".join(CMatch("__acquires").search(line))
+        self.assertEqual(result, "__acquires((ctx1, ctx2))")
+
+    def test_search_must_hold(self):
+        line = "__must_hold(&lock) do_something();"
+        result = ", ".join(CMatch("__must_hold").search(line))
+        self.assertEqual(result, "__must_hold(&lock)")
+
+    def test_search_must_hold_shared(self):
+        line = "__must_hold_shared(RCU) other();"
+        result = ", ".join(CMatch("__must_hold_shared").search(line))
+        self.assertEqual(result, "__must_hold_shared(RCU)")
+
+    def test_search_no_false_positive(self):
+        line = "call__acquires(foo);  // should stay intact"
+        result = ", ".join(CMatch(r"__acquires").search(line))
+        self.assertEqual(result, "")
+
+    def test_search_no_macro_remains(self):
+        line = "do_something_else();"
+        result = ", ".join(CMatch("__acquires").search(line))
+        self.assertEqual(result, "")
+
+    def test_search_no_function(self):
+        line = "something"
+        result = ", ".join(CMatch(line).search(line))
+        self.assertEqual(result, "")
+
+#
+# Override unittest.TestCase to better compare diffs ignoring whitespaces
+#
+class TestCaseDiff(unittest.TestCase):
+    """
+    Disable maximum limit on diffs and add a method to better
+    handle diffs with whitespace differences.
+    """
+
+    @classmethod
+    def setUpClass(cls):
+        """Ensure that there won't be limit for diffs"""
+        cls.maxDiff = None
+
+    def assertLogicallyEqual(self, a, b):
+        """
+        Compare two results ignoring multiple whitespace differences.
+
+        This is useful to check more complex matches picked from examples.
+        On a plus side, we also don't need to use dedent.
+        Please notice that line breaks still need to match. We might
+        remove it at the regex, but this way, checking the diff is easier.
+        """
+        a = re.sub(r"[\t ]+", " ", a.strip())
+        b = re.sub(r"[\t ]+", " ", b.strip())
+
+        a = re.sub(r"\s+\n", "\n", a)
+        b = re.sub(r"\s+\n", "\n", b)
+
+        a = re.sub(" ;", ";", a)
+        b = re.sub(" ;", ";", b)
+
+        self.assertEqual(a, b)
+
+#
+# Tests doing with different macros
+#
+
+class TestSubMultipleMacros(TestCaseDiff):
+    """
+    Tests doing with different macros.
+
+    Here, we won't use assertLogicallyEqual. Instead, we'll check if each
+    of the expected patterns are present at the answer.
+    """
+
+    def test_acquires_simple(self):
+        """Simple replacement test with __acquires"""
+        line = "__acquires(ctx) foo();"
+        result = CMatch(r"__acquires").sub("REPLACED", line)
+
+        self.assertEqual("REPLACED foo();", result)
+
+    def test_acquires_multiple(self):
+        """Multiple __acquires"""
+        line = "__acquires(ctx) __acquires(other) bar();"
+        result = CMatch(r"__acquires").sub("REPLACED", line)
+
+        self.assertEqual("REPLACED REPLACED bar();", result)
+
+    def test_acquires_nested_paren(self):
+        """__acquires with nested pattern"""
+        line = "__acquires((ctx1, ctx2)) baz();"
+        result = CMatch(r"__acquires").sub("REPLACED", line)
+
+        self.assertEqual("REPLACED baz();", result)
+
+    def test_must_hold(self):
+        """__must_hold with a pointer"""
+        line = "__must_hold(&lock) do_something();"
+        result = CMatch(r"__must_hold").sub("REPLACED", line)
+
+        self.assertNotIn("__must_hold(", result)
+        self.assertIn("do_something();", result)
+
+    def test_must_hold_shared(self):
+        """__must_hold with an upercase defined value"""
+        line = "__must_hold_shared(RCU) other();"
+        result = CMatch(r"__must_hold_shared").sub("REPLACED", line)
+
+        self.assertNotIn("__must_hold_shared(", result)
+        self.assertIn("other();", result)
+
+    def test_no_false_positive(self):
+        """
+        Ensure that unrelated text containing similar patterns is preserved
+        """
+        line = "call__acquires(foo);  // should stay intact"
+        result = CMatch(r"\b__acquires").sub("REPLACED", line)
+
+        self.assertLogicallyEqual(result, "call__acquires(foo);")
+
+    def test_mixed_macros(self):
+        """Add a mix of macros"""
+        line = "__acquires(ctx) __releases(ctx) __must_hold(&lock) foo();"
+
+        result = CMatch(r"__acquires").sub("REPLACED", line)
+        result = CMatch(r"__releases").sub("REPLACED", result)
+        result = CMatch(r"__must_hold").sub("REPLACED", result)
+
+        self.assertNotIn("__acquires(", result)
+        self.assertNotIn("__releases(", result)
+        self.assertNotIn("__must_hold(", result)
+
+        self.assertIn("foo();", result)
+
+    def test_no_macro_remains(self):
+        """Ensures that unmatched macros are untouched"""
+        line = "do_something_else();"
+        result = CMatch(r"__acquires").sub("REPLACED", line)
+
+        self.assertEqual(result, line)
+
+    def test_no_function(self):
+        """Ensures that no functions will remain untouched"""
+        line = "something"
+        result = CMatch(line).sub("REPLACED", line)
+
+        self.assertEqual(result, line)
+
+#
+# Check if the diff is logically equivalent. To simplify, the tests here
+# use a single macro name for all replacements.
+#
+
+class TestSubSimple(TestCaseDiff):
+    """
+    Test argument replacements.
+
+    Here, the function name can be anything. So, we picked __attribute__(),
+    to mimic a macro found at the Kernel, but none of the replacements her
+    has any relationship with the Kernel usage.
+    """
+
+    MACRO = "__attribute__"
+
+    @classmethod
+    def setUpClass(cls):
+        """Define a CMatch to be used for all tests"""
+        cls.matcher = CMatch(cls.MACRO)
+
+    def test_sub_with_capture(self):
+        """Test all arguments replacement with a single arg"""
+        line = f"{self.MACRO}(&ctx)\nfoo();"
+
+        result = self.matcher.sub(r"ACQUIRED(\0)", line)
+
+        self.assertLogicallyEqual("ACQUIRED(&ctx)\nfoo();", result)
+
+    def test_sub_zero_placeholder(self):
+        """Test all arguments replacement with a multiple args"""
+        line = f"{self.MACRO}(arg1, arg2)\nbar();"
+
+        result = self.matcher.sub(r"REPLACED(\0)", line)
+
+        self.assertLogicallyEqual("REPLACED(arg1, arg2)\nbar();", result)
+
+    def test_sub_single_placeholder(self):
+        """Single replacement rule for \1"""
+        line = f"{self.MACRO}(ctx, boo)\nfoo();"
+        result = self.matcher.sub(r"ACQUIRED(\1)", line)
+
+        self.assertLogicallyEqual("ACQUIRED(ctx)\nfoo();", result)
+
+    def test_sub_multiple_placeholders(self):
+        """Replacement rule for both \1 and \2"""
+        line = f"{self.MACRO}(arg1, arg2)\nbar();"
+        result = self.matcher.sub(r"REPLACE(\1, \2)", line)
+
+        self.assertLogicallyEqual("REPLACE(arg1, arg2)\nbar();", result)
+
+    def test_sub_mixed_placeholders(self):
+        """Replacement rule for \0, \1 and additional text"""
+        line = f"{self.MACRO}(foo, bar)\nbaz();"
+        result = self.matcher.sub(r"ALL(\0) FIRST(\1)", line)
+
+        self.assertLogicallyEqual("ALL(foo, bar) FIRST(foo)\nbaz();", result)
+
+    def test_sub_no_placeholder(self):
+        """Replacement without placeholders"""
+        line = f"{self.MACRO}(arg)\nfoo();"
+        result = self.matcher.sub(r"NO_BACKREFS()", line)
+
+        self.assertLogicallyEqual("NO_BACKREFS()\nfoo();", result)
+
+    def test_sub_count_parameter(self):
+        """Verify that the algorithm stops after the requested count"""
+        line = f"{self.MACRO}(a1) x();\n{self.MACRO}(a2) y();"
+        result = self.matcher.sub(r"ONLY_FIRST(\1) ", line, count=1)
+
+        self.assertLogicallyEqual(f"ONLY_FIRST(a1) x();\n{self.MACRO}(a2) y();",
+                                  result)
+
+    def test_strip_multiple_acquires(self):
+        """Check if spaces between removed delimiters will be dropped"""
+        line = f"int {self.MACRO}(1)  {self.MACRO}(2 )   {self.MACRO}(3) foo;"
+        result = self.matcher.sub("", line)
+
+        self.assertLogicallyEqual(result, "int foo;")
+
+    def test_rise_early_greedy(self):
+        line = f"{self.MACRO}(a, b, c, d);"
+        sub = r"\1, \2+, \3"
+
+        with self.assertRaises(ValueError):
+            result = self.matcher.sub(sub, line)
+
+    def test_rise_multiple_greedy(self):
+        line = f"{self.MACRO}(a, b, c, d);"
+        sub = r"\1, \2+, \3+"
+
+        with self.assertRaises(ValueError):
+            result = self.matcher.sub(sub, line)
+
+#
+# Test replacements with slashrefs
+#
+
+
+class TestSubWithLocalXforms(TestCaseDiff):
+    """
+    Test diferent usecase patterns found at the Kernel.
+
+    Here, replacements using both CMatch and KernRe can be tested,
+    as it will import the actual replacement rules used by kernel-doc.
+    """
+
+    struct_xforms = [
+        (CMatch("__attribute__"), ' '),
+        (CMatch('__aligned'), ' '),
+        (CMatch('__counted_by'), ' '),
+        (CMatch('__counted_by_(le|be)'), ' '),
+        (CMatch('__guarded_by'), ' '),
+        (CMatch('__pt_guarded_by'), ' '),
+
+        (CMatch('__cacheline_group_(begin|end)'), ''),
+
+        (CMatch('struct_group'), r'\2'),
+        (CMatch('struct_group_attr'), r'\3'),
+        (CMatch('struct_group_tagged'), r'struct \1 { \3+ } \2;'),
+        (CMatch('__struct_group'), r'\4'),
+
+        (CMatch('__ETHTOOL_DECLARE_LINK_MODE_MASK'), r'DECLARE_BITMAP(\1, __ETHTOOL_LINK_MODE_MASK_NBITS)'),
+        (CMatch('DECLARE_PHY_INTERFACE_MASK',), r'DECLARE_BITMAP(\1, PHY_INTERFACE_MODE_MAX)'),
+        (CMatch('DECLARE_BITMAP'), r'unsigned long \1[BITS_TO_LONGS(\2)]'),
+
+        (CMatch('DECLARE_HASHTABLE'), r'unsigned long \1[1 << ((\2) - 1)]'),
+        (CMatch('DECLARE_KFIFO'), r'\2 *\1'),
+        (CMatch('DECLARE_KFIFO_PTR'), r'\2 *\1'),
+        (CMatch('(?:__)?DECLARE_FLEX_ARRAY'), r'\1 \2[]'),
+        (CMatch('DEFINE_DMA_UNMAP_ADDR'), r'dma_addr_t \1'),
+        (CMatch('DEFINE_DMA_UNMAP_LEN'), r'__u32 \1'),
+        (CMatch('VIRTIO_DECLARE_FEATURES'), r'union { u64 \1; u64 \1_array[VIRTIO_FEATURES_U64S]; }'),
+    ]
+
+    function_xforms = [
+        (CMatch('__printf'), ""),
+        (CMatch('__(?:re)?alloc_size'), ""),
+        (CMatch("__diagnose_as"), ""),
+        (CMatch("DECL_BUCKET_PARAMS"), r"\1, \2"),
+
+        (CMatch("__cond_acquires"), ""),
+        (CMatch("__cond_releases"), ""),
+        (CMatch("__acquires"), ""),
+        (CMatch("__releases"), ""),
+        (CMatch("__must_hold"), ""),
+        (CMatch("__must_not_hold"), ""),
+        (CMatch("__must_hold_shared"), ""),
+        (CMatch("__cond_acquires_shared"), ""),
+        (CMatch("__acquires_shared"), ""),
+        (CMatch("__releases_shared"), ""),
+        (CMatch("__attribute__"), ""),
+    ]
+
+    var_xforms = [
+        (CMatch('__guarded_by'), ""),
+        (CMatch('__pt_guarded_by'), ""),
+        (CMatch("LIST_HEAD"), r"struct list_head \1"),
+    ]
+
+    #: Transforms main dictionary used at apply_transforms().
+    xforms = {
+        "struct": struct_xforms,
+        "func": function_xforms,
+        "var": var_xforms,
+    }
+
+    @classmethod
+    def apply_transforms(cls, xform_type, text):
+        """
+        Mimic the behavior of kdoc_parser.apply_transforms() method.
+
+        For each element of STRUCT_XFORMS, apply apply_transforms.
+
+        There are two parameters:
+
+        - ``xform_type``
+            Can be ``func``, ``struct`` or ``var``;
+        - ``text``
+            The text where the sub patterns from CTransforms will be applied.
+        """
+        for search, subst in cls.xforms.get(xform_type):
+            text = search.sub(subst, text)
+
+        return text.strip()
+
+        cls.matcher = CMatch(r"struct_group[\w\_]*")
+
+    def test_struct_group(self):
+        """
+        Test struct_group using a pattern from
+        drivers/net/ethernet/asix/ax88796c_main.h.
+        """
+        line = """
+            struct tx_pkt_info {
+                    struct_group(tx_overhead,
+                            struct tx_sop_header sop;
+                            struct tx_segment_header seg;
+                    );
+                    struct tx_eop_header eop;
+                    u16 pkt_len;
+                    u16 seq_num;
+            };
+        """
+        expected = """
+            struct tx_pkt_info {
+                    struct tx_sop_header sop;
+                    struct tx_segment_header seg;
+                    struct tx_eop_header eop;
+                    u16 pkt_len;
+                    u16 seq_num;
+            };
+        """
+
+        result = self.apply_transforms("struct", line)
+        self.assertLogicallyEqual(result, expected)
+
+    def test_struct_group_attr(self):
+        """
+        Test two struct_group_attr using patterns from fs/smb/client/cifspdu.h.
+        """
+        line = """
+            typedef struct smb_com_open_rsp {
+                struct smb_hdr hdr;     /* wct = 34 BB */
+                __u8 AndXCommand;
+                __u8 AndXReserved;
+                __le16 AndXOffset;
+                __u8 OplockLevel;
+                __u16 Fid;
+                __le32 CreateAction;
+                struct_group_attr(common_attributes,,
+                    __le64 CreationTime;
+                    __le64 LastAccessTime;
+                    __le64 LastWriteTime;
+                    __le64 ChangeTime;
+                    __le32 FileAttributes;
+                );
+                __le64 AllocationSize;
+                __le64 EndOfFile;
+                __le16 FileType;
+                __le16 DeviceState;
+                __u8 DirectoryFlag;
+                __u16 ByteCount;        /* bct = 0 */
+            } OPEN_RSP;
+            typedef struct {
+                struct_group_attr(common_attributes,,
+                    __le64 CreationTime;
+                    __le64 LastAccessTime;
+                    __le64 LastWriteTime;
+                    __le64 ChangeTime;
+                    __le32 Attributes;
+                );
+                __u32 Pad1;
+                __le64 AllocationSize;
+                __le64 EndOfFile;
+                __le32 NumberOfLinks;
+                __u8 DeletePending;
+                __u8 Directory;
+                __u16 Pad2;
+                __le32 EASize;
+                __le32 FileNameLength;
+                union {
+                    char __pad;
+                    DECLARE_FLEX_ARRAY(char, FileName);
+                };
+            } FILE_ALL_INFO;       /* level 0x107 QPathInfo */
+        """
+        expected = """
+            typedef struct smb_com_open_rsp {
+                struct smb_hdr hdr;
+                __u8 AndXCommand;
+                __u8 AndXReserved;
+                __le16 AndXOffset;
+                __u8 OplockLevel;
+                __u16 Fid;
+                __le32 CreateAction;
+                __le64 CreationTime;
+                __le64 LastAccessTime;
+                __le64 LastWriteTime;
+                __le64 ChangeTime;
+                __le32 FileAttributes;
+                __le64 AllocationSize;
+                __le64 EndOfFile;
+                __le16 FileType;
+                __le16 DeviceState;
+                __u8 DirectoryFlag;
+                __u16 ByteCount;
+            } OPEN_RSP;
+        typedef struct {
+            __le64 CreationTime;
+            __le64 LastAccessTime;
+            __le64 LastWriteTime;
+            __le64 ChangeTime;
+            __le32 Attributes;
+            __u32 Pad1;
+            __le64 AllocationSize;
+            __le64 EndOfFile;
+            __le32 NumberOfLinks;
+            __u8 DeletePending;
+            __u8 Directory;
+            __u16 Pad2;
+            __le32 EASize;
+            __le32 FileNameLength;
+            union {
+                char __pad;
+                char FileName[];
+            };
+        } FILE_ALL_INFO;
+        """
+
+        result = self.apply_transforms("struct", line)
+        self.assertLogicallyEqual(result, expected)
+
+    def test_raw_struct_group(self):
+        """
+        Test a __struct_group pattern from include/uapi/cxl/features.h.
+        """
+        line = """
+            struct cxl_mbox_get_sup_feats_out {
+                __struct_group(cxl_mbox_get_sup_feats_out_hdr, hdr, /* empty */,
+                    __le16 num_entries;
+                    __le16 supported_feats;
+                    __u8 reserved[4];
+                );
+                struct cxl_feat_entry ents[] __counted_by_le(num_entries);
+            } __attribute__ ((__packed__));
+        """
+        expected = """
+            struct cxl_mbox_get_sup_feats_out {
+                __le16 num_entries;
+                __le16 supported_feats;
+                __u8 reserved[4];
+                struct cxl_feat_entry ents[];
+            };
+        """
+
+        result = self.apply_transforms("struct", line)
+        self.assertLogicallyEqual(result, expected)
+
+    def test_raw_struct_group_tagged(self):
+        r"""
+        Test cxl_regs with struct_group_tagged patterns from drivers/cxl/cxl.h.
+
+        NOTE:
+
+            This one has actually a violation from what kernel-doc would
+            expect: Kernel-doc regex expects only 3 members, but this is
+            actually defined as::
+
+                #define struct_group_tagged(TAG, NAME, MEMBERS...)
+
+            The replace expression there is::
+
+                struct \1 { \3 } \2;
+
+            but it should be really something like::
+
+                struct \1 { \3 \4 \5 \6 \7 \8 ... } \2;
+
+            a later fix would be needed to address it.
+
+        """
+        line = """
+            struct cxl_regs {
+                struct_group_tagged(cxl_component_regs, component,
+                    void __iomem *hdm_decoder;
+                    void __iomem *ras;
+                );
+
+
+                /* This is actually a violation: too much commas */
+                struct_group_tagged(cxl_device_regs, device_regs,
+                    void __iomem *status, *mbox, *memdev;
+                );
+
+                struct_group_tagged(cxl_pmu_regs, pmu_regs,
+                    void __iomem *pmu;
+                );
+
+                struct_group_tagged(cxl_rch_regs, rch_regs,
+                    void __iomem *dport_aer;
+                );
+
+                struct_group_tagged(cxl_rcd_regs, rcd_regs,
+                    void __iomem *rcd_pcie_cap;
+                );
+            };
+        """
+        expected = """
+        struct cxl_regs {
+            struct cxl_component_regs {
+                void __iomem *hdm_decoder;
+                void __iomem *ras;
+            } component;
+
+            struct cxl_device_regs {
+                void __iomem *status, *mbox, *memdev;
+            } device_regs;
+
+            struct cxl_pmu_regs {
+                void __iomem *pmu;
+            } pmu_regs;
+
+            struct cxl_rch_regs {
+                void __iomem *dport_aer;
+            } rch_regs;
+
+            struct cxl_rcd_regs {
+                void __iomem *rcd_pcie_cap;
+            } rcd_regs;
+        };
+        """
+
+        result = self.apply_transforms("struct", line)
+        self.assertLogicallyEqual(result, expected)
+
+    def test_struct_group_tagged_with_private(self):
+        """
+        Replace struct_group_tagged with private, using the same regex
+        for the replacement as what happens in xforms_lists.py.
+
+        As the private removal happens outside NestedGroup class, we manually
+        dropped the remaining part of the struct, to simulate what happens
+        at kdoc_parser.
+
+        Taken from include/net/page_pool/types.h
+        """
+        line = """
+            struct page_pool_params {
+                struct_group_tagged(page_pool_params_slow, slow,
+                                    struct net_device *netdev;
+                                    unsigned int queue_idx;
+                                    unsigned int    flags;
+                                    /* private: only under "slow" struct */
+                                    unsigned int ignored;
+                );
+                /* Struct below shall not be ignored */
+                struct_group_tagged(page_pool_params_fast, fast,
+                                    unsigned int    order;
+                                    unsigned int    pool_size;
+                                    int             nid;
+                                    struct device   *dev;
+                                    struct napi_struct *napi;
+                                    enum dma_data_direction dma_dir;
+                                    unsigned int    max_len;
+                                    unsigned int    offset;
+                );
+            };
+        """
+        expected = """
+            struct page_pool_params {
+                struct page_pool_params_slow {
+                    struct net_device *netdev;
+                    unsigned int queue_idx;
+                    unsigned int    flags;
+                } slow;
+                struct page_pool_params_fast {
+                    unsigned int order;
+                    unsigned int    pool_size;
+                    int             nid;
+                    struct device   *dev;
+                    struct napi_struct *napi;
+                    enum dma_data_direction dma_dir;
+                    unsigned int    max_len;
+                    unsigned int    offset;
+                } fast;
+            };
+        """
+
+        result = self.apply_transforms("struct", line)
+        self.assertLogicallyEqual(result, expected)
+
+    def test_struct_kcov(self):
+        """
+        """
+        line = """
+            struct kcov {
+                refcount_t              refcount;
+                spinlock_t              lock;
+                enum kcov_mode          mode __guarded_by(&lock);
+                unsigned int            size __guarded_by(&lock);
+                void                    *area __guarded_by(&lock);
+                struct task_struct      *t __guarded_by(&lock);
+                bool                    remote;
+                unsigned int            remote_size;
+                int                     sequence;
+            };
+        """
+        expected = """
+        """
+
+        result = self.apply_transforms("struct", line)
+        self.assertLogicallyEqual(result, expected)
+
+
+    def test_struct_kcov(self):
+        """
+        Test a struct from kernel/kcov.c.
+        """
+        line = """
+            struct kcov {
+                refcount_t              refcount;
+                spinlock_t              lock;
+                enum kcov_mode          mode __guarded_by(&lock);
+                unsigned int            size __guarded_by(&lock);
+                void                    *area __guarded_by(&lock);
+                struct task_struct      *t __guarded_by(&lock);
+                bool                    remote;
+                unsigned int            remote_size;
+                int                     sequence;
+            };
+        """
+        expected = """
+            struct kcov {
+                refcount_t              refcount;
+                spinlock_t              lock;
+                enum kcov_mode          mode;
+                unsigned int            size;
+                void                    *area;
+                struct task_struct      *t;
+                bool                    remote;
+                unsigned int            remote_size;
+                int                     sequence;
+            };
+        """
+
+        result = self.apply_transforms("struct", line)
+        self.assertLogicallyEqual(result, expected)
+
+    def test_vars_stackdepot(self):
+        """
+        Test guarded_by on vars from lib/stackdepot.c.
+        """
+        line = """
+            size_t pool_offset __guarded_by(&pool_lock) = DEPOT_POOL_SIZE;
+            __guarded_by(&pool_lock) LIST_HEAD(free_stacks);
+            void **stack_pools __pt_guarded_by(&pool_lock);
+        """
+        expected = """
+            size_t pool_offset = DEPOT_POOL_SIZE;
+            struct list_head free_stacks;
+            void **stack_pools;
+        """
+
+        result = self.apply_transforms("var", line)
+        self.assertLogicallyEqual(result, expected)
+
+    def test_functions_with_acquires_and_releases(self):
+        """
+        Test guarded_by on vars from lib/stackdepot.c.
+        """
+        line = """
+            bool prepare_report_consumer(unsigned long *flags,
+                                         const struct access_info *ai,
+                                         struct other_info *other_info) \
+                                        __cond_acquires(true, &report_lock);
+
+            int tcp_sigpool_start(unsigned int id, struct tcp_sigpool *c) \
+                                  __cond_acquires(0, RCU_BH);
+
+            bool undo_report_consumer(unsigned long *flags,
+                                      const struct access_info *ai,
+                                      struct other_info *other_info) \
+                                     __cond_releases(true, &report_lock);
+
+            void debugfs_enter_cancellation(struct file *file,
+                                            struct debugfs_cancellation *c) \
+                                           __acquires(cancellation);
+
+            void debugfs_leave_cancellation(struct file *file,
+                                            struct debugfs_cancellation *c) \
+                                           __releases(cancellation);
+
+            acpi_cpu_flags acpi_os_acquire_lock(acpi_spinlock lockp) \
+                                               __acquires(lockp);
+
+            void acpi_os_release_lock(acpi_spinlock lockp,
+                                      acpi_cpu_flags not_used) \
+                                     __releases(lockp)
+        """
+        expected = """
+            bool prepare_report_consumer(unsigned long *flags,
+                                         const struct access_info *ai,
+                                         struct other_info *other_info);
+
+            int tcp_sigpool_start(unsigned int id, struct tcp_sigpool *c);
+
+            bool undo_report_consumer(unsigned long *flags,
+                                      const struct access_info *ai,
+                                      struct other_info *other_info);
+
+            void debugfs_enter_cancellation(struct file *file,
+                                            struct debugfs_cancellation *c);
+
+            void debugfs_leave_cancellation(struct file *file,
+                                            struct debugfs_cancellation *c);
+
+            acpi_cpu_flags acpi_os_acquire_lock(acpi_spinlock lockp);
+
+            void acpi_os_release_lock(acpi_spinlock lockp,
+                                      acpi_cpu_flags not_used)
+        """
+
+        result = self.apply_transforms("func", line)
+        self.assertLogicallyEqual(result, expected)
+
+#
+# Run all tests
+#
+if __name__ == "__main__":
+    run_unittest(__file__)
diff --git a/tools/unittests/test_tokenizer.py b/tools/unittests/test_tokenizer.py
new file mode 100755
index 000000000000..5634b4a7283e
--- /dev/null
+++ b/tools/unittests/test_tokenizer.py
@@ -0,0 +1,462 @@
+#!/usr/bin/env python3
+
+"""
+Unit tests for struct/union member extractor class.
+"""
+
+
+import os
+import re
+import unittest
+import sys
+
+from unittest.mock import MagicMock
+
+SRC_DIR = os.path.dirname(os.path.realpath(__file__))
+sys.path.insert(0, os.path.join(SRC_DIR, "../lib/python"))
+
+from kdoc.c_lex import CToken, CTokenizer
+from unittest_helper import run_unittest
+
+#
+# List of tests.
+#
+# The code will dynamically generate one test for each key on this dictionary.
+#
+def tokens_to_list(tokens):
+    tuples = []
+
+    for tok in tokens:
+        if tok.kind == CToken.SPACE:
+            continue
+
+        tuples += [(tok.kind, tok.value, tok.level)]
+
+    return tuples
+
+
+def make_tokenizer_test(name, data):
+    """
+    Create a test named ``name`` using parameters given by ``data`` dict.
+    """
+
+    def test(self):
+        """In-lined lambda-like function to run the test"""
+
+        #
+        # Check if logger is working
+        #
+        if "log_level" in data:
+            with self.assertLogs('kdoc.c_lex', level='ERROR') as cm:
+                tokenizer = CTokenizer(data["source"])
+
+            return
+
+        #
+        # Check if tokenizer is producing expected results
+        #
+        tokens = CTokenizer(data["source"]).tokens
+
+        result = tokens_to_list(tokens)
+        expected = tokens_to_list(data["expected"])
+
+        self.assertEqual(result, expected, msg=f"{name}")
+
+    return test
+
+#: Tokenizer tests.
+TESTS_TOKENIZER = {
+    "__run__": make_tokenizer_test,
+
+    "basic_tokens": {
+        "source": """
+            int a; // comment
+            float b = 1.23;
+        """,
+        "expected": [
+            CToken(CToken.NAME, "int"),
+            CToken(CToken.NAME, "a"),
+            CToken(CToken.ENDSTMT, ";"),
+            CToken(CToken.COMMENT, "// comment"),
+            CToken(CToken.NAME, "float"),
+            CToken(CToken.NAME, "b"),
+            CToken(CToken.OP, "="),
+            CToken(CToken.NUMBER, "1.23"),
+            CToken(CToken.ENDSTMT, ";"),
+        ],
+    },
+
+    "depth_counters": {
+        "source": """
+            struct X {
+                int arr[10];
+                func(a[0], (b + c));
+            }
+        """,
+        "expected": [
+            CToken(CToken.STRUCT, "struct"),
+            CToken(CToken.NAME, "X"),
+            CToken(CToken.BEGIN, "{", brace_level=1),
+
+            CToken(CToken.NAME, "int", brace_level=1),
+            CToken(CToken.NAME, "arr", brace_level=1),
+            CToken(CToken.BEGIN, "[", brace_level=1, bracket_level=1),
+            CToken(CToken.NUMBER, "10", brace_level=1, bracket_level=1),
+            CToken(CToken.END, "]", brace_level=1),
+            CToken(CToken.ENDSTMT, ";", brace_level=1),
+            CToken(CToken.NAME, "func", brace_level=1),
+            CToken(CToken.BEGIN, "(", brace_level=1, paren_level=1),
+            CToken(CToken.NAME, "a", brace_level=1, paren_level=1),
+            CToken(CToken.BEGIN, "[", brace_level=1, paren_level=1, bracket_level=1),
+            CToken(CToken.NUMBER, "0", brace_level=1, paren_level=1, bracket_level=1),
+            CToken(CToken.END, "]", brace_level=1, paren_level=1),
+            CToken(CToken.PUNC, ",", brace_level=1, paren_level=1),
+            CToken(CToken.BEGIN, "(", brace_level=1, paren_level=2),
+            CToken(CToken.NAME, "b", brace_level=1, paren_level=2),
+            CToken(CToken.OP, "+", brace_level=1, paren_level=2),
+            CToken(CToken.NAME, "c", brace_level=1, paren_level=2),
+            CToken(CToken.END, ")", brace_level=1, paren_level=1),
+            CToken(CToken.END, ")", brace_level=1),
+            CToken(CToken.ENDSTMT, ";", brace_level=1),
+            CToken(CToken.END, "}"),
+        ],
+    },
+
+    "mismatch_error": {
+        "source": "int a$ = 5;",          # $ is illegal
+        "log_level": "ERROR",
+    },
+}
+
+def make_private_test(name, data):
+    """
+    Create a test named ``name`` using parameters given by ``data`` dict.
+    """
+
+    def test(self):
+        """In-lined lambda-like function to run the test"""
+        tokens = CTokenizer(data["source"])
+        result = str(tokens)
+
+        #
+        # Avoid whitespace false positives
+        #
+        result = re.sub(r"\s++", " ", result).strip()
+        expected = re.sub(r"\s++", " ", data["trimmed"]).strip()
+
+        msg = f"failed when parsing this source:\n{data['source']}"
+        self.assertEqual(result, expected, msg=msg)
+
+    return test
+
+#: Tests to check if CTokenizer is handling properly public/private comments.
+TESTS_PRIVATE = {
+    #
+    # Simplest case: no private. Ensure that trimming won't affect struct
+    #
+    "__run__": make_private_test,
+    "no private": {
+        "source": """
+            struct foo {
+                int a;
+                int b;
+                int c;
+            };
+        """,
+        "trimmed": """
+            struct foo {
+                int a;
+                int b;
+                int c;
+            };
+        """,
+    },
+
+    #
+    # Play "by the books" by always having a public in place
+    #
+
+    "balanced_private": {
+        "source": """
+            struct foo {
+                int a;
+                /* private: */
+                int b;
+                /* public: */
+                int c;
+            };
+        """,
+        "trimmed": """
+            struct foo {
+                int a;
+                int c;
+            };
+        """,
+    },
+
+    "balanced_non_greddy_private": {
+        "source": """
+            struct foo {
+                int a;
+                /* private: */
+                int b;
+                /* public: */
+                int c;
+                /* private: */
+                int d;
+                /* public: */
+                int e;
+
+            };
+        """,
+        "trimmed": """
+            struct foo {
+                int a;
+                int c;
+                int e;
+            };
+        """,
+    },
+
+    "balanced_inner_private": {
+        "source": """
+            struct foo {
+                struct {
+                    int a;
+                    /* private: ignore below */
+                    int b;
+                /* public: but this should not be ignored */
+                };
+                int b;
+            };
+        """,
+        "trimmed": """
+            struct foo {
+                struct {
+                    int a;
+                };
+                int b;
+            };
+        """,
+    },
+
+    #
+    # Test what happens if there's no public after private place
+    #
+
+    "unbalanced_private": {
+        "source": """
+            struct foo {
+                int a;
+                /* private: */
+                int b;
+                int c;
+            };
+        """,
+        "trimmed": """
+            struct foo {
+                int a;
+            };
+        """,
+    },
+
+    "unbalanced_inner_private": {
+        "source": """
+            struct foo {
+                struct {
+                    int a;
+                    /* private: ignore below */
+                    int b;
+                /* but this should not be ignored */
+                };
+                int b;
+            };
+        """,
+        "trimmed": """
+            struct foo {
+                struct {
+                    int a;
+                };
+                int b;
+            };
+        """,
+    },
+
+    "unbalanced_struct_group_tagged_with_private": {
+        "source": """
+            struct page_pool_params {
+                struct_group_tagged(page_pool_params_fast, fast,
+                        unsigned int    order;
+                        unsigned int    pool_size;
+                        int             nid;
+                        struct device   *dev;
+                        struct napi_struct *napi;
+                        enum dma_data_direction dma_dir;
+                        unsigned int    max_len;
+                        unsigned int    offset;
+                };
+                struct_group_tagged(page_pool_params_slow, slow,
+                        struct net_device *netdev;
+                        unsigned int queue_idx;
+                        unsigned int    flags;
+                        /* private: used by test code only */
+                        void (*init_callback)(netmem_ref netmem, void *arg);
+                        void *init_arg;
+                };
+            };
+        """,
+        "trimmed": """
+            struct page_pool_params {
+                struct_group_tagged(page_pool_params_fast, fast,
+                        unsigned int    order;
+                        unsigned int    pool_size;
+                        int             nid;
+                        struct device   *dev;
+                        struct napi_struct *napi;
+                        enum dma_data_direction dma_dir;
+                        unsigned int    max_len;
+                        unsigned int    offset;
+                };
+                struct_group_tagged(page_pool_params_slow, slow,
+                        struct net_device *netdev;
+                        unsigned int queue_idx;
+                        unsigned int    flags;
+                };
+            };
+        """,
+    },
+
+    "unbalanced_two_struct_group_tagged_first_with_private": {
+        "source": """
+            struct page_pool_params {
+                struct_group_tagged(page_pool_params_slow, slow,
+                        struct net_device *netdev;
+                        unsigned int queue_idx;
+                        unsigned int    flags;
+                        /* private: used by test code only */
+                        void (*init_callback)(netmem_ref netmem, void *arg);
+                        void *init_arg;
+                };
+                struct_group_tagged(page_pool_params_fast, fast,
+                        unsigned int    order;
+                        unsigned int    pool_size;
+                        int             nid;
+                        struct device   *dev;
+                        struct napi_struct *napi;
+                        enum dma_data_direction dma_dir;
+                        unsigned int    max_len;
+                        unsigned int    offset;
+                };
+            };
+        """,
+        "trimmed": """
+            struct page_pool_params {
+                struct_group_tagged(page_pool_params_slow, slow,
+                        struct net_device *netdev;
+                        unsigned int queue_idx;
+                        unsigned int    flags;
+                };
+                struct_group_tagged(page_pool_params_fast, fast,
+                        unsigned int    order;
+                        unsigned int    pool_size;
+                        int             nid;
+                        struct device   *dev;
+                        struct napi_struct *napi;
+                        enum dma_data_direction dma_dir;
+                        unsigned int    max_len;
+                        unsigned int    offset;
+                };
+            };
+        """,
+    },
+    "unbalanced_without_end_of_line": {
+        "source": """ \
+            struct page_pool_params { \
+                struct_group_tagged(page_pool_params_slow, slow, \
+                        struct net_device *netdev; \
+                        unsigned int queue_idx; \
+                        unsigned int    flags;
+                        /* private: used by test code only */
+                        void (*init_callback)(netmem_ref netmem, void *arg); \
+                        void *init_arg; \
+                }; \
+                struct_group_tagged(page_pool_params_fast, fast, \
+                        unsigned int    order; \
+                        unsigned int    pool_size; \
+                        int             nid; \
+                        struct device   *dev; \
+                        struct napi_struct *napi; \
+                        enum dma_data_direction dma_dir; \
+                        unsigned int    max_len; \
+                        unsigned int    offset; \
+                }; \
+            };
+        """,
+        "trimmed": """
+            struct page_pool_params {
+                struct_group_tagged(page_pool_params_slow, slow,
+                        struct net_device *netdev;
+                        unsigned int queue_idx;
+                        unsigned int    flags;
+                };
+                struct_group_tagged(page_pool_params_fast, fast,
+                        unsigned int    order;
+                        unsigned int    pool_size;
+                        int             nid;
+                        struct device   *dev;
+                        struct napi_struct *napi;
+                        enum dma_data_direction dma_dir;
+                        unsigned int    max_len;
+                        unsigned int    offset;
+                };
+            };
+        """,
+    },
+}
+
+#: Dict containing all test groups fror CTokenizer
+TESTS = {
+    "TestPublicPrivate": TESTS_PRIVATE,
+    "TestTokenizer": TESTS_TOKENIZER,
+}
+
+def setUp(self):
+    self.maxDiff = None
+
+def build_test_class(group_name, table):
+    """
+    Dynamically creates a class instance using type() as a generator
+    for a new class derivated from unittest.TestCase.
+
+    We're opting to do it inside a function to avoid the risk of
+    changing the globals() dictionary.
+    """
+
+    class_dict = {
+        "setUp": setUp
+    }
+
+    run = table["__run__"]
+
+    for test_name, data in table.items():
+        if test_name == "__run__":
+            continue
+
+        class_dict[f"test_{test_name}"] = run(test_name, data)
+
+    cls = type(group_name, (unittest.TestCase,), class_dict)
+
+    return cls.__name__, cls
+
+#
+# Create classes and add them to the global dictionary
+#
+for group, table in TESTS.items():
+    t = build_test_class(group, table)
+    globals()[t[0]] = t[1]
+
+#
+# main
+#
+if __name__ == "__main__":
+    run_unittest(__file__)
author	Jonathan Corbet <corbet@lwn.net>	2026-03-22 15:06:59 -0600
committer	Jonathan Corbet <corbet@lwn.net>	2026-03-22 15:06:59 -0600
commit	781171bec0650c00c642564afcb5cce57abda5bf (patch)
tree	31e8cf6d41b40bef9f432ebbe4a8180f32af9fbf /tools
parent	6108c809f4fd9dbb1a138ba4326d645cc3113a8d (diff)
parent	7538df7a2d7d26428803cf8053476169a6d28659 (diff)
download	lwn-781171bec0650c00c642564afcb5cce57abda5bf.tar.gz lwn-781171bec0650c00c642564afcb5cce57abda5bf.zip