Ruby 3.3.5p100 (2024-09-03 revision ef084cc8f4958c1b6e4ead99136631bef6d8ddba)
pm_strpbrk.h File Reference

A custom strpbrk implementation. More...

#include "prism/defines.h"
#include "prism/parser.h"
#include <stddef.h>
#include <string.h>

Go to the source code of this file.

Functions

const uint8_t * pm_strpbrk (const pm_parser_t *parser, const uint8_t *source, const uint8_t *charset, ptrdiff_t length)
 Here we have rolled our own version of strpbrk.
 

Detailed Description

A custom strpbrk implementation.

Definition in file pm_strpbrk.h.

Function Documentation

◆ pm_strpbrk()

const uint8_t * pm_strpbrk ( const pm_parser_t * parser,
const uint8_t * source,
const uint8_t * charset,
ptrdiff_t length )

Here we have rolled our own version of strpbrk.

The standard library strpbrk has undefined behavior when the source string is not null-terminated. We want to support strings that are not null-terminated because pm_parse does not have the contract that the string is null-terminated. (This is desirable because it means the extension can call pm_parse with the result of a call to mmap).

The standard library strpbrk also does not support passing a maximum length to search. We want to support this for the reason mentioned above, but we also don't want it to stop on null bytes. Ruby actually allows null bytes within strings, comments, regular expressions, etc. So we need to be able to skip past them.

Finally, we want to support encodings wherein the charset could contain characters that are trailing bytes of multi-byte characters. For example, in Shift-JIS, the backslash character can be a trailing byte. In that case we need to take a slower path and iterate one multi-byte character at a time.

Parameters
parserThe parser.
sourceThe source to search.
charsetThe charset to search for.
lengthThe maximum number of bytes to search.
Returns
A pointer to the first character in the source string that is in the charset, or NULL if no such character exists.

The standard library strpbrk has undefined behavior when the source string is not null-terminated. We want to support strings that are not null-terminated because pm_parse does not have the contract that the string is null-terminated. (This is desirable because it means the extension can call pm_parse with the result of a call to mmap).

The standard library strpbrk also does not support passing a maximum length to search. We want to support this for the reason mentioned above, but we also don't want it to stop on null bytes. Ruby actually allows null bytes within strings, comments, regular expressions, etc. So we need to be able to skip past them.

Finally, we want to support encodings wherein the charset could contain characters that are trailing bytes of multi-byte characters. For example, in Shift-JIS, the backslash character can be a trailing byte. In that case we need to take a slower path and iterate one multi-byte character at a time.

Definition at line 64 of file pm_strpbrk.c.