summaryrefslogtreecommitdiff
path: root/tools/include/nolibc/arch-x86_64.h
AgeCommit message (Collapse)Author
2023-10-12tools/nolibc: x86-64: Use `rep stosb` for `memset()`Ammar Faizi
Simplify memset() on the x86-64 arch. The x86-64 arch has a 'rep stosb' instruction, which can perform memset() using only a single instruction, given: %al = value (just like the second argument of memset()) %rdi = destination %rcx = length Before this patch: ``` 00000000000010c9 <memset>: 10c9: 48 89 f8 mov %rdi,%rax 10cc: 48 85 d2 test %rdx,%rdx 10cf: 74 0e je 10df <memset+0x16> 10d1: 31 c9 xor %ecx,%ecx 10d3: 40 88 34 08 mov %sil,(%rax,%rcx,1) 10d7: 48 ff c1 inc %rcx 10da: 48 39 ca cmp %rcx,%rdx 10dd: 75 f4 jne 10d3 <memset+0xa> 10df: c3 ret ``` After this patch: ``` 0000000000001511 <memset>: 1511: 96 xchg %eax,%esi 1512: 48 89 d1 mov %rdx,%rcx 1515: 57 push %rdi 1516: f3 aa rep stos %al,%es:(%rdi) 1518: 58 pop %rax 1519: c3 ret ``` v2: - Use pushq %rdi / popq %rax (Alviro). - Use xchg %eax, %esi (Willy). Link: https://lore.kernel.org/lkml/ZO9e6h2jjVIMpBJP@1wt.eu Suggested-by: Alviro Iskandar Setiawan <alviro.iskandar@gnuweeb.org> Suggested-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Reviewed-by: Alviro Iskandar Setiawan <alviro.iskandar@gnuweeb.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2023-10-12tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()`Ammar Faizi
Simplify memcpy() and memmove() on the x86-64 arch. The x86-64 arch has a 'rep movsb' instruction, which can perform memcpy() using only a single instruction, given: %rdi = destination %rsi = source %rcx = length Additionally, it can also handle the overlapping case by setting DF=1 (backward copy), which can be used as the memmove() implementation. Before this patch: ``` 00000000000010ab <memmove>: 10ab: 48 89 f8 mov %rdi,%rax 10ae: 31 c9 xor %ecx,%ecx 10b0: 48 39 f7 cmp %rsi,%rdi 10b3: 48 83 d1 ff adc $0xffffffffffffffff,%rcx 10b7: 48 85 d2 test %rdx,%rdx 10ba: 74 25 je 10e1 <memmove+0x36> 10bc: 48 83 c9 01 or $0x1,%rcx 10c0: 48 39 f0 cmp %rsi,%rax 10c3: 48 c7 c7 ff ff ff ff mov $0xffffffffffffffff,%rdi 10ca: 48 0f 43 fa cmovae %rdx,%rdi 10ce: 48 01 cf add %rcx,%rdi 10d1: 44 8a 04 3e mov (%rsi,%rdi,1),%r8b 10d5: 44 88 04 38 mov %r8b,(%rax,%rdi,1) 10d9: 48 01 cf add %rcx,%rdi 10dc: 48 ff ca dec %rdx 10df: 75 f0 jne 10d1 <memmove+0x26> 10e1: c3 ret 00000000000010e2 <memcpy>: 10e2: 48 89 f8 mov %rdi,%rax 10e5: 48 85 d2 test %rdx,%rdx 10e8: 74 12 je 10fc <memcpy+0x1a> 10ea: 31 c9 xor %ecx,%ecx 10ec: 40 8a 3c 0e mov (%rsi,%rcx,1),%dil 10f0: 40 88 3c 08 mov %dil,(%rax,%rcx,1) 10f4: 48 ff c1 inc %rcx 10f7: 48 39 ca cmp %rcx,%rdx 10fa: 75 f0 jne 10ec <memcpy+0xa> 10fc: c3 ret ``` After this patch: ``` // memmove is an alias for memcpy 000000000040133b <memcpy>: 40133b: 48 89 d1 mov %rdx,%rcx 40133e: 48 89 f8 mov %rdi,%rax 401341: 48 89 fa mov %rdi,%rdx 401344: 48 29 f2 sub %rsi,%rdx 401347: 48 39 ca cmp %rcx,%rdx 40134a: 72 03 jb 40134f <memcpy+0x14> 40134c: f3 a4 rep movsb %ds:(%rsi),%es:(%rdi) 40134e: c3 ret 40134f: 48 8d 7c 0f ff lea -0x1(%rdi,%rcx,1),%rdi 401354: 48 8d 74 0e ff lea -0x1(%rsi,%rcx,1),%rsi 401359: fd std 40135a: f3 a4 rep movsb %ds:(%rsi),%es:(%rdi) 40135c: fc cld 40135d: c3 ret ``` v3: - Make memmove as an alias for memcpy (Willy). - Make the forward copy the likely case (Alviro). v2: - Fix the broken memmove implementation (David). Link: https://lore.kernel.org/lkml/20230902062237.GA23141@1wt.eu Link: https://lore.kernel.org/lkml/5a821292d96a4dbc84c96ccdc6b5b666@AcuMS.aculab.com Suggested-by: David Laight <David.Laight@aculab.com> Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2023-08-23tools/nolibc: x86_64: shrink _start with _start_cZhangjin Wu
move most of the _start operations to _start_c(), include the stackprotector initialization. Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: remove the old sys_stat supportZhangjin Wu
The statx manpage [1] shows that it has been supported from Linux 4.11 and glibc 2.28, the Linux support can be checked for all of the architectures with this command: $ git grep -r statx v4.11 arch/ include/uapi/asm-generic/unistd.h \ | grep -E "aarch64|arm|mips|s390|x86|:include/uapi" Besides riscv and loongarch, all of the nolibc supported architectures have added sys_statx from Linux v4.11. riscv is mainlined to v4.15, loongarch is mainlined to v5.19, both of them use the generic unistd.h, so, they have added sys_statx from their first mainline versions. The current oldest stable branch is v4.14, only reserving sys_statx still preserves compatibility with all of the supported stable branches, So, let's remove the old arch related and dependent sys_stat support completely. This is friendly to the future new architecture porting. [1]: https://man7.org/linux/man-pages/man2/statx.2.html Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: fix up startup failures for -O0 under gcc < 11.1.0Zhangjin Wu
As gcc doc [1] shows: Most optimizations are completely disabled at -O0 or if an -O level is not set on the command line, even if individual optimization flags are specified. Test result [2] shows, gcc>=11.1.0 deviates from the above description, but before gcc 11.1.0, "-O0" still forcely uses frame pointer in the _start function even if the individual optimize("omit-frame-pointer") flag is specified. The frame pointer related operations will change the stack pointer (e.g. In x86_64, an extra "push %rbp" will be inserted at the beginning of _start) and make it differs from the one we expected, as a result, break the whole startup function. To fix up this issue, as suggested by Thomas, the individual "Os" and "omit-frame-pointer" optimize flags are used together on _start function to disable frame pointer completely even if the -O0 is set on the command line. [1]: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html [2]: https://lore.kernel.org/lkml/20230714094723.140603-1-falcon@tinylab.org/ Suggested-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/lkml/34b21ba5-7b59-4b3b-9ed6-ef9a3a5e06f7@t-8ch.de/ Fixes: 7f8548589661 ("tools/nolibc: make compiler and assembler agree on the section around _start") Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: arch-*.h: add missing space after ','Zhangjin Wu
Fix up such errors reported by scripts/checkpatch.pl: ERROR: space required after that ',' (ctx:VxV) #148: FILE: tools/include/nolibc/arch-aarch64.h:148: +void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) __no_stack_protector _start(void) ^ ERROR: space required after that ',' (ctx:VxV) #148: FILE: tools/include/nolibc/arch-aarch64.h:148: +void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) __no_stack_protector _start(void) ^ Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23toolc/nolibc: arch-*.h: clean up whitespaces after __asm__Zhangjin Wu
replace "__asm__ volatile" with "__asm__ volatile" and insert necessary whitespace before "\" to make sure the lines are aligned. $ sed -i -e 's/__asm__ volatile ( /__asm__ volatile ( /g' tools/include/nolibc/*.h Note, arch-s390.h uses post-tab instead of post-whitespaces, must avoid insert whitespace just before the tabs: $ sed -i -e 's/__asm__ volatile (\t/__asm__ volatile (\t/g' tools/include/nolibc/arch-*.h Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: arch-*.h: fix up code indent errorsZhangjin Wu
More than 8 whitespaces of the code indent are replaced with "tab + whitespaces" to fix up such errors reported by scripts/checkpatch.pl: ERROR: code indent should use tabs where possible #64: FILE: tools/include/nolibc/arch-mips.h:64: +^I \$ ERROR: code indent should use tabs where possible #72: FILE: tools/include/nolibc/arch-mips.h:72: +^I "t0", "t1", "t2", "t3", "t4", "t5", "t6", "t7", "t8", "t9" \$ This command is used: $ sed -i -e '/^\t* /{s/ /\t/g}' tools/include/nolibc/arch-*.h Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-06-09tools/nolibc: fix segfaults on compilers without attribute no_stack_protectorThomas Weißschuh
Not all compilers, notably GCC < 10, have support for __attribute__((no_stack_protector)). Fall back to a mechanism that also works there. Tested with GCC 9.5.0 from kernel.org crosstools. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-06-09tools/nolibc: add autodetection for stackprotector supportThomas Weißschuh
The stackprotector support in nolibc should be enabled iff it is also enabled in the compiler. Use the preprocessor defines added by gcc and clang if stackprotector support is enable to automatically do so in nolibc. This completely removes the need for any user-visible API. To avoid inlining the lengthy preprocessor check into every user introduce a new header compiler.h that abstracts the logic away. As the define NOLIBC_STACKPROTECTOR is now not user-relevant anymore prefix it with an underscore. Suggested-by: Willy Tarreau <w@1wt.eu> Link: https://lore.kernel.org/lkml/20230520133237.GA27501@1wt.eu/ Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-06-09tools/nolibc: x86_64: disable stack protector for _startThomas Weißschuh
This was forgotten in the original submission. It is unknown why it worked for x86_64 on some compiler without this attribute. Reported-by: Willy Tarreau <w@1wt.eu> Closes: https://lore.kernel.org/lkml/20230520133237.GA27501@1wt.eu/ Fixes: 0d8c461adbc4 ("tools/nolibc: x86_64: add stackprotector support") Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-06-09tools/nolibc: use C89 comment syntaxThomas Weißschuh
Most of nolibc is already using C89 comments. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-03-27tools/nolibc: x86_64: add stackprotector supportThomas Weißschuh
Enable the new stackprotector support for x86_64. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-10tools/nolibc: add auxiliary vector retrieval for x86_64Willy Tarreau
In the _start block we now iterate over envp to find the auxiliary vector after the NULL. The pointer is saved into an _auxv variable that is marked as weak so that it's accessible from multiple units. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-10tools/nolibc: export environ as a weak symbol on x86_64Willy Tarreau
The environ is retrieved from the _start code and is easy to store at this moment. Let's declare the variable weak and store the value into it. By not being static it will be visible to all units. By being weak, if some programs already declared it, they will continue to be able to use it. This was tested both with environ inherited from _start and extracted from envp. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-10tools/nolibc: remove local definitions of O_* flags for open/fcntlWilly Tarreau
The historic nolibc code did not include asm/fcntl.h and had to define the various O_RDWR etc macros in each arch-specific file (since such values differ between certain archs). This was found at least once to induce bugs due to wrong definitions. Let's get rid of all of them and include asm/nolibc.h from sys.h instead. This was verified to work properly on all supported architectures. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-10tools/nolibc: make compiler and assembler agree on the section around _startWilly Tarreau
The out-of-block asm() statement carrying _start does not allow the compiler to know what section the assembly code is being emitted to, and there's no easy way to push/pop the current section and restore it. It sometimes causes issues depending on the include files ordering and compiler optimizations. For example if a variable is declared immediately before the asm() block and another one after, the compiler assumes that the current section is still .bss and doesn't re-emit it, making the second variable appear inside the .text section instead. Forcing .bss at the end of the _start block doesn't work either because at certain optimizations the compiler may reorder blocks and will make some real code appear just after this block. A significant number of solutions were attempted, but many of them were still sensitive to section reordering. In the end, the best way to make sure the compiler and assembler agree on the current section is to place this code inside a function. Here the function is directly called _start and configured not to emit a frame-pointer, hence to have no prologue. If some future architectures would still emit some prologue, another working approach consists in naming the function differently and placing the _start label inside the asm statement. But the current solution is simpler. It was tested with nolibc-test at -O,-O0,-O2,-O3,-Os for arm,arm64,i386, mips,riscv,s390 and x86_64. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-20tools/nolibc: Remove .global _start from the entry point codeAmmar Faizi
Building with clang yields the following error: ``` <inline asm>:3:1: error: _start changed binding to STB_GLOBAL .global _start ^ 1 error generated. ``` Make sure only specify one between `.global _start` and `.weak _start`. Remove `.global _start`. Cc: llvm@lists.linux.dev Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-20tools/nolibc: Replace `asm` with `__asm__`Ammar Faizi
Replace `asm` with `__asm__` to support compilation with -std flag. Using `asm` with -std flag makes GCC think `asm()` is a function call instead of an inline assembly. GCC doc says: For the C language, the `asm` keyword is a GNU extension. When writing C code that can be compiled with `-ansi` and the `-std` options that select C dialects without GNU extensions, use `__asm__` instead of `asm`. Link: https://gcc.gnu.org/onlinedocs/gcc/Basic-Asm.html Reported-by: Alviro Iskandar Setiawan <alviro.iskandar@gnuweeb.org> Acked-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-20tools/nolibc: x86-64: Update System V ABI document linkAmmar Faizi
The old link no longer works, update it. Acked-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-20tools/nolibc/arch: mark the _start symbol as weakWilly Tarreau
By doing so we can link together multiple C files that have been compiled with nolibc and which each have a _start symbol. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-04-20tools/nolibc/arch: split arch-specific code into individual filesWilly Tarreau
In order to ease maintenance, this splits the arch-specific code into one file per architecture. A common file "arch.h" is used to include the right file among arch-* based on the detected architecture. Projects which are already split per architecture could simply rename these files to $arch/arch.h and get rid of the common arch.h. For this reason, include guards were placed into each arch-specific file. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>