The feature depends on ARCH_USE_SEPARATED_SECTION
the different memory area has different access speed and cache
capability, so the arch can custom allocate them based on
section names to achieve performance optimization
test:
sim:elf
sim:sotest
Signed-off-by: dongjiuzhu1 <dongjiuzhu1@xiaomi.com>
We need to record the parent's integer register context upon exception
entry to a separate non-volatile area. Why?
Because xcp.regs can move due to a context switch within the fork() system
call, be it either via interrupt or a synchronization point.
Fix this by adding a "sregs" area where the saved user context is placed.
The critical section within fork() is also unnecessary.
There was an error in the fork() routine when system calls are in use:
the child context is saved on the child's user stack, which is incorrect,
the context must be saved on the kernel stack instead.
The result is a full system crash if (when) the child executes on a
different CPU which does not have the same MMU mappings active.
This commit fixes the regression from https://github.com/apache/nuttx/pull/13561
In order to determine whether a context switch has occurred,
we can use g_running_task to store the current regs.
This allows us to compare the current register state with the previously
stored state to identify if a context switch has taken place.
Signed-off-by: hujun5 <hujun5@xiaomi.com>
When the toolchain does not support atomic, it will use the version implemented by NuttX (low performance version). This scenario is consistent with the original design, so we can ignore it.
see bug here:
https://bugs.llvm.org/show_bug.cgi?id=43603
Error: inode/fs_inodeaddref.c:50:7: error: large atomic operation may incur significant performance penalty; the access size (4 bytes) exceeds the max lock-free size (0 bytes) [-Werror,-Watomic-alignment]
50 | atomic_fetch_add(&inode->i_crefs, 1);
| ^
/tools/clang-arm-none-eabi/lib/clang/17/include/stdatomic.h:152:43: note: expanded from macro 'atomic_fetch_add'
152 | #define atomic_fetch_add(object, operand) __c11_atomic_fetch_add(object, operand, __ATOMIC_SEQ_CST)
| ^
1 error generated.
make[1]: *** [Makefile:83: fs_inodeaddref.o] Error 1
Error: inode/fs_inodefind.c:74:7: error: large atomic operation may incur significant performance penalty; the access size (4 bytes) exceeds the max lock-free size (0 bytes) [-Werror,-Watomic-alignment]
74 | atomic_fetch_add(&node->i_crefs, 1);
Signed-off-by: chenrun1 <chenrun1@xiaomi.com>
Summary:
1.Modified the i_crefs from int16_t to atomic_int
2.Modified the i_crefs add, delete, read, and initialize interfaces to atomic operations
The purpose of this change is to avoid deadlock in cross-core scenarios, where A Core blocks B Core’s request for a write operation to A Core when A Core requests a read operation to B Core.
Signed-off-by: chenrun1 <chenrun1@xiaomi.com>
The aforementioned functions can/will fail if the C compiler decides
to use the stack for the incoming entrypt/etc. parameters.
Fix this issue by converting the jump to user part into pure assembly,
ensuring the stack is NOT used for the parameters.
The original code made the incorrect assumption that the amount of
translation levels is 3, but this is incorrect. The amount of levels is 4
and the amount of levels that are utilized / in use is set dynamically
from the amount of VA bits in use.
The VMSAv8-64 translation system has 4 page table levels in total, ranging
from 0-3. The address environment code assumes only 3 levels, from 1-3 but
this is wrong; the amount of levels _utilized_ depends on the configured
VA size CONFIG_ARM64_VA_BITS. With <= 39 bits 3 levels is enough, while
if the va range is larger, the 4th translation table level is taken into
use dynamically by shifting the base translation table level.
From arm64_mmu.c, where va_bits is the amount of va bits used in address
translations:
(va_bits <= 21) - base level 3
(22 <= va_bits <= 30) - base level 2
(31 <= va_bits <= 39) - base level 1
(40 <= va_bits <= 48) - base level 0
The base level is what is configured as the page directory root. This also
affects the performance of address translations i.e. if the VA range is
smaller, address translations are also faster as the page table walk is
shorter.
This defconfig is an example of the recorded stack and it became
faulty recently after the implementation of the `up_current_regs`
functions. The `noinstrument_function` directive must be used for
preventing it from being looped when instrumentation is enabled.
Also, this commit places `sched/instrument/stack_record.c` in IRAM.
with other functionalities removed.
reason:
by doing this we can reduce context switch time,
When we exit from an interrupt handler, we directly use tcb->xcp.regs
before
text data bss dec hex filename
138805 337 24256 163398 27e46 nuttx
after
text data bss dec hex filename
138499 337 24240 163076 27d04 nuttx
szie change -322
Signed-off-by: hujun5 <hujun5@xiaomi.com>
1. Similar to asan, supports single byte out of bounds detection
2. Fix the script to address the issue of not supporting the big end
Signed-off-by: wangmingrong1 <wangmingrong1@xiaomi.com>
1. Tested on QEMU, the two sockets were basically the same, and their performance was not affected. The size of the generated bin file was also the same
2. Extract global detection as a separate file, both types of Kasan support global variable out of bounds detection simultaneously
Signed-off-by: wangmingrong1 <wangmingrong1@xiaomi.com>
The simple improvement is designed to speed up compilation and reduce download errors on github and local.
Added a folder nxtmpdir for storing third-party packages
nuttxworkspace
|
|- nuttx
|- apps
|- nxtmpdir
tools/Unix.mk:
added export NXTMPDIR := $(WSDIR)/nxtmpdir
tools/configure.sh:
added option -S creates the nxtmpdir folder for third-party packages.
tools/Config.mk:
added macro
CLONE - Git clone repository.
CHECK_COMMITSHA - Check if the branch contains the commit SHA-1.
tools/testbuild.sh:
added option -S
For now I added in the folder this package
ESP_HAL_3RDPARTY_URL = https://github.com/espressif/esp-hal-3rdparty.git
ARCH
arch/xtensa/src/esp32/Make.defs
arch/xtensa/src/esp32s2/Make.defs
arch/xtensa/src/esp32s3/Make.defs
arch/risc-v/src/common/espressif/Make.defs
arch/risc-v/src/esp32c3-legacy/Make.defs
but you can also add other packages (maybe also of apps)
resson:
using percpu storage for g_current_regs or leveraging interrupt status
registers to determine if code is running within an interrupt context can enhance performance.
Signed-off-by: hujun5 <hujun5@xiaomi.com>
detail: Add g_ prefix to can_dlc_to_len and len_to_can_dlc to
follow NuttX coding style conventions for global symbols,
improving code readability and maintainability.
Signed-off-by: zhaohaiyang1 <zhaohaiyang1@xiaomi.com>
The number of exception for risc-v is 16 (0 ~ 15)
for the machine ISA version 1.12 or earlier, the number of exception is 20
(0 ~ 19) from the ISA version 1.13. And maybe changed in the future.
Using a dedicated option to control the exception number to allow the earlier
version chip with customized exception number (e.g. 16 ~ 19 used) to define
the exception reason string correctly.
Signed-off-by: Huang Qi <huangqi3@xiaomi.com>
Fix an issue of driver open failure caused by the following commit
that changes the initial value of inode reference.
43d0d95f81 fs/inode: using inode reference to indicate unlink and simply code
after enable both CONFIG_BUILD_KERNEL and CONFIG_ARCH_VMA_MAPPING
arch.h:141:18: error: 'ARCH_SHM_MAXPAGES' undeclared here
Signed-off-by: wanggang26 <wanggang26@xiaomi.com>
GOTPCRELX reloc available only for CONFIG_ARCH_ADDRENV=y
when CONFIG_ARCH_ADDRENV is not set, CONFIG_ARCH_TEXT_VBASE is not specified
so we can't relocate
Signed-off-by: p-szafonimateusz <p-szafonimateusz@xiaomi.com>
In SMP mode, qemu/goldfish platform, cpu0 use up_cpu_start()
to start others cpus.
But in previous patch(mathion ahead), arm_gic_initialize() will
wait others cpus start, so deadlocked!
Resolve:
Move the wait logic when use using sgi
Signed-off-by: ligd <liguiding1@xiaomi.com>
After move the SGI irq to group1, other cpu can't response the
sgi request from cpu0 when its gic not initialized.
So let cpu0 wait until all other cpus gic initialize done.
Signed-off-by: Bowen Wang <wangbowen6@xiaomi.com>
Revert "Parallelize depend file generation"
This reverts commit d5b6ec450f.
parallel depend ddc does not significantly speed up compilation,
intermediately generated .ddc files can cause problems if compilation is interrupted unexpectedly
Signed-off-by: xuxin19 <xuxin19@xiaomi.com>
change the extra library from a file to an import target;
this will avoid differences in the handling of static libraries
between different versions of cmake and different platforms.
after unifying as a target, extra libraries can be
handled as the same as other compiled libraries
Signed-off-by: xuxin19 <xuxin19@xiaomi.com>
prepare 16550 UART driver to support PCI:
- [breaking change] change argument of uart_ioctl() from `struct file *filep` to `FAR struct u16550_s *priv`
Also fix moxart_16550.c build related to this change
- [breaking change] change argument of uart_getreg() and uart_putreg from `uart_addrwidth_t base` to `FAR struct u16550_s *priv`
Also fix arch/x86/src/qemu/qemu_serial.c and arch/x86_64/src/intel64/intel64_serial.c related to this change
- [breaking change] change argument of uart_dmachan() from `uart_addrwidth_t base` to `FAR struct u16550_s *priv`
- move `struct u16550_s` to public header
- generalize UART_XXX_OFFSET so we can use it with any register increment
- make u16550_bind(), u16550_interrupt(), u16550_interrupt() public
- remove arch/or1k/src/common/or1k_uart.c and use common 16550 MIMO interfacve
- change irq type in `struct u16550_s` from uint8_t to int to match MSI API
Signed-off-by: p-szafonimateusz <p-szafonimateusz@xiaomi.com>
Some of PCI drivers require OS interfaces that can't be executed in the INIT context.
In that case we have to postpone PCI drivers probing and call it for example
in board initialization logic.
Signed-off-by: p-szafonimateusz <p-szafonimateusz@xiaomi.com>