for the citimon stats:
thread 0: thread 1:
enter_critical (t0)
up_switch_context
note suspend thread0 (t1)
thread running
IRQ happen, in ISR:
post thread0
up_switch_context
note resume thread0 (t2)
ISR continue f1
ISR continue f2
...
ISR continue fn
leave_critical (t3)
You will see, the thread 0, critical_section time is:
(t1 - t0) + (t3 - t2)
BUT, this result contains f1 f2 .. fn time spent, it is wrong
to tell user thead0 hold the critical lots of time but actually
not belong to it.
Resolve:
change the nxsched_suspend/resume_scheduler to real hanppends
Signed-off-by: ligd <liguiding1@xiaomi.com>
Register: smp
Register: nsh
Register: sh
Register: getprime
Register: ostest
Espressif HAL for 3rd Party Platforms: b4c723a119344b4b71d69819019d55637fb570fd
common/xtensa_cpupause.c: In function 'xtensa_pause_handler':
common/xtensa_cpupause.c:240:3: warning: implicit declaration of function 'xtensa_savestate'; did you mean 'xtensa_setps'? [-Wimplicit-function-declaration]
240 | xtensa_savestate(tcb->xcp.regs);
| ^~~~~~~~~~~~~~~~
| xtensa_setps
common/xtensa_cpupause.c:243:3: warning: implicit declaration of function 'xtensa_restorestate'; did you mean 'xtensa_context_restore'? [-Wimplicit-function-declaration]
243 | xtensa_restorestate(tcb->xcp.regs);
| ^~~~~~~~~~~~~~~~~~~
| xtensa_context_restore
Signed-off-by: hujun5 <hujun5@xiaomi.com>
/vela/nuttx/drivers/pci/pci_ecam.c:432:(.text.pci_ecam_get_irq+0x16): undefined reference to `up_get_legacy_irq'
Signed-off-by: Yongrong Wang <wangyongrong@xiaomi.com>
reason:
Currently, if we need to schedule a task to another CPU, we have to completely halt the other CPU,
manipulate the scheduling linked list, and then resume the operation of that CPU. This process is both time-consuming and unnecessary.
During this process, both the current CPU and the target CPU are inevitably subjected to busyloop.
The improved strategy is to simply send a cross-core interrupt to the target CPU.
The current CPU continues to run while the target CPU responds to the interrupt, eliminating the certainty of a busyloop occurring.
Signed-off-by: hujun5 <hujun5@xiaomi.com>
reason:
by doing this we can reduce context switch time,
When we exit from an interrupt handler, we directly use tcb->xcp.regs
before
size nuttx
text data bss dec hex filename
225920 409 30925 257254 3ece6 nuttx
after
text data bss dec hex filename
225604 409 30925 256938 3ebaa nuttx
szie change -316
Signed-off-by: hujun5 <hujun5@xiaomi.com>
The feature depends on ARCH_USE_SEPARATED_SECTION
the different memory area has different access speed and cache
capability, so the arch can custom allocate them based on
section names to achieve performance optimization
test:
sim:elf
sim:sotest
Signed-off-by: dongjiuzhu1 <dongjiuzhu1@xiaomi.com>
We need to record the parent's integer register context upon exception
entry to a separate non-volatile area. Why?
Because xcp.regs can move due to a context switch within the fork() system
call, be it either via interrupt or a synchronization point.
Fix this by adding a "sregs" area where the saved user context is placed.
The critical section within fork() is also unnecessary.
There was an error in the fork() routine when system calls are in use:
the child context is saved on the child's user stack, which is incorrect,
the context must be saved on the kernel stack instead.
The result is a full system crash if (when) the child executes on a
different CPU which does not have the same MMU mappings active.
This commit fixes the regression from https://github.com/apache/nuttx/pull/13561
In order to determine whether a context switch has occurred,
we can use g_running_task to store the current regs.
This allows us to compare the current register state with the previously
stored state to identify if a context switch has taken place.
Signed-off-by: hujun5 <hujun5@xiaomi.com>
When the toolchain does not support atomic, it will use the version implemented by NuttX (low performance version). This scenario is consistent with the original design, so we can ignore it.
see bug here:
https://bugs.llvm.org/show_bug.cgi?id=43603
Error: inode/fs_inodeaddref.c:50:7: error: large atomic operation may incur significant performance penalty; the access size (4 bytes) exceeds the max lock-free size (0 bytes) [-Werror,-Watomic-alignment]
50 | atomic_fetch_add(&inode->i_crefs, 1);
| ^
/tools/clang-arm-none-eabi/lib/clang/17/include/stdatomic.h:152:43: note: expanded from macro 'atomic_fetch_add'
152 | #define atomic_fetch_add(object, operand) __c11_atomic_fetch_add(object, operand, __ATOMIC_SEQ_CST)
| ^
1 error generated.
make[1]: *** [Makefile:83: fs_inodeaddref.o] Error 1
Error: inode/fs_inodefind.c:74:7: error: large atomic operation may incur significant performance penalty; the access size (4 bytes) exceeds the max lock-free size (0 bytes) [-Werror,-Watomic-alignment]
74 | atomic_fetch_add(&node->i_crefs, 1);
Signed-off-by: chenrun1 <chenrun1@xiaomi.com>
Summary:
1.Modified the i_crefs from int16_t to atomic_int
2.Modified the i_crefs add, delete, read, and initialize interfaces to atomic operations
The purpose of this change is to avoid deadlock in cross-core scenarios, where A Core blocks B Core’s request for a write operation to A Core when A Core requests a read operation to B Core.
Signed-off-by: chenrun1 <chenrun1@xiaomi.com>
The aforementioned functions can/will fail if the C compiler decides
to use the stack for the incoming entrypt/etc. parameters.
Fix this issue by converting the jump to user part into pure assembly,
ensuring the stack is NOT used for the parameters.
The original code made the incorrect assumption that the amount of
translation levels is 3, but this is incorrect. The amount of levels is 4
and the amount of levels that are utilized / in use is set dynamically
from the amount of VA bits in use.
The VMSAv8-64 translation system has 4 page table levels in total, ranging
from 0-3. The address environment code assumes only 3 levels, from 1-3 but
this is wrong; the amount of levels _utilized_ depends on the configured
VA size CONFIG_ARM64_VA_BITS. With <= 39 bits 3 levels is enough, while
if the va range is larger, the 4th translation table level is taken into
use dynamically by shifting the base translation table level.
From arm64_mmu.c, where va_bits is the amount of va bits used in address
translations:
(va_bits <= 21) - base level 3
(22 <= va_bits <= 30) - base level 2
(31 <= va_bits <= 39) - base level 1
(40 <= va_bits <= 48) - base level 0
The base level is what is configured as the page directory root. This also
affects the performance of address translations i.e. if the VA range is
smaller, address translations are also faster as the page table walk is
shorter.
This defconfig is an example of the recorded stack and it became
faulty recently after the implementation of the `up_current_regs`
functions. The `noinstrument_function` directive must be used for
preventing it from being looped when instrumentation is enabled.
Also, this commit places `sched/instrument/stack_record.c` in IRAM.
with other functionalities removed.
reason:
by doing this we can reduce context switch time,
When we exit from an interrupt handler, we directly use tcb->xcp.regs
before
text data bss dec hex filename
138805 337 24256 163398 27e46 nuttx
after
text data bss dec hex filename
138499 337 24240 163076 27d04 nuttx
szie change -322
Signed-off-by: hujun5 <hujun5@xiaomi.com>
1. Similar to asan, supports single byte out of bounds detection
2. Fix the script to address the issue of not supporting the big end
Signed-off-by: wangmingrong1 <wangmingrong1@xiaomi.com>
1. Tested on QEMU, the two sockets were basically the same, and their performance was not affected. The size of the generated bin file was also the same
2. Extract global detection as a separate file, both types of Kasan support global variable out of bounds detection simultaneously
Signed-off-by: wangmingrong1 <wangmingrong1@xiaomi.com>
The simple improvement is designed to speed up compilation and reduce download errors on github and local.
Added a folder nxtmpdir for storing third-party packages
nuttxworkspace
|
|- nuttx
|- apps
|- nxtmpdir
tools/Unix.mk:
added export NXTMPDIR := $(WSDIR)/nxtmpdir
tools/configure.sh:
added option -S creates the nxtmpdir folder for third-party packages.
tools/Config.mk:
added macro
CLONE - Git clone repository.
CHECK_COMMITSHA - Check if the branch contains the commit SHA-1.
tools/testbuild.sh:
added option -S
For now I added in the folder this package
ESP_HAL_3RDPARTY_URL = https://github.com/espressif/esp-hal-3rdparty.git
ARCH
arch/xtensa/src/esp32/Make.defs
arch/xtensa/src/esp32s2/Make.defs
arch/xtensa/src/esp32s3/Make.defs
arch/risc-v/src/common/espressif/Make.defs
arch/risc-v/src/esp32c3-legacy/Make.defs
but you can also add other packages (maybe also of apps)
resson:
using percpu storage for g_current_regs or leveraging interrupt status
registers to determine if code is running within an interrupt context can enhance performance.
Signed-off-by: hujun5 <hujun5@xiaomi.com>
detail: Add g_ prefix to can_dlc_to_len and len_to_can_dlc to
follow NuttX coding style conventions for global symbols,
improving code readability and maintainability.
Signed-off-by: zhaohaiyang1 <zhaohaiyang1@xiaomi.com>