move initial RSP for AP cores below regs area.
otherwise IDLE thread for AP cores can be corrupted
XCP region now match regs allocation in up_initial_state()
Signed-off-by: p-szafonimateusz <p-szafonimateusz@xiaomi.com>
As the handling of sp_el0 was moved from the context switch routine
to exception entry/exit, we must set sp_el0 explicitly when the user
process is first started.
Summary
The original implement for exception handler is very simple and
haven't framework for breakpoint/watchpoint routine or brk instruction.
I refine the fatal handler and add framework for debug handler to
register or unregister. this is a prepare for watchpoint/breakpoint
implement
Signed-off-by: qinwei1 <qinwei1@xiaomi.com>
reason:
by doing this we can reduce context switch time,
When we exit from an interrupt handler, we directly use tcb->xcp.regs
before
text data bss dec hex filename
178368 876 130604 309848 4ba58 nuttx
after
text data bss dec hex filename
178120 876 130212 309208 4b7d8 nuttx
szie change -248
Signed-off-by: hujun5 <hujun5@xiaomi.com>
for the citimon stats:
thread 0: thread 1:
enter_critical (t0)
up_switch_context
note suspend thread0 (t1)
thread running
IRQ happen, in ISR:
post thread0
up_switch_context
note resume thread0 (t2)
ISR continue f1
ISR continue f2
...
ISR continue fn
leave_critical (t3)
You will see, the thread 0, critical_section time is:
(t1 - t0) + (t3 - t2)
BUT, this result contains f1 f2 .. fn time spent, it is wrong
to tell user thead0 hold the critical lots of time but actually
not belong to it.
Resolve:
change the nxsched_suspend/resume_scheduler to real hanppends
Signed-off-by: ligd <liguiding1@xiaomi.com>
Register: smp
Register: nsh
Register: sh
Register: getprime
Register: ostest
Espressif HAL for 3rd Party Platforms: b4c723a119344b4b71d69819019d55637fb570fd
common/xtensa_cpupause.c: In function 'xtensa_pause_handler':
common/xtensa_cpupause.c:240:3: warning: implicit declaration of function 'xtensa_savestate'; did you mean 'xtensa_setps'? [-Wimplicit-function-declaration]
240 | xtensa_savestate(tcb->xcp.regs);
| ^~~~~~~~~~~~~~~~
| xtensa_setps
common/xtensa_cpupause.c:243:3: warning: implicit declaration of function 'xtensa_restorestate'; did you mean 'xtensa_context_restore'? [-Wimplicit-function-declaration]
243 | xtensa_restorestate(tcb->xcp.regs);
| ^~~~~~~~~~~~~~~~~~~
| xtensa_context_restore
Signed-off-by: hujun5 <hujun5@xiaomi.com>
/vela/nuttx/drivers/pci/pci_ecam.c:432:(.text.pci_ecam_get_irq+0x16): undefined reference to `up_get_legacy_irq'
Signed-off-by: Yongrong Wang <wangyongrong@xiaomi.com>
reason:
Currently, if we need to schedule a task to another CPU, we have to completely halt the other CPU,
manipulate the scheduling linked list, and then resume the operation of that CPU. This process is both time-consuming and unnecessary.
During this process, both the current CPU and the target CPU are inevitably subjected to busyloop.
The improved strategy is to simply send a cross-core interrupt to the target CPU.
The current CPU continues to run while the target CPU responds to the interrupt, eliminating the certainty of a busyloop occurring.
Signed-off-by: hujun5 <hujun5@xiaomi.com>
reason:
by doing this we can reduce context switch time,
When we exit from an interrupt handler, we directly use tcb->xcp.regs
before
size nuttx
text data bss dec hex filename
225920 409 30925 257254 3ece6 nuttx
after
text data bss dec hex filename
225604 409 30925 256938 3ebaa nuttx
szie change -316
Signed-off-by: hujun5 <hujun5@xiaomi.com>
The feature depends on ARCH_USE_SEPARATED_SECTION
the different memory area has different access speed and cache
capability, so the arch can custom allocate them based on
section names to achieve performance optimization
test:
sim:elf
sim:sotest
Signed-off-by: dongjiuzhu1 <dongjiuzhu1@xiaomi.com>
We need to record the parent's integer register context upon exception
entry to a separate non-volatile area. Why?
Because xcp.regs can move due to a context switch within the fork() system
call, be it either via interrupt or a synchronization point.
Fix this by adding a "sregs" area where the saved user context is placed.
The critical section within fork() is also unnecessary.
There was an error in the fork() routine when system calls are in use:
the child context is saved on the child's user stack, which is incorrect,
the context must be saved on the kernel stack instead.
The result is a full system crash if (when) the child executes on a
different CPU which does not have the same MMU mappings active.
This commit fixes the regression from https://github.com/apache/nuttx/pull/13561
In order to determine whether a context switch has occurred,
we can use g_running_task to store the current regs.
This allows us to compare the current register state with the previously
stored state to identify if a context switch has taken place.
Signed-off-by: hujun5 <hujun5@xiaomi.com>
When the toolchain does not support atomic, it will use the version implemented by NuttX (low performance version). This scenario is consistent with the original design, so we can ignore it.
see bug here:
https://bugs.llvm.org/show_bug.cgi?id=43603
Error: inode/fs_inodeaddref.c:50:7: error: large atomic operation may incur significant performance penalty; the access size (4 bytes) exceeds the max lock-free size (0 bytes) [-Werror,-Watomic-alignment]
50 | atomic_fetch_add(&inode->i_crefs, 1);
| ^
/tools/clang-arm-none-eabi/lib/clang/17/include/stdatomic.h:152:43: note: expanded from macro 'atomic_fetch_add'
152 | #define atomic_fetch_add(object, operand) __c11_atomic_fetch_add(object, operand, __ATOMIC_SEQ_CST)
| ^
1 error generated.
make[1]: *** [Makefile:83: fs_inodeaddref.o] Error 1
Error: inode/fs_inodefind.c:74:7: error: large atomic operation may incur significant performance penalty; the access size (4 bytes) exceeds the max lock-free size (0 bytes) [-Werror,-Watomic-alignment]
74 | atomic_fetch_add(&node->i_crefs, 1);
Signed-off-by: chenrun1 <chenrun1@xiaomi.com>
Summary:
1.Modified the i_crefs from int16_t to atomic_int
2.Modified the i_crefs add, delete, read, and initialize interfaces to atomic operations
The purpose of this change is to avoid deadlock in cross-core scenarios, where A Core blocks B Core’s request for a write operation to A Core when A Core requests a read operation to B Core.
Signed-off-by: chenrun1 <chenrun1@xiaomi.com>
The aforementioned functions can/will fail if the C compiler decides
to use the stack for the incoming entrypt/etc. parameters.
Fix this issue by converting the jump to user part into pure assembly,
ensuring the stack is NOT used for the parameters.
The original code made the incorrect assumption that the amount of
translation levels is 3, but this is incorrect. The amount of levels is 4
and the amount of levels that are utilized / in use is set dynamically
from the amount of VA bits in use.
The VMSAv8-64 translation system has 4 page table levels in total, ranging
from 0-3. The address environment code assumes only 3 levels, from 1-3 but
this is wrong; the amount of levels _utilized_ depends on the configured
VA size CONFIG_ARM64_VA_BITS. With <= 39 bits 3 levels is enough, while
if the va range is larger, the 4th translation table level is taken into
use dynamically by shifting the base translation table level.
From arm64_mmu.c, where va_bits is the amount of va bits used in address
translations:
(va_bits <= 21) - base level 3
(22 <= va_bits <= 30) - base level 2
(31 <= va_bits <= 39) - base level 1
(40 <= va_bits <= 48) - base level 0
The base level is what is configured as the page directory root. This also
affects the performance of address translations i.e. if the VA range is
smaller, address translations are also faster as the page table walk is
shorter.
This defconfig is an example of the recorded stack and it became
faulty recently after the implementation of the `up_current_regs`
functions. The `noinstrument_function` directive must be used for
preventing it from being looped when instrumentation is enabled.
Also, this commit places `sched/instrument/stack_record.c` in IRAM.
with other functionalities removed.
reason:
by doing this we can reduce context switch time,
When we exit from an interrupt handler, we directly use tcb->xcp.regs
before
text data bss dec hex filename
138805 337 24256 163398 27e46 nuttx
after
text data bss dec hex filename
138499 337 24240 163076 27d04 nuttx
szie change -322
Signed-off-by: hujun5 <hujun5@xiaomi.com>
1. Similar to asan, supports single byte out of bounds detection
2. Fix the script to address the issue of not supporting the big end
Signed-off-by: wangmingrong1 <wangmingrong1@xiaomi.com>
1. Tested on QEMU, the two sockets were basically the same, and their performance was not affected. The size of the generated bin file was also the same
2. Extract global detection as a separate file, both types of Kasan support global variable out of bounds detection simultaneously
Signed-off-by: wangmingrong1 <wangmingrong1@xiaomi.com>