reason:
Currently, if we need to schedule a task to another CPU, we have to completely halt the other CPU,
manipulate the scheduling linked list, and then resume the operation of that CPU. This process is both time-consuming and unnecessary.
During this process, both the current CPU and the target CPU are inevitably subjected to busyloop.
The improved strategy is to simply send a cross-core interrupt to the target CPU.
The current CPU continues to run while the target CPU responds to the interrupt, eliminating the certainty of a busyloop occurring.
Signed-off-by: hujun5 <hujun5@xiaomi.com>
reason:
by doing this we can reduce context switch time,
When we exit from an interrupt handler, we directly use tcb->xcp.regs
before
size nuttx
text data bss dec hex filename
225920 409 30925 257254 3ece6 nuttx
after
text data bss dec hex filename
225604 409 30925 256938 3ebaa nuttx
szie change -316
Signed-off-by: hujun5 <hujun5@xiaomi.com>
The feature depends on ARCH_USE_SEPARATED_SECTION
the different memory area has different access speed and cache
capability, so the arch can custom allocate them based on
section names to achieve performance optimization
test:
sim:elf
sim:sotest
Signed-off-by: dongjiuzhu1 <dongjiuzhu1@xiaomi.com>
We need to record the parent's integer register context upon exception
entry to a separate non-volatile area. Why?
Because xcp.regs can move due to a context switch within the fork() system
call, be it either via interrupt or a synchronization point.
Fix this by adding a "sregs" area where the saved user context is placed.
The critical section within fork() is also unnecessary.
There was an error in the fork() routine when system calls are in use:
the child context is saved on the child's user stack, which is incorrect,
the context must be saved on the kernel stack instead.
The result is a full system crash if (when) the child executes on a
different CPU which does not have the same MMU mappings active.
This commit fixes the regression from https://github.com/apache/nuttx/pull/13561
In order to determine whether a context switch has occurred,
we can use g_running_task to store the current regs.
This allows us to compare the current register state with the previously
stored state to identify if a context switch has taken place.
Signed-off-by: hujun5 <hujun5@xiaomi.com>
When the toolchain does not support atomic, it will use the version implemented by NuttX (low performance version). This scenario is consistent with the original design, so we can ignore it.
see bug here:
https://bugs.llvm.org/show_bug.cgi?id=43603
Error: inode/fs_inodeaddref.c:50:7: error: large atomic operation may incur significant performance penalty; the access size (4 bytes) exceeds the max lock-free size (0 bytes) [-Werror,-Watomic-alignment]
50 | atomic_fetch_add(&inode->i_crefs, 1);
| ^
/tools/clang-arm-none-eabi/lib/clang/17/include/stdatomic.h:152:43: note: expanded from macro 'atomic_fetch_add'
152 | #define atomic_fetch_add(object, operand) __c11_atomic_fetch_add(object, operand, __ATOMIC_SEQ_CST)
| ^
1 error generated.
make[1]: *** [Makefile:83: fs_inodeaddref.o] Error 1
Error: inode/fs_inodefind.c:74:7: error: large atomic operation may incur significant performance penalty; the access size (4 bytes) exceeds the max lock-free size (0 bytes) [-Werror,-Watomic-alignment]
74 | atomic_fetch_add(&node->i_crefs, 1);
Signed-off-by: chenrun1 <chenrun1@xiaomi.com>
Summary:
1.Modified the i_crefs from int16_t to atomic_int
2.Modified the i_crefs add, delete, read, and initialize interfaces to atomic operations
The purpose of this change is to avoid deadlock in cross-core scenarios, where A Core blocks B Core’s request for a write operation to A Core when A Core requests a read operation to B Core.
Signed-off-by: chenrun1 <chenrun1@xiaomi.com>
The aforementioned functions can/will fail if the C compiler decides
to use the stack for the incoming entrypt/etc. parameters.
Fix this issue by converting the jump to user part into pure assembly,
ensuring the stack is NOT used for the parameters.
The original code made the incorrect assumption that the amount of
translation levels is 3, but this is incorrect. The amount of levels is 4
and the amount of levels that are utilized / in use is set dynamically
from the amount of VA bits in use.
The VMSAv8-64 translation system has 4 page table levels in total, ranging
from 0-3. The address environment code assumes only 3 levels, from 1-3 but
this is wrong; the amount of levels _utilized_ depends on the configured
VA size CONFIG_ARM64_VA_BITS. With <= 39 bits 3 levels is enough, while
if the va range is larger, the 4th translation table level is taken into
use dynamically by shifting the base translation table level.
From arm64_mmu.c, where va_bits is the amount of va bits used in address
translations:
(va_bits <= 21) - base level 3
(22 <= va_bits <= 30) - base level 2
(31 <= va_bits <= 39) - base level 1
(40 <= va_bits <= 48) - base level 0
The base level is what is configured as the page directory root. This also
affects the performance of address translations i.e. if the VA range is
smaller, address translations are also faster as the page table walk is
shorter.
This defconfig is an example of the recorded stack and it became
faulty recently after the implementation of the `up_current_regs`
functions. The `noinstrument_function` directive must be used for
preventing it from being looped when instrumentation is enabled.
Also, this commit places `sched/instrument/stack_record.c` in IRAM.
with other functionalities removed.
reason:
by doing this we can reduce context switch time,
When we exit from an interrupt handler, we directly use tcb->xcp.regs
before
text data bss dec hex filename
138805 337 24256 163398 27e46 nuttx
after
text data bss dec hex filename
138499 337 24240 163076 27d04 nuttx
szie change -322
Signed-off-by: hujun5 <hujun5@xiaomi.com>
1. Similar to asan, supports single byte out of bounds detection
2. Fix the script to address the issue of not supporting the big end
Signed-off-by: wangmingrong1 <wangmingrong1@xiaomi.com>
1. Tested on QEMU, the two sockets were basically the same, and their performance was not affected. The size of the generated bin file was also the same
2. Extract global detection as a separate file, both types of Kasan support global variable out of bounds detection simultaneously
Signed-off-by: wangmingrong1 <wangmingrong1@xiaomi.com>
The simple improvement is designed to speed up compilation and reduce download errors on github and local.
Added a folder nxtmpdir for storing third-party packages
nuttxworkspace
|
|- nuttx
|- apps
|- nxtmpdir
tools/Unix.mk:
added export NXTMPDIR := $(WSDIR)/nxtmpdir
tools/configure.sh:
added option -S creates the nxtmpdir folder for third-party packages.
tools/Config.mk:
added macro
CLONE - Git clone repository.
CHECK_COMMITSHA - Check if the branch contains the commit SHA-1.
tools/testbuild.sh:
added option -S
For now I added in the folder this package
ESP_HAL_3RDPARTY_URL = https://github.com/espressif/esp-hal-3rdparty.git
ARCH
arch/xtensa/src/esp32/Make.defs
arch/xtensa/src/esp32s2/Make.defs
arch/xtensa/src/esp32s3/Make.defs
arch/risc-v/src/common/espressif/Make.defs
arch/risc-v/src/esp32c3-legacy/Make.defs
but you can also add other packages (maybe also of apps)
resson:
using percpu storage for g_current_regs or leveraging interrupt status
registers to determine if code is running within an interrupt context can enhance performance.
Signed-off-by: hujun5 <hujun5@xiaomi.com>
detail: Add g_ prefix to can_dlc_to_len and len_to_can_dlc to
follow NuttX coding style conventions for global symbols,
improving code readability and maintainability.
Signed-off-by: zhaohaiyang1 <zhaohaiyang1@xiaomi.com>
The number of exception for risc-v is 16 (0 ~ 15)
for the machine ISA version 1.12 or earlier, the number of exception is 20
(0 ~ 19) from the ISA version 1.13. And maybe changed in the future.
Using a dedicated option to control the exception number to allow the earlier
version chip with customized exception number (e.g. 16 ~ 19 used) to define
the exception reason string correctly.
Signed-off-by: Huang Qi <huangqi3@xiaomi.com>
Fix an issue of driver open failure caused by the following commit
that changes the initial value of inode reference.
43d0d95f81 fs/inode: using inode reference to indicate unlink and simply code
after enable both CONFIG_BUILD_KERNEL and CONFIG_ARCH_VMA_MAPPING
arch.h:141:18: error: 'ARCH_SHM_MAXPAGES' undeclared here
Signed-off-by: wanggang26 <wanggang26@xiaomi.com>
GOTPCRELX reloc available only for CONFIG_ARCH_ADDRENV=y
when CONFIG_ARCH_ADDRENV is not set, CONFIG_ARCH_TEXT_VBASE is not specified
so we can't relocate
Signed-off-by: p-szafonimateusz <p-szafonimateusz@xiaomi.com>
In SMP mode, qemu/goldfish platform, cpu0 use up_cpu_start()
to start others cpus.
But in previous patch(mathion ahead), arm_gic_initialize() will
wait others cpus start, so deadlocked!
Resolve:
Move the wait logic when use using sgi
Signed-off-by: ligd <liguiding1@xiaomi.com>
After move the SGI irq to group1, other cpu can't response the
sgi request from cpu0 when its gic not initialized.
So let cpu0 wait until all other cpus gic initialize done.
Signed-off-by: Bowen Wang <wangbowen6@xiaomi.com>
Revert "Parallelize depend file generation"
This reverts commit d5b6ec450f.
parallel depend ddc does not significantly speed up compilation,
intermediately generated .ddc files can cause problems if compilation is interrupted unexpectedly
Signed-off-by: xuxin19 <xuxin19@xiaomi.com>
change the extra library from a file to an import target;
this will avoid differences in the handling of static libraries
between different versions of cmake and different platforms.
after unifying as a target, extra libraries can be
handled as the same as other compiled libraries
Signed-off-by: xuxin19 <xuxin19@xiaomi.com>
prepare 16550 UART driver to support PCI:
- [breaking change] change argument of uart_ioctl() from `struct file *filep` to `FAR struct u16550_s *priv`
Also fix moxart_16550.c build related to this change
- [breaking change] change argument of uart_getreg() and uart_putreg from `uart_addrwidth_t base` to `FAR struct u16550_s *priv`
Also fix arch/x86/src/qemu/qemu_serial.c and arch/x86_64/src/intel64/intel64_serial.c related to this change
- [breaking change] change argument of uart_dmachan() from `uart_addrwidth_t base` to `FAR struct u16550_s *priv`
- move `struct u16550_s` to public header
- generalize UART_XXX_OFFSET so we can use it with any register increment
- make u16550_bind(), u16550_interrupt(), u16550_interrupt() public
- remove arch/or1k/src/common/or1k_uart.c and use common 16550 MIMO interfacve
- change irq type in `struct u16550_s` from uint8_t to int to match MSI API
Signed-off-by: p-szafonimateusz <p-szafonimateusz@xiaomi.com>
Some of PCI drivers require OS interfaces that can't be executed in the INIT context.
In that case we have to postpone PCI drivers probing and call it for example
in board initialization logic.
Signed-off-by: p-szafonimateusz <p-szafonimateusz@xiaomi.com>
When in a multi-core structure, as the intermediate core,
remote is both the master and slave;When the remote exception or
restart occurs, it needs to notify the slave and reestablish the connection
Signed-off-by: yintao <yintao@xiaomi.com>
add sim_rpmsg_virtio.c to verify the new rpmsg virtio wrapper layer,
new the rpmsg virtio can be used in sim platfrom
Signed-off-by: Yongrong Wang <wangyongrong@xiaomi.com>
This commit fixed the issue where the hardware timer wraps around and causes the system to halt.
Signed-off-by: ouyangxiangzhen <ouyangxiangzhen@xiaomi.com>
reason:
1 On different architectures, we can utilize more optimized strategies
to implement up_current_regs/up_set_current_regs.
eg. use interrupt registersor percpu registers.
code size
before
text data bss dec hex filename
262848 49985 63893 376726 5bf96 nuttx
after
text data bss dec hex filename
262844 49985 63893 376722 5bf92 nuttx
size change -4
Configuring NuttX and compile:
$ ./tools/configure.sh -l qemu-armv8a:nsh_smp
$ make
Running with qemu
$ qemu-system-aarch64 -cpu cortex-a53 -smp 4 -nographic \
-machine virt,virtualization=on,gic-version=3 \
-net none -chardev stdio,id=con,mux=on -serial chardev:con \
-mon chardev=con,mode=readline -kernel ./nuttx
Signed-off-by: hujun5 <hujun5@xiaomi.com>
Intel CET (Control-flow Enforcement Technology) is a hardware enhancement aimed at mitigating the Retpoline vulnerability, but it may impact CPU branch prediction performance. This commit added ARCH_INTEL64_DISABLE_CET, which can disable CET completely with compilation option `-fcf-protection=none`.
Signed-off-by: ouyangxiangzhen <ouyangxiangzhen@xiaomi.com>
It was discovered that attempting to load x86-64 format ELF files with a multiboot1 header using the qemu `-kernel` command would result in an error, as multiboot1 only allows x86-32 format ELF files. To address this limitation, we have developed a simple x86_32 bootloader. This bootloader is designed to copy the `nuttx.bin` file to the designated memory address (`0x100000`) and then transfer control to NuttX by executing a jump instruction (`jmp 0x100000`).
Signed-off-by: ouyangxiangzhen <ouyangxiangzhen@xiaomi.com>
These are changes to make HPET work with ACRN hypervisor:
- FSB interrupt delivery (which works like PCI MSI)
- 32-bit mode support
Signed-off-by: p-szafonimateusz <p-szafonimateusz@xiaomi.com>
TX clock or ref clock can be driven either from outside (PHY / oscilator) or by the ENET block.
Typical connection with RMII PHY is that the PHY drives the refclk.
Signed-off-by: Jukka Laitinen <jukkax@ssrc.tii.ae>
New configuration IMX9_HAVE_ATF_FIRMWARE introduced,
it is default on and it selects ARM64_HAVE_PSCI, when compiling
bootloader or when using bootloader that does not have atf
this shall be disabled
Signed-off-by: Jouni Ukkonen <jouni.ukkonen@unikie.com>
Using the HLT instruction in VM usually traps into the Hypervisor and releases CPU control. This will result in real-time performance degradation. Using the NOP or MWAIT instruction for an IDLE loop can reduce energy consumption while not trapping into the Hypervisor.
Signed-off-by: ouyangxiangzhen <ouyangxiangzhen@xiaomi.com>
Clean up the interrupt-driven logic in the driver; handle error cases properly,
remove dead code and simplify logic.
Signed-off-by: Jukka Laitinen <jukkax@ssrc.tii.ae>
Change "DMACH_HANDLE *handle" into "DMACH_HANDLE handle". The DMACH_HANDLE is already
defined as "void *".
Signed-off-by: Jukka Laitinen <jukkax@ssrc.tii.ae>
currently esp32 protected mode requires a patched bootloader.
it's a bit cumbersome to build the bootloader for that purpose.
this commit attempts to remove the need of the patched bootloader
by applying the changes by ourselves using esp hal.
Using the ts/tick conversion functions provided in clock.h
Do this caused we want speed up the time calculation, so change:
clock_time2ticks, clock_ticks2time, clock_timespec_add,
clock_timespec_compare, clock_timespec_subtract... to MACRO
Signed-off-by: ligd <liguiding1@xiaomi.com>
Enforcing the default 48-bit VA for everyone also implies a 4 page table
translation system. However, if less than 40 bits are needed, a full
translation table level can be dropped, making the translations faster.
Thus, make this into a configurable option, instead of enforcing the same
address widht for everyone.
Attempt to service all interrupts pending in the PLIC's claim register. Ideally, this is more efficient than switching context for each interrupt received.
Provides two implementations:
- CSR_CYCLE: Cores which implement hardware performance monitoring.
- CSR_TIME: Uses the machine time registers.
Using the up_perf_xx bindings directory is more efficient than performing a nanosecond conversion on every gettime event.
There is a tiny possibility that when a process is started a trap is
taken which causes a context switch. This moves the kernel stack
unexpectedly and the task start logic no longer works.
Fix this by recording the initial context location, and use that to
trampoline into the user process with interrupts disabled. This ensures
the context stays intact AND the kernel stack is fully unwound before
the user process starts.