reason:
To remove the "sync pause" and decouple the critical section from the dependency on enabling interrupts,
after that we need to further implement "schedlock + spinlock".
changelist
1 Modify the implementation of critical sections to no longer involve enabling interrupts or handling synchronous pause events.
2 GIC_SMP_CPUCALL attach to pause handler to remove arch interface up_cpu_paused_restore up_cpu_paused_save
3 Completely remove up_cpu_pause, up_cpu_resume, up_cpu_paused, and up_cpu_pausereq
4 change up_cpu_pause_async to up_send_cpu_sgi
Signed-off-by: hujun5 <hujun5@xiaomi.com>
As the handling of sp_el0 was moved from the context switch routine
to exception entry/exit, we must set sp_el0 explicitly when the user
process is first started.
Summary
The original implement for exception handler is very simple and
haven't framework for breakpoint/watchpoint routine or brk instruction.
I refine the fatal handler and add framework for debug handler to
register or unregister. this is a prepare for watchpoint/breakpoint
implement
Signed-off-by: qinwei1 <qinwei1@xiaomi.com>
for the citimon stats:
thread 0: thread 1:
enter_critical (t0)
up_switch_context
note suspend thread0 (t1)
thread running
IRQ happen, in ISR:
post thread0
up_switch_context
note resume thread0 (t2)
ISR continue f1
ISR continue f2
...
ISR continue fn
leave_critical (t3)
You will see, the thread 0, critical_section time is:
(t1 - t0) + (t3 - t2)
BUT, this result contains f1 f2 .. fn time spent, it is wrong
to tell user thead0 hold the critical lots of time but actually
not belong to it.
Resolve:
change the nxsched_suspend/resume_scheduler to real hanppends
Signed-off-by: ligd <liguiding1@xiaomi.com>
reason:
Currently, if we need to schedule a task to another CPU, we have to completely halt the other CPU,
manipulate the scheduling linked list, and then resume the operation of that CPU. This process is both time-consuming and unnecessary.
During this process, both the current CPU and the target CPU are inevitably subjected to busyloop.
The improved strategy is to simply send a cross-core interrupt to the target CPU.
The current CPU continues to run while the target CPU responds to the interrupt, eliminating the certainty of a busyloop occurring.
Signed-off-by: hujun5 <hujun5@xiaomi.com>
The aforementioned functions can/will fail if the C compiler decides
to use the stack for the incoming entrypt/etc. parameters.
Fix this issue by converting the jump to user part into pure assembly,
ensuring the stack is NOT used for the parameters.
The original code made the incorrect assumption that the amount of
translation levels is 3, but this is incorrect. The amount of levels is 4
and the amount of levels that are utilized / in use is set dynamically
from the amount of VA bits in use.
The VMSAv8-64 translation system has 4 page table levels in total, ranging
from 0-3. The address environment code assumes only 3 levels, from 1-3 but
this is wrong; the amount of levels _utilized_ depends on the configured
VA size CONFIG_ARM64_VA_BITS. With <= 39 bits 3 levels is enough, while
if the va range is larger, the 4th translation table level is taken into
use dynamically by shifting the base translation table level.
From arm64_mmu.c, where va_bits is the amount of va bits used in address
translations:
(va_bits <= 21) - base level 3
(22 <= va_bits <= 30) - base level 2
(31 <= va_bits <= 39) - base level 1
(40 <= va_bits <= 48) - base level 0
The base level is what is configured as the page directory root. This also
affects the performance of address translations i.e. if the VA range is
smaller, address translations are also faster as the page table walk is
shorter.
1. Similar to asan, supports single byte out of bounds detection
2. Fix the script to address the issue of not supporting the big end
Signed-off-by: wangmingrong1 <wangmingrong1@xiaomi.com>
1. Tested on QEMU, the two sockets were basically the same, and their performance was not affected. The size of the generated bin file was also the same
2. Extract global detection as a separate file, both types of Kasan support global variable out of bounds detection simultaneously
Signed-off-by: wangmingrong1 <wangmingrong1@xiaomi.com>
Revert "Parallelize depend file generation"
This reverts commit d5b6ec450f.
parallel depend ddc does not significantly speed up compilation,
intermediately generated .ddc files can cause problems if compilation is interrupted unexpectedly
Signed-off-by: xuxin19 <xuxin19@xiaomi.com>
reason:
1 On different architectures, we can utilize more optimized strategies
to implement up_current_regs/up_set_current_regs.
eg. use interrupt registersor percpu registers.
code size
before
text data bss dec hex filename
262848 49985 63893 376726 5bf96 nuttx
after
text data bss dec hex filename
262844 49985 63893 376722 5bf92 nuttx
size change -4
Configuring NuttX and compile:
$ ./tools/configure.sh -l qemu-armv8a:nsh_smp
$ make
Running with qemu
$ qemu-system-aarch64 -cpu cortex-a53 -smp 4 -nographic \
-machine virt,virtualization=on,gic-version=3 \
-net none -chardev stdio,id=con,mux=on -serial chardev:con \
-mon chardev=con,mode=readline -kernel ./nuttx
Signed-off-by: hujun5 <hujun5@xiaomi.com>
TX clock or ref clock can be driven either from outside (PHY / oscilator) or by the ENET block.
Typical connection with RMII PHY is that the PHY drives the refclk.
Signed-off-by: Jukka Laitinen <jukkax@ssrc.tii.ae>
New configuration IMX9_HAVE_ATF_FIRMWARE introduced,
it is default on and it selects ARM64_HAVE_PSCI, when compiling
bootloader or when using bootloader that does not have atf
this shall be disabled
Signed-off-by: Jouni Ukkonen <jouni.ukkonen@unikie.com>
Clean up the interrupt-driven logic in the driver; handle error cases properly,
remove dead code and simplify logic.
Signed-off-by: Jukka Laitinen <jukkax@ssrc.tii.ae>
Change "DMACH_HANDLE *handle" into "DMACH_HANDLE handle". The DMACH_HANDLE is already
defined as "void *".
Signed-off-by: Jukka Laitinen <jukkax@ssrc.tii.ae>
Enforcing the default 48-bit VA for everyone also implies a 4 page table
translation system. However, if less than 40 bits are needed, a full
translation table level can be dropped, making the translations faster.
Thus, make this into a configurable option, instead of enforcing the same
address widht for everyone.
There is a tiny possibility that when a process is started a trap is
taken which causes a context switch. This moves the kernel stack
unexpectedly and the task start logic no longer works.
Fix this by recording the initial context location, and use that to
trampoline into the user process with interrupts disabled. This ensures
the context stays intact AND the kernel stack is fully unwound before
the user process starts.
The register context is not needed, the original idea was to provide
the user stack pointer for signal handler delivery, but the user stack
can be obtained via sp_el0 so the context registers are not needed.
SP0 is not stored upon exception entry anyways, so this code is just
completely redundant and wrong.
reason:
In SMP, when a context switch occurs, restore_critical_section is executed.
To reduce the time taken for context switching, we directly pass the required
parameters to restore_critical_section instead of acquiring them repeatedly.
Signed-off-by: hujun5 <hujun5@xiaomi.com>