Signed-off-by: dulibo1 <dulibo1@xiaomi.com>
1.call_data->refcount is not remote_cpus;
2.nxsched_smp_call_add enqueue the call_data;
3.maybe nxsched_smp_call_handler is just doing;
4.dequeue the call_data but call_data->refcount is not corret;
5.unpredictability case will occur;
reason:
Currently, if we need to schedule a task to another CPU, we have to completely halt the other CPU,
manipulate the scheduling linked list, and then resume the operation of that CPU. This process is both time-consuming and unnecessary.
During this process, both the current CPU and the target CPU are inevitably subjected to busyloop.
The improved strategy is to simply send a cross-core interrupt to the target CPU.
The current CPU continues to run while the target CPU responds to the interrupt, eliminating the certainty of a busyloop occurring.
Signed-off-by: hujun5 <hujun5@xiaomi.com>
reason:
1In the scenario of active waiting, context switching is inevitable, and we can eliminate redundant judgments.
code size
before
hujun5@hujun5-OptiPlex-7070:~/downloads1/vela_sim/nuttx$ size nuttx
text data bss dec hex filename
262848 49985 63893 376726 5bf96 nuttx
after
hujun5@hujun5-OptiPlex-7070:~/downloads1/vela_sim/nuttx$ size nuttx
text data bss dec hex filename
263324 49985 63893 377202 5c172 nuttx
reduce code size by -476
Configuring NuttX and compile:
$ ./tools/configure.sh -l qemu-armv8a:nsh_smp
$ make
Running with qemu
$ qemu-system-aarch64 -cpu cortex-a53 -smp 4 -nographic \
-machine virt,virtualization=on,gic-version=3 \
-net none -chardev stdio,id=con,mux=on -serial chardev:con \
-mon chardev=con,mode=readline -kernel ./nuttx
Signed-off-by: hujun5 <hujun5@xiaomi.com>
The feature depends on ARCH_USE_SEPARATED_SECTION
the different memory area has different access speed and cache
capability, so the arch can custom allocate them based on
section names to achieve performance optimization
test:
sim:elf
sim:sotest
Signed-off-by: dongjiuzhu1 <dongjiuzhu1@xiaomi.com>
This commit simplified thread ID dispatching logic by integrating it into the `nxsig_dispatch` function.
Signed-off-by: ouyangxiangzhen <ouyangxiangzhen@xiaomi.com>
There will be a large performance loss after SCHED_CRITMONITOR is enabled.
By isolating thread running time-related functions, CPU load can be run with less overhead.
Signed-off-by: yinshengkai <yinshengkai@xiaomi.com>
Signed-off-by: buxiasen <buxiasen@xiaomi.com>
When statistics when sched_cpuload_critMonitor, thread switching only
calculates the current CPU's running time. Other CPU running threads
are updated for updates
Signed-off-by: yinshengkai <yinshengkai@xiaomi.com>
Converted the timer overrun check from a loop-based approach to a division-based method. This change ensures a deterministic worst-case execution time (WCET), even if it might not outperform the loop in average scenarios.
Signed-off-by: ouyangxiangzhen <ouyangxiangzhen@xiaomi.com>
This patch added support for SIGEV_THREAD_ID and sigev_notify_thread_id.
Signed-off-by: ouyangxiangzhen <ouyangxiangzhen@xiaomi.com>
Signed-off-by: ligd <liguiding1@xiaomi.com>
When we do not drop notifier from g_notifier_pending, we need an
isolated dq entry for this queue, otherwise the queued work_s's dq entry
may be modified by the work queue and breaks the chain of
g_notifier_pending.
Signed-off-by: Zhe Weng <wengzhe@xiaomi.com>
Return 1 to indicate the work was not cancelled.
Because it is currently being processed by work thread,
but wait for it to finish.
Signed-off-by: ligd <liguiding1@xiaomi.com>
Revert "Parallelize depend file generation"
This reverts commit d5b6ec450f.
parallel depend ddc does not significantly speed up compilation,
intermediately generated .ddc files can cause problems if compilation is interrupted unexpectedly
Signed-off-by: xuxin19 <xuxin19@xiaomi.com>
reason:
1 On different architectures, we can utilize more optimized strategies
to implement up_current_regs/up_set_current_regs.
eg. use interrupt registersor percpu registers.
code size
before
text data bss dec hex filename
262848 49985 63893 376726 5bf96 nuttx
after
text data bss dec hex filename
262844 49985 63893 376722 5bf92 nuttx
size change -4
Configuring NuttX and compile:
$ ./tools/configure.sh -l qemu-armv8a:nsh_smp
$ make
Running with qemu
$ qemu-system-aarch64 -cpu cortex-a53 -smp 4 -nographic \
-machine virt,virtualization=on,gic-version=3 \
-net none -chardev stdio,id=con,mux=on -serial chardev:con \
-mon chardev=con,mode=readline -kernel ./nuttx
Signed-off-by: hujun5 <hujun5@xiaomi.com>
Using the ts/tick conversion functions provided in clock.h
Do this caused we want speed up the time calculation, so change:
clock_time2ticks, clock_ticks2time, clock_timespec_add,
clock_timespec_compare, clock_timespec_subtract... to MACRO
Signed-off-by: ligd <liguiding1@xiaomi.com>
1 Only the idle task can have the flag TCB_FLAG_CPU_LOCKED.
According to the code logic, btcb cannot be an idle task, so this check can be removed.
2 Optimized the preemption logic check and removed the call to nxsched_add_prioritized.
3 Speed up the scheduling time while avoiding the potential for
tasks to be moved multiple times between g_assignedtasks and g_readytorun.
Configuring NuttX and compile:
$ ./tools/configure.sh -l qemu-armv8a:nsh_smp
$ make
Running with qemu
$ qemu-system-aarch64 -cpu cortex-a53 -smp 4 -nographic \
-machine virt,virtualization=on,gic-version=3 \
-net none -chardev stdio,id=con,mux=on -serial chardev:con \
-mon chardev=con,mode=readline -kernel ./nuttx
Signed-off-by: hujun5 <hujun5@xiaomi.com>
Most tools used for compliance and SBOM generation use SPDX identifiers
This change brings us a step closer to an easy SBOM generation.
Signed-off-by: Alin Jerpelea <alin.jerpelea@sony.com>
After these wdog refactor:
We conducted a latency measurement using the rt-tests/cyclictest (commit cadd661) on an x86_64 NUC12 equipped with an i7-1255U processor and 16GB of LPDDR5 memory. The specific command used for this microbenchmark was cyclictest -q -l 100000 -h 30000, which is designed to assess the responsiveness of the cyclic timer.
The findings from our benchmark are summarized below, highlighting the minimum, median, and maximum latency values for each operating system tested:
Operating System Minimum Latency (us) Median Latency (us) Maximum Latency (us)
Linux 48 53 410
PreemptRT 6 57 148
Xenomai 53 53 64
NuttX 64 626 1212
NuttX (refactor) 1 1 3
In this table, "Min" indicates the shortest latency observed, "Median" represents the middle value of the latency distribution, and "Max" denotes the longest latency encountered.
The systems tested were as follows:
Linux: ACRN version 6.1.80 (commit f528146)
PreemptRT: Linux kernel 5.4.251 with the 5.4.254-rt85 patch applied
Xenomai: Linux kernel 5.4.251 patched with ipipe-core-5.4.239-x86-13
These results clearly demonstrate the varying performance of different operating systems in terms of timer latency, the refactored NuttX showing particularly low latency values.
Signed-off-by: ligd <liguiding1@xiaomi.com>
Now we have CONFIG_USEC_PER_TICK, and for our timer system, all the calculation used 'tick'.
And all the timespec should change to 'tick' before use wd_start(), so USEC2TICK() can NOT be avoided.
Then there must be an 'less then one tick' loss.
One resolution:
ticks++ anyway when wd_start(). But this will caused time expired more a tick.
Another resolution:
Change the testcase, and allow the following logic:
t1 = current_time();
sleep(3);
t2 = current_time();
allow: t2 - t1 >= 3;
(original test must be: t2- t1 > 3)
The original test think the time must be elapse-ing, and the (t2 - t1) must bigger then 3,
but in our system, we use 'tick' as the minimal wdog unit, then there must a precision loss.
Now we choose first resolution.
Signed-off-by: ligd <liguiding1@xiaomi.com>