The spawn proxy thread is a special existence in NuttX, usually some developers
spend a lot of time on stack overflow of spawn proxy thread:
https://github.com/apache/nuttx/issues/9046https://github.com/apache/nuttx/pull/9081
In order to avoid similar issues, this PR will remove spawn proxy thread to simplify
the process of task/posix_spawn().
1. Postpone the related processing of spawn file actions until after task_init()
2. Delete the temporary thread of spawn proxy and related global variables
Signed-off-by: chao an <anchao@xiaomi.com>
This commit adds Linux like adjtime() interface that is used to correct
the system time clock if it varies from real value. The adjustment is
done by slight adjustment of clock period and therefore the adjustment
is without time jumps (both forward and backwards)
The implementation is enabled by CONFIG_CLOCK_ADJTIME and separated from
CONFIG_CLOCK_TIMEKEEPING functions. Options CONFIG_CLOCK_ADJTIME_SLEWLIMIT
and CONFIG_CLOCK_ADJTIME_PERIOD can be used to control the adjustment
speed.
Interfaces up_get_timer_period() and up_adj_timer_period() has to be
defined by architecture level support.
This is not a POSIX interface but derives from 4.3BSD, System V.
It is also supported for Linux compatibility.
Signed-off-by: Michal Lenc <michallenc@seznam.cz>
Store the old environment in a local context so another temporary address
environment can be selected. This can happen especially when a process
is being loaded (the new process's mappings are temporarily instantiated)
and and interrupt occurs.
The current implementation requires the use of enter_critical_section, so the source code needs to be moved to kernel space
Signed-off-by: hujun5 <hujun5@xiaomi.com>
pthread_cond_wait is preempted after releasing the lock, sched_lock cannot lock threads from other CPUs, use enter_critical_section
Signed-off-by: hujun5 <hujun5@xiaomi.com>
Instead of using a volatile storage for the address environment in the
binfmt / loadinfo structures, always allocate the address environment
from kheap.
This serves two purposes:
- If the task creation fails, any kernel thread that depends on the
address environment created during task creation will not lose their
mappings (because they hold a reference to it)
- The current address environment variable (g_addrenv) will NEVER contain
a stale / incorrect value
- Releasing the address environment is simplified as any pointer given
to addrenv_drop() can be assumed to be heap memory
- Makes the kludge function addrenv_clear_current irrelevant, as the
system will NEVER have invalid mappings any more
Problem:
AppBringup task in default priority 240 ->
board_late_initialize() ->
some driver called work_queue() ->
nxsem_post(&(wqueue).sem) failed because sem_count is 0
hp work_thread in default priority 224 ->
nxsem_wait_uninterruptible(&wqueue->sem);
so hp_work_thread can't wake up, worker can't run immediately.
Signed-off-by: dongjiuzhu1 <dongjiuzhu1@xiaomi.com>
- Remove the temporary "saved" variable when temporarily changing MMU
mappings to access another process's memory. The fact that it has an
address environment is enough to make the choice
- Restore nxflat_addrenv_restore-macro. It was accidentally lost when
the address environment handling was re-factored.
- The code will detect an error condition described in
https://cwiki.apache.org/confluence/display/NUTTX/Signaling+Semaphores+and+Priority+Inheritance
- The kernel will go to PANIC if semaphore holder can't be allocated even
if CONFIG_DEBUG_ASSERTIONS is disabled
- Clean-up code that handled posing of semaphore with priority inheritance
enabled from the interrupt context (remove nxsem_restore_baseprio_irq())
Summary:
- Support arm64 pmu api, Currently only the cycle counter function is supported.
- Using ARM64 PMU hardware capability to implement perf interface, modify all
perf interface related code.
- Support for pmu init under smp.
Signed-off-by: wangming9 <wangming9@xiaomi.com>
After enabling this option, you can automatically trace the function instrumentation without adding tracepoint manually.
This is similar to the Function Trace effect of the linux kernel
Signed-off-by: yinshengkai <yinshengkai@xiaomi.com>
Assert in nxsem_post if:
- Priority inheritance is enabled on a semaphore
- A thread that does not hold the semaphore attempts to post it
This will detect an error condition described in https://cwiki.apache.org/confluence/display/NUTTX/Signaling+Semaphores+and+Priority+Inheritance
None. The debug instrumentation is only enabled if CONFIG_DEBUG_ASSERTIONS is enabled.
Use sim:ostest. Verify that no assertions occur.
Compilation error occurs after SCHED_CRITMONITOR is enabled
sched/sched_critmonitor.c:315: undefined reference to `serr'
Signed-off-by: yinshengkai <yinshengkai@xiaomi.com>
sem_t is user memory and the correct mappings are needed to perform
the semaphore wait interruption.
Otherwise either a page fault, or access to the WRONG address environment
happens.
Refer to issue #8867 for details and rational.
Convert sigset_t to an array type so that more than 32 signals can be supported.
Why not use a uin64_t?
- Using a uin32_t is more flexible if we decide to increase the number of signals beyound 64.
- 64-bit accesses are not atomic, at least not on 32-bit ARMv7-M and similar
- Keeping the base type as uint32_t does not introduce additional overhead due to padding to achieve 64-bit alignment of uin64_t
- Some architectures still supported by NuttX do not support uin64_t
types,
Increased the number of signals to 64. This matches Linux. This will support all xsignals defined by Linux and also 32 real time signals (also like Linux).
This is is a work in progress; a draft PR that you are encouraged to comment on.
Calling syslog to print logs in clock_gettime will cause the system to have recursive output, i.e., clock_gettime->sinfo->syslog->clock_gettime, with the consequences of stack overflow or non-stop log output.
Decouple the semcount and the work queue length.
Previous Problem:
If a work is queued and cancelled in high priority threads (or queued
by timer and cancelled by another high priority thread) before
work_thread runs, the queue operation will mark work_thread as ready to
run, but the cancel operation minus the semcount back to -1 and makes
wqueue->q empty. Then the work_thread still runs, found empty queue,
and wait sem again, then semcount becomes -2 (being minused by 1)
This can be done multiple times, then semcount can become very small
value. Test case to produce incorrect semcount:
high_priority_task()
{
for (int i = 0; i < 10000; i++)
{
work_queue(LPWORK, &work, worker, NULL, 0);
work_cancel(LPWORK, &work);
usleep(1);
}
/* Now the g_lpwork.sem.semcount is a value near -10000 */
}
With incorrect semcount, any queue operation when the work_thread is
busy, will only increase semcount and push work into queue, but cannot
trigger work_thread (semcount is negative but work_thread is not
waiting), then there will be more and more works left in queue while
the work_thread is waiting sem and cannot call them.
Signed-off-by: Zhe Weng <wengzhe@xiaomi.com>
The _unmasked_ signal action was never added if the task is in system call
and waiting for (a different) signal.
This fixes deliver especially for default signal actions / unmaskable
signals, like SIGTERM.
As far as I can interpret how signal delivery should work when the signal
is blocked, it should still be sent to the pending queue even if the signal
is masked. When the sigmask changes it will be delivered.
The original implementation did not add the pending signal action, if
stcb->task_state == TSTATE_WAIT_SIG is true.
An attempt to patch this was made in #8563 but it is insufficient as it
creates an issue when the task is not waiting for a signal, but is in
syscall, in this case the signal is incorrectly queued twice.
since the chip/board vendor could disable dirvers/note and
provide the implementation of sched_note_xxx by self
Signed-off-by: Xiang Xiao <xiaoxiang@xiaomi.com>
Remove calls to the userspace API exit() from the kernel. The problem
with doing such calls is that the exit functions are called with kernel
mode privileges which is a big security no-no.
Do not allow a deferred cancellation if the group is exiting, it is too
dangerous to allow the threads to execute any user space code after the
exit has started.
If the cancelled thread is not inside a cancellation point, just kill it
immediately via asynchronous cancellation. This will create far less
problems than allowing it to continue running user code.
For some reason the signal action was never performed if the receiveing
task was within a system call, the pending queue inser was simply missing.
This fixes the issue.
There is an issue where the wrong process exit code is given to the parent
when a process exits. This happens when the process has pthreads running
user code i.e. not within a cancel point / system call.
Why does this happen ?
When exit() is called, the following steps are done:
- group_kill_children(), which tells the children to die via pthread_cancel()
Then, one of two things can happen:
1. if the child is in a cancel point, it gets scheduled to allow it to leave
the cancel point and gets destroyed immediately
2. if the child is not in a cancel point, a "cancel pending" flag is set and
the child will die when the next cancel point is encountered
So what is the problem here?
The last thread alive dispatches SIGCHLD to the parent, which carries the
process's exit code. The group head has the only meaningful exit code and
this is what should be passed. However, in the second case, the group head
exits before the child, taking the process exit code to its grave. The child
that was alive will exit next and will pass its "status" to the parent process,
but this status is not the correct value to pass.
This commit fixes the issue by passing the group head's exit code ALWAYS to
the parent process.
The function is not relevant any longer, remove it. Also remove
save_addrenv_t, the parameter taken by up_addrenv_restore.
Implement addrenv_select() / addrenv_restore() to handle the temporary
instantiation of address environments, e.g. when a process is being
created.
There is currently a big problem in the address environment handling which
is that the address environment is released too soon when the process is
exiting. The current MMU mappings will always be the exiting process's, which means
the system needs them AT LEAST until the next context switch happens. If
the next thread is a kernel thread, the address environment is needed for
longer.
Kernel threads "lend" the address environment of the previous user process.
This is beneficial in two ways:
- The kernel processes do not need an allocated address environment
- When a context switch happens from user -> kernel or kernel -> kernel,
the TLB does not need to be flushed. This must be done only when
changing to a different user address environment.
Another issue is when a new process is created; the address environment
of the new process must be temporarily instantiated by up_addrenv_select().
However, the system scheduler does not know that the process has a different
address environment to its own and when / if a context restore happens, the
wrong MMU page directory is restored and the process will either crash or
do something horribly wrong.
The following changes are needed to fix the issues:
- Add mm_curr which is the current address environment of the process
- Add a reference counter to safeguard the address environment
- Whenever an address environment is mapped to MMU, its reference counter
is incremented
- Whenever and address environment is unmapped from MMU, its reference
counter is decremented, and tested. If no more references -> drop the
address environment and release the memory as well
- To limit the context switch delay, the address environment is freed in
a separate low priority clean-up thread (LPWORK)
- When a process temporarily instantiates another process's address
environment, the scheduler will now know of this and will restore the
correct mappings to MMU
Why is this not causing more noticeable issues ? The problem only happens
under the aforementioned special conditions, and if a context switch or
IRQ occurs during this time.
Detach the address environment handling from the group structure to the
tcb. This is preparation to fix rare cases where the system (MMU) is left
without a valid page directory, e.g. when a process exits.
NuttX kernel should not use the syscall functions, especially after
enabling CONFIG_SCHED_INSTRUMENTATION_SYSCALL, all system functions
will be traced to backend, which will impact system performance.
Signed-off-by: chao an <anchao@xiaomi.com>
Implement a function for dropping references to the group structure and
finally freeing the allocated memory, if the group has been marked for
destruction
The number of work entries will be inconsistent with semaphore count
if the work is canceled, in extreme case, semaphore count will overflow
and fallback to 0 the workqueue will stop scheduling the enqueue work.
Signed-off-by: chao an <anchao@xiaomi.com>
continue the follow work:
commit 43e7b13697
Author: Xiang Xiao <xiaoxiang@xiaomi.com>
Date: Sun Jan 22 19:31:32 2023 +0800
assert: Log the assertion expression in case of fail
Signed-off-by: Xiang Xiao <xiaoxiang@xiaomi.com>
A testcase as following:
child_task()
{
sleep(3);
}
main_task()
{
while (1)
{
ret = task_create("child_task", child_task, );
sleep(1);
task_delete(ret);
}
}
Root casuse:
task_delete hasn's cover the condition that the deleted-task
is justing running on the other CPU.
Fix:
Let the nxsched_remove_readytorun() do the real work
Signed-off-by: ligd <liguiding1@xiaomi.com>
1. When pthread exit, set the default cancellability state to NONCANCELABLE state.
2. Make sure modify tcb->flags is atomic operations.
Signed-off-by: zhangyuan21 <zhangyuan21@xiaomi.com>
This is just unnecessary, a process cannot be destroyed by another
process in any case, every time this is executed the active address
environment is the process getting destroyed.
Even in the hypothetical case this was possible, the system would
crash at once if a context switch happens between "select()" and
"restore()", which is possible as the granule allocator is protected by
a semaphore (which is a synchronization point).
- Also remove the nuttx private shm.h file nuttx/mm/shm.h, which became redundant
- Also remove the gran allocator initialization/release in binfmt since common
vpage allocator is initialized in group_create/group_leave
Signed-off-by: Jukka Laitinen <jukkax@ssrc.tii.ae>
tg_info is still in use after task_uninit_info(), unifies
lib_stream_* with life cycle of task info to avoid this issue.
| ==1940861==ERROR: AddressSanitizer: heap-use-after-free on address 0xf47032e0 at pc 0x5676dc4f bp 0xf2f38c68 sp 0xf2f38c58
|
|#10 0xf7abec89 in __asan::__asan_report_load2 (addr=4100993760) at ../../../../src/libsanitizer/asan/asan_rtl.cpp:119
|#11 0x5677356a in nxsem_destroy (sem=0xf47032e0) at semaphore/sem_destroy.c:73
|#12 0x56773695 in sem_destroy (sem=0xf47032e0) at semaphore/sem_destroy.c:120
|#13 0x5676faa2 in nxmutex_destroy (mutex=0xf47032e0) at include/nuttx/mutex.h:126
|#14 0x567a3430 in lib_stream_release (group=0xf4901ba0) at stdio/lib_libstream.c:98
|#15 0x5676da75 in group_release (group=0xf4901ba0) at group/group_leave.c:162
|#16 0x5676e51c in group_leave (tcb=0xf5377740) at group/group_leave.c:360
|#17 0x569fe79b in nxtask_exithook (tcb=0xf5377740, status=0) at task/task_exithook.c:455
|#18 0x569f90b9 in _exit (status=0) at task/exit.c:82
|#19 0x56742680 in exit (status=0) at stdlib/lib_exit.c:61
|#20 0x56a69c78 in iperf_showusage (progname=0xf2f28838 "iperf", exitcode=0) at iperf_main.c:91
|#21 0x56a6a6ec in iperf_main (argc=1, argv=0xf2f28830) at iperf_main.c:140
|#22 0x5679c148 in nxtask_startup (entrypt=0x56a69c78 <iperf_main>, argc=1, argv=0xf2f28830) at sched/task_startup.c:70
|#23 0x56767f58 in nxtask_start () at task/task_start.c:134
Signed-off-by: chao an <anchao@xiaomi.com>
The task_group specific list can be used to store information about
mmappings.
For a driver or filesystem performing mmap can also enable munmap by
adding an item to this list using mm_map_add(). The item is then
returned in the corresponding munmap call.
Signed-off-by: Jukka Laitinen <jukkax@ssrc.tii.ae>
Summary:
- I noticed that nxplayer (HTTP audio streaming) + command execution
via telnet sometimes causes memory corruption.
See https://github.com/apache/nuttx/pull/7947 for the detail.
- This commit fixes this issue by calling lib_stream_release() before
lib_stream_release() in group_leave.c
Impact:
- Should be none
Testing:
- Tested with spresense:wifi_smp
Signed-off-by: Masayuki Ishikawa <Masayuki.Ishikawa@jp.sony.com>
When the signal sent by the sender is blocked in the target task,
if the target task has an action registered with sa_flags SA_KENELHAND,
it will directly respond to the action in the context of the sender.
When the action is executed, it will have the parameters set by the
target task with sigaction:sa_user.
Signed-off-by: dongjiuzhu1 <dongjiuzhu1@xiaomi.com>
_assert is a kernel procedure, entered via system call to make the core
dump in privileged mode.
Running exit() from this context is not OK as it runs the registered
exit functions and flushes streams, which must not be done
from privileged mode as it is a security hole.
Thus, implement assert() into user space (again) and remove the exit()
call from the kernel procedure.
Instantiate the correct address environment when reading the process's
argument vector. Otherwise doing this will crash the system every time,
causing a recursive assert loop.
Also try to do a bit of sanity checking before attempting to read the
process's memory, it might be in a bad state in which case this will
fail anyway.
This is preparation for flushing streams from user space, like it should
be done.
- Move tg_streamlist (group, kernel space) ->
ta_streamlist (TLS, user space)
- Access stream list via tg_info in kernel
- Access stream list via TLS in user space
- Remove / rename nxsched_get_streams -> lib_getstreams
- Remove system call for nxsched_get_streams
The user stack is dependent on the user address environment. As the
process exits, this address environment is destroyed anyway, so the
stack does not need to be released separately.
There is also an issue with this when the process exits via exit(). The
problem is that the task group is released prior to this "up_release_stack()"
call along with the address environment, and trying to free the memory
either causes an immediate crash (no valid addrenv), or frees memory into
another process' heap (addrenv from a different process).
Signed-off-by: Ville Juven <ville.juven@unikie.com>
According to posix spec, this function should never return `EINTR`.
This fixes the call to `pthread_mutex_take` so it keeps retrying the
lock and doesn't return `EINTR`
It takes about 10 cycles to obtain the task list according to the task
status. In most cases, we know the task status, so we can directly
add the task from the specified task list to reduce time consuming.
It takes about 10 cycles to obtain the task list according to the task
status. In most cases, we know the task status, so we can directly
delete the task from the specified task list to reduce time consuming.
In the past, header file paths were generated by the incdir command
Now they are generated by concatenating environment variables
In this way, when executing makefile, no shell command will be executed,
it will improve the speed of executing makfile
Signed-off-by: yinshengkai <yinshengkai@xiaomi.com>
TIMER_SETTIME(2)
NAME
timer_settime, timer_gettime - arm/disarm and fetch state of POSIX per-process timer
SYNOPSIS
#include <time.h>
int timer_settime(timer_t timerid, int flags,
const struct itimerspec *new_value,
struct itimerspec *old_value);
int timer_gettime(timer_t timerid, struct itimerspec *curr_value);
...
ERRORS
...
EINVAL timerid is invalid.
Signed-off-by: chao an <anchao@xiaomi.com>
because if we pass predefined environment variables table explicitly,
the environment variables created by the board code will be lost.
Signed-off-by: Xiang Xiao <xiaoxiang@xiaomi.com>
This commit is intended to solve the bug caused by #7159.
It will fixed data abort issue when task restart in wait sem status.
If delete waitobj in the sem recover function, then we will get the wrong
task list when remove the task from task list.
1.Don't include unwind.h when arch specific backtrace is enable
2.Built arch specific backtrace wrapper only when enable
Signed-off-by: Xiang Xiao <xiaoxiang@xiaomi.com>
Optimize sched_note_begin/end, replace note_printf with note_string
sched_note_begin/end optimized from 50us to 1us per consumption
Signed-off-by: yinshengkai <yinshengkai@xiaomi.com>
Situation:
Assume we have 2 cpus, and busy run task0.
CPU0 CPU1
task0 -> task1 task2 -> task0
1. remove task0 form runninglist
2. take task1 as new tcb
3. add task0 to blocklist
4. clear spinlock
4.1 remove task2 form runninglist
4.2 take task0 as new tcb
4.3 add task2 to blocklist
4.4 use svc ISR swith to task0
4.5 crash
5. use svc ISR swith to task1
Fix:
Move clear spinlock to the end of svc ISR
Signed-off-by: ligd <liguiding1@xiaomi.com>
reason:
1. g_running_tasks = thread A
2. thread A exit (free thread A's tcb) -> thread B
3. thread B interrupt by irq
4. check g_running_tasks->flags -> kasan report used after free
rootcause:
g_running_tasks has't set completely when syscall hanppened
Resolve:
Use rtcb (get at ISR begining) instead
Signed-off-by: ligd <liguiding1@xiaomi.com>
There one ways can caused this:
mq_timedreceive
TIMER IRQ do wd_timer -> wd_func1 mq_send
-> wd_func2 nxmq_rcvtimeout -> crash
Resolve:
Stop the watchdog when mq_send
Signed-off-by: ligd <liguiding1@xiaomi.com>
it inappropriate to apply volatile to the task list:
1.The code access task list is already protected by critical section
2.The queue is complex struct, it isn't enough to protect by volatile
Signed-off-by: Xiang Xiao <xiaoxiang@xiaomi.com>