There is currently a big problem in the address environment handling which
is that the address environment is released too soon when the process is
exiting. The current MMU mappings will always be the exiting process's, which means
the system needs them AT LEAST until the next context switch happens. If
the next thread is a kernel thread, the address environment is needed for
longer.
Kernel threads "lend" the address environment of the previous user process.
This is beneficial in two ways:
- The kernel processes do not need an allocated address environment
- When a context switch happens from user -> kernel or kernel -> kernel,
the TLB does not need to be flushed. This must be done only when
changing to a different user address environment.
Another issue is when a new process is created; the address environment
of the new process must be temporarily instantiated by up_addrenv_select().
However, the system scheduler does not know that the process has a different
address environment to its own and when / if a context restore happens, the
wrong MMU page directory is restored and the process will either crash or
do something horribly wrong.
The following changes are needed to fix the issues:
- Add mm_curr which is the current address environment of the process
- Add a reference counter to safeguard the address environment
- Whenever an address environment is mapped to MMU, its reference counter
is incremented
- Whenever and address environment is unmapped from MMU, its reference
counter is decremented, and tested. If no more references -> drop the
address environment and release the memory as well
- To limit the context switch delay, the address environment is freed in
a separate low priority clean-up thread (LPWORK)
- When a process temporarily instantiates another process's address
environment, the scheduler will now know of this and will restore the
correct mappings to MMU
Why is this not causing more noticeable issues ? The problem only happens
under the aforementioned special conditions, and if a context switch or
IRQ occurs during this time.
Detach the address environment handling from the group structure to the
tcb. This is preparation to fix rare cases where the system (MMU) is left
without a valid page directory, e.g. when a process exits.
NuttX kernel should not use the syscall functions, especially after
enabling CONFIG_SCHED_INSTRUMENTATION_SYSCALL, all system functions
will be traced to backend, which will impact system performance.
Signed-off-by: chao an <anchao@xiaomi.com>
According to posix spec, this function should never return `EINTR`.
This fixes the call to `pthread_mutex_take` so it keeps retrying the
lock and doesn't return `EINTR`
D:\code\incubator-nuttx\sched\pthread\pthread_create.c(154,22):
warning C4189: “pjoin”: local variable is initialized but not referenced
[D:\code\incubator-nuttx\vs20221\sched\sched.vcxproj]
D:\code\incubator-nuttx\sched\group\group_setupidlefiles.c(61,28):
warning C4189: “group”: local variable is initialized but not referenced
[D:\code\incubator-nuttx\vs20221\sched\sched.vcxproj]
Reference:
https://docs.microsoft.com/en-us/cpp/error-messages/compiler-warnings/compiler-warning-level-4-c4189?view=msvc-170
Signed-off-by: chao.an <anchao@xiaomi.com>
The "p" format specifier already prepends the pointer address with
"0x" when printing.
Signed-off-by: Gustavo Henrique Nihei <gustavo.nihei@espressif.com>
since _exit may kill all sibling thread when
HAVE_GROUP_MEMBERS equal true. Regressed by:
commit 622677d4a1
Author: Ville Juven <ville.juven@unikie.com>
Date: Mon May 2 15:15:06 2022 +0300
libc: Implement exit, atexit, on_exit and cxa_exit on the user side
For CONFIG_BUILD_KERNEL using the sched/task/task_exithook implementation
will just not work. It calls user code with kernel privileges which is
a bit of a security issue.
Signed-off-by: Xiang Xiao <xiaoxiang@xiaomi.com>
pthread_condclockwait() can not distinguish between interrupt and timeout,
which cause these API not follow POSIX:
pthread_rwlock_timedrdlock()
pthread_rwlock_timedwrlock()
pthread_condtimedwait()
POSIX:
Upon return from the signal handler the thread resumes waiting for
the condition variable as if it wasnot interrupted
These functions shall not return an error code of [EINTR].
Replacing nxsem_wait() with nxsem_clockwait_uninterruptible() can solve it.
Signed-off-by: jihandong <jihandong@xiaomi.com>
if a pthread set attr is detach,and when call pthread_create,
new thread exit quikly,new thread's tcb be free,then pthread_create
use new thread's tcb will crash.
Signed-off-by: anjiahao <anjiahao@xiaomi.com>
pthread_join need check thread is DETACHED,
Whether to wait according to the result.And,
if a thread is DETACHED,it will not set a new
attr.
Signed-off-by: anjiahao <anjiahao@xiaomi.com>
There is a potential problem that can lead to deadlock at
condition wait, if the timeout of watchdog is a very small value
(1 tick ?), the timer interrupt will come before the nxsem_wait()
Revert "sched: pthread: Remove a redundant critical section in pthread_condclockwsait.c"
Revert "sched: semaphore: Remove a redundant critical section in nxsem_clockwait()"
Revert "sched: semaphore: Remove a redundant critical section in nxsem_tickwait()"
This reverts commit 7758f3dcb1.
This reverts commit 2976bb212e.
This reverts commit 65dec5d10a.
Signed-off-by: chao.an <anchao@xiaomi.com>
pthread_exit will be called recursive when pthread_cancel
or other cleanup operation with syscalls that support
cancellation, to avoid this by mark current tcb flag as
TCB_FLAG_CANCEL_DOING instead of TCB_FLAG_CANCEL_PENDING.
Signed-off-by: Huang Qi <huangqi3@xiaomi.com>
Drop to user-space in kernel/protected build with up_pthread_exit,
now all pthread_cleanup functions executed in user mode.
* A new syscall SYS_pthread_exit added
* A new tcb flag TCB_FLAG_CANCEL_DOING added
* up_pthread_exit implemented for riscv/arm arch
Signed-off-by: Huang Qi <huangqi3@xiaomi.com>
arch: Allocate the space from the beginning in up_stack_frame
and modify the affected portion:
1.Correct the stack dump and check
2.Allocate tls_info_s by up_stack_frame too
3.Move the stack fork allocation from arch to sched
Signed-off-by: Xiang Xiao <xiaoxiang@xiaomi.com>
getopt() in the FLAT build environment is not thread safe. This is because global variables that are process-specific in Unix are truly global in the FLAT build. Moving the getopt() variables into TLS resolves this issue.
No side-effects are expected other than to getopt()
Tested with sim:nsh
Summary:
- I found potential bugs in pthread_condclockwait.c
- Actually, sched_unlock() and wd_cancel() were misplaced.
- This commit fixes this issue
Impact:
- pthread_cond_clockwait() only
Testing:
- Tested with ostest with the following configs
- sabre-6quad:nsh, sabre-6quad:smp, sim:ostest
Signed-off-by: Masayuki Ishikawa <Masayuki.Ishikawa@jp.sony.com>
Summary:
- I noticed 'pthread_rwlock test' in ostest sometimes stops
- This issue happened with spresense:wifi_smp (NCPUS=4) and sim:smp
- Finally, I found an issue in pthread_join()
- In pthread_join(), sched_lock() is used to avoid pre-emption
- However, this is not enough for SMP
- Because another CPU would continue the pthread and exit sequences
- So we need to protect it with a critical section
Impact:
- Affect SMP only
Testing:
- Tested with ostest with the following configurations
- spresnese:smp
- spresense:wifi_smp (NCPUS=2, NCPUS=4)
- sabre-6quad:smp (QEMU)
- esp32-core:smp (QEMU)
- maix-bit:smp (QEMU)
- sim:smp
- lc823450-xgevk:rndis
Signed-off-by: Masayuki Ishikawa <Masayuki.Ishikawa@jp.sony.com>
to ensure the basic info(e.g. pid) setup correctly before call arch API
Signed-off-by: Xiang Xiao <xiaoxiang@xiaomi.com>
Change-Id: I851cb0fdf22f45844938dafc5981b3f576100dba
to save the preserved space(1KB) and also avoid the heap overhead
Signed-off-by: Xiang Xiao <xiaoxiang@xiaomi.com>
Change-Id: I694073f68e1bd63960cedeea1ddec441437be025
Prohibit use of pthread_cleanup API's by kernel threads. The pthread pthread_cleanup functions MUST run in user mode, making them unusable for kernel threads.
See Issue #1263
-Move task_init() and task_activate() prototypes from include/sched.h to include/nuttx/sched.h. These are internal OS functions and should not be exposed to the user.
-Remove references to task_init() and task_activate() from the User Manual.
-Rename task_init() to nxtask_init() since since it is an OS internal function
-Rename task_activate() to nxtask_activate since it is an OS internal function
All complaints fixed except for those that were not possible to fix:
- Used of Mixed case identifier in ESP32 files. These are references to Expressif ROM functions which are outside of the scope of NuttX.
1. Move pthread-specific data files from sched/pthread/ to libs/libc/pthread.
2. Remove pthread-specific data functions from syscalls.
3. Implement tls_alloc() and tls_free() with system calls.
4. Reimplement pthread_key_create() and pthread_key_free() using tls_alloc() and tls_free().
5. Reimplement pthread_set_specific() and pthread_get_specicif() using tls_set_value() and tls_get_value()
The new OS interface, sched_get_stackinfo() combines two pthread-specific interfaces into a single generic interface. The existing pthread_get_stackaddr_np() and pthread_get_stacksize_np() are moved from sched/pthread to libs/libc/pthread.
There are two motivations for this change: First, it reduces the number of system calls. Secondly, it adds a common hook that is going to used for a future implementation of TLS.
These warnings fix a class of warnings that I saw during CI checks for macOS sim builds. For example:
devif/devif_callback.c:111:49: warning: for loop has empty body [-Wempty-body]
prev = curr, curr = curr->nxtdev);
^
devif/devif_callback.c:111:49: note: put the semicolon on a separate line to silence this warning
I did not put the semi-colon on a separate line, but used braces.
This commit resolves issue #620:
Remove CONFIG_CAN_PASS_STRUCTS #620
The configuration option CONFIG_CAN_PASS_STRUCTS was added many years ago to support an old version of the SDCC compiler. That compiler is currently used only with the Z80 and Z180 targets. The limitation of that old compiler was that it could not pass structures or unions as either inputs or outputs. For example:
#ifdef CONFIG_CAN_PASS_STRUCTS
struct mallinfo mallinfo(void);
#else
int mallinfo(FAR struct mallinfo *info);
#endif
And even leads to violation of a few POSIX interfaces like:
#ifdef CONFIG_CAN_PASS_STRUCTS
int sigqueue(int pid, int signo, union sigval value);
#else
int sigqueue(int pid, int signo, FAR void *sival_ptr);
#endif
This breaks the 1st INVIOLABLES rule:
Strict POSIX compliance
-----------------------
o Strict conformance to the portable standard OS interface as defined at
OpenGroup.org.
o A deeply embedded system requires some special support. Special
support must be minimized.
o The portable interface must never be compromised only for the sake of
expediency.
o Expediency or even improved performance are not justifications for
violation of the strict POSIX interface
Also, it appears that the current SDCC compilers have resolve this issue and so, perhaps, this is no longer a problem: z88dk/z88dk#1132
NOTE: This commit cannot pass the PR checks because it depends on matching changes to the apps/ directory.
A mutex may be configured with rather exotic options such as recursive, unsafe, etc. The availability of these mutex options is controlled by configuation settings. When each option is enabled, additional fields are managed inside of the mutex structure.
pthread_cond_wait() and pthread_timed_wait() do the following atomically: (1) unlock the mutex, (2) wait for the condition, and (3) restore the mutex lock. When that lock is restored, pthread_cond_[timed]wait() must also restore the exact configuration of the mutex data structure if these "exotic" features are enabled.
Now that nxsem_wait_uninterruptible() returns an error when the thread is canceled, new logic is being executed. This revealed an existing error in pthread_cond_wait(). The nature of the error is as follows:
* pthread_cond_wait() must atomically (1) release the mutex, (2) wait on the condition, and (3) re-establish the mutex. Errors can occur on any of these steps. In this case the ECANCELED error was (very correctly) reported by nxsem_wait_uninterruptible().
* However, logic in step (3) had a bug; it re-acquired the mutex, but then a bug in existing logic cause the mutex to be improperly initialized. Essentially, the wrong error status value was being testing. This resulted in a nonsequitar and errors reported by the OS test.
This problem was found by apps/testing/ostest/pthread_cleanup.c