Fail case:
exit -> nxtask_terminate -> nxtask_exithook -> nxsched_release_tcb
group_leave || nxsched_releasepid & group_leave
/\
/ \
switch out & waitpid()
Thread A group_leave in nxtask_exithook, switch out,
Thread B do waitpid(thread A) then meet traget thread A group is NULL, error.
Change-Id: Ia181d7a13aa645ec1c3141a45839fbf79db35b17
Signed-off-by: ligd <liguiding1@xiaomi.com>
Summary:
- SP_SECTION was introduced to allocate spinlock in non-cachable
region mainly for Cortex-A to stabilize the NuttX SMP kernel
- However, all spinlocks are now allocated in cachable area and
works without any problems
- So SP_SECTION should be removed to simplify the kernel code
Impact:
- None
Testing:
- Build test only
Signed-off-by: Masayuki Ishikawa <Masayuki.Ishikawa@jp.sony.com>
arch: Allocate the space from the beginning in up_stack_frame
and modify the affected portion:
1.Correct the stack dump and check
2.Allocate tls_info_s by up_stack_frame too
3.Move the stack fork allocation from arch to sched
Signed-off-by: Xiang Xiao <xiaoxiang@xiaomi.com>
Fix missing semicolon at the end of a DEBUGASSERT statement:
sched/sched_sporadic.c: In function 'sporadic_budget_expire':
sched/sched_sporadic.c:512:15: error: expected ';' before 'period'
512 | period = (sporadic->repl_period >> 1) - unrealized;
| ^~~~~~
sched/sched_sporadic.c: In function 'nxsched_resume_sporadic':
sched/sched_sporadic.c:1078:19: error: expected ';' before 'period'
1078 | period = (sporadic->repl_period >> 1) - unrealized;
| ^~~~~~
Fix use of uninitialized variable in DEBUGASSERT statement:
sched/sched_sporadic.c:466:27: warning: 'sporadic' may be used uninitialized in this function [-Wmaybe-uninitialized]
466 | sporadic->nrepls > 0);
Also fixes some typos.
There should be no unexpected side-effects of this changed.
Tested with the stm32f4discovery:sporadic configuration (see PR #3097
it is wrong to define a new grpid_t, but not reuse pid_t,
because it make getpid(parent) == getppid(child) impossible.
Signed-off-by: Xiang Xiao <xiaoxiang@xiaomi.com>
Summary:
- Because this_task() and nxsched_get_tcb() are protected inside the funtcions
- Also, enter_critical_section() is used in nxsched_set_affinity()
Impact:
- No impact
Testing:
- Tested with spresense:wifi_smp
Signed-off-by: Masayuki Ishikawa <Masayuki.Ishikawa@jp.sony.com>
Summary:
- Calling sched_lock()/unlock() is unnecessary for SMP
because we've been using the critical section API
Impact:
- SMP only
Testing:
- Tested with smp and ostest with the following configurations
- sabre-6quad:smp (QEMU, dev board)
- spresense:smp, spresense:wifi_smp
- sim:smp, sim:ostest
- maix-bit:smp (QEMU)
- esp32-devkitc:smp (QEMU)
- lc823450-xgevk:rndis
Signed-off-by: Masayuki Ishikawa <Masayuki.Ishikawa@jp.sony.com>
Summary:
- This commit fixes global IRQ control logic
- In previous implementation, g_cpu_irqset for a remote CPU was
set in sched_add_readytorun(), sched_remove_readytorun() and
up_schedule_sigaction()
- In this implementation, they are removed.
- Instead, in the pause handler, call enter_critical_setion()
which will call up_cpu_paused() then acquire g_cpu_irqlock
- So if a new task with irqcount > 1 restarts on the remote CPU,
the CPU will only hold a critical section. Thus, the issue such as
'POSSIBLE FOR TWO CPUs TO HOLD A CRITICAL SECTION' could be resolved.
- Fix nxsched_resume_scheduler() so that it does not call spin_clrbit()
if a CPU does not hold a g_cpu_irqset
- Fix nxtask_exit() so that it acquires g_cpu_irqlock
- Update TODO
Impact:
- All SMP implementations
Testing:
- Tested with smp, ostest with the following configurations
- Tested with spresense:wifi_smp (NCPUS=2,4)
- Tested with sabre-6quad:smp (QEMU, dev board)
- Tested with maix-bit:smp (QEMU)
- Tested with esp32-core:smp (QEMU)
- Tested with lc823450-xgevk:rndis
Signed-off-by: Masayuki Ishikawa <Masayuki.Ishikawa@jp.sony.com>
Summary:
- I noticed waitpid_test stops with lc823450-xgevk:rndis
- The condition was CONFIG_DEBUG_ASSERTION=y
- Actually, the child task sent SIGCHILD but the parent couldn't catch the signal
- Then, I found that nx_waitid(), nx_waitpid() use sched_lock()
- However, a parent task and a child task are running on different CPUs
- So, sched_lock() is not enough and need to use a critical section
- Also, signal handling in nxtask_exithook() must be done in a critical section
Impact:
- SMP only
Testing:
- Tested with ostest with the following configurations
- lc823450-xgevk:rndis (CONFIG_DEBUG_ASSERTION=y and n)
- spresense:smp
- spresense:wifi_smp (NCPUS=2 and 4)
- sabre-6quad:smp (QEMU)
- esp32-core:smp (QEMU)
- maix-bit:smp (QEMU)
- sim:smp
Signed-off-by: Masayuki Ishikawa <Masayuki.Ishikawa@jp.sony.com>
Summary:
- sched_tasklistlock.c was introduced to stabilize NuttX SMP
- However, the current SMP implementation is stable by recent fixes
- So I decided to remove the file finally
Impact:
- SMP only
Testing:
- Tested with ostest with the following configurations
- spresense:smp
- spresense:wifi_smp (NCPUS=2 and 4)
- sabre-6quad:smp (QEMU)
- esp32-core:smp (QEMU)
- maix-bit:smp (QEMU)
- sim:smp
- lc823450-xgevk:rndis
Signed-off-by: Masayuki Ishikawa <Masayuki.Ishikawa@jp.sony.com>
Summary:
- I noticed sched_lock() logic is different from sched_unlock()
- I think sched_lock() should use critical section
- Also, the code should be simple like sched_unlock()
- This commit fixes these issues
Impact:
- Affects SMP only
Testing:
- Tested with spresense:wifi_smp (both NCPUS=2 and 3)
- Tested with lc823450-xgevk:rndis
- Tested with maix-bit:smp (QEMU)
- Tested with esp32-core:smp (QEMU)
- Tested with sabre-6quad:smp (QEMU)
Signed-off-by: Masayuki Ishikawa <Masayuki.Ishikawa@jp.sony.com>
Summary:
- I noticed DEBUGASSERT() happens in sched_unlock()
- The test was Wi-Fi audio streaming stress test with spresense 3cores
- Actually, g_cpu_schedlock was locked but g_cpu_lockset was incorrect
- Finally, I found that cpu was obtained before enter_critical_section()
- And the task was moved from one cpu to another cpu
- However, that call should be done within the critical section
- This commit fixes this issue
Impact:
- Affects SMP only
Testing:
- Tested with spresense:wifi_smp (both NCPUS=2 and 3)
- Tested with lc823450-xgevk:rndis
- Tested with maix-bit:smp (QEMU)
- Tested with esp32-core:smp (QEMU)
- Tested with sabre-6quad:smp (QEMU)
Signed-off-by: Masayuki Ishikawa <Masayuki.Ishikawa@jp.sony.com>
1.Reduce the default size of task_group_s(~512B each task)
2.Scale better between simple and complex application
Signed-off-by: Xiang Xiao <xiaoxiang@xiaomi.com>
Change-Id: Ia872137504fddcf64d89c48d6f0593d76d582710
Summary:
- ARCH_GLOBAL_IRQDISABLE was initially introduced for LC823450 SMP
- At that time, i.MX6 (quad Cortex-A9) did not use this config
- However, this option is now used for all CPUs which support SMP
- So it's good timing for refactoring the code
Impact:
- Should have no impact because the logic is the same for SMP
Testing:
- Tested with board: spresense:smp, spresense:wifi_smp
- Tested with qemu: esp32-core:smp, maix-bit:smp, sabre-6quad:smp
- Build only: lc823450-xgevk:rndis, sam4cmp-db:nsh
Signed-off-by: Masayuki Ishikawa <Masayuki.Ishikawa@jp.sony.com>
Found by clang-check:
sched/sched_waitpid.c:380:33: warning: Although the value stored to 'ret' is used in the enclosing expression, the value is never actually read from 'ret'
(pid != (pid_t)-1 && (ret = nxsig_kill(pid, 0)) < 0))
^ ~~~~~~~~~~~~~~~~~~
1 warning generated.
Since up_release_stack auto detect whether the memory come from builtin heap
if (ttype == TCB_FLAG_TTYPE_KERNEL)
{
if (kmm_heapmember(dtcb->stack_alloc_ptr))
{
kmm_free(dtcb->stack_alloc_ptr);
}
}
else
{
/* Use the user-space allocator if this is a task or pthread */
if (umm_heapmember(dtcb->stack_alloc_ptr))
{
kumm_free(dtcb->stack_alloc_ptr);
}
}
This reverts commit 124e6ee53d.
If there's no child status available immediately,
return 0 without blocking as specified by the standards.
I checked the following version of the standard.
I believe it has always been this way though.
The Open Group Base Specifications Issue 7, 2018 edition
IEEE Std 1003.1-2017 (Revision of IEEE Std 1003.1-2008)
If there's no child status available immediately,
return 0 without blocking as specified by the standards.
The implementation for non CONFIG_SCHED_HAVE_PARENT case
seems ok in this regard.
I checked the following version of the standard.
I believe it has always been this way though.
The Open Group Base Specifications Issue 7, 2018 edition
IEEE Std 1003.1-2017 (Revision of IEEE Std 1003.1-2008)
sched_releasetcb() will normally free the stack allocated for a task. However, a task with a custom, user-managed stack may be created using nxtask_init() followed by nxtask_activer. If such a custom stack is used then it must not be free in this many or a crash will most likely result.
This chagne addes a flag call TCB_FLAG_CUSTOM_STACK that may be passed in the the pre-allocted TCB to nxtask_init(). This flag is not used internally anywhere in the OS except that if set, it will prevent sched_releasetcb() from freeing that custom stack.
1. Internal scheduler functions should begin with nxsched_, not sched_
2. Follow the consistent naming patter of https://cwiki.apache.org/confluence/display/NUTTX/Naming+of+OS+Internal+Functions
# clock_systimer -> clock_systime_tick
# clock_systimespec -> clock_systime_timespec
sched_oneshot_extclk -> nxsched_oneshot_extclk
sched_period_extclk -> nxsched_period_extclk
# nxsem_setprotocol -> nxsem_set_protocol
# nxsem_getprotocol -> nxsem_get_protocol
# nxsem_getvalue -> nxsem_get_value
nxsem_initholders -> nxsem_initialize_holders
nxsem_addholder -> nxsem_add_holder
nxsem_addholder_tcb -> nxsem_add_holder_tcb
nxsem_boostpriority -> nxsem_boost_priority
nxsem_releaseholder -> nxsem_release_holder
nxsem_restorebaseprio -> nxsem_restore_baseprio
Some planned name changed were skipped for now because they effect too many files (and would require many hours of coding style fixups).
All complaints fixed except for those that were not possible to fix:
- Used of Mixed case identifier in ESP32 files. These are references to Expressif ROM functions which are outside of the scope of NuttX.
- Remove per-thread errno from the TCB structure (pterrno)
- Remove get_errno() and set_errno() as functions. The macros are still available as stubs and will be needed in the future if we need to access the errno from a different address environment (KERNEL mode).
- Add errno value to the tls_info_s structure definitions
- Move sched/errno to libs/libc/errno. Replace old TCB access to the errno with TLS access to the errno.
The sched_get_stackinfo() interface was just added. However, it occurs to me that it is a dangerous feature and could lead to security problems. In FLAT and PROTECTED modes, if you get access to any other threads stack, you could do harm.
This commit adds some level of security. Basically, it implements these rules:
1. Any thread may query its own stack,
2. A kernel thread may query the stack of any other thread
3. Application threads, however, may query only the stacks of threads within the same task group, i.e., the main thread and any of the child pthreads created with the main thread as a parent or grandparent or great-grandpart ...
The new OS interface, sched_get_stackinfo() combines two pthread-specific interfaces into a single generic interface. The existing pthread_get_stackaddr_np() and pthread_get_stacksize_np() are moved from sched/pthread to libs/libc/pthread.
There are two motivations for this change: First, it reduces the number of system calls. Secondly, it adds a common hook that is going to used for a future implementation of TLS.
These warnings fix a class of warnings that I saw during CI checks for macOS sim builds. For example:
devif/devif_callback.c:111:49: warning: for loop has empty body [-Wempty-body]
prev = curr, curr = curr->nxtdev);
^
devif/devif_callback.c:111:49: note: put the semicolon on a separate line to silence this warning
I did not put the semi-colon on a separate line, but used braces.
Hello,
I am testing priority inversion feauture of Nuttx scheduler. I encounter with problem in tests.
Here is problem:
nxsched_readytorun_setpriority() function not works as expected when DEBUG_ASSERTION not enabled. Because sched_removereadytorun() and sched_addreadytorun() is not called in this case. These functions wrapped with DEBUGASSERT macro and with the effect of DEBUGASSERT macro, sched_removereadytorun() and sched_addreadytorun() not called when DEBUG_ASSERTION not enabled. Therefore priority inversion not works as expected.
Could you check this problem ? Is this a bug, if not what is the purpose of this code ?
Thanks,
Şükrü Bahadır Arslan
Move the prototype for the internal OS from from sched/sched/sched.h to include/nuttx/sched.h. This was done because binfmt/binfmt/excecmodule.c requires the prototype for sched_releasetcb() and was illegally including sched/sched/sched.h. That is a blatant violation of the OS modular design and the person that did this should be hung up by their thumbs. Oh... I did that back in a bad moment in 2014. Now that is made right.
If SMP is enabled this function will return the number of the CPU that the thread is running on. This is non-standard but follows GLIBC if __GNU_SOURCE is enabled. The returned CPU number is, however, worthless since it returns the CPU number of the CPU that was executing the task when the function was called. The application can never know the true CPU number of the CPU tht it is running on since that value is volatile and change change at any time.
the width of all block comments. Includes a check to assure that all block
comments use the same line width.
Verified against all .c files under /sched. There were a few cosmetic changes to the coding style under /sched to account to new, correctly detected problems in the /sched files.
* Documentation/NuttXCCodingStandard.html: Remove requirement to decorate ignored returned values with (void).
* sched_mergepending.c: Correct some errors in comments.
Co-authored-by: Gregory Nutt <gnutt@nuttx.org>
* Simplify EINTR/ECANCEL error handling
1. Add semaphore uninterruptible wait function
2 .Replace semaphore wait loop with a single uninterruptible wait
3. Replace all sem_xxx to nxsem_xxx
* Unify the void cast usage
1. Remove void cast for function because many place ignore the returned value witout cast
2. Replace void cast for variable with UNUSED macro
We also make an attempt to avoid the thundering herd problem if there are multiple readers/pollers.
Patch also removes forcing CONFIG_RAMLOG_CRLF in nuttx/syslog/ramlog.h as there is no point of wasting precious RAM for useless characters.
libs/: Remove references to CONFIG_DISABLE_SIGNALS. Signals can no longer be disabled.
syscall/: Remove references to CONFIG_DISABLE_SIGNALS. Signals can no longer be disabled.
wireless/: Remove references to CONFIG_DISABLE_SIGNALS. Signals can no longer be disabled.
Documentation/: Remove references to CONFIG_DISABLE_SIGNALS. Signals can no longer be disabled.
include/: Remove references to CONFIG_DISABLE_SIGNALS. Signals can no longer be disabled.
drivers/: Remove references to CONFIG_DISABLE_SIGNALS. Signals can no longer be disabled.
sched/: Remove references to CONFIG_DISABLE_SIGNALS. Signals can no longer be disabled.
configs: Remove references to CONFIG_DISABLE_SIGNALS. Signals can no longer be disabled.
arch/xtensa: Remove references to CONFIG_DISABLE_SIGNALS. Signals can no longer be disabled.
arch/z80: Remove references to CONFIG_DISABLE_SIGNALS. Signals can no longer be disabled.
arch/x86: Remove references to CONFIG_DISABLE_SIGNALS. Signals can no longer be disabled.
arch/renesas and arch/risc-v: Remove references to CONFIG_DISABLE_SIGNALS. Signals can no longer be disabled.
arch/or1k: Remove all references to CONFIG_DISABLE_SIGNALS. Signals are always enabled.
arch/misoc: Remove all references to CONFIG_DISABLE_SIGNALS. Signals are always enabled.
arch/mips: Remove all references to CONFIG_DISABLE_SIGNALS. Signals are always enabled.
arch/avr: Remove all references to CONFIG_DISABLE_SIGNALS. Signals are always enabled.
arch/arm: Remove all references to CONFIG_DISABLE_SIGNALS. Signals are always enabled.
Squashed commit of the following:
sched/sched/sched_getsockets.c: Fix an error in conditional compilation.
fs/: Remove all conditional logic based on CONFIG_NSOCKET_DESCRIPTORS == 0
Documentation/: Remove all references to CONFIG_NSOCKET_DESCRIPTORS == 0
include/: Remove all conditional logic based on CONFIG_NSOCKET_DESCRIPTORS == 0
libs/: Remove all conditional logic based on CONFIG_NSOCKET_DESCRIPTORS == 0
net/: Remove all conditional logic based on CONFIG_NSOCKET_DESCRIPTORS == 0
sched/: Remove all conditional logic based on CONFIG_NSOCKET_DESCRIPTORS == 0
syscall/: Remove all conditional logic based on CONFIG_NSOCKET_DESCRIPTORS == 0
tools/: Fixups for CONFIG_NSOCKET_DESCRIPTORS no longer used to disable sockets.
Squashed commit of the following:
configs/: The few configurations that formerly set CONFIG_NFILE_DESCRIPTORS=0 should not default, rather they should set the number of descriptors to 3.
fs/: Remove all conditional logic based on CONFIG_NFILE_DESCRIPTORS == 0
tools/: Tools updates for changes to usage of CONFIG_NFILE_DESCRIPTORS.
syscall/: Remove all conditional logic based on CONFIG_NFILE_DESCRIPTORS == 0
libs/: Remove all conditional logic based on CONFIG_NFILE_DESCRIPTORS == 0
include/: Remove all conditional logic based on CONFIG_NFILE_DESCRIPTORS == 0
drivers/: Remove all conditional logic based on CONFIG_NFILE_DESCRIPTORS == 0
Documentation/: Remove all references to CONFIG_NFILE_DESCRIPTORS == 0
binfmt/: Remove all conditional logic based on CONFIG_NFILE_DESCRIPTORS == 0
arch/: Remove all conditional logic based on CONFIG_NFILE_DESCRIPTORS == 0
net/: Remove all conditional logic based on CONFIG_NFILE_DESCRIPTORS == 0
sched/: Remove all conditional logic based on CONFIG_NFILE_DESCRIPTORS == 0
sched/Kconfig: CONFIG_NFILE_DESCRIPTORS may no longer to set to a value less than 3
configs/: Remove all settings for CONFIG_NFILE_DESCRIPTORS < 3
sched/init/nx_bringup.c: Fix a naming collision.
sched/init: Rename os_start() to nx_start()
sched/init: Rename os_smp* to nx_smp*
sched/init: Rename os_bringup to nx_bringup
sched/init: rename all internal static functions to begin with nx_ vs os_
This solution to the problem noted by EunBong Song results in major memory fragmentation and and out-of-memory conditions on the PX4 platform. On that platform the lower priority work queue is very low priority and essentially never runs when the system is busy. As a result, the systems gets slowly starved of memory until failures and bad behaviors begin to occur.
This is an addition patch coming later to result the original problem in a different way that does not have cause memory starvation.
This reverts commit 91aa26774b.
fs/procfs/fs_procfsproc: Extended the process ID ProcFS output to show per-thread maximum time for pre-emption disabled and maximum time within a critical section.
sched/sched/sched_critmonitor.c: Adds data collection logic in support of monitoring critical sections and pre-emption state.
TASK A TASK B
malloc()
mm_takesemaphore()
heap holder is set to TASK B
<--- preempt
...
task_exit()
Set to current task to
TASK B
Try to release tcb, and
stack memory
free()
mm_takesemaphore()
- Successfully obtain
semaphore because current
task and heap holder is
same.
Free memory....
Heap corrupt.
This change forces all de-allocations via sched_kfree() and sched_ufree()
to be delayed. Eliminating the immediate de-allocation prevents the
above problem with the the re-entrant semaphore because the deallocation
always occurs on the worker thread, never on TASK B.
There could be consequences in the timing of memory availability. We
will see.
Squashed commit of the following:
Add procfs support to show stopped tasks. Add nxsig_action() to solve a chicken and egg problem: We needed to use sigaction to set default actions, but sigaction() would refuse to set actions if the default actions could not be caught or ignored.
sched/signal: Add configuration option to selectively enabled/disable default signal actions for SIGSTOP/SIGSTP/SIGCONT and SIGKILL/SIGINT. Fix some compilation issues.
sched/sched: Okay.. I figured out a way to handle state changes that may occur while they were stopped. If a task/thread was already blocked when SIGSTOP/SIGSTP was received, it will restart in the running state. I will appear that to the task/thread that the blocked condition was interrupt by a signal and returns the EINTR error.
sched/group and sched/sched: Finish framework for continue/resume logic.
sched/signal: Roughing out basic structure to support task suspend/resume