Summary:
- This commit fixes global IRQ control logic
- In previous implementation, g_cpu_irqset for a remote CPU was
set in sched_add_readytorun(), sched_remove_readytorun() and
up_schedule_sigaction()
- In this implementation, they are removed.
- Instead, in the pause handler, call enter_critical_setion()
which will call up_cpu_paused() then acquire g_cpu_irqlock
- So if a new task with irqcount > 1 restarts on the remote CPU,
the CPU will only hold a critical section. Thus, the issue such as
'POSSIBLE FOR TWO CPUs TO HOLD A CRITICAL SECTION' could be resolved.
- Fix nxsched_resume_scheduler() so that it does not call spin_clrbit()
if a CPU does not hold a g_cpu_irqset
- Fix nxtask_exit() so that it acquires g_cpu_irqlock
- Update TODO
Impact:
- All SMP implementations
Testing:
- Tested with smp, ostest with the following configurations
- Tested with spresense:wifi_smp (NCPUS=2,4)
- Tested with sabre-6quad:smp (QEMU, dev board)
- Tested with maix-bit:smp (QEMU)
- Tested with esp32-core:smp (QEMU)
- Tested with lc823450-xgevk:rndis
Signed-off-by: Masayuki Ishikawa <Masayuki.Ishikawa@jp.sony.com>
Summary:
- sched_tasklistlock.c was introduced to stabilize NuttX SMP
- However, the current SMP implementation is stable by recent fixes
- So I decided to remove the file finally
Impact:
- SMP only
Testing:
- Tested with ostest with the following configurations
- spresense:smp
- spresense:wifi_smp (NCPUS=2 and 4)
- sabre-6quad:smp (QEMU)
- esp32-core:smp (QEMU)
- maix-bit:smp (QEMU)
- sim:smp
- lc823450-xgevk:rndis
Signed-off-by: Masayuki Ishikawa <Masayuki.Ishikawa@jp.sony.com>
* Simplify EINTR/ECANCEL error handling
1. Add semaphore uninterruptible wait function
2 .Replace semaphore wait loop with a single uninterruptible wait
3. Replace all sem_xxx to nxsem_xxx
* Unify the void cast usage
1. Remove void cast for function because many place ignore the returned value witout cast
2. Replace void cast for variable with UNUSED macro
Fix SMP related bugs
* sched/sched: Fix a deadlock in SMP mode
Two months ago, I introduced sched_tasklist_lock() and
sched_tasklist_unlock() to protect tasklists in SMP mode.
Actually, this change works pretty well for HTTP audio
streaming aging test with lc823450-xgevk.
However, I found a deadlock in the scheduler when I tried
similar aging tests with DVFS autonomous mode where CPU
clock speed changed based on cpu load. In this case, call
sequences were as follows;
cpu1: sched_unlock()->sched_mergepending()->sched_addreadytorun()->up_cpu_pause()
cpu0: sched_lock()->sched_mergepending()
To avoid this deadlock, I added sched_tasklist_unlock() when calling
up_cpu_pause() and sched_addreadytorun(). Also, added
sched_tasklist_lock() after the call.
Signed-off-by: Masayuki Ishikawa <Masayuki.Ishikawa@jp.sony.com>
* libc: Add critical section in lib_filesem.c for SMP
To set my_pid into fs_folder atomically in SMP mode,
critical section API must be used.
Signed-off-by: Masayuki Ishikawa <Masayuki.Ishikawa@jp.sony.com>
* mm: Add critical section in mm_sem.c for SMP
To set my_pid into mm_folder atomically in SMP mode,
critical section API must be used.
Signed-off-by: Masayuki Ishikawa <Masayuki.Ishikawa@jp.sony.com>
* net: Add critical section in net_lock.c for SMP
To set my pid (me) into fs_folder atomically in SMP mode,
critical section API must be used.
Signed-off-by: Masayuki Ishikawa <Masayuki.Ishikawa@jp.sony.com>
Approved-by: Gregory Nutt <gnutt@nuttx.org>
The previous implementation of clearing global IRQ in sched_addreadytorun()
and sched_removereadytorun() was done too early. As a result, nxsem_post()
would have a chance to enter the critical section even nxsem_wait() is
still not in blocked state. This patch moves clearing global IRQ controls
from sched_addreadytorun() and sched_removereadytorun() to sched_resumescheduler()
to ensure that nxsem_post() can enter the critical section correctly.
For this change, sched_resumescheduler.c is always necessary for SMP configuration.
In addition, by this change, task_exit() had to be modified so that it calls
sched_resumescheduler() because it calls sched_removescheduler() inside the
function, otherwise it will cause a deadlock.
However, I encountered another DEBUGASSERT() in sched_cpu_select() during
HTTP streaming aging test on lc823450-xgevk. Actually sched_cpu_select()
accesses the g_assignedtasks which might be changed by another CPU. Similarly,
other tasklists might be modified simultaneously if both CPUs are executing
scheduling logic. To avoid this, I introduced tasklist protetion APIs.
With these changes, SMP kernel stability has been much improved.
Signed-off-by: Masayuki Ishikawa <Masayuki.Ishikawa@jp.sony.com>
This commit corrects this. This is matching logic in sched_addreadytorun to avoid starting new tasks within the critical section (unless the CPU is the holder of the lock). The holder of the IRQ lock must be permitted to do whatever it needs to do.
2. Move list of signal actions from the task TCB to the task group. Signal handlers are a property of the entire task group and not of individual threads in the group. I know, I preferred it the other way too but this is more compliant with POSIX.