Summary:
- I found a deadlock during Wi-Fi audio streaming test plus stress test
- The testing environment was spresense:wifi_smp (NCPUS=4)
- The deadlock happened because two CPUs called up_cpu_pause() almost simultaneously
- This situation should not happen, because up_cpu_pause() is called in a critical section
- Actually, the latter call was from nxsem_post() in an IRQ handler
- And when enter_critical_section() was called, irq_waitlock() detected a deadlock
- Then it called up_cpu_paused() to break the deadlock
- However, this resulted in setting g_cpu_irqset on the CPU
- Even though another CPU had held a g_cpu_irqlock
- This situation violates the critical section and should be avoided
- To avoid the situation, if a CPU sets g_cpu_irqset after calling up_cpu_paused()
- The CPU must release g_cpu_irqlock first
- Then retry irq_waitlock() to acquire g_cpu_irqlock
Impact:
- Affect SMP
Testing:
- Tested with spresense:wifi_smp (NCPUS=2 and 4)
- Tested with spresense:smp
- Tested with sim:smp
- Tested with sabre-6quad:smp (QEMU)
- Tested with maix-bit:smp (QEMU)
- Tested with esp32-core:smp (QEMU)
- Tested with lc823450-xgevk:rndis
Signed-off-by: Masayuki Ishikawa <Masayuki.Ishikawa@jp.sony.com>
Summary:
- ARCH_GLOBAL_IRQDISABLE was initially introduced for LC823450 SMP
- At that time, i.MX6 (quad Cortex-A9) did not use this config
- However, this option is now used for all CPUs which support SMP
- So it's good timing for refactoring the code
Impact:
- Should have no impact because the logic is the same for SMP
Testing:
- Tested with board: spresense:smp, spresense:wifi_smp
- Tested with qemu: esp32-core:smp, maix-bit:smp, sabre-6quad:smp
- Build only: lc823450-xgevk:rndis, sam4cmp-db:nsh
Signed-off-by: Masayuki Ishikawa <Masayuki.Ishikawa@jp.sony.com>
sched/init/nx_bringup.c: Fix a naming collision.
sched/init: Rename os_start() to nx_start()
sched/init: Rename os_smp* to nx_smp*
sched/init: Rename os_bringup to nx_bringup
sched/init: rename all internal static functions to begin with nx_ vs os_
fs/procfs/fs_procfsproc: Extended the process ID ProcFS output to show per-thread maximum time for pre-emption disabled and maximum time within a critical section.
sched/sched/sched_critmonitor.c: Adds data collection logic in support of monitoring critical sections and pre-emption state.
But certain logic interacts with tasks in different ways. The only one that comes to mind are wdogs. There is a tasking interface that to manipulate wdogs, and a different interface in the timer interrupt handling logic to manage wdog expirations.
In the normal case, this is fine. Since the tasking level code calls enter_critical_section, interrupts are disabled an no conflicts can occur. But that may not be the case in the SMP case. Most architectures do not permit disabling interrupts on other CPUs so enter_critical_section must work differently: Locks are required to protect code.
So this change adds locking (via enter_critical section) to wdog expiration logic for the the case if the SMP configuration.