From 01ade4803a5722c5e8104cf7c959ee55d3cc2d27 Mon Sep 17 00:00:00 2001 From: Gregory Nutt Date: Sun, 20 Nov 2016 09:51:54 -0600 Subject: [PATCH] Update README and TODO list --- TODO | 41 ++++++++++++++++++++++++++++-- configs/sabre-6quad/README.txt | 46 +++++++++++++++++++++++++--------- 2 files changed, 73 insertions(+), 14 deletions(-) diff --git a/TODO b/TODO index c0e268e3b0..252e195614 100644 --- a/TODO +++ b/TODO @@ -10,7 +10,7 @@ issues related to each board port. nuttx/: (13) Task/Scheduler (sched/) - (1) SMP + (2) SMP (1) Memory Management (mm/) (1) Power Management (drivers/pm) (3) Signals (sched/signal, arch/) @@ -324,13 +324,50 @@ o SMP The better option is to add compiler independent "ornamentation" to the spinlock so that the spinlocks are all linked together into a separate, non-cacheable memory regions. Because of - region aligment and minimum region mapping sizes this could + region alignment and minimum region mapping sizes this could still be wasteful of memory. This would work in systems that have both data cache and either an MPU or an MMU. Status: Open Priority: High. spinlocks, and hence SMP, will not work on such systems without this change. + Title: DEADLOCK SCENARIO WITH up_cpu_pause(). + Description: I think there is a possibilty for a hang in up_cpu_pause(). + Suppose this situation: + + - CPU1 is in a critical section and has the g_cpu_irqlock + spinlock. + - CPU0 takes an interrupt and attempts to enter the critical + section. It spins waiting on g_cpu_irqlock with interrupt + disabled. + - CPU1 calls up_cpu_pause() to pause operation on CPU1. This + will issue an inter-CPU interrupt to CPU0 + - But interrupts are disabled. What will happen? I think + that this is a deadlock: Interrupts will stay disabled on + CPU0 because it is spinning in the interrupt handler; + up_cpu_pause() will hang becuase the inter-CPU interrupt + is pending. + + Are inter-CPU interrupts maskable in the same way as other + interrupts? If the are not-maskable, then we must also handle + them as nested interrupts in some fashion. + + A work-around might be to check the state of other-CPU + interrupt handler inside the spin loop of up_cpu_pause(). + Having the other CPU spinning and waiting for up_cpu_pause() + provided that (1) the pending interrupt can be cleared, and + (2) leave_critical_section() is not called prior to the point + where up_cpu_resume() is called, and (3) up_cpu_resume() is + smart enough to know that it should not attempt to resume a + non-paused CPU. + + This would require some kind of information about each + interrupt handler: In an interrupt, waiting for spinlock, + have spinlock, etc. + + Status: Open + Priority: Medium-High. I don't know for certain that this is a problem but it seems like it could + o Memory Management (mm/) ^^^^^^^^^^^^^^^^^^^^^^^ diff --git a/configs/sabre-6quad/README.txt b/configs/sabre-6quad/README.txt index 4284c6de2a..72bb78f952 100644 --- a/configs/sabre-6quad/README.txt +++ b/configs/sabre-6quad/README.txt @@ -495,14 +495,12 @@ Open Issues: 1. Currently all device interrupts are handled on CPU0 only. Critical sections will attempt to disable interrupts but will now disable interrupts only on the current - CPU (which may not be CPU0). Perhaps that should be a spinlock to prohibit - execution of interrupts on CPU0 when other CPUs are in a critical section? + CPU (which may not be CPU0). There is a spinlock to prohibit entrance into these critical sections in interrupt handlers of other CPUs. - 2016-11-17: A fix was added to sched/irq/irq_csection that should correct this - problem. When the critical section is used to lock a resource that is also used by - interupt handling, the interrupt handling logic must also take the spinlock. This - will cause the interrupt handlers on other CPUs to spin until leave_critical_section() - is called. More verification is needed, however. + When the critical section is used to lock a resource that is also used by + interupt handling, the interrupt handling logic must also take the spinlock. + This will cause the interrupt handlers on other CPUs to spin until + leave_critical_section() is called. More verification is needed, however. 2. Cache Concurency. This is a complex problem. There is logic in place now to clean CPU0 D-cache before starting a new CPU and for invalidating the D-Cache @@ -551,16 +549,40 @@ Open Issues: spinlock "g_cpu_paused[cpu]". CPU1 correctly sets g_cpu_paused[cpu] to zero but CPU0 never sees the change. -3. Assertions. On a fatal assertions, other CPUs need to be stopped. The SCR, +3. Caching probabaly interferes with spinlocks as they are currently implemented. + Waiting on a cached copy of the spinlock may result in a hang or a failure to + wait. + + Should all spinlocks go into a special "strongly ordered" memory region? + +5. Assertions. On a fatal assertions, other CPUs need to be stopped. The SCR, however, only supports disabling CPUs 1 through 3. Perhaps if the assertion occurs on CPUn, n > 0, then it should use and SGI to perform the assertion on CPU0 always. From CPU0, CPU1-3 can be disabled. -4. Caching probabaly interferes with spinlocks as they are currently implemented. - Waiting on a cached copy of the spinlock may result in a hang or a failure to - wait. +6. I think there is a possibilty for a hang in up_cpu_pause(). Suppose this + situation: -5. Do spinlocks need to go into a special "strongly ordered" memory region? + - CPU1 is in a critical section and has the g_cpu_irqlock spinlock. + - CPU0 takes an interrupt and attempts to enter the critical section. It + spins waiting on g_cpu_irqlock with interrupt disabled. + - CPU1 calls up_cpu_pause() to pause operation on CPU1. This will issue + an inter-CPU interrupt to CPU0 + - But interrupts are disabled. What will happen? I think that this is + a deadlock: Interrupts will stay disabled on CPU0 because it is spinning + in the interrupt handler; up_cpu_pause() will hang becuase the inter- + CPU interrupt is pending. + + Are inter-CPU interrupts maskable in the same way as other interrupts? If + the are not-maskable, then we must also handle them as nested interrupts + in some fashion. + + A work-around might be to check the state of other-CPU interrupt handler + inside the spin loop of up_cpu_pause(). Having the other CPU spinning and + waiting for up_cpu_pause() provided that (1) the pending interrupt can be + cleared, and (2) leave_critical_section() is not called prior to the point + where up_cpu_resume() is called, and (3) up_cpu_resume() is smart enough + to know that it should not attempt to resume a non-paused CPU. Configurations ==============