If cancellation points are enabled, then the following logic is activated in sem_wait(). This causes ECANCELED to be returned every time that sem_wait is called.
int sem_wait(FAR sem_t *sem)
{
...
/* sem_wait() is a cancellation point */
if (enter_cancellation_point())
{
#ifdef CONFIG_CANCELLATION_POINTS
/* If there is a pending cancellation, then do not perform
* the wait. Exit now with ECANCELED.
*/
errcode = ECANCELED;
goto errout_with_cancelpt;
#endif
}
...
Normally this works fine. sem_wait() is the OS API called by the application and will cancel the thread just before it returns to the application. Since it is cancellation point, it should never be called from within the OS.
There there is is one perverse cases where sem_wait() may be nested within another cancellation point. If open() is called, it will attempt to lock a VFS data structure and will eventually call nxmutex_lock(). nxmutex_lock() waits on a semaphore:
int nxmutex_lock(FAR mutex_t *mutex)
{
...
for (; ; )
{
/* Take the semaphore (perhaps waiting) */
ret = _SEM_WAIT(&mutex->sem);
if (ret >= 0)
{
mutex->holder = _SCHED_GETTID();
break;
}
ret = _SEM_ERRVAL(ret);
if (ret != -EINTR && ret != -ECANCELED)
{
break;
}
}
...
}
In the FLAT build, _SEM_WAIT expands to sem_wait(). That causes the error in the logic: It should always expand to nxsem_wait(). That is because sem_wait() is cancellation point and should never be called from with the OS or the C library internally.
The failure occurs because the cancellation point logic in sem_wait() returns -ECANCELED (via _SEM_ERRVAL) because sem_wait() is nested; it needs to return the -ECANCELED error to the outermost cancellation point which is open() in this case. Returning -ECANCELED then causes an infinite loop to occur in nxmutex_lock().
The correct behavior in this case is to call nxsem_wait() instead of sem_wait(). nxsem_wait() is identical to sem_wait() except that it is not a cancelation point. It will return -ECANCELED if the thread is canceled, but only once. So no infinite loop results.
In addition, an nxsem_wait() system call was added to support the call from nxmutex_lock().
This resolves Issue #9695
sim/rpserver
NuttShell (NSH) NuttX-12.0.0
server> cu
_assert: Current Version: NuttX server 12.0.0 3ead669e7a-dirty Feb 2 2023 23:53:48 sim
_assert: Assertion failed : at file: libs/libc/misc/lib_mutex.c:303 task: cu 0x5662fff4
Signed-off-by: chao an <anchao@xiaomi.com>