diff --git a/Documentation/components/net/index.rst b/Documentation/components/net/index.rst index c9bf369964..7573e02b09 100644 --- a/Documentation/components/net/index.rst +++ b/Documentation/components/net/index.rst @@ -9,6 +9,7 @@ Network Support socketcan.rst netguardsize.rst slip.rst + wqueuedeadlocks.rst ``net`` Directory Structure :: diff --git a/Documentation/components/net/wqueuedeadlocks.rst b/Documentation/components/net/wqueuedeadlocks.rst new file mode 100644 index 0000000000..4f274d3dea --- /dev/null +++ b/Documentation/components/net/wqueuedeadlocks.rst @@ -0,0 +1,174 @@ +==================== +Work Queue Deadlocks +==================== + +Use of Work Queues +================== + +Most network drivers use a work queue to handle network events. This is done for +two reason: (1) Most of the example code to leverage from does it that way, and (2) +it is easier and is a more efficient use memory resources to use the work queue +rather than creating a dedicated task/thread to service the network. + +High and Low Priority Work Queues +================================= + +There are two work queues: A single, high priority work queue that is intended +only to service the back end interrupt processing in a semi-normal, tasking +context. And low priority work queue(s) that are similar but as then name implies +are lower in priority and not dedicated for time-critical back end interrupt +processing. + +Downsides of Work Queues +======================== + +There are two important downsides to the use of work queues. First, the work queues +are inherently non-deterministic. The time delay from the point at which you +schedule work and the time at which the work is performed in highly random and +that delay is due not only to the strict priority scheduling but also to what +work as been queued ahead of you. + +Why do you bother to use an RTOS if you rely on non-deterministic work queues to do +most of the work? + +A second problem is related: Only one work queue job can be performed at a time. +That job should be brief so that it can make the work queue available again for +the next work queue job as soon as possible. And that job should never block +waiting for resources! If the job blocks, then it blocks the entire work queue +and makes the whole work queue unavailable for the duration of the wait. + +Networking on Work Queues +========================= + +As mentioned, most network drivers use a work queue to handle network events. +(some are even configurable to use high priority work queue... YIKES!). Most +network operations are not really suited for execution on a work queue: The +networking operations can be quite extended and also can block waiting for for +the availability of resources. So, at a minimum, networking should never use +the high priority work queue. + +Deadlocks +========= + +If there is only a single instance of a work queue, then it is easy to create a +deadlock on the work queue if a work job blocks on the work queue. Here is the +generic work queue deadlock scenario: + +* A job runs on a work queue and waits for the availability of a resource. +* The operation that provides that resource also runs on the same work queue. +* But since the work queue is blocked waiting for the resource, the job that + provides the resource cannot run and a deadlock results. + +IOBs +==== + +IOBs (I/O Blocks) are small I/O buffers that can be linked together in chains to +efficiently buffer variable sized network packet data. This is a much more +efficient use of buffering space than full packet buffers since the packets +content is often much smaller than the full packet size (the MSS). + +The network allocates IOBs to support TCP and UDP read-ahead buffering and write +buffering. Read-head buffering is used when TCP/UDP data is received and there is +no receiver in place waiting to accept the data. In this case, the received +payload is buffered in the IOB-based, read-ahead buffers. When the application +next calls ``revc()`` or ``recvfrom()``, the date will be removed from the read-ahead +buffer and returned to the caller immediately. + +Write-buffering refers to the similar feature on the outgoing side. When application +calls ``send()`` or ``sendto()`` and the driver is not available to accept the new packet +data, then data is buffered in IOBs in the write buffer chain. When the network +driver is finally available to take more data, then packet data is removed from +the write-buffer and provided to the driver. + +The IOBs are allocated with a fixed size. A fixed number of IOBs are pre-allocated +when the system starts. If the network runs out of IOBs, additional IOBs will not +be allocated dynamically, rather, the IOB allocator, ``iob_alloc()`` will block waiting +until an IOB is finally returned to pool of free IOBs. There is also a non-blocking +IOB allocator, ``iob_tryalloc()``. + +Under conditions of high utilization, such as sending large amount of data at high +rates or receiving large amounts of data at high rates, it is inevitable that the +system will run out of pre-allocated IOBs. For read-ahead buffering, the packets +are simply dropped in this case. For TCP this means that there will be a subsequent +timeout on the remote peer because no ACK will be received and the remote peer will +eventually re-transmit the packet. UDP is a lossy transfer and handling of lost or +dropped datagrams must be included in any UDP design. + +For write-buffering, there are three possible behaviors that can occur when the +IOB pool has been exhausted: First, if there are no available IOBs at the beginning +of a ``send()`` or ``sendto()`` transfer, then the operation will block until IOBs are again +available if ``O_NONBLOCK`` is not selected. This delay can can be a substantial amount +of time. + +Second, if ``O_NONBLOCK`` is selected, the send will, of course, return immediatly, +failing with errno set ``EAGAIN`` if we cannot allocate the first IOB for the transfer. + +The third behavior occurs if the we run out of IOBs in the middle of the transfer. +Then the send operation will not wait but will instead send then number of bytes that +it has successfully buffered. Applications should always check the return value from +``send()`` or ``sendto()``. If it a is a byte count less then the requested transfer +size, then the send function should be called again. + +The blocking iob_alloc() call is also the a common cause of work queue deadlocks. +The scenario again is: + +* Some logic in the OS runs on a work queue and blocks waiting for an IOB to + become available, +* The logic that releases the IOB also runs on the same work queue, but +* That logic that provides the IOB cannot execute, however, because the other job + is blocked waiting for the IOB on the same work queue. + +Alternatives to Work Queues +=========================== + +To avoid network deadlocks here is the rule: Never run the network on a singleton +work queue! + +Most network implementation do just that! Here are a couple of alternatives: + +#. Use Multiple Low Priority Work Queues + Unlike the high priority work queues, the low priority work queues utilize a + thread pool. The number of threads in the pool is controlled by the + ``CONFIG_SCHED_LPNTHREADS``. If ``CONFIG_SCHED_LPNTHREADS`` is greater than one, + then such deadlocks should not be possible: In that case, if a thread is busy with + some other job (even if it is only waiting for a resource), then the job will be + assigned to a different thread and the deadlock will be broken. The cost of the + additional low priority work queue thread is primarily the memory set aside for + the thread's stack. + +#. Use a Dedicated Network Thread + The best solution would be to write a custom kernel thread to handle driver + network operations. This would be the highest performing and the most manageable. + It would also, however, but substantially more work. + +#. Interactions with Network Locks + The network lock is a re-entrant mutex that enforces mutually exclusive access to + the network. The network lock can also cause deadlocks and can also interact with + the work queues to degrade performance. Consider this scenario: + + * Some network logic, perhaps running on on the application thread, takes the network + lock then waits for an IOB to become available (on the application thread, not a + work queue). + * Some network related event runs on the work queue but is blocked waiting for + the network lock. + * Another job is queued behind that network job. This is the one that provides the + IOB, but it cannot run because the other thread is blocked waiting for the network + lock on the work queue. + + But the network will not be unlocked because the application logic holds the network + lock and is waiting for the IOB which can never be released. + + Within the network, this deadlock condition is avoided using a special function + ``net_ioballoc()``. ``net_ioballoc()`` is a wrapper around the blocking ``iob_alloc()`` + that momentarily releases the network lock while waiting for the IOB to become available. + + Similarly, the network functions ``net_lockedait()`` and ``net_timedait()`` are wrappers + around ``nxsem_wait()`` ``nxsem_timedwait()``, respectively, and also release the network + lock for the duration of the wait. + + Caution should be used with any of these wrapper functions. Because the network lock is + relinquished during the wait, there could changes in the network state that occur before + the lock is recovered. Your design should account for this possibility. + + +