tcp_close disposes the connection immediately if it's called in
TCP_LAST_ACK. If it happens, we will end up with responding the
last ACK with a RST.
This commit fixes it by making tcp_close wait for the completion
of the passive close.
This fixes connection closing issues with CONFIG_NET_TCP_WRITE_BUFFERS.
Because TCP_CLOSE is used for both of input and output for tcp_callback,
the close callback and the send callback confuses each other as
the following. As it effectively disposes the connection immediately,
we end up with responding to the consequent ACK and FIN/ACK from the peer
with RSTs.
tcp_timer
-> tcp_close_eventhandler
returns TCP_CLOSE (meaning an active close)
-> psock_send_eventhandler
called with TCP_CLOSE from tcp_close_eventhandler, misinterpet as
a passive close.
-> tcp_lost_connection
-> tcp_shutdown_monitor
-> tcp_callback
-> tcp_close_eventhandler
misinterpret TCP_CLOSE from itself as
a passive close
The current code just leave the window value from the segment
from the peer. It doesn't make sense.
Instead, always use 0.
This matches what NetBSD and Linux do.
(As far as I read their code correctly.)
* It doesn't make sense to have this conditional on our own
SO_KEEPALIVE support. (CONFIG_NET_TCP_KEEPALIVE)
Actually we don't have a control on the peer tcp stack,
who decides to send us keep-alive probes.
* We should respond them for non ESTABLISHED states. eg. FIN_WAIT_2
See also:
https://github.com/apache/incubator-nuttx/pull/3919#issuecomment-868248576
Do not bother to preserve segment boundaries in the tcp
readahead queues.
* Avoid wasting the tail IOB space for each segments.
Instead, pack the newly received data into the tail space
of the last IOB. Also, advertise the tail space as
a part of the window.
* Use IOB chain directly. Eliminate IOB queue overhead.
* Allow to accept only a part of a segment.
* This change improves the memory efficiency.
And probably more importantly, allows less-confusing
recv window advertisement behavior.
Previously, even when we advertise N bytes window,
we often couldn't actually accept N bytes. Depending on
the segment sizes and IOB configurations, it was causing
segment drops.
Also, the previous code was moving the right edge of the
window back and forth too often, even when nothing in
the system was competing on the IOBs. Shrinking the
window that way is a kinda well known recipe to confuse
the peer stack.
* Move the code to advance rcvseq for user data from tcp_input
to receive handlers.
Motivation: allow partial ack.
* If we drop a segment, ignore FIN as well. Note than tcp FIN bit is
logically after the user data in the same segment.
I assume this was just an oversight because I couldn't
find any obvious reason to special-case only the first IOB.
The commit message of the original commit is cited below.
```
commit bf21056001
Author: chao.an <anchao@xiaomi.com>
Date: Fri Nov 27 09:50:38 2020 +0800
net/tcp: fallback to unthrottle pool to avoid deadlock
Add a fallback mechanism to ensure that there are still available
iobs for an free connection, Guarantees all connections will have
a minimum threshold iob to keep the connection not be hanged.
Change-Id: I59bed98d135ccd1f16264b9ccacdd1b0d91261de
Signed-off-by: chao.an <anchao@xiaomi.com>
```
* Fixes the case where the window was small but not zero.
* tcp_recvfrom: Remove tcp_ackhandler. Instead, simply schedule TX for
a possible window update and make tcp_appsend decide.
* Replace rcv_wnd (the last advertized window size value) with
rcv_adv. (the window edge sequence number advertized to the peer)
rcv_wnd was complicated to deal with because its base (rcvseq) is
also moving.
* tcp_appsend: Send a window update even if there are no other reasons
to send an ack.
Namely, send an update if it increases the window by
* 2 * mss
* or the half of the max possible window size
Gregory Nutt is the copyright holder for those files and he has submitted the
SGA as a result we can migrate the licenses to Apache.
Signed-off-by: Alin Jerpelea <alin.jerpelea@sony.com>
My recent changes to buffered tcp send broke this. [1]
One of my local apps using non-blocking tcp is working
again with this fix.
[1]
```
commit 837e1a72a4
Author: YAMAMOTO Takashi <yamamoto@midokura.com>
Date: Mon Mar 15 16:19:42 2021 +0900
tcp_send_buffered.c: improve tcp write buffering
```
reset the connection refcount if SYN retry count has elapsed
Assertion:
up_assert: Assertion failed at file:tcp/tcp_conn.c line: 764 task: netdev_wq
N/A
Signed-off-by: chao.an <anchao@xiaomi.com>
Summary:
- Based on the discussion (PR#2772), let me revert the commit
Impact:
- None
Testing:
- N/A
This reverts commit ec8bf5c8c1.
Suggested-by: YAMAMOTO Takashi <yamamoto@midokura.com>
Signed-off-by: Masayuki Ishikawa <Masayuki.Ishikawa@jp.sony.com>
Summary:
- This commit adds DEBUGASSERT() to check the IOB size
Impact:
- None
Testing:
- Tested with sabre-6quad:netnsh with QEMU
Signed-off-by: Masayuki Ishikawa <Masayuki.Ishikawa@jp.sony.com>
Request the TCP ACK to estimate the receive window after handle
any data already buffered in a read-ahead buffer.
Change-Id: Id998a1125dd2991d73ba4bef081ddcb7adea4f0d
Signed-off-by: chao.an <anchao@xiaomi.com>
Urgent data preceded "normal" data in the TCP payload. If the urgent data is larger than the size of the TCP payload, this indicates that the entire payload is urgent data and that urgent data continues in the next packet.
This case was handled correctly for the case where urgent data was present but was not being handled correctly in the case where the urgent data was NOT present.
RFC2001: TCP Slow Start, Congestion Avoidance, Fast Retransmit,
and Fast Recovery Algorithms
...
3. Fast Retransmit
Modifications to the congestion avoidance algorithm were proposed in
1990 [3]. Before describing the change, realize that TCP may
generate an immediate acknowledgment (a duplicate ACK) when an out-
of-order segment is received (Section 4.2.2.21 of [1], with a note
that one reason for doing so was for the experimental fast-
retransmit algorithm). This duplicate ACK should not be delayed.
The purpose of this duplicate ACK is to let the other end know that a
segment was received out of order, and to tell it what sequence
number is expected.
Since TCP does not know whether a duplicate ACK is caused by a lost
segment or just a reordering of segments, it waits for a small number
of duplicate ACKs to be received. It is assumed that if there is
just a reordering of the segments, there will be only one or two
duplicate ACKs before the reordered segment is processed, which will
then generate a new ACK. If three or more duplicate ACKs are
received in a row, it is a strong indication that a segment has been
lost. TCP then performs a retransmission of what appears to be the
missing segment, without waiting for a retransmission timer to
expire.
Change-Id: Ie2cbcecab507c3d831f74390a6a85e0c5c8e0652
Signed-off-by: chao.an <anchao@xiaomi.com>
The number of available iobs is already sub in iob_navail(true) on line 114
net/tcp/tcp_recvwindow.c:
...
73 uint16_t tcp_get_recvwindow(FAR struct net_driver_s *dev)
...
114 niob_avail = iob_navail(true);
Change-Id: I230927904d8db08ed8d95d7fa18c5c5fce08aa5e
Signed-off-by: chao.an <anchao@xiaomi.com>
Add a fallback mechanism to ensure that there are still available
iobs for an free connection, Guarantees all connections will have
a minimum threshold iob to keep the connection not be hanged.
Change-Id: I59bed98d135ccd1f16264b9ccacdd1b0d91261de
Signed-off-by: chao.an <anchao@xiaomi.com>
remove the connection assertion since the instance will be invalid
if the network device has been taken down.
net/netdev/netdev_ioctl.c:
1847 void netdev_ifdown(FAR struct net_driver_s *dev)
1848 {
...
1871 /* Notify clients that the network has been taken down */
1872
1873 devif_dev_event(dev, NULL, NETDEV_DOWN);
...
1883 }
Change-Id: I492b97b5ebe035ea67bbdd7ed635cb13d085e89c
Signed-off-by: chao.an <anchao@xiaomi.com>
In the current net stack implementation, there is no mechanism
for notifying the loss of the wireless connection, if the network
is disconnected then application sends data packets through tcp,
the tcp_timer will keep retrying fetch the ack for awhile, the
connection status will not be able to be switched timely.
Change-Id: I84d1121527edafc6ee6ad56ba164838694e7e11c
Signed-off-by: chao.an <anchao@xiaomi.com>
Fixes an issue where tcp sockets with activated keepalives stalled and
were not properly closed. Poll would not indicate a POLLHUP and therefore
locks down the application.
* tcp_conn_s.tcp_conn_s & tcp_conn_s.keepintvl changed to uint32_t
According RFC1122 keepidle MUST have a default of 2 hours.
Found by clang-check:
tcp/tcp_timer.c:185:15: warning: Value stored to 'result' is never read
result = tcp_callback(dev, conn, TCP_TIMEDOUT);
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
tcp/tcp_timer.c:264:23: warning: Value stored to 'result' is never read
result = tcp_callback(dev, listener, TCP_TIMEDOUT);
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
tcp/tcp_timer.c:300:19: warning: Value stored to 'result' is never read
result = tcp_callback(dev, conn, TCP_TIMEDOUT);
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3 warnings generated.
In some extreme scenarios(eg. crash, reboot, reset, etc...),
an established connection cannot guarantee that the port can be
closed properly, if we try to reconnect quickly after reset, the
connection will fail since the current port is same as the
previous one, the previous port connection has been hold on server side.
dynamically apply for the port base to avoid duplication.
Change-Id: I0089244b2707ea61f553a4dae09c7af3649c70bd
Signed-off-by: chao.an <anchao@xiaomi.com>
It's enough to check the buffer available in the net event handler
Change-Id: I2d7c7a03675cf6eff6ffb42a81b7c7245253e92c
Signed-off-by: Xiang Xiao <xiaoxiang@xiaomi.com>
Since the request address was not properly resolved before the handshake,
every time of connection, the handshake data will be overwitten into
arp packet and retransmitted until the next tcp timer.
Request the arp address before the handshake to avoid the retransmission.
Change-Id: I80118b9a8096c126c8e16cdf2f7b3d98fca92437
Signed-off-by: chao.an <anchao@xiaomi.com>
The tcp_backlog is used when there is not process running to accept a new connection, but it is limited by the number of allocated backlog containers. The current error message induces the user to believe there is not free memory to allocate a new container, but actually the containers were allocated during the initialization and were available until the last element of the list has been removed to use.
MSG_DONTWAIT (since Linux 2.2)
Enables nonblocking operation; if the operation would block, the
call fails with the error EAGAIN or EWOULDBLOCK. This provides
similar behavior to setting the O_NONBLOCK flag (via the fcntl(2)
F_SETFL operation), but differs in that MSG_DONTWAIT is a per-call
option, whereas O_NONBLOCK is a setting on the open file description
(see open(2)), which will affect all threads in the calling process
and as well as other processes that hold file descriptors referring
to the same open file description.
Author: Gregory Nutt <gnutt@nuttx.org>
net/tcp: Fix errors found in build testing.
Recent re-organization moved some functions from net/inet to net/tcp and net/udp. This include references to nxsem_wait(), SEM_PRIO_NONE, and other internal NuttX semaphore functions. These all failed to compile because nuttx/semaphore.h was not included in any of the files.
1.Consolidate absolute to relative timeout conversion into one place(_net_timedwait)
2.Drive the wait timeout logic by net_timedwait instead of devif_timer
This patch help us remove devif_timer(period tick) to save the power in the future.
Change-Id: I534748a5d767ca6da8a7843c3c2f993ed9ea77d4
Signed-off-by: Xiang Xiao <xiaoxiang@xiaomi.com>
Here is the email loop talk about why it is better to remove the option:
https://groups.google.com/forum/#!topic/nuttx/AaNkS7oU6R0
Change-Id: Ib66c037752149ad4b2787ef447f966c77aa12aad
Signed-off-by: Xiang Xiao <xiaoxiang@xiaomi.com>
* Simplify EINTR/ECANCEL error handling
1. Add semaphore uninterruptible wait function
2 .Replace semaphore wait loop with a single uninterruptible wait
3. Replace all sem_xxx to nxsem_xxx
* Unify the void cast usage
1. Remove void cast for function because many place ignore the returned value witout cast
2. Replace void cast for variable with UNUSED macro
* fix ieee802154/ieee802154_input.c:179:7: error: too few arguments to function 'iob_free'
iob_free(container->ic_iob);
^~~~~~~~
ieee802154/ieee802154_input.c:180:7: error: too many arguments to function 'ieee802154_container_free'
ieee802154_container_free(container, IOBUSER_NET_SOCK_IEEE802154);
^~~~~~~~~~~~~~~~~~~~~~~~~
* fix udp/udp_netpoll.c:327:10: warning: 'ret' may be used uninitialized in this function [-Wmaybe-uninitialized]
* fix local/local_netpoll.c:154:15: warning: implicit declaration of function 'nxsem_post'; did you mean 'sem_post'? [-Wimplicit-function-declaration]
nxsem_post(fds->sem);
^~~~~~~~~~
sem_post
Author: Gregory Nutt <gnutt@nuttx.org>
Run all .h and .c files modified in last PR through nxstyle.
Author: Xiang Xiao <xiaoxiang@xiaomi.com>
Net cleanup (#17)
* Fix the semaphore usage issue found in tcp/udp
1. The count semaphore need disable priority inheritance
2. Loop again if net_lockedwait return -EINTR
3. Call nxsem_trywait to avoid the race condition
4. Call nxsem_post instead of sem_post
* Put the work notifier into free list to avoid the heap fragment in the long run. Since the allocation strategy is encapsulated internally, we can even refine the implementation later.
* Network stack shouldn't allocate memory in the poll implementation to avoid the heap fragment in the long run, other modification include:
1. Select MM_IOB automatically since ICMP[v6] socket can't work without the read ahead buffer
2. Remove the net lock since xxx_callback_free already do the same thing
3. TCP/UDP poll should work even the read ahead buffer isn't enabled at all
* Add NET_ prefix for UDP_NOTIFIER and TCP_NOTIFIER option to align with other UDP/TCP option convention
* Remove the unused _SF_[IDLE|ACCEPT|SEND|RECV|MASK] flags since there are code to set/clear these flags, but nobody check them.
Squashed commit of the following:
net/tmp: Rename the unacked field of the tcp connection structure to tx_unacked. Too confusing with the implementation of delayed RX ACKs.
net/tcp: Initial implementation of TCP delayed ACKs.
net/tcp: Add delayed ACK configuration selection. Rename tcp_ack() to tcp_synack(). It may or may not send a ACK. It will always send SYN or SYN/ACK.
1. For buffered tcp/udp case, if CONFIG_NET_ARP_SEND/CONFIG_NET_ARP_IPIN/CONFIG_NET_ICMPv6_NEIGHBOR isn't enabled and the table doesn't contain ip<->ethaddr mapping yet, the logic will skip the realtransmission and then arp/neighbor can't steal the final buffer to generate arp/icmpv6 packet.
2.for all other case, the tcp layer or user program should already contain the retransmit logic, the check is redundancy and may generate many duplicated packets if arp/icmpv6 response is too slow because the cursor stop forward. If user still concern about the very first packet lost, he could fix the issue by enabling CONFIG_NET_ARP_SEND/CONFIG_NET_ICMPv6_NEIGHBOR at begin.
Iobinstrumentation
* mm/iob: Introduces producer/consumer id to every iob call. This is so that the calls can be instrumented to monitor the IOB resources.
* iob instrumentation - Merges producer/consumer enumeration for simpler IOB user.
* fs/procfs: Starts adding support for /proc/iobinfo
* fs/procfs: Finishes first pass of simple IOB user stastics and /proc/iobinfo entry
Approved-by: Gregory Nutt <gnutt@nuttx.org>
Squashed commit of the following:
net/: Fix some naming inconsistencies, Fix final compilation issies.
net/inet/inet_close(): Now that we have logic to drain the buffered TX data, we can implement a proper lingering close.
net/inet,tcp,udp: Add functions to wait for write buffers to drain.
net/udp: Add support for notification when the UDP write buffer becomes empty.
net/tcp: Add support for notification when the TCP write buffer becomes empty.
arch/: Removed all references to CONFIG_DISABLE_POLL. The standard POSIX poll() can not longer be disabled.
sched/ audio/ crypto/: Removed all references to CONFIG_DISABLE_POLL. The standard POSIX poll() can not longer be disabled.
Documentation/: Removed all references to CONFIG_DISABLE_POLL. The standard POSIX poll() can not longer be disabled.
fs/: Removed all references to CONFIG_DISABLE_POLL. The standard POSIX poll() can not longer be disabled.
graphics/: Removed all references to CONFIG_DISABLE_POLL. The standard POSIX poll() can not longer be disabled.
net/: Removed all references to CONFIG_DISABLE_POLL. The standard POSIX poll() can not longer be disabled.
drivers/: Removed all references to CONFIG_DISABLE_POLL. The standard POSIX poll() can not longer be disabled.
include/, syscall/, wireless/: Removed all references to CONFIG_DISABLE_POLL. The standard POSIX poll() can not longer be disabled.
configs/: Remove all references to CONFIG_DISABLE_POLL. Standard POSIX poll can no longer be disabled.
Squashed commit of the following:
sched/sched/sched_getsockets.c: Fix an error in conditional compilation.
fs/: Remove all conditional logic based on CONFIG_NSOCKET_DESCRIPTORS == 0
Documentation/: Remove all references to CONFIG_NSOCKET_DESCRIPTORS == 0
include/: Remove all conditional logic based on CONFIG_NSOCKET_DESCRIPTORS == 0
libs/: Remove all conditional logic based on CONFIG_NSOCKET_DESCRIPTORS == 0
net/: Remove all conditional logic based on CONFIG_NSOCKET_DESCRIPTORS == 0
sched/: Remove all conditional logic based on CONFIG_NSOCKET_DESCRIPTORS == 0
syscall/: Remove all conditional logic based on CONFIG_NSOCKET_DESCRIPTORS == 0
tools/: Fixups for CONFIG_NSOCKET_DESCRIPTORS no longer used to disable sockets.
sched/wqueue/kwork_notifier.c: Redesign some data structures. struct works_s must appear at the beginning of the notifier entry structure. That is because it contains the work queue indices. This solves a harfault issue.
net/tcp/tcp_netpoll.c: tcp_iob_work() needs to free the allocated argument when it is finished.
net/tcp/tcp_send_buffered.c: Extend psock_tcp_cansend() so that it also requires that at least on IOB is also avaialble.
mm/iob: iob_navail() was returning the number of free IOB chain queue entries, not the number of free IOBs. Completely misnamed.
net/tcp/tcp_netpoll.c: Add logic to receive notifications when IOBs are freed (Needs CONFIG_NET_TCP_WRITE_BUFFERS and CONFIG_IOB_NOTIFIER). At present, does nothing because the logic in in psock_tcp_cansend() does not check for the availability of IOBs. That will change.
Squashed commit of the following:
Fix up some final compile isses.
net/netdev: Convert the network down notification logic to use the new wqueue-based notification factility.
net/udp: Convert the UDP readahead notification logic to use the new wqueue-based notification factility.
net/tcp: Convert the TCP readahead notification logic to use the new wqueue-based notification factility.
mm/iob: Convert the IOB notification logic to use the new wqueue-based notification factility.
sched/wqueue: Signals are not good IPCs to support the target poll functionality for several reasons including the amount of data that can be passed with a signal and in the fact that in protected and kernel modes, user threads executing signal handlers in protected, kernel memory is problematic. Instead, convert the same logic to perform the notifications via function callback on the high priority work queue.
Squashed commit of the following:
net/tcp: Add signal notification for the case when UDP read-ahead data is buffered. This is basically of clone of the TCP notification logic with naming adapted for UDP.
net/tcp: Add signal notification for the case when TCP read-ahead data is buffered.
net/utils: return from net_breaklock() was being clobbered.
net/: Replace all calls to iob_alloc() with calls to net_ioballoc() which will release the network lock, if necessary.
net/utils, tcp, include/net: Separate out the special IOB allocation logic and place it in its own function. Prototype is available in a public header file where it can also be used by network drivers.
net/utils: net_timedwait() now uses new net_breaklock() and net_restorelock().
This makes the user interface a little hostile. People thing of an MTU of 1500 bytes, but the corresponding packet is really 1514 bytes (including the 14 byte Ethernet header). A more friendly solution would configure the MTU (as before), but then derive the packet buffer size by adding the MAC header length. Instead, we define the packet buffer size then derive the MTU.
The MTU is not common currency in networking. On the wire, the only real issue is the MSS which is derived from MTU by subtracting the IP header and TCP header sizes (for the case of TCP). Now it is derived for the PKTSIZE by subtracting the IP header, the TCP header, and the MAC header sizes. So we should be all good and without the recurring 14 byte error in MTU's and MSS's.
Squashed commit of the following:
Trivial update to fix some spacing issues.
net/: Rename several macros containing _MTU to _PKTSIZE.
net/: Rename CONFIG_NET_SLIP_MTU to CONFIG_NET_SLIP_PKTSIZE and similarly for CONFIG_NET_TUN_MTU. These are not the MTU which does not include the size of the link layer header. These are the full size of the packet buffer memory (minus any GUARD bytes).
net/: Rename CONFIG_NET_6LOWPAN_MTU to CONFIG_NET_6LOWPAN_PKTSIZE and similarly for CONFIG_NET_TUN_MTU. These are not the MTU which does not include the size of the link layer header. These are the full size of the packet buffer memory (minus any GUARD bytes).
net/: Rename CONFIG_NET_ETH_MTU to CONFIG_NET_ETH_PKTSIZE. This is not the MTU which does not include the size of the link layer header. This is the full size of the packet buffer memory (minus any GUARD bytes).
net/: Rename the file d_mtu in the network driver structure to d_pktsize. That value saved there is not the MTU. The packetsize is the memory large enough to hold the maximum packet PLUS the size of the link layer header. The MTU does not include the link layer header.
Fix a few typo/compilation problems.
net/: Remove all CONFIG_NET_xxx_TCP_RECVWNDO configuration variables. They were used only to initialize the d_recwndo of the network device structure which no longer exists.
net/: Remove the device TCP receive window field (d_recvwndo) from the device structure. That value is no longer retained, but is calculated dynamically.
Remove some dangling references to CONFIG_NET_TCP_RWND_CONTROL.
net/tcp: Take read-ahead throttling into account when calculating the TCP receive window size.
net/tcp: tcp_get_recvwindow() now returns the receive window size directly (vs. indirectly via the device structure).
net/tcp: Remove CONFIG_NET_TCP_RWND_CONTROL. TCP window algorithm is now trigged only by CONFIG_NET_TCP_READAHEAD.
Squashed commit of the following:
sched: Rename all use of system_t to clock_t.
syscall: Rename all use of system_t to clock_t.
net: Rename all use of system_t to clock_t.
libs: Rename all use of system_t to clock_t.
fs: Rename all use of system_t to clock_t.
drivers: Rename all use of system_t to clock_t.
arch: Rename all use of system_t to clock_t.
include: Remove definition of systime_t; rename all use of system_t to clock_t.