I noticed that the conn instance will leak during stress test,
The close work queued from tcp_close_eventhandler() will be canceled
by tcp_timer() immediately:
Breakpoint 1, tcp_close_eventhandler (dev=0x565cd338 <up_irq_restore+108>, pvpriv=0x5655e6ff <getpid+12>, flags=0) at tcp/tcp_close.c:71
(gdb) bt
| #0 tcp_close_eventhandler (dev=0x565cd338 <up_irq_restore+108>, pvpriv=0x5655e6ff <getpid+12>, flags=0) at tcp/tcp_close.c:71
| #1 0x5658bf1e in devif_conn_event (dev=0x5660bd80 <g_sim_dev>, flags=512, list=0x5660d558 <g_cbprealloc+312>) at devif/devif_callback.c:508
| #2 0x5658a219 in tcp_callback (dev=0x5660bd80 <g_sim_dev>, conn=0x5660c4a0 <g_tcp_connections>, flags=512) at tcp/tcp_callback.c:167
| #3 0x56589253 in tcp_timer (dev=0x5660bd80 <g_sim_dev>, conn=0x5660c4a0 <g_tcp_connections>) at tcp/tcp_timer.c:378
| #4 0x5658dd47 in tcp_poll (dev=0x5660bd80 <g_sim_dev>, conn=0x5660c4a0 <g_tcp_connections>) at tcp/tcp_devpoll.c:95
| #5 0x5658b95f in devif_poll_tcp_connections (dev=0x5660bd80 <g_sim_dev>, callback=0x565770f2 <netdriver_txpoll>) at devif/devif_poll.c:601
| #6 0x5658b9ea in devif_poll (dev=0x5660bd80 <g_sim_dev>, callback=0x565770f2 <netdriver_txpoll>) at devif/devif_poll.c:722
| #7 0x56577230 in netdriver_txavail_work (arg=0x5660bd80 <g_sim_dev>) at sim/up_netdriver.c:308
| #8 0x5655999e in work_thread (argc=2, argv=0xf3db5dd0) at wqueue/kwork_thread.c:178
| #9 0x5655983f in nxtask_start () at task/task_start.c:129
(gdb) c
Continuing.
Breakpoint 2, tcp_update_timer (conn=0x5660c4a0 <g_tcp_connections>) at tcp/tcp_timer.c:178
(gdb) bt
| #0 tcp_update_timer (conn=0x5660c4a0 <g_tcp_connections>) at tcp/tcp_timer.c:178
| #1 0x5658952a in tcp_timer (dev=0x5660bd80 <g_sim_dev>, conn=0x5660c4a0 <g_tcp_connections>) at tcp/tcp_timer.c:708
| #2 0x5658dd47 in tcp_poll (dev=0x5660bd80 <g_sim_dev>, conn=0x5660c4a0 <g_tcp_connections>) at tcp/tcp_devpoll.c:95
| #3 0x5658b95f in devif_poll_tcp_connections (dev=0x5660bd80 <g_sim_dev>, callback=0x565770f2 <netdriver_txpoll>) at devif/devif_poll.c:601
| #4 0x5658b9ea in devif_poll (dev=0x5660bd80 <g_sim_dev>, callback=0x565770f2 <netdriver_txpoll>) at devif/devif_poll.c:722
| #5 0x56577230 in netdriver_txavail_work (arg=0x5660bd80 <g_sim_dev>) at sim/up_netdriver.c:308
| #6 0x5655999e in work_thread (argc=2, argv=0xf3db5dd0) at wqueue/kwork_thread.c:178
| #7 0x5655983f in nxtask_start () at task/task_start.c:129
Since a separate work will add 24 bytes to each conn instance,
but in order to support the feature of asynchronous close(),
I can not find a better way than adding a separate work,
for resource constraints, I recommend the developers to enable
CONFIG_NET_ALLOC_CONNS, which will reduce the ram usage.
Signed-off-by: chao an <anchao@xiaomi.com>
Do not use 'pvconn' argument to get the connection pointer since
pvconn is normally NULL for some events like NETDEV_DOWN.
Instead, the connection pointer can be reliably obtained from the
corresponding private pointer.
Signed-off-by: chao.an <anchao@xiaomi.com>
When the free connection list is unenough to alloc a new instance,
the TCP stack will reuse the currently closed connection, but if
the handle is not released by the user via close(2), the reference
count of the connection remains in a non-zero value, it will cause
the assertion to fail, so when the handle is not released we should
not use such a conn instance when being actively closed, and ensure
that the reference count is assigned within the net lock protection
|(gdb) bt
|#0 up_assert (filename=0x565c78f7 "tcp/tcp_conn.c", lineno=771) at sim/up_assert.c:75
|#1 0x56566177 in _assert (filename=0x565c78f7 "tcp/tcp_conn.c", linenum=771) at assert/lib_assert.c:36
|#2 0x5657d620 in tcp_free (conn=0x565fb3e0 <g_tcp_connections>) at tcp/tcp_conn.c:771
|#3 0x5657d5a1 in tcp_alloc (domain=2 '\002') at tcp/tcp_conn.c:700
|#4 0x565b1f50 in inet_tcp_alloc (psock=0xf3dea150) at inet/inet_sockif.c:144
|#5 0x565b2082 in inet_setup (psock=0xf3dea150, protocol=0) at inet/inet_sockif.c:253
|#6 0x565b1bf0 in psock_socket (domain=2, type=1, protocol=0, psock=0xf3dea150) at socket/socket.c:121
|#7 0x56588f5f in socket (domain=2, type=1, protocol=0) at socket/socket.c:278
|#8 0x565b11c0 in hello_main (argc=1, argv=0xf3dfab10) at hello_main.c:35
|#9 0x56566631 in nxtask_startup (entrypt=0x565b10ef <hello_main>, argc=1, argv=0xf3dfab10) at sched/task_startup.c:70
|#10 0x565597fa in nxtask_start () at task/task_start.c:134
Signed-off-by: chao.an <anchao@xiaomi.com>
This reverts commit b88a1fd7fd. [1]
Because:
* It casues assertion failures like [2].
* I don't understand what it attempted to fix.
[1]
```
commit b88a1fd7fd
Author: chao.an <anchao@xiaomi.com>
Date: Sat Jul 2 13:17:41 2022 +0800
net/tcp: discard connect reference before free
connect reference should be set to 0 before free
Signed-off-by: chao.an <anchao@xiaomi.com>
```
[2]
```
#0 up_assert (filename=0x5516d0 "tcp/tcp_conn.c", lineno=771) at sim/up_assert.c:75
#1 0x000000000040a4bb in _assert (filename=0x5516d0 "tcp/tcp_conn.c", linenum=771) at assert/lib_assert.c:36
#2 0x000000000042a2ad in tcp_free (conn=0x597fe0 <g_tcp_connections+384>) at tcp/tcp_conn.c:771
#3 0x000000000053bdc2 in tcp_close_disconnect (psock=0x7f58d1abbd80) at tcp/tcp_close.c:331
#4 0x000000000053bc69 in tcp_close (psock=0x7f58d1abbd80) at tcp/tcp_close.c:366
#5 0x000000000052eefe in inet_close (psock=0x7f58d1abbd80) at inet/inet_sockif.c:1689
#6 0x000000000052eb9b in psock_close (psock=0x7f58d1abbd80) at socket/net_close.c:102
#7 0x0000000000440495 in sock_file_close (filep=0x7f58d1b35f40) at socket/socket.c:115
#8 0x000000000043b8b6 in file_close (filep=0x7f58d1b35f40) at vfs/fs_close.c:74
#9 0x000000000043ab22 in nx_close (fd=9) at inode/fs_files.c:544
#10 0x000000000043ab7f in close (fd=9) at inode/fs_files.c:578
```
The time consuming of tcp waving hands(close(2)) will be affected
by network jitter, especially the wireless device cannot receive
the last-ack under worst environment, in this change we move the
tcp close callback into background and invoke the resource free
from workqueue, which will avoid the user application from being
blocked for a long time and unable to return in the call of close
Signed-off-by: chao.an <anchao@xiaomi.com>
tcp_close disposes the connection immediately if it's called in
TCP_LAST_ACK. If it happens, we will end up with responding the
last ACK with a RST.
This commit fixes it by making tcp_close wait for the completion
of the passive close.
This fixes connection closing issues with CONFIG_NET_TCP_WRITE_BUFFERS.
Because TCP_CLOSE is used for both of input and output for tcp_callback,
the close callback and the send callback confuses each other as
the following. As it effectively disposes the connection immediately,
we end up with responding to the consequent ACK and FIN/ACK from the peer
with RSTs.
tcp_timer
-> tcp_close_eventhandler
returns TCP_CLOSE (meaning an active close)
-> psock_send_eventhandler
called with TCP_CLOSE from tcp_close_eventhandler, misinterpet as
a passive close.
-> tcp_lost_connection
-> tcp_shutdown_monitor
-> tcp_callback
-> tcp_close_eventhandler
misinterpret TCP_CLOSE from itself as
a passive close
remove the connection assertion since the instance will be invalid
if the network device has been taken down.
net/netdev/netdev_ioctl.c:
1847 void netdev_ifdown(FAR struct net_driver_s *dev)
1848 {
...
1871 /* Notify clients that the network has been taken down */
1872
1873 devif_dev_event(dev, NULL, NETDEV_DOWN);
...
1883 }
Change-Id: I492b97b5ebe035ea67bbdd7ed635cb13d085e89c
Signed-off-by: chao.an <anchao@xiaomi.com>
Author: Gregory Nutt <gnutt@nuttx.org>
net/tcp: Fix errors found in build testing.
Recent re-organization moved some functions from net/inet to net/tcp and net/udp. This include references to nxsem_wait(), SEM_PRIO_NONE, and other internal NuttX semaphore functions. These all failed to compile because nuttx/semaphore.h was not included in any of the files.