Commit Graph

129 Commits

Author SHA1 Message Date
YAMAMOTO Takashi
09f3a1ec8e tcp_send_buffered: throttle IOB allocations for send
Consider a bi-directional TCP connection:

1. we use all IOBs for tx queue
2. we advertize zero recv window because we have no free IOBs
3. if the peer tcp does the same thing,
   both sides advertize zero window and can not drain the tx queue.

For a similar stall to happen, the peer doesn't need to be
a naive tcp implementation like nuttx. A naive application blocking
on send() without draining its read buffer is enough.
(Probably such an application should be fixed to drain rx even
when tx is full. However, it's another story.)

This commit avoids the situation by prevent tx from grabbing
the all IOBs in the first place. (assuming CONFIG_IOB_THROTTLE > 0)
2021-07-14 15:08:18 +08:00
chao.an
d4ce70979e net/tcp: change all window relative value type to uint32_t
1. change all window relative value type to uint32_t
2. move window range validity check(UINT16_MAX) before assembling TCP header

Signed-off-by: chao.an <anchao@xiaomi.com>
2021-07-07 03:55:41 -05:00
chao.an
aab03ef86d net/tcp: add window scale support
Reference here:
https://tools.ietf.org/html/rfc1323

Signed-off-by: chao.an <anchao@xiaomi.com>
2021-07-07 03:55:41 -05:00
chao.an
b901f22c27 net/socket: add SO_RCVBUF support
Signed-off-by: chao.an <anchao@xiaomi.com>
2021-07-06 01:44:55 -05:00
chao.an
eabe535de7 net/inet: add support of FIONREAD
Signed-off-by: chao.an <anchao@xiaomi.com>
2021-07-05 06:20:52 -05:00
YAMAMOTO Takashi
52c237cb5f net/tcp/tcp.h: Update a comment about readahead 2021-06-30 06:40:13 -05:00
YAMAMOTO Takashi
4878b7729c tcp: simplify readahead
Do not bother to preserve segment boundaries in the tcp
readahead queues.

* Avoid wasting the tail IOB space for each segments.
  Instead, pack the newly received data into the tail space
  of the last IOB. Also, advertise the tail space as
  a part of the window.

* Use IOB chain directly. Eliminate IOB queue overhead.

* Allow to accept only a part of a segment.

* This change improves the memory efficiency.
  And probably more importantly, allows less-confusing
  recv window advertisement behavior.
  Previously, even when we advertise N bytes window,
  we often couldn't actually accept N bytes. Depending on
  the segment sizes and IOB configurations, it was causing
  segment drops.
  Also, the previous code was moving the right edge of the
  window back and forth too often, even when nothing in
  the system was competing on the IOBs. Shrinking the
  window that way is a kinda well known recipe to confuse
  the peer stack.
2021-06-30 06:22:14 +09:00
YAMAMOTO Takashi
eeafe070ec tcp.h: Add TCP_SEQ_ADD 2021-06-30 06:22:14 +09:00
YAMAMOTO Takashi
14ec75e7fc tcp: window update improvements
* Fixes the case where the window was small but not zero.

* tcp_recvfrom: Remove tcp_ackhandler. Instead, simply schedule TX for
  a possible window update and make tcp_appsend decide.

* Replace rcv_wnd (the last advertized window size value) with
  rcv_adv. (the window edge sequence number advertized to the peer)
  rcv_wnd was complicated to deal with because its base (rcvseq) is
  also moving.

* tcp_appsend: Send a window update even if there are no other reasons
  to send an ack.
  Namely, send an update if it increases the window by
    * 2 * mss
    * or the half of the max possible window size
2021-06-13 21:20:24 -05:00
YAMAMOTO Takashi
1f6fdf04b7 tcp: Extract MSS calculation from tcp_synack
I plan to use it for recv window update decision logic.
2021-06-13 21:20:24 -05:00
YAMAMOTO Takashi
433a2b27d9 tcp: add macros to deal with sequence number wraparound 2021-06-10 22:47:04 -05:00
YAMAMOTO Takashi
69b3f034a4 tcp: Move buffered/unbuffered common code to tcp_send.c 2021-06-03 21:33:10 -05:00
YAMAMOTO Takashi
09869e5d41 net/tcp/tcp.h: Remove unused extern g_netdevices 2021-03-30 12:27:50 -05:00
YAMAMOTO Takashi
837e1a72a4 tcp_send_buffered.c: improve tcp write buffering
* Send data chunk-by-chunk
  Note: A stream socket doesn't have atomicity requirement.

* Increase the chance to use full-sized segments

Benchmark numbers in my environment:

* Over ESP32 wifi
* The peer is NetBSD, which has traditional delayed ack TCP
* iperf uses 16384 bytes buffer

---

without this patch,
CONFIG_IOB_NBUFFERS=36
CONFIG_IOB_BUFSIZE=196

does not work.
see https://github.com/apache/incubator-nuttx/pull/2772#discussion_r592820639

---

without this patch,
CONFIG_IOB_NBUFFERS=128
CONFIG_IOB_BUFSIZE=196
```
nsh> iperf -c 192.168.8.1
       IP: 192.168.8.103

 mode=tcp-client sip=192.168.8.103:5001,dip=192.168.8.1:5001, interval=3, time=30

        Interval Bandwidth

   0-   3 sec,  4.11 Mbits/sec
   3-   6 sec,  4.63 Mbits/sec
   6-   9 sec,  4.89 Mbits/sec
   9-  12 sec,  4.63 Mbits/sec
  12-  15 sec,  4.85 Mbits/sec
  15-  18 sec,  4.85 Mbits/sec
  18-  21 sec,  5.02 Mbits/sec
  21-  24 sec,  3.67 Mbits/sec
  24-  27 sec,  4.94 Mbits/sec
  27-  30 sec,  4.81 Mbits/sec
   0-  30 sec,  4.64 Mbits/sec
nsh>
```

---

with this patch,
CONFIG_IOB_NBUFFERS=36
CONFIG_IOB_BUFSIZE=196
```
nsh> iperf -c 192.168.8.1
       IP: 192.168.8.103

 mode=tcp-client sip=192.168.8.103:5001,dip=192.168.8.1:5001, interval=3, time=30

        Interval Bandwidth

   0-   3 sec,  5.33 Mbits/sec
   3-   6 sec,  5.59 Mbits/sec
   6-   9 sec,  5.55 Mbits/sec
   9-  12 sec,  5.59 Mbits/sec
  12-  15 sec,  5.59 Mbits/sec
  15-  18 sec,  5.72 Mbits/sec
  18-  21 sec,  5.68 Mbits/sec
  21-  24 sec,  5.29 Mbits/sec
  24-  27 sec,  4.67 Mbits/sec
  27-  30 sec,  4.50 Mbits/sec
   0-  30 sec,  5.35 Mbits/sec
nsh>
```

---

with this patch,
CONFIG_IOB_NBUFFERS=128
CONFIG_IOB_BUFSIZE=196
```
nsh> iperf -c 192.168.8.1
       IP: 192.168.8.103

 mode=tcp-client sip=192.168.8.103:5001,dip=192.168.8.1:5001, interval=3, time=30

        Interval Bandwidth

   0-   3 sec,  5.51 Mbits/sec
   3-   6 sec,  4.67 Mbits/sec
   6-   9 sec,  4.54 Mbits/sec
   9-  12 sec,  5.42 Mbits/sec
  12-  15 sec,  5.37 Mbits/sec
  15-  18 sec,  5.11 Mbits/sec
  18-  21 sec,  5.07 Mbits/sec
  21-  24 sec,  5.29 Mbits/sec
  24-  27 sec,  5.77 Mbits/sec
  27-  30 sec,  4.63 Mbits/sec
   0-  30 sec,  5.14 Mbits/sec
nsh>
```
2021-03-22 01:12:59 -07:00
Alin Jerpelea
37d5c1b0d9 net: Author Gregory Nutt: update licenses to Apache
Gregory Nutt has submitted the SGA and we can migrate the licenses
 to Apache.

Signed-off-by: Alin Jerpelea <alin.jerpelea@sony.com>
2021-02-20 00:38:18 -08:00
Abdelatif Guettouche
7e3d4a5f29 net: Remove duplicate forward references.
Signed-off-by: Abdelatif Guettouche <abdelatif.guettouche@espressif.com>
2021-01-16 07:39:04 -08:00
Juha Niskanen
de1ad1fdb3 net: fix typos, incorrect comments, nxstyle
Signed-off-by: Juha Niskanen <juha.niskanen@haltian.com>
2020-12-13 09:06:28 -06:00
chao.an
8d0118569c [Performance]net/tcp: send the ACK in time after obtain ahead buffer from iobs
Request the TCP ACK to estimate the receive window after handle
any data already buffered in a read-ahead buffer.

Change-Id: Id998a1125dd2991d73ba4bef081ddcb7adea4f0d
Signed-off-by: chao.an <anchao@xiaomi.com>
2020-12-10 12:23:47 +09:00
chao.an
881dd9d62d net/tcp: add a member to record the current receiving window
Change-Id: Ic4c46d643a905fdd3a828e563eab4814da70dbe5
Signed-off-by: chao.an <anchao@xiaomi.com>
2020-12-10 12:23:47 +09:00
chao.an
794a6ec23d net/tcp: rename the winszie to snd_wnd to make the semantics more accurate
Change-Id: I8fdc7cf78a7f2cd53a30ef1de702b1a697c43238
Signed-off-by: chao.an <anchao@xiaomi.com>
2020-12-10 12:23:47 +09:00
chao.an
c2b0006dcd net/tcp: implement the fast retransmit
RFC2001: TCP Slow Start, Congestion Avoidance, Fast Retransmit,
         and Fast Recovery Algorithms

...

3.  Fast Retransmit
  Modifications to the congestion avoidance algorithm were proposed in
  1990 [3].  Before describing the change, realize that TCP may
  generate an immediate acknowledgment (a duplicate ACK) when an out-
  of-order segment is received (Section 4.2.2.21 of [1], with a note
  that one reason for doing so was for the experimental fast-
  retransmit algorithm).  This duplicate ACK should not be delayed.
  The purpose of this duplicate ACK is to let the other end know that a
  segment was received out of order, and to tell it what sequence
  number is expected.

  Since TCP does not know whether a duplicate ACK is caused by a lost
  segment or just a reordering of segments, it waits for a small number
  of duplicate ACKs to be received.  It is assumed that if there is
  just a reordering of the segments, there will be only one or two
  duplicate ACKs before the reordered segment is processed, which will
  then generate a new ACK.  If three or more duplicate ACKs are
  received in a row, it is a strong indication that a segment has been
  lost.  TCP then performs a retransmission of what appears to be the
  missing segment, without waiting for a retransmission timer to
  expire.

Change-Id: Ie2cbcecab507c3d831f74390a6a85e0c5c8e0652
Signed-off-by: chao.an <anchao@xiaomi.com>
2020-12-01 11:36:10 -06:00
chao.an
bf21056001 net/tcp: fallback to unthrottle pool to avoid deadlock
Add a fallback mechanism to ensure that there are still available
iobs for an free connection, Guarantees all connections will have
a minimum threshold iob to keep the connection not be hanged.

Change-Id: I59bed98d135ccd1f16264b9ccacdd1b0d91261de
Signed-off-by: chao.an <anchao@xiaomi.com>
2020-11-28 00:03:47 -06:00
GAEHWILER Reto
83745652c4 TCP-stack fix for stalled tcp sockets due to broken keepalive
Fixes an issue where tcp sockets with activated keepalives stalled and
were not properly closed. Poll would not indicate a POLLHUP and therefore
locks down the application.

* tcp_conn_s.tcp_conn_s & tcp_conn_s.keepintvl changed to uint32_t
  According RFC1122 keepidle MUST have a default of 2 hours.
2020-10-27 11:21:56 -07:00
Xiang Xiao
63e3054ced Don't need monitor IOB buffer empty event for POLLOUT implementation
It's enough to check the buffer available in the net event handler

Change-Id: I2d7c7a03675cf6eff6ffb42a81b7c7245253e92c
Signed-off-by: Xiang Xiao <xiaoxiang@xiaomi.com>
2020-05-13 06:50:07 -06:00
Xiang Xiao
bd4e8e19d3 Run codespell -w against all files
and fix the wrong correction
2020-02-22 14:45:07 -06:00
chao.an
c65d8e6a23 net/socket: add MSG_DONTWAIT support
MSG_DONTWAIT (since Linux 2.2)
  Enables nonblocking operation; if the operation would block, the
  call fails with the error EAGAIN or EWOULDBLOCK. This provides
  similar behavior to setting the O_NONBLOCK flag (via the fcntl(2)
  F_SETFL operation), but differs in that MSG_DONTWAIT is a per-call
  option, whereas O_NONBLOCK is a setting on the open file description
  (see open(2)), which will affect all threads in the calling process
  and as well as other processes that hold file descriptors referring
  to the same open file description.
2020-02-19 12:21:28 -06:00
Juha Niskanen
15b78abccf Fix typos in comments 2020-02-14 08:50:45 -06:00
Gregory Nutt
6d4b86ff06 net/tcp/tcp.h: Correct spacing error introduced with the last PR. 2020-01-21 08:30:39 -06:00
Xiang Xiao
e75b5e9d86 net/tcp and udp: Move tcp/udp recvfrom into tcp/udp folder
Move tcp/udp recvfrom into tcp/udp folder and remove inet_recvfrom.c
2020-01-21 08:30:39 -06:00
Xiang Xiao
e869a10c18 net/tcp, udp: Move tcp/udp close operation into tcp/udp folder
Move tcp/udp close operation into tcp/udp folder and remove inet_close.c
2020-01-21 08:30:39 -06:00
Xiang Xiao
5c5c08efcd network: simplify the timeout process logic
1.Consolidate absolute to relative timeout conversion into one place(_net_timedwait)
2.Drive the wait timeout logic by net_timedwait instead of devif_timer
This patch help us remove devif_timer(period tick) to save the power in the future.

Change-Id: I534748a5d767ca6da8a7843c3c2f993ed9ea77d4
Signed-off-by: Xiang Xiao <xiaoxiang@xiaomi.com>
2020-01-11 08:24:49 -06:00
Xiang Xiao
346336bb9e Make the read ahead buffer unselectable
Here is the email loop talk about why it is better to remove the option:
https://groups.google.com/forum/#!topic/nuttx/AaNkS7oU6R0

Change-Id: Ib66c037752149ad4b2787ef447f966c77aa12aad
Signed-off-by: Xiang Xiao <xiaoxiang@xiaomi.com>
2020-01-11 08:24:49 -06:00
Masayuki Ishikawa
6e7c761fc8 net: tcp: Fix compile error in tcp.h 2020-01-02 09:36:25 -06:00
Xiang Xiao
90c52e6f8f Squashed commit of the following:
Author: Gregory Nutt <gnutt@nuttx.org>

    Run all .h and .c files modified in last PR through nxstyle.

Author: Xiang Xiao <xiaoxiang@xiaomi.com>

    Net cleanup (#17)

    * Fix the semaphore usage issue found in tcp/udp

    1. The count semaphore need disable priority inheritance
    2. Loop again if net_lockedwait return -EINTR
    3. Call nxsem_trywait to avoid the race condition
    4. Call nxsem_post instead of sem_post

    * Put the work notifier into free list to avoid the heap fragment in the long run.  Since the allocation strategy is encapsulated internally, we can even refine the implementation later.

    * Network stack shouldn't allocate memory in the poll implementation to avoid the heap fragment in the long run, other modification include:

    1. Select MM_IOB automatically since ICMP[v6] socket can't work without the read ahead buffer
    2. Remove the net lock since xxx_callback_free already do the same thing
    3. TCP/UDP poll should work even the read ahead buffer isn't enabled at all

    * Add NET_ prefix for UDP_NOTIFIER and TCP_NOTIFIER option to align with other UDP/TCP option convention

    * Remove the unused _SF_[IDLE|ACCEPT|SEND|RECV|MASK] flags since there are code to set/clear these flags, but nobody check them.
2019-12-31 09:26:14 -06:00
Gregory Nutt
66ef6d143a This commit adds an initial implemented of TCP delayed ACKs as specified in RFC 1122.
Squashed commit of the following:

    net/tmp:  Rename the unacked field of the tcp connection structure to tx_unacked.  Too confusing with the implementation of delayed RX ACKs.

    net/tcp:  Initial implementation of TCP delayed ACKs.

    net/tcp:  Add delayed ACK configuration selection.  Rename tcp_ack() to tcp_synack().  It may or may not send a ACK.  It will always send SYN or SYN/ACK.
2019-12-08 13:13:51 -06:00
Gregory Nutt
6266e067e9 net/: Re-order the content of all address-family socket 'connection' structures so that they begin with a comomon prologue. This permits better use of logic for different address family types. 2019-09-01 08:47:01 -06:00
Anthony Merlino
70404ed0dc Merged in antmerlino/nuttx/iobinstrumentation (pull request #1001)
Iobinstrumentation

* mm/iob: Introduces producer/consumer id to every iob call. This is so that the calls can be instrumented to monitor the IOB resources.

* iob instrumentation - Merges producer/consumer enumeration for simpler IOB user.

* fs/procfs: Starts adding support for /proc/iobinfo

* fs/procfs: Finishes first pass of simple IOB user stastics and /proc/iobinfo entry

Approved-by: Gregory Nutt <gnutt@nuttx.org>
2019-08-16 22:42:25 +00:00
Gregory Nutt
361d85ae35 net/tcp and udp: Fix errors in the new implementation of SO_LINGER. The tcp_drain() and udp_drain() functions were casting the working argument to the wrong type, resulting in hangs and abnormal behavior. There is a complexity in the tcp drain logic when the remote peer closes the socket before all Tx data has been flushed. Sometimes we are not notified of this case and wait the entire timeout unnecessarily. There is a workaround in place in tcp_txdrain(), but this really should be revisited. 2019-07-27 10:26:52 -06:00
Gregory Nutt
574595dc32 Still fixing new warnings found in build testing. 2019-07-01 15:56:34 -06:00
Gregory Nutt
8774977f4d Fix warnings found in build testing. 2019-07-01 15:22:42 -06:00
Gregory Nutt
de5a6163d5 This commit implements a proper version of SO_LINGER. Not sufficiently tested on initial commit.
Squashed commit of the following:

    net/: Fix some naming inconsistencies, Fix final compilation issies.

    net/inet/inet_close():  Now that we have logic to drain the buffered TX data, we can implement a proper lingering close.

    net/inet,tcp,udp:  Add functions to wait for write buffers to drain.

    net/udp:  Add support for notification when the UDP write buffer becomes empty.

    net/tcp:  Add support for notification when the TCP write buffer becomes empty.
2019-07-01 12:25:32 -06:00
Gregory Nutt
b49be4bb20 Squashed commit of the following:
arch/:  Removed all references to CONFIG_DISABLE_POLL.  The standard POSIX poll() can not longer be disabled.
    sched/ audio/ crypto/:  Removed all references to CONFIG_DISABLE_POLL.  The standard POSIX poll() can not longer be disabled.
    Documentation/:  Removed all references to CONFIG_DISABLE_POLL.  The standard POSIX poll() can not longer be disabled.
    fs/:  Removed all references to CONFIG_DISABLE_POLL.  The standard POSIX poll() can not longer be disabled.
    graphics/:  Removed all references to CONFIG_DISABLE_POLL.  The standard POSIX poll() can not longer be disabled.
    net/:  Removed all references to CONFIG_DISABLE_POLL.  The standard POSIX poll() can not longer be disabled.
    drivers/:  Removed all references to CONFIG_DISABLE_POLL.  The standard POSIX poll() can not longer be disabled.
    include/, syscall/, wireless/:  Removed all references to CONFIG_DISABLE_POLL.  The standard POSIX poll() can not longer be disabled.
    configs/:  Remove all references to CONFIG_DISABLE_POLL.  Standard POSIX poll can no longer be disabled.
2019-05-21 18:57:54 -06:00
Gregory Nutt
0bc800d71f net/tcp/tcp.h: Fix a muffed edit to conditional found in build testing. 2019-02-11 15:54:31 -06:00
Gregory Nutt
efe65749ce Fix condition logic: The setup seems to support a network without sockets. That is not the case.
Squashed commit of the following:

    sched/sched/sched_getsockets.c:  Fix an error in conditional compilation.
    fs/:  Remove all conditional logic based on CONFIG_NSOCKET_DESCRIPTORS == 0
    Documentation/:  Remove all references to CONFIG_NSOCKET_DESCRIPTORS == 0
    include/:  Remove all conditional logic based on CONFIG_NSOCKET_DESCRIPTORS == 0
    libs/:  Remove all conditional logic based on CONFIG_NSOCKET_DESCRIPTORS == 0
    net/:  Remove all conditional logic based on CONFIG_NSOCKET_DESCRIPTORS == 0
    sched/:  Remove all conditional logic based on CONFIG_NSOCKET_DESCRIPTORS == 0
    syscall/:  Remove all conditional logic based on CONFIG_NSOCKET_DESCRIPTORS == 0
    tools/:  Fixups for CONFIG_NSOCKET_DESCRIPTORS no longer used to disable sockets.
2019-02-11 15:47:25 -06:00
Xiang Xiao
d2cfd398ba Fix compiler error and warning when CONFIG_NET_SENDFILE=y 2018-11-09 11:17:43 -06:00
Gregory Nutt
42a018747e net/devif, net/tcp, and net/udp: Extend the logic of 6c0ab0e077 so that all support Transport protocols support by IPv6 can handle the presence of IPv6 header extension options. 2018-11-02 17:50:01 -06:00
Gregory Nutt
09d5d05b95 net/TCP: Extend the TCP notification logic logic so that it will also report loss of connection events. 2018-09-09 17:32:10 -06:00
Gregory Nutt
9d3148406c Signals were not a good choice of IPC to implement the poll function for several reasons: In order to handle the asynchrnous poll-related event, a substantial amount of state information is needed. Signals are only capable of passing minimal amounts of data. There are also complexities with performing kernel space signal handlers in kernel space code that is better to avoid. So, instead of signals, the equivalent logic was converted to run via a callback that executes on the high-priority work queue.
Squashed commit of the following:

    Fix up some final compile isses.

    net/netdev:  Convert the network down notification logic to use the new wqueue-based notification factility.

    net/udp:  Convert the UDP readahead notification logic to use the new wqueue-based notification factility.

    net/tcp:  Convert the TCP readahead notification logic to use the new wqueue-based notification factility.

    mm/iob:  Convert the IOB notification logic to use the new wqueue-based notification factility.

    sched/wqueue:  Signals are not good IPCs to support the target poll functionality for several reasons including the amount of data that can be passed with a signal and in the fact that in protected and kernel modes, user threads executing signal handlers in protected, kernel memory is problematic.  Instead, convert the same logic to perform the notifications via function callback on the high priority work queue.
2018-09-09 15:01:44 -06:00
Gregory Nutt
20814acad2 sched/signal: In signal notification facility, use sigqueue() to notify vs. kill(). With sigqueue, we can pass more info (but still not enough). 2018-09-09 11:57:25 -06:00
Gregory Nutt
28f73bd928 net/tcp and udp: Add logic to signal events when TCP or UDP read-ahead data is buffered.
Squashed commit of the following:

    net/tcp:  Add signal notification for the case when UDP read-ahead data is buffered.  This is basically of clone of the TCP notification logic with naming adapted for UDP.

    net/tcp:  Add signal notification for the case when TCP read-ahead data is buffered.
2018-09-09 09:21:39 -06:00