Commit Graph

51 Commits

Author SHA1 Message Date
chao an
34d2cde8a8 net/l2/l3/l4: add support of iob offload
1. Add new config CONFIG_NET_LL_GUARDSIZE to isolation of l2 stack,
   which will benefit l3(IP) layer for multi-MAC(l2) implementation,
   especially in some NICs such as celluler net driver.

new configuration options: CONFIG_NET_LL_GUARDSIZE

CONFIG_NET_LL_GUARDSIZE will reserved l2 buffer header size of
network buffer to isolate the L2/L3 (MAC/IP) data on network layer,
which will be beneficial to L3 network layer protocol transparent
transmission and forwarding

------------------------------------------------------------
Layout of frist iob entry:

        iob_data (aligned by CONFIG_IOB_ALIGNMENT)
            |
            |                  io_offset(CONFIG_NET_LL_GUARDSIZE)
            |                                |
            -------------------------------------------------
      iob   |            Reserved            |    io_len    |
            -------------------------------------------------

-------------------------------------------------------------
Layout of different NICs implementation:

        iob_data (aligned by CONFIG_IOB_ALIGNMENT)
            |
            |                 io_offset(CONFIG_NET_LL_GUARDSIZE)
            |                                |
            -------------------------------------------------
 Ethernet   |       Reserved    | ETH_HDRLEN |    io_len    |
            ---------------------------------|---------------
 8021Q      |   Reserved  | ETH_8021Q_HDRLEN |    io_len    |
            ---------------------------------|---------------
 ipforward  |            Reserved            |    io_len    |
            -------------------------------------------------

--------------------------------------------------------------------

2. Support iob offload to l2 driver to avoid unnecessary memory copy

Support send/receive iob vectors directly between the NICs and l3/l4
stack to avoid unnecessary memory copies, especially on hardware that
supports Scatter/gather, which can greatly improve performance.

new interface to support iob offload:

  ------------------------------------------
  |    IOB version     |     original      |
  |----------------------------------------|
  |  devif_iob_poll()  |   devif_poll()    |
  |       ...          |       ...         |
  ------------------------------------------

--------------------------------------------------------------------

1> NIC hardware support Scatter/gather transfer

TX:

                tcp_poll()/udp_poll()/pkt_poll()/...(l3|l4)
                           /              \
                          /                \
devif_poll_[l3|l4]_connections()     devif_iob_send() (nocopy:udp/icmp/...)
           /                                   \      (copy:tcp)
          /                                     \
  devif_iob_poll("NIC"_txpoll)                callback() // "NIC"_txpoll
                                                  |
                            dev->d_iob:           |
                                                ---------------         ---------------
                             io_data       iob1 |  |          |    iob3 |  |          |
                                    \           ---------------         ---------------
                                  ---------------  |       --------------- |
                             iob0 |  |          |  |  iob2 |  |          | |
                                  ---------------  |       --------------- |
                                     \             |          /           /
                                        \          |       /           /
                                   ----------------------------------------------
                    NICs io vector |    |    |    |    |    |    |    |    |    |
                                   ----------------------------------------------

RX:

  [tcp|udp|icmp|...]ipv[4|6]_data_handler()(iob_concat/append to readahead)
                    |
                    |
      [tcp|udp|icmp|...]_ipv[4|6]_in()/...
                    |
                    |
          pkt/ipv[4/6]_input()/...
                    |
                    |
     NICs io vector receive(iov_base to each iobs)

--------------------------------------------------------------------

2> CONFIG_IOB_BUFSIZE is greater than MTU:

TX:

"(CONFIG_IOB_BUFSIZE) > (MAX_NETDEV_PKTSIZE + CONFIG_NET_GUARDSIZE + CONFIG_NET_LL_GUARDSIZE)"

                tcp_poll()/udp_poll()/pkt_poll()/...(l3|l4)
                           /              \
                          /                \
devif_poll_[l3|l4]_connections()     devif_iob_send() (nocopy:udp/icmp/...)
           /                                   \      (copy:tcp)
          /                                     \
  devif_iob_poll("NIC"_txpoll)                callback() // "NIC"_txpoll
                                                  |
                                             "NIC"_send()
                          (dev->d_iob->io_data[CONFIG_NET_LL_GUARDSIZE - NET_LL_HDRLEN(dev)])

RX:

  [tcp|udp|icmp|...]ipv[4|6]_data_handler()(iob_concat/append to readahead)
                    |
                    |
      [tcp|udp|icmp|...]_ipv[4|6]_in()/...
                    |
                    |
          pkt/ipv[4/6]_input()/...
                    |
                    |
     NICs io vector receive(iov_base to io_data)

--------------------------------------------------------------------

3> Compatible with all old flat buffer NICs

TX:
                tcp_poll()/udp_poll()/pkt_poll()/...(l3|l4)
                           /              \
                          /                \
devif_poll_[l3|l4]_connections()     devif_iob_send() (nocopy:udp/icmp/...)
           /                                   \      (copy:tcp)
          /                                     \
  devif_iob_poll(devif_poll_callback())  devif_poll_callback() /* new interface, gather iobs to flat buffer */
       /                                           \
      /                                             \
 devif_poll("NIC"_txpoll)                     "NIC"_send()(dev->d_buf)

RX:

  [tcp|udp|icmp|...]ipv[4|6]_data_handler()(iob_concat/append to readahead)
                    |
                    |
      [tcp|udp|icmp|...]_ipv[4|6]_in()/...
                    |
                    |
               netdev_input()  /* new interface, Scatter/gather flat/iob buffer */
                    |
                    |
          pkt/ipv[4|6]_input()/...
                    |
                    |
    NICs io vector receive(Orignal flat buffer)

3. Iperf passthrough on NuttX simulator:

  -------------------------------------------------
  |  Protocol      | Server | Client |            |
  |-----------------------------------------------|
  |  TCP           |  813   |   834  |  Mbits/sec |
  |  TCP(Offload)  | 1720   |  1100  |  Mbits/sec |
  |  UDP           |   22   |   757  |  Mbits/sec |
  |  UDP(Offload)  |   25   |  1250  |  Mbits/sec |
  -------------------------------------------------

Signed-off-by: chao an <anchao@xiaomi.com>
2022-12-03 11:47:04 +08:00
chao.an
162fcd10ca net: cleanup pvconn reference to avoid confuse
More reference:
https://github.com/apache/incubator-nuttx/pull/5252
https://github.com/apache/incubator-nuttx/pull/5434

Signed-off-by: chao.an <anchao@xiaomi.com>
2022-08-26 20:58:11 +08:00
Xiang Xiao
ba9486de4a iob: Remove iob_user_e enum and related code
since it is impossible to track producer and consumer
correctly if TCP/IP stack pass IOB directly to netdev

Signed-off-by: Xiang Xiao <xiaoxiang@xiaomi.com>
2022-08-15 08:41:20 +03:00
chao.an
8f63596063 net/tcp: replace the common connect prologue
Signed-off-by: chao.an <anchao@xiaomi.com>
2022-02-10 15:04:33 -03:00
YAMAMOTO Takashi
4878b7729c tcp: simplify readahead
Do not bother to preserve segment boundaries in the tcp
readahead queues.

* Avoid wasting the tail IOB space for each segments.
  Instead, pack the newly received data into the tail space
  of the last IOB. Also, advertise the tail space as
  a part of the window.

* Use IOB chain directly. Eliminate IOB queue overhead.

* Allow to accept only a part of a segment.

* This change improves the memory efficiency.
  And probably more importantly, allows less-confusing
  recv window advertisement behavior.
  Previously, even when we advertise N bytes window,
  we often couldn't actually accept N bytes. Depending on
  the segment sizes and IOB configurations, it was causing
  segment drops.
  Also, the previous code was moving the right edge of the
  window back and forth too often, even when nothing in
  the system was competing on the IOBs. Shrinking the
  window that way is a kinda well known recipe to confuse
  the peer stack.
2021-06-30 06:22:14 +09:00
YAMAMOTO Takashi
022a2490d1 tcp: Change the way to advance rcvseq
* Move the code to advance rcvseq for user data from tcp_input
  to receive handlers.
  Motivation: allow partial ack.

* If we drop a segment, ignore FIN as well. Note than tcp FIN bit is
  logically after the user data in the same segment.
2021-06-30 06:22:14 +09:00
YAMAMOTO Takashi
64676641cb tcp_datahandler: try throttled=false on iob_trycopyin failure as well
I assume this was just an oversight because I couldn't
find any obvious reason to special-case only the first IOB.

The commit message of the original commit is cited below.

```
commit bf21056001
Author: chao.an <anchao@xiaomi.com>
Date:   Fri Nov 27 09:50:38 2020 +0800

    net/tcp: fallback to unthrottle pool to avoid deadlock

    Add a fallback mechanism to ensure that there are still available
    iobs for an free connection, Guarantees all connections will have
    a minimum threshold iob to keep the connection not be hanged.

    Change-Id: I59bed98d135ccd1f16264b9ccacdd1b0d91261de
    Signed-off-by: chao.an <anchao@xiaomi.com>
```
2021-06-15 01:17:38 -05:00
YAMAMOTO Takashi
7ac6c0a8de tcp_data_event: Add a comment 2021-05-31 01:37:51 -05:00
YAMAMOTO Takashi
92328792fd tcp_data_event: Fix an indent 2021-05-31 01:37:51 -05:00
Alin Jerpelea
37d5c1b0d9 net: Author Gregory Nutt: update licenses to Apache
Gregory Nutt has submitted the SGA and we can migrate the licenses
 to Apache.

Signed-off-by: Alin Jerpelea <alin.jerpelea@sony.com>
2021-02-20 00:38:18 -08:00
chao.an
4305718691 net/tcp: fix nxstyle warnings
Change-Id: Id885fefc80caa12806c73778e632c8b0c5888be0
Signed-off-by: chao.an <anchao@xiaomi.com>
2020-11-28 00:03:47 -06:00
chao.an
bf21056001 net/tcp: fallback to unthrottle pool to avoid deadlock
Add a fallback mechanism to ensure that there are still available
iobs for an free connection, Guarantees all connections will have
a minimum threshold iob to keep the connection not be hanged.

Change-Id: I59bed98d135ccd1f16264b9ccacdd1b0d91261de
Signed-off-by: chao.an <anchao@xiaomi.com>
2020-11-28 00:03:47 -06:00
Xiang Xiao
346336bb9e Make the read ahead buffer unselectable
Here is the email loop talk about why it is better to remove the option:
https://groups.google.com/forum/#!topic/nuttx/AaNkS7oU6R0

Change-Id: Ib66c037752149ad4b2787ef447f966c77aa12aad
Signed-off-by: Xiang Xiao <xiaoxiang@xiaomi.com>
2020-01-11 08:24:49 -06:00
Xiang Xiao
6a3c2aded6 Fix wait loop and void cast (#24)
* Simplify EINTR/ECANCEL error handling

1. Add semaphore uninterruptible wait function
2 .Replace semaphore wait loop with a single uninterruptible wait
3. Replace all sem_xxx to nxsem_xxx

* Unify the void cast usage

1. Remove void cast for function because many place ignore the returned value witout cast
2. Replace void cast for variable with UNUSED macro
2020-01-02 10:54:43 -06:00
Xiang Xiao
90c52e6f8f Squashed commit of the following:
Author: Gregory Nutt <gnutt@nuttx.org>

    Run all .h and .c files modified in last PR through nxstyle.

Author: Xiang Xiao <xiaoxiang@xiaomi.com>

    Net cleanup (#17)

    * Fix the semaphore usage issue found in tcp/udp

    1. The count semaphore need disable priority inheritance
    2. Loop again if net_lockedwait return -EINTR
    3. Call nxsem_trywait to avoid the race condition
    4. Call nxsem_post instead of sem_post

    * Put the work notifier into free list to avoid the heap fragment in the long run.  Since the allocation strategy is encapsulated internally, we can even refine the implementation later.

    * Network stack shouldn't allocate memory in the poll implementation to avoid the heap fragment in the long run, other modification include:

    1. Select MM_IOB automatically since ICMP[v6] socket can't work without the read ahead buffer
    2. Remove the net lock since xxx_callback_free already do the same thing
    3. TCP/UDP poll should work even the read ahead buffer isn't enabled at all

    * Add NET_ prefix for UDP_NOTIFIER and TCP_NOTIFIER option to align with other UDP/TCP option convention

    * Remove the unused _SF_[IDLE|ACCEPT|SEND|RECV|MASK] flags since there are code to set/clear these flags, but nobody check them.
2019-12-31 09:26:14 -06:00
Anthony Merlino
70404ed0dc Merged in antmerlino/nuttx/iobinstrumentation (pull request #1001)
Iobinstrumentation

* mm/iob: Introduces producer/consumer id to every iob call. This is so that the calls can be instrumented to monitor the IOB resources.

* iob instrumentation - Merges producer/consumer enumeration for simpler IOB user.

* fs/procfs: Starts adding support for /proc/iobinfo

* fs/procfs: Finishes first pass of simple IOB user stastics and /proc/iobinfo entry

Approved-by: Gregory Nutt <gnutt@nuttx.org>
2019-08-16 22:42:25 +00:00
Gregory Nutt
f6b00e1966 tools/nxstyle.c: Fix logic error that prevent detecion of '/' and '/=' as operators. net/: Minor updates resulting from testing tools/nxstyle. 2019-03-11 12:48:39 -06:00
Gregory Nutt
09d5d05b95 net/TCP: Extend the TCP notification logic logic so that it will also report loss of connection events. 2018-09-09 17:32:10 -06:00
Gregory Nutt
28f73bd928 net/tcp and udp: Add logic to signal events when TCP or UDP read-ahead data is buffered.
Squashed commit of the following:

    net/tcp:  Add signal notification for the case when UDP read-ahead data is buffered.  This is basically of clone of the TCP notification logic with naming adapted for UDP.

    net/tcp:  Add signal notification for the case when TCP read-ahead data is buffered.
2018-09-09 09:21:39 -06:00
Masayuki Ishikawa
ac5b2ea049 Merged in masayuki2009/nuttx.nuttx/fix_tcp_statistics (pull request #703)
net/tcp: Remove g_netstats.tcp.syndrop++ from tcp_data_event()

Signed-off-by: Masayuki Ishikawa <Masayuki.Ishikawa@jp.sony.com>

Approved-by: GregoryN <gnutt@nuttx.org>
2018-08-03 01:25:53 +00:00
Gregory Nutt
7cf88d7dbd Make sure that labeling is used consistently in all function headers. 2018-02-01 10:00:02 -06:00
Gregory Nutt
8ffb103adb networking: IGMP: Remove special support for interrupt level processing (there is none) and fix some timer cancellation logic. In many files, correct comments. There is no interrupt level processing in the networking layer. 2017-09-02 10:27:03 -06:00
Gregory Nutt
d40ee8e79d Networking: Start the network monitor for a socket when a TCP socket is dup'ed. 2017-08-29 10:53:04 -06:00
Gregory Nutt
9db65dea78 Networking: TCP disconnection callbacks are not retained in a list. This will support mutiple callbacks per lower-level TCP connection structure. That is necessary for the cae where a socket is dup'ed and shares the same lower-level connection structure. NOTE: There still needs to be a call to tcp_start_monitor() when the socket is dup'ed. 2017-08-29 10:38:01 -06:00
Gregory Nutt
2043e1a114 IOBs: Move from driver/iob to a better location in mm/iob 2017-05-09 07:35:30 -06:00
Gregory Nutt
d5207efb5a Be consistent... Use Name: consistent in function headers vs Function: 2017-04-21 16:33:14 -06:00
Gregory Nutt
bfb93338f6 Move net/iob to drivers/iob so that the I/O buffering feature can be available to other drivers when networking is disabled. 2017-04-20 16:08:49 -06:00
Gregory Nutt
2a751068e6 Without lowsyslog() *llerr() is not useful. Eliminate and replace with *err(). 2016-06-20 12:44:38 -06:00
Gregory Nutt
43eb04bb8f Without lowsyslog() *llinfo() is not useful. Eliminate and replace with *info(). 2016-06-20 11:59:15 -06:00
Gregory Nutt
e99301d7c2 Rename *lldbg to *llerr 2016-06-11 14:55:27 -06:00
Gregory Nutt
fc3540cffe Replace all occurrences of vdbg with vinfo 2016-06-11 11:59:51 -06:00
Gregory Nutt
4f208600aa Replace confusing references to uIP with just 'the network' 2016-05-30 09:31:44 -06:00
Gregory Nutt
2c95fef501 Remove some empty code section comments 2016-02-26 07:35:55 -06:00
Gregory Nutt
9c66bde5b0 Fix typo in pre-processor command noted by Pierre-noel Bouteville. Also move # of pre-processior command to column 1 2015-09-05 09:10:48 -06:00
Gregory Nutt
8b029fbbee TCP networking: Hook the network monitor into the device event notification logic 2015-05-30 11:29:47 -06:00
Gregory Nutt
e81f279315 Networking: Modify event list handling: Now there are two event lists each device structure: (1) One is for ARP and ICMP data related evetns, the other is for device related events. Callback allocation/free routines no accept a device paramter as well as a list: If the device paramter is added, then the callback goes into both the connection-related liast AND the device event list. Thus each socket type can received both custom data-related events as well as common device related events. 2015-05-28 12:01:38 -06:00
Gregory Nutt
c4bd6f52b5 Networking: The are issues with the TCP write-ahead buffering if CONFIG_NET_NOINTS is enabled: There is a possibility of deadlocks in certain timing conditions. I have not seen this with the Tiva driver that I have been users but other people claim to see the issue on other platforms. Certainly it is a logic error: The network should never wait for TCP read-ahead buffering space to be available. It should drop the packets immediately.
This was fixed by duplicating most of the IOB interfaces:  The versions that waited are still present (like iob_alloc()), but now there are non-waiting verisons of the same interfaces (like iob_tryalloc()).  The TCP read-ahead logic now uses only these non-waiting interfaces.
2015-01-27 21:23:42 -06:00
Gregory Nutt
73f3ecf7e2 NET: Rename network interrupt event flags more appropriately: TCP_, UDP_, ICMP_, or PKT_ vs UIP_ 2014-07-06 17:22:02 -06:00
Gregory Nutt
2d52d70d4c NET: Move private definitions from include/nuttx/net/tcp to net/tcp/tcp.h 2014-07-06 12:34:27 -06:00
Gregory Nutt
0bb153b8cb Remove all inclusion of uip.h 2014-07-04 16:58:22 -06:00
Gregory Nutt
85f42044df NET: Rename uip_dataevent to tcp_data_event 2014-06-30 19:17:22 -06:00
Gregory Nutt
8e706eb4ff Rename many functions in net/devif from uip_* to devif_* 2014-06-28 18:36:09 -06:00
Gregory Nutt
5790c94ba3 Rename net/uip to net/devif. Rename uip/uip.h to devif/devif.h 2014-06-28 18:07:02 -06:00
Gregory Nutt
fce2a79abd Rename uip_driver_s net_driver_s 2014-06-27 16:48:12 -06:00
Gregory Nutt
e1091251e6 NET: Move statistcs from uip.h to new netstats.h to remove nasty circular inclusion problem. 2014-06-26 09:32:39 -06:00
Gregory Nutt
04985d6d1e Clean up all TCP-related naming 2014-06-24 18:12:49 -06:00
Gregory Nutt
fabcb6d37e TCP Read-Ahead: Convert to use I/O buffer chains 2014-06-24 15:38:00 -06:00
Gregory Nutt
5d1f8180d4 Move the remaining files from include/nuttx/net/uip to include/nuttx/net; Rename *_internal.h header files in net/ to just *.h 2014-06-24 10:14:15 -06:00
Gregory Nutt
37646044ac Move include/nuttx/net/uip/uip-arch.h to include/nuttx/net/netdev.h 2014-06-24 09:28:44 -06:00
Gregory Nutt
626469e30c Move include/nuttx/net/uipopt.h to include/nuttx/net/netconfig.h 2014-06-24 08:53:28 -06:00