Commit Graph

25 Commits

Author SHA1 Message Date
chao an
34d2cde8a8 net/l2/l3/l4: add support of iob offload
1. Add new config CONFIG_NET_LL_GUARDSIZE to isolation of l2 stack,
   which will benefit l3(IP) layer for multi-MAC(l2) implementation,
   especially in some NICs such as celluler net driver.

new configuration options: CONFIG_NET_LL_GUARDSIZE

CONFIG_NET_LL_GUARDSIZE will reserved l2 buffer header size of
network buffer to isolate the L2/L3 (MAC/IP) data on network layer,
which will be beneficial to L3 network layer protocol transparent
transmission and forwarding

------------------------------------------------------------
Layout of frist iob entry:

        iob_data (aligned by CONFIG_IOB_ALIGNMENT)
            |
            |                  io_offset(CONFIG_NET_LL_GUARDSIZE)
            |                                |
            -------------------------------------------------
      iob   |            Reserved            |    io_len    |
            -------------------------------------------------

-------------------------------------------------------------
Layout of different NICs implementation:

        iob_data (aligned by CONFIG_IOB_ALIGNMENT)
            |
            |                 io_offset(CONFIG_NET_LL_GUARDSIZE)
            |                                |
            -------------------------------------------------
 Ethernet   |       Reserved    | ETH_HDRLEN |    io_len    |
            ---------------------------------|---------------
 8021Q      |   Reserved  | ETH_8021Q_HDRLEN |    io_len    |
            ---------------------------------|---------------
 ipforward  |            Reserved            |    io_len    |
            -------------------------------------------------

--------------------------------------------------------------------

2. Support iob offload to l2 driver to avoid unnecessary memory copy

Support send/receive iob vectors directly between the NICs and l3/l4
stack to avoid unnecessary memory copies, especially on hardware that
supports Scatter/gather, which can greatly improve performance.

new interface to support iob offload:

  ------------------------------------------
  |    IOB version     |     original      |
  |----------------------------------------|
  |  devif_iob_poll()  |   devif_poll()    |
  |       ...          |       ...         |
  ------------------------------------------

--------------------------------------------------------------------

1> NIC hardware support Scatter/gather transfer

TX:

                tcp_poll()/udp_poll()/pkt_poll()/...(l3|l4)
                           /              \
                          /                \
devif_poll_[l3|l4]_connections()     devif_iob_send() (nocopy:udp/icmp/...)
           /                                   \      (copy:tcp)
          /                                     \
  devif_iob_poll("NIC"_txpoll)                callback() // "NIC"_txpoll
                                                  |
                            dev->d_iob:           |
                                                ---------------         ---------------
                             io_data       iob1 |  |          |    iob3 |  |          |
                                    \           ---------------         ---------------
                                  ---------------  |       --------------- |
                             iob0 |  |          |  |  iob2 |  |          | |
                                  ---------------  |       --------------- |
                                     \             |          /           /
                                        \          |       /           /
                                   ----------------------------------------------
                    NICs io vector |    |    |    |    |    |    |    |    |    |
                                   ----------------------------------------------

RX:

  [tcp|udp|icmp|...]ipv[4|6]_data_handler()(iob_concat/append to readahead)
                    |
                    |
      [tcp|udp|icmp|...]_ipv[4|6]_in()/...
                    |
                    |
          pkt/ipv[4/6]_input()/...
                    |
                    |
     NICs io vector receive(iov_base to each iobs)

--------------------------------------------------------------------

2> CONFIG_IOB_BUFSIZE is greater than MTU:

TX:

"(CONFIG_IOB_BUFSIZE) > (MAX_NETDEV_PKTSIZE + CONFIG_NET_GUARDSIZE + CONFIG_NET_LL_GUARDSIZE)"

                tcp_poll()/udp_poll()/pkt_poll()/...(l3|l4)
                           /              \
                          /                \
devif_poll_[l3|l4]_connections()     devif_iob_send() (nocopy:udp/icmp/...)
           /                                   \      (copy:tcp)
          /                                     \
  devif_iob_poll("NIC"_txpoll)                callback() // "NIC"_txpoll
                                                  |
                                             "NIC"_send()
                          (dev->d_iob->io_data[CONFIG_NET_LL_GUARDSIZE - NET_LL_HDRLEN(dev)])

RX:

  [tcp|udp|icmp|...]ipv[4|6]_data_handler()(iob_concat/append to readahead)
                    |
                    |
      [tcp|udp|icmp|...]_ipv[4|6]_in()/...
                    |
                    |
          pkt/ipv[4/6]_input()/...
                    |
                    |
     NICs io vector receive(iov_base to io_data)

--------------------------------------------------------------------

3> Compatible with all old flat buffer NICs

TX:
                tcp_poll()/udp_poll()/pkt_poll()/...(l3|l4)
                           /              \
                          /                \
devif_poll_[l3|l4]_connections()     devif_iob_send() (nocopy:udp/icmp/...)
           /                                   \      (copy:tcp)
          /                                     \
  devif_iob_poll(devif_poll_callback())  devif_poll_callback() /* new interface, gather iobs to flat buffer */
       /                                           \
      /                                             \
 devif_poll("NIC"_txpoll)                     "NIC"_send()(dev->d_buf)

RX:

  [tcp|udp|icmp|...]ipv[4|6]_data_handler()(iob_concat/append to readahead)
                    |
                    |
      [tcp|udp|icmp|...]_ipv[4|6]_in()/...
                    |
                    |
               netdev_input()  /* new interface, Scatter/gather flat/iob buffer */
                    |
                    |
          pkt/ipv[4|6]_input()/...
                    |
                    |
    NICs io vector receive(Orignal flat buffer)

3. Iperf passthrough on NuttX simulator:

  -------------------------------------------------
  |  Protocol      | Server | Client |            |
  |-----------------------------------------------|
  |  TCP           |  813   |   834  |  Mbits/sec |
  |  TCP(Offload)  | 1720   |  1100  |  Mbits/sec |
  |  UDP           |   22   |   757  |  Mbits/sec |
  |  UDP(Offload)  |   25   |  1250  |  Mbits/sec |
  -------------------------------------------------

Signed-off-by: chao an <anchao@xiaomi.com>
2022-12-03 11:47:04 +08:00
liangchaozhong
0ec4b0a149 tcp: recv returns 0 without set errno
Issue:
recv return 0  means peer side has already closed the connection according to
man page's description.
0 is returned without set errno when TCP rx timeout happens with current
design.

Solution:
return result instead of ir_result when ir_result is 0, then -1 will be
returned with errno set to EAGAIN for tcp rx buffer empty case.

Signed-off-by: liangchaozhong <liangchaozhong@xiaomi.com>
2022-11-23 23:23:44 +08:00
chao an
ae4b5b5f50 net/tcp: TCP_WAITALL should use state flag
The feature of MSG_WAITALL is broken since the wrong flag is used

Signed-off-by: chao an <anchao@xiaomi.com>
2022-11-21 17:21:10 +08:00
chao an
a8d3286258 net: move device buffer define to common header
Signed-off-by: chao an <anchao@xiaomi.com>
2022-10-28 00:32:16 -04:00
anjiahao
5724c6b2e4 sem:remove sem default protocl
Signed-off-by: anjiahao <anjiahao@xiaomi.com>
2022-10-22 14:50:48 +08:00
Fotis Panagiotopoulos
143d1322ea Added handling of MSG_WAITALL flag in TCP recv. 2022-10-13 18:22:05 +08:00
zhanghongyu
9bff29d7e7 udp: add IPVx_PKTINFO related support
Signed-off-by: zhanghongyu <zhanghongyu@xiaomi.com>
2022-09-07 10:49:47 +08:00
chao.an
162fcd10ca net: cleanup pvconn reference to avoid confuse
More reference:
https://github.com/apache/incubator-nuttx/pull/5252
https://github.com/apache/incubator-nuttx/pull/5434

Signed-off-by: chao.an <anchao@xiaomi.com>
2022-08-26 20:58:11 +08:00
Xiang Xiao
ba9486de4a iob: Remove iob_user_e enum and related code
since it is impossible to track producer and consumer
correctly if TCP/IP stack pass IOB directly to netdev

Signed-off-by: Xiang Xiao <xiaoxiang@xiaomi.com>
2022-08-15 08:41:20 +03:00
chao.an
8fb2468785 net/tcp: remove the socket hook reference from netdev callback
Signed-off-by: chao.an <anchao@xiaomi.com>
2022-02-10 15:04:33 -03:00
chao.an
3fce144aeb net/inet: move recv/send timeout into socket_conn_s
Signed-off-by: chao.an <anchao@xiaomi.com>
2022-02-10 15:04:33 -03:00
chao.an
99cde13a11 net/inet: move socket flags into socket_conn_s
Signed-off-by: chao.an <anchao@xiaomi.com>
2022-02-10 15:04:33 -03:00
Petro Karashchenko
08043fb5bc net: unify FAR keyword usage for all net buffer memory mapped buffers
Signed-off-by: Petro Karashchenko <petro.karashchenko@gmail.com>
2022-01-20 01:42:56 +08:00
YAMAMOTO Takashi
4878b7729c tcp: simplify readahead
Do not bother to preserve segment boundaries in the tcp
readahead queues.

* Avoid wasting the tail IOB space for each segments.
  Instead, pack the newly received data into the tail space
  of the last IOB. Also, advertise the tail space as
  a part of the window.

* Use IOB chain directly. Eliminate IOB queue overhead.

* Allow to accept only a part of a segment.

* This change improves the memory efficiency.
  And probably more importantly, allows less-confusing
  recv window advertisement behavior.
  Previously, even when we advertise N bytes window,
  we often couldn't actually accept N bytes. Depending on
  the segment sizes and IOB configurations, it was causing
  segment drops.
  Also, the previous code was moving the right edge of the
  window back and forth too often, even when nothing in
  the system was competing on the IOBs. Shrinking the
  window that way is a kinda well known recipe to confuse
  the peer stack.
2021-06-30 06:22:14 +09:00
YAMAMOTO Takashi
022a2490d1 tcp: Change the way to advance rcvseq
* Move the code to advance rcvseq for user data from tcp_input
  to receive handlers.
  Motivation: allow partial ack.

* If we drop a segment, ignore FIN as well. Note than tcp FIN bit is
  logically after the user data in the same segment.
2021-06-30 06:22:14 +09:00
YAMAMOTO Takashi
14ec75e7fc tcp: window update improvements
* Fixes the case where the window was small but not zero.

* tcp_recvfrom: Remove tcp_ackhandler. Instead, simply schedule TX for
  a possible window update and make tcp_appsend decide.

* Replace rcv_wnd (the last advertized window size value) with
  rcv_adv. (the window edge sequence number advertized to the peer)
  rcv_wnd was complicated to deal with because its base (rcvseq) is
  also moving.

* tcp_appsend: Send a window update even if there are no other reasons
  to send an ack.
  Namely, send an update if it increases the window by
    * 2 * mss
    * or the half of the max possible window size
2021-06-13 21:20:24 -05:00
YAMAMOTO Takashi
0c606ecb8e psock_tcp_recvfrom: Add a comment about window updates 2021-05-31 01:37:51 -05:00
chao.an
5e9e50991c net/tcp: send the ack on nonblock mode
Signed-off-by: chao.an <anchao@xiaomi.com>
2020-12-18 14:16:11 +09:00
chao.an
8d0118569c [Performance]net/tcp: send the ACK in time after obtain ahead buffer from iobs
Request the TCP ACK to estimate the receive window after handle
any data already buffered in a read-ahead buffer.

Change-Id: Id998a1125dd2991d73ba4bef081ddcb7adea4f0d
Signed-off-by: chao.an <anchao@xiaomi.com>
2020-12-10 12:23:47 +09:00
Gregory Nutt
57bc329aac Run nxstyle all .c and .h files modified by PR. 2020-05-17 14:01:00 -03:00
Gregory Nutt
a569006fd8 sched/: Make more naming consistent
Rename various functions per the quidelines of https://cwiki.apache.org/confluence/display/NUTTX/Naming+of+OS+Internal+Functions

    nxsem_setprotocol -> nxsem_set_protocol
    nxsem_getprotocol -> nxsem_get_protocol
    nxsem_getvalue -> nxsem_get_value
2020-05-17 14:01:00 -03:00
patacongo
861efdf8a3
net/: Whenever the network initializes an IPv4 address, it must clear sin_zero. 2020-02-25 15:53:39 +01:00
chao.an
c65d8e6a23 net/socket: add MSG_DONTWAIT support
MSG_DONTWAIT (since Linux 2.2)
  Enables nonblocking operation; if the operation would block, the
  call fails with the error EAGAIN or EWOULDBLOCK. This provides
  similar behavior to setting the O_NONBLOCK flag (via the fcntl(2)
  F_SETFL operation), but differs in that MSG_DONTWAIT is a per-call
  option, whereas O_NONBLOCK is a setting on the open file description
  (see open(2)), which will affect all threads in the calling process
  and as well as other processes that hold file descriptors referring
  to the same open file description.
2020-02-19 12:21:28 -06:00
Greg Nutt
6b413ec241 net/tcp: Fix errors found in build testing.
Author: Gregory Nutt <gnutt@nuttx.org>

    net/tcp: Fix errors found in build testing.

    Recent re-organization moved some functions from net/inet to net/tcp and net/udp.  This include references to nxsem_wait(), SEM_PRIO_NONE, and other internal NuttX semaphore functions.  These all failed to compile because nuttx/semaphore.h was not included in any of the files.
2020-01-22 12:29:26 -03:00
Xiang Xiao
e75b5e9d86 net/tcp and udp: Move tcp/udp recvfrom into tcp/udp folder
Move tcp/udp recvfrom into tcp/udp folder and remove inet_recvfrom.c
2020-01-21 08:30:39 -06:00