Flow and Retransmission Control Protocol

From Ouroboros
Revision as of 13:50, 17 May 2026 by Dimitri (talk | contribs)
Jump to navigation Jump to search

FRCP - Flow and Retransmission Control Protocol

FRCP runs end-to-end between two peers over a flow. It delivers reliability, in-order delivery, flow control, and liveness. Congestion Control (CC) is not in FRCP - that lives in the IPC Process (IPCP) Congestion Avoidance (CA) policies, orthogonal to FRCP. Flow allocation, naming, and IPCP lifecycle are handled by the IPC Resource Manager daemon (IRMd).

FRCT (Flow and Retransmission Control Task) is the libouroboros implementation of FRCP; the task lives in src/lib/frct.c. The remainder of this document describes the FRCP wire protocol and the behaviour FRCT realises. Code symbols retain the FRCT_ prefix (FRCT_DATA, FRCT_RXM, ...) because they belong to the implementing task; this document references them verbatim.

The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 (Best Current Practice; RFC 2119, RFC 8174) when, and only when, they appear in all capitals.


Notation

  u32, u8       Unsigned 32-bit / 8-bit integers (kernel-C style).
  ns            Nanoseconds.

Modular sequence-number comparators (32-bit, modulo 2^32):

    before(a, b)  ==  (int32_t)(a - b) < 0
    after(a, b)   ==  before(b, a)

Used throughout for ackno / seqno ordering checks.

Round-Trip Time (RTT) abbreviations used throughout:

    SRTT          Smoothed RTT estimate (RFC 6298).
    mdev          Mean deviation of RTT (Linux variance estimator).
    EWMA          Exponentially Weighted Moving Average.
    RTO           Retransmission Timeout, max(RTO_MIN,
                  srtt + (mdev << MDEV_MUL)).

Timer-bound symbols t_a (a-timer, ACK delay) and t_r (r-timer, retransmission window) are defined in Section 8; t_mpl (Maximum Packet Lifetime) is introduced in Section 2.1 (the inact field) with heritage in Section 15.

Wire-format diagrams follow the IETF convention: bit 0 is the leftmost (most significant) bit and fields are in network byte order unless stated otherwise.


Table of Contents

  1. Wire format
     1.1. PCI header
     1.2. Flag bits
     1.3. SACK payload
     1.4. RTTP payload
     1.5. Stream PCI extension
  2. Per-flow state and service modes
     2.1. Per-flow state
     2.2. Service modes (orthogonal axes)
  3. Protocol parameters
  4. Sequence-number rotation (DRF)
  5. Send path
  6. Receive path
     6.1. Early-exit dispatch
     6.2. Locked main path
  7. Read path and reassembly
     7.1. Read path
     7.2. Fragmentation and reassembly
  8. Retransmission
  9. Pre-DRF NACK
 10. Cumulative + selective ACK
 11. Flow control
 12. RTT estimation
 13. Liveness (keepalive)
 14. Linger / teardown
 15. Heritage and adopted techniques
     15.1. Original to FRCP (no clean prior art)
     15.2. Not adopted
 16. Stream-mode flows
     16.1. Send
     16.2. Receive
     16.3. Read
     16.4. Flow control
     16.5. Security considerations
 17. References
     17.1. IETF documents
     17.2. Books and journal papers
     17.3. Source-code references


1. Wire format

1.1. PCI header

Fixed 16-octet base Protocol-Control Information (PCI) header prefixed to every FRCP packet (RFC convention: bit 0 leftmost, most-significant bit first). All multi-byte fields except hcs are in network byte order; hcs is an opaque 16-bit value that the receiver recomputes from the wire bytes and compares to the in-place pci->hcs read, so its on-wire byte order need only match between peers running compatible builds. DATA packets on stream-mode flows carry an additional 8-octet extension (see Section 1.5); SACK and RTTP carry their own payloads after the base PCI.

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |             flags             |              hcs              |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                            window                             |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                            seqno                              |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                            ackno                              |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                     payload (variable) ...
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  flags   - feature/type bitmap (see 1.2).
  hcs     - CRC-16-CCITT-FALSE Header Check Sequence (HCS) over
            flags + window + seqno + ackno (+ stream extension when
            present); the two octets of the hcs field itself are
            omitted from the CRC input.  Verified on receive before
            any flag-driven dispatch.
  window  - receiver-advertised right window edge (valid iff FC).
  seqno   - per-flow sequence number.
  ackno   - cumulative Acknowledgement (ACK) (valid iff ACK).

A single packet can simultaneously carry DATA + ACK + FC (Flow Control) + RXM (Retransmission) by ORing flag bits; the PCI multiplexes control on the same wire frame in the spirit of SCTP chunk bundling (RFC 9260 sec. 6.10) and QUIC frame multiplexing (RFC 9000 sec. 12.4). DATA-bearing packets carry the caller's payload after the PCI; SACK (Selective Acknowledgement) and RTTP (Round-Trip Time Probe) carry their own typed payloads after the PCI.

Optional framing (per-flow, see Section 2.2). On the wire, the order from inside out is:

    [   PCI + body          ]    -- the FRCP packet
    [   PCI + body + CRC-32 ]    -- CRC-32 covers the body only (PCI
                                    is in HCS); appended iff qs.ber
                                    == 0 on DATA, or on every SACK
                                    packet
    [ AEAD-wrap of above    ]    -- iff Authenticated Encryption
                                    with Associated Data (AEAD) is
                                    enabled
  - HCS in the PCI covers the header fields on every packet and is
    verified before any flag-driven dispatch.
  - The CRC-32 trailer (IEEE 802.3 / zlib reflected polynomial
    0xEDB88320, init 0xFFFFFFFF, xor-out 0xFFFFFFFF) covers the
    body on DATA when qs.ber == 0 and on every SACK packet; the
    trailer is written as a raw uint32_t (the same convention as
    hcs: opaque on the wire as long as both peers run compatible
    builds).  The PCI is not under the CRC (Cyclic Redundancy
    Check) because the HCS already protects it.  It is
    appended before AEAD encryption and therefore rides inside the
    AEAD wrap when both are active; the AEAD tag (~2^-128 forgery
    probability) dominates the CRC (~2^-32) for integrity in that
    mode but the CRC trailer is currently retained.
  - When encryption is enabled, the entire (possibly-CRC'd) FRCP
    packet is wrapped with AEAD inside the shared-memory packet
    buffer (spb, struct ssm_pk_buff); the packet grows by the AEAD
    overhead, namely a leading nonce / Initialization Vector (IV)
    of headsz bytes (crypt_get_ivsz) and a trailing authentication
    tag of tailsz bytes (crypt_get_tagsz).

Both CRC and AEAD are layered around the FRCP wire format and are not visible to the FRCP machinery itself.


1.2. Flag bits

Flag bits are numbered most-significant-bit first to match the wire diagram (bit numbering per Section 1.1; bit 0 is the MSB of the 16-bit flags field and lands at wire-position 0 in network byte order). Bits 13..15 are reserved and MUST be transmitted as zero.

    +------+--------+--------+----------------------------------------+
    | Bit  | Mask   | Name   | Meaning                                |
    +------+--------+--------+----------------------------------------+
    |   0  | 0x8000 | DATA   | Carries caller payload                 |
    |   1  | 0x4000 | DRF    | Data Run Flag: start of a fresh run    |
    |   2  | 0x2000 | ACK    | Acknowledgement: ackno field valid     |
    |   3  | 0x1000 | NACK   | Negative ACK; seqno = arrival_seqno-1  |
    |   4  | 0x0800 | FC     | Flow Control: window field valid (rwe) |
    |   5  | 0x0400 | RDVS   | Rendezvous probe (window-closed)       |
    |   6  | 0x0200 | FFGM   | First Fragment (role bit 0; see below) |
    |   7  | 0x0100 | LFGM   | Last Fragment (role bit 1; see below)  |
    |   8  | 0x0080 | RXM    | Retransmission                         |
    |   9  | 0x0040 | SACK   | Selective ACK block list in payload    |
    |  10  | 0x0020 | RTTP   | RTT Probe / echo (payload follows)     |
    |  11  | 0x0010 | KA     | Keepalive                              |
    |  12  | 0x0008 | FIN    | End-of-stream marker (stream mode)     |
    | 13-15|   --   |  --    | Reserved (MUST be zero)                |
    +------+--------+--------+----------------------------------------+

The (FFGM, LFGM) pair encodes the fragment role of a DATA-bearing Service Data Unit (SDU), SCTP-style begin/end flags (RFC 9260 sec. 3.3.1):

    +-----------+-------------------------------------------------+
    | FFGM LFGM | Role                                            |
    +-----------+-------------------------------------------------+
    |   1   1   | Sole / un-fragmented SDU (begin AND end)        |
    |   1   0   | First fragment of a multi-fragment SDU          |
    |   0   0   | Middle fragment                                 |
    |   0   1   | Last fragment                                   |
    +-----------+-------------------------------------------------+

Each fragment is carried in its own FRCP packet with its own seqno; FRTX (the FRCT Retransmission service mode, see Section 2.2) recovers individual fragments via the normal Retransmission Timeout (RTO) / SACK / Recent Acknowledgement (RACK, RFC 8985) path. The receiver reassembles the SDU at consume time once the contiguous [FIRST .. LAST] run has fully arrived. On non-DATA packets the role bits are unused and MUST be transmitted as zero.

In stream mode (qos.service == SVC_STREAM, see Section 16) there are no SDU boundaries to encode, so FFGM and LFGM are unused and MUST be transmitted as zero. End-of-stream uses a dedicated bit (FIN, bit 12) carried on a 0-byte DATA packet, emitted at write-half close (fccntl to FLOWFRDONLY), during linger drain, and at flow_dealloc; emission is idempotent (first call wins). After contiguous delivery of the FIN-bearing slot, the receiver latches byte_fin at the FIN's start offset; flow_read returns 0 (end-of-file, EOF) once buffered bytes have been drained up to byte_fin. Per-byte position is carried by the [start, end) extension (Section 1.5).


1.3. SACK payload

A SACK packet has the FRCT_ACK | FRCT_FC | FRCT_SACK flag bits set (bit numbering per Section 1.1). Following the 16-octet PCI, the payload is a 2-octet block count (network byte order), 2 octets of padding to 4-byte align the block list, then n_blocks pairs of 32-bit start/end seqnos describing *present* (received) ranges above the cumulative ACK.

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |           n_blocks            |        padding (2 octets)     |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                           start[0]                            |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                            end[0]                             |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                           start[1]                            |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                       ... n_blocks pairs total ...

n_blocks <= SACK_MAX_BLOCKS (2048). The per-flow effective cap is further bounded by (frag_mtu - PCI - 4) / 8 blocks per packet; SACK packets carry no stream extension, so PCI here is the 16-octet base header even on stream-mode flows.

Wire invariant: every block produced by the receiver, except an optional leading Duplicate SACK (D-SACK) block as described below, describes a range strictly above the cumulative ACK carried in the PCI ackno field (after(start[i], ackno)). This makes the D-SACK convention below unambiguous; the receiver-side builder MUST preserve it.

Duplicate SACK (D-SACK, RFC 2883) is signalled in-band: no flag bit, no extra framing. Modular seqno arithmetic uses the before() / after() comparators defined in the Notation block. Block[0] carries a D-SACK report when either:

  case 1 (RFC 2883 sec. 4.1.1, full duplicate):
      before(blocks[0].start, ackno) and ackno - blocks[0].start is
      within MAX_DSACK_LAG (== RQ_SIZE).  A single duplicate seqno
      observed below the cumulative ACK.
  case 2 (RFC 2883 sec. 4.1.2, partial duplicate):
      blocks[0] is a sub-range of some blocks[i>0] (not exactly
      equal).  Reports a duplicate of an in-window seqno that the
      same packet's remaining SACK blocks already describe as
      received.

Senders that do not implement D-SACK process block[0] through the normal SACK-mark loop and the existing clamp-and-skip path makes case-1 a no-op (start < snd_cr.lwe clamps to snd_cr.lwe, the inner loop then skips k == snd_cr.lwe) and case-2 idempotent (same slots NULL'd twice). D-SACK-aware senders feed the report into the RACK reo_wnd_mult scaler (RFC 8985 sec. 6.2 step 4): bump on receipt (cap 20), halve once per 16 cumulatively-ACK'd seqnos since the most recent D-SACK arrival or halve event, reset to 1 on an RTO timer fire at the head-of-line. D-SACK alone never enters NewReno-careful recovery (see Section 8); only non-D-SACK blocks count as gaps.


1.4. RTTP payload

An RTTP (Round-Trip Time Probe) packet has only the FRCT_RTTP flag set (bit numbering per Section 1.1). Following the 16-octet PCI, the payload is 24 octets (packed):

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                          probe_id                             |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                          echo_id                              |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                                                               |
    +                  nonce (16 octets, echoed verbatim)           +
    |                                                               |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  probe_id - sender counter, 0 on reply, 0 reserved.
  echo_id  - peer's probe_id, 0 on outbound probe.
  nonce    - random, echoed unmodified, memcmp'd to defeat spoof.


1.5. Stream PCI extension

A stream-mode flow (qos.service == SVC_STREAM) carries an extra 8-octet extension after the 16-octet base PCI on every DATA packet (bit numbering per Section 1.1):

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                            start                              |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                             end                               |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  start - octet offset of the first payload byte in the stream.
  end   - octet offset one past the last payload byte;
          end - start equals the on-wire payload length.

Total stream-mode PCI for DATA packets is 24 octets (16 base + 8 extension); control packets (SACK, RTTP, bare ACK, KA, etc.) retain the 16-octet base PCI. Stream mode MUST be negotiated at flow allocation; the extension is present iff stream mode is in use, never on a per-packet basis. Both peers MUST treat start/end as monotonic 32-bit byte offsets; when a slot reaches the head of the contiguous run with start not equal to the prior packet's end the slot is silently dropped at delivery time (Section 16) rather than rejected at stash.

This is the QUIC STREAM-frame reassembly model (RFC 9000 sec. 19.8): each packet carries its packet seqno (this PCI's seqno field) and a separate stream byte position (start/end). Separating the two avoids TCP's conflation of packet identity with byte position which forces Karn's algorithm for Round-Trip Time (RTT) sampling (no RTT sample on retransmits, RFC 6298 sec. 3); FRCP applies the Karn-equivalent gate via a combination of per-packet FRCT_RXM, per-slot SND_RTX flags, and a sample-fence rtt_lwe (see Section 2.1 and Section 12). FRCP's fixed-32-bit start/end wrap at 4 GiB of wire bytes, narrower than QUIC's 62-bit varint offset (cf. RFC 9000 sec. 16); the on-wire wrap is handled by the same modular before() / after() comparators (Section 1.3) FRCP uses for seqnos, which remain unambiguous as long as the in-flight byte window stays strictly under 2 GiB (the half-range of the signed-int32 difference in before()). The default per-flow ring is 1 MiB; the implementation caps ring_sz at 128 MiB (FRCT_STREAM_RING_SZ_MAX), well below the 2 GiB half-range bound. The runtime byte counters exposed via FUSE (Filesystem in Userspace) in the Ouroboros Resource Information Base (RIB, a virtual-filesystem introspection bridge) are platform size_t and do not wrap on 64-bit hosts.


2. Per-flow state and service modes

2.1. Per-flow state

Each flow keeps a sender control record and a receiver control record:

    lwe    : u32  snd: oldest unacked seqno (cumulative ACK
                  boundary as seen by sender);
                  rcv: next in-order seqno expected
    rwe    : u32  snd: peer-advertised right window edge;
                  rcv: locally-advertised right window edge
    cflags : u8   per-direction feature flags: retransmission
                  (FRCTFRTX), receiver flow control
                  (FRCTFRESCNTL), linger-on-close (FRCTFLINGER);
                  see <ouroboros/fccntl.h>
    seqno  : u32  snd: next seqno to send;
                  rcv: force-ACK trigger - set on a stale or dup
                  DATA so the next ack_snd emits a fresh
                  cumulative ACK
    ackno  : u32  snd: outbound ACK-packet seqno counter,
                  incremented for every ACK-bearing packet (bare
                  ACK, delayed ACK, SACK); used by wire-dup ACK
                  detection;
                  rcv: incoming-ACK dedup tracker
    act    : ns   last activity (used by inactivity / DRF)
    inact  : ns   inactivity threshold; sender = 3*mpl + a + r + 1s,
                  receiver = 2*mpl + a + r + 1s.  mpl is the
                  Maximum Packet Lifetime (delta-t terminology;
                  see Section 15); a and r are the FRCT a-timer
                  and r-timer bounds (see Section 8).  The
                  asymmetry is load-bearing for pre-DRF NACK
                  (Section 9).

The sender holds a per-slot ring snd_slots[RQ_SIZE] keyed by (seqno mod RQ_SIZE). Each slot tracks its retransmit entry (rxm), last-send timestamp, and retransmit flag bits: SND_RTX (a retransmit is pending or has fired, gates the next RTT sample under Karn) and SND_FAST_RXM (one-shot fast-retransmit staged for this loss event).

The receiver holds a parallel reorder ring rcv_slots[RQ_SIZE] (referred to as rq[] in prose) holding stashed out-of-order packet-buffer indexes; both FRTX and best-effort flows share this path. The invariant rwe - lwe <= RQ_SIZE holds: on each consume the receiver advances rwe by the consumed count, capping the receive window at RQ_SIZE seqno slots.

A separate fence variable rtt_lwe is bumped on every retransmit (timer-fire, SACK-driven, fast-rxm, NACK-driven) and on every seqno_rotate (Section 4) to mark the seqno range whose RTT samples MUST be discarded.


2.2. Service modes (orthogonal axes)

FRCP exposes its wire features as a vector of independent QoS axes selected at flow allocation time. All flows go through the same flow_alloc(name, qos, ...) primitive; the qosspec_t passed in determines which protocol machinery engages on the wire. This contrasts with the POSIX BSD socket model where TCP and UDP require different socket types (SOCK_STREAM / SOCK_DGRAM).

The axes:

  service   0 = unordered (no FRCP engagement: raw datagrams,
              no PCI on the wire, UDP-equivalent at this layer)
            1 = message-ordered (FRCP engaged; SDU boundaries
              preserved across fragmentation)
            2 = stream (byte-oriented, no SDU boundaries; FRTX
              required)
  loss      0 = lossless service requested: FRTX retransmit
              machinery engages (Section 8); MUST be 0 for
              service=2.  Non-zero = best-effort, FRTX off.
  ber       Bit Error Rate tolerance.
            0 = error-free service requested: a CRC trailer is
              appended after the body of DATA packets and verified
              on receive (added / checked outside the FRCP PCI;
              see Section 1.1).  Non-zero = peer accepts errors;
              trailer omitted.  SACK control packets carry a
              CRC32 trailer regardless of ber; the ber gate
              applies to DATA only.
  timeout   Peer-timeout (ms); 0 disables the keepalive timer.
              Independent of FRCP engagement.

Encryption is a separate per-flow attribute set at flow setup; when enabled it wraps the FRCP packet (PCI + body, plus the CRC trailer if any) under AEAD, expanding the spb by headsz + tailsz octets (nonce / tag). The CRC trailer is currently kept inside the AEAD wrap (see Section 1.1).

Reachable combinations exported by include/ouroboros/qos.h:

  +-----------------+---------+------+-----+-----------------------+
  | Cube            | service | loss | ber | Engaged               |
  +-----------------+---------+------+-----+-----------------------+
  | qos_raw         |    0    |   1  |   1 | Raw passthrough       |
  | qos_raw_safe    |    0    |   1  |   0 | Raw + CRC trailer     |
  | qos_rt          |    1    |   1  |   1 | FRCP, no FRTX, no CRC |
  | qos_rt_safe     |    1    |   1  |   0 | FRCP, no FRTX, CRC    |
  | qos_msg         |    1    |   0  |   0 | FRCP + FRTX           |
  | qos_stream      |    2    |   0  |   0 | FRCP + FRTX, stream   |
  +-----------------+---------+------+-----+-----------------------+

Forced couplings actually enforced by the public API:

  - service == SVC_STREAM (2) requires loss == 0; flow_alloc /
    flow_accept reject the pair otherwise with -EINVAL.
  - FRTX requires FRCP engagement (service != SVC_RAW); requesting
    loss = 0 with service = SVC_RAW is structurally a no-op
    because no frcti is created.
  - The QOS_DISABLE_CRC build flag globally forces ber = 1.
    Note: this flag defaults to ON, so default builds ship with
    CRC disabled until QOS_DISABLE_CRC is set to OFF.

Caveat: the API does NOT force ber = 0 when service != SVC_RAW. qos_rt has service = SVC_MESSAGE with ber = 1, which means the PCI itself is not CRC-protected on that cube; the HCS (Section 1.1) remains the only integrity check on the header.

The FRCP-no-FRTX regime (service = SVC_MESSAGE, loss > 0) is meaningful and live: sequence numbering, in-order delivery, flow-control advertisement, KA, DRF rotation, and SDU fragmentation / reassembly (Section 7.2) all run. Lost packets are dropped rather than retransmitted; a permanently-lost mid-fragment is dropped via skip-past-gap once a later SDU is visible in the reorder ring.


3. Protocol parameters

    +--------------------+------------------------+-------------------+
    | Parameter          | Value                  | Role              |
    +--------------------+------------------------+-------------------+
    | RQ_SIZE            | compile-time, power of | Slot ring / rcv   |
    |                    |  2 (default 128)       | window width      |
    | START_WINDOW       | compile-time, power of | Initial rwe-lwe   |
    |                    |  2 (default 128)       | after rotate      |
    | RTO_MIN            | MAX(250 us build-tun-  | RTO floor; also   |
    |                    |  able, 1<<RXMQ_RES);   | floored at the    |
    |                    |  per-flow via fccntl   | retransmit-wheel  |
    |                    |  (FRCTSRTOMIN).        | resolution        |
    |                    |  Default ~1 ms with    | (~1 ms by         |
    |                    |  RXMQ_RES=20.          | default).         |
    | MAX_RTO_MUL        | 20                     | Backoff shift cap |
    | RACK window R      | MIN(reo_wnd_mult       | Reorder window;   |
    |                    |  * min_RTT/4, SRTT)    | per RFC 8985      |
    |                    |  with MIN_REORDER_NS   | sec. 6.2;         |
    |                    |  = 250 us floor;       | reo_wnd_mult per  |
    |                    |  reo_wnd_mult scales   | sec. 6.2 step 4   |
    |                    |  on D-SACK, cap 20     |                   |
    | MIN_RTT_WIN_NS     | 300 s (5 min, Linux    | min_RTT windowed  |
    |                    |  tcp_min_rtt_wlen)     | re-anchor         |
    | REO_WND_MULT_MAX   | 20 (RFC 8985 sec.      | reo_wnd_mult cap  |
    |                    |  6.2 step 4)           |                   |
    | REO_DECAY_PKTS     | 16 (RFC 8985 sec.      | Fresh-ACK'd seq   |
    |                    |  6.2 step 4 /          | count per halving |
    |                    |  RACK.reo_wnd_persist) |                   |
    | MAX_DSACK_LAG      | RQ_SIZE                | D-SACK sanity cap |
    | RTT_QUARANTINE     | 32 (seqno steps)       | NewReno gate pad  |
    | SACK rate-limit    | SACK_MIN_GAP_NS        | Min SACK gap      |
    |                    |  (250 us, fixed)       |                   |
    | SACK_MAX_BLOCKS    | 2048 (wire cap; per-   | Per-SACK block    |
    |                    |  flow capped at        | cap               |
    |                    |  (frag_mtu-PCI-4)/8)   |                   |
    | SACK_RXM_MAX       | 32                     | Per-pass staged   |
    |                    |                        | retransmit cap    |
    | DUP_THRESH         | 3 (RFC 8985 default)   | Hybrid fast-rxm   |
    |                    |                        | trigger (Sec. 8)  |
    | MDEV_MUL           | 2 (build-tunable via   | mdev shift in     |
    |                    |  FRCT_RTO_MDEV_-       | RTO = srtt +      |
    |                    |  MULTIPLIER)           | (mdev << MDEV_MUL)|
    | RTTP nonce         | 16 octets              | Echoed verbatim   |
    | RTTP_RING          | 8                      | In-flight probes  |
    | RTT clamp          | 16 * srtt              | Probe-sample      |
    |                    |                        | upper bound       |
    |                    |                        | (ACK-derived RTT  |
    |                    |                        | samples gated by  |
    |                    |                        | Karn / recovery   |
    |                    |                        | only)             |
    | Cold-probe cadence | 100 ms (rx-driven;     | Pre-srtt RTTP     |
    |                    |  see Section 12)       | rate              |
    | DELT_RDV           | 100 ms                 | RDVS emit cadence |
    | MAX_RDV            | 1 s                    | RDVS give-up      |
    | Delayed-ACK fire   | 2 * TICTIME (TICTIME   | Fired after the   |
    |                    |  = FRCT tick gran-     | first in-order    |
    |                    |  ularity, default      | DATA arrival;     |
    |                    |  5 ms; 2*TICTIME       | tick is build-    |
    |                    |  = 10 ms by default)   | tunable           |
    | NACK send cooldown | srtt when an srtt      | Pre-DRF NACK      |
    |                    |  sample exists, else   | rate-limit        |
    |                    |  100 ms                |                   |
    | MAX_SDU            | 1 MiB                  | Max reassembled   |
    |                    |                        | SDU; configurable |
    |                    |                        | per flow          |
    +--------------------+------------------------+-------------------+

The per-flow fragment Maximum Transmission Unit (MTU) is computed at flow setup from the lower IPCP's mtu minus encryption headsz / tailsz and CRC trailer; there is no FRCT-level default or environment-variable override.


4. Sequence-number rotation (DRF)

The DRF (Data Run Flag) bit on an outbound packet means "this is the start of a fresh data run" and is set whenever the sender has nothing in flight (snd_cr.seqno == snd_cr.lwe).

Independently of that, if the sender has been idle longer than snd_cr.inact AND the pipe is empty (snd_cr.seqno == snd_cr.lwe), seqno_rotate() rolls a random new seqno before the send and resets

    snd_cr.seqno  = random()
    snd_cr.lwe    = snd_cr.seqno
    snd_cr.rwe    = snd_cr.seqno + START_WINDOW
    rtt_lwe       = snd_cr.seqno
    in_recovery   = false   (recovery state, see Section 8)
    recovery_high = snd_cr.seqno

The receiver, on observing rcv-side inactivity (now - rcv_cr.act > rcv_cr.inact), requires a DRF on the next DATA packet; otherwise it replies with a rate-limited NACK (see below). Non-DATA control packets pass through without the DRF requirement. On DRF the receiver releases the rq[] slots and rebases

    rcv_cr.lwe   = seqno
    rcv_cr.rwe   = seqno + RQ_SIZE
    rcv_cr.seqno = seqno

If the inactive packet has DATA but no DRF, a rate-limited NACK is fired back to the sender (cooldown per Section 3); non-DATA stale arrivals fall through to normal processing (no NACK, no drop).


5. Send path

    1. If the SDU exceeds (frag_mtu - data_hdr_len), the caller
       (dev.c) fans it out into ceil(count / (frag_mtu -
       data_hdr_len)) fragments, each emitted via frcti_snd as its
       own DATA packet with a per-fragment role (Section 7.2);
       both FRTX and best-effort flows fragment.  Raw flows (no
       FRCP engagement, qos.service == SVC_RAW) carry no PCI and
       return -EMSGSIZE for any SDU larger than one packet at the
       layer below.  An SDU that fits in a single packet is sent
       as SOLE.  frcti_snd reserves PCI head room; sets DATA, plus
       DRF when the pipe is empty (snd_cr.seqno == snd_cr.lwe).
    2. seqno_rotate() if past sender inactivity and the pipe is
       empty (Section 4).
    3. Advertise FC (pci.window = frcti_advert_rwe(frcti), i.e.
       rcv_cr.rwe clamped to rcv_cr.lwe + ring_seq_cap in stream
       mode) when the receiver side is recent: now - rcv_cr.act
       < rcv_cr.inact.
    4. Reliable mode (FRTX): leave snd_cr.lwe where it is; reset
       the slot at RQ_SLOT(seqno) (snd_slots[p].time = now,
       snd_slots[p].flags = 0); queue an rxm_entry (saves a packet
       copy, arms a wheel timer at now + (rto << rto_mul)).
       Piggyback ACK (pci.ackno = rcv_cr.lwe) while the a-timer
       for the most recent received DATA packet has not yet
       expired (now - rcv_cr.act <= t_a); on piggyback, set
       rcv_cr.seqno = rcv_cr.lwe so the next delayed-ACK fire is
       suppressed.  See Section 8 for t_a / t_r semantics.
    5. Best-effort mode (no FRTX): advance snd_cr.lwe immediately
       (snd_cr.lwe = snd_cr.lwe + 1, snd_cr.rwe = snd_cr.lwe +
       RQ_SIZE); no retransmit state.  No send-side RTT probe is
       armed in this mode (rtt_probe_arm requires an in-flight
       seqno, which best-effort never has); the rx-driven cold
       seeder in frcti_rcv is the only probe path.
    6. In reliable mode, optionally arm an RTT probe (Section 12).


6. Receive path

6.1. Early-exit dispatch

Keepalive (KA), RTT probe (RTTP), pre-DRF NACK, and rendezvous (RDVS) packets short-circuit out of frcti_rcv before the locked main path; each handler takes its own lock internally.

      incoming packet
            |
            v
       +---------+
       | KA?     |---yes--> ka_rcv  ; return
       +---------+
            |no
            v
       +---------+
       | RTTP?   |---yes--> rttp_rcv; return
       +---------+
            |no
            v
       +---------+
       | NACK?   |---yes--> nack_rcv; return  (see Section 9)
       +---------+
            |no
            v
       +---------+
       | RDVS?   |---yes--> rdv_rcv ; return  (reply bare FC, ackno=0)
       +---------+
            |no
            v
       acquire wrlock; enter locked main path
  - KA   : refresh t_ka_rcv, honour piggybacked ACK.
  - RTTP : probe (echo back nonce) or echo (verify nonce, sample
           RTT).
  - NACK : pre-DRF, sender-side handler.  See Section 9.
  - RDVS : reply with a bare FC packet (ackno = 0); rdlock only.


6.2. Locked main path

Steps below run with the per-flow frcti.lock held for writing (pthread_rwlock_wrlock) unless noted.

  rcv_inact_check
      Only meaningful when the receive side is stale.  On DRF
      (Data Run Flag): release rq[] slots, rebase rcv_cr, continue.
      On stale DATA without DRF: fire a pre-DRF NACK if cooldown
      allows (Section 9), then discard the packet; on cooldown,
      drop without sending a NACK (a pending cumulative ACK from
      drop_packet may still go out).  Non-DATA, non-DRF arrivals
      bypass rcv_inact_check entirely; pure-DRF stale arrivals fall
      through after the DRF rebase branch.
  DATA-only act refresh
      Refresh rcv_cr.act only when FRCT_DATA is set, so that non-DATA
      packets never block the next DRF rebase.
  Wire-dup gate
      Before flag-driven dispatch, drop wire-duplicate ACKs and
      wire-duplicate DATA (is_dup_ack / is_dup_data).  The DATA
      check is bypassed for FRCT_RXM-bearing arrivals so the
      piggybacked ACK / SACK / FC carried on a retransmitted DATA
      at an already-ACK'd seqno is still applied; the stale-in-
      window branch below then drops the packet.
  ACK
      Drop ACKs whose ackno falls outside (snd_cr.lwe, snd_cr.seqno].
      If ackno == snd_cr.lwe (non-advancing cumulative ACK), drive
      RACK fast-retransmit consideration (Section 8).  Otherwise
      advance snd_cr.lwe = ackno, collapse rto_mul to 0 (Karn-gated
      by SND_RTX on the just-acknowledged slot, the old head-of-
      line), reset dup_thresh to 0, update t_latest_ack to the
      send-time of the slot at ackno-1 (consumed by RACK and SACK
      below), decay reo_wnd_mult per RFC 8985 sec. 6.2 step 4,
      exit NewReno-careful recovery (see Section 8) on
      ackno >= recovery_high or ackno == snd_cr.seqno, and feed an
      RTT sample if eligible (Section 12).
  SACK
      Walk the block list.  For each block (a present range above
      lwe) NULL out snd_slots[k].rxm, clear the slot's per-send
      flags, and advance t_latest_ack to the latest send-time
      covered (the Forward Acknowledgement / fack equivalent,
      Mathis & Mahdavi 1996); the first block whose start
      clamps to snd_cr.lwe skips this fack update so that a head-
      of-line clamp does not falsely advance fack.  For un-SACKed
      gaps below hi_sacked, stage a retransmit per slot that is
      (1) still owned (rxm != NULL), (2) not already SND_FAST_RXM,
      (3) not aged out past t_r, and (4) either outside the RACK
      reorder window R OR with dup_thresh >= DUP_THRESH (the RFC
      8985 sec. 6.2 hybrid trigger).  Mark the slot SND_FAST_RXM
      and NULL the rxm at stage time.  Capped at SACK_RXM_MAX
      staged retransmits per receive pass; what's left rides the
      next SACK.
  FC
      Bump snd_cr.rwe (clamped to lwe + RQ_SIZE, never shrinks)
      and mark window open.
  DATA
      Bounds-check seqno against window.  On stale-dup
      (seqno < rcv_cr.lwe), set rcv_cr.seqno = seqno to force a
      fresh ACK on the next ack_snd, then drop.  On accept: both
      FRTX and best-effort stash the packet-buffer index into
      rq[seqno mod RQ_SIZE].  Fragments stash unchanged - the role
      bits are inspected only at consume time (Section 7.2).  On
      out-of-order arrival, build a SACK reply if not rate-limited
      (per Section 3) and not deduplicated against the previous
      (rcv_cr.lwe, n_blocks) pair; D-SACK reports always bypass the
      dedup.  If both rate-limit and dedup suppress the reply,
      neither SACK nor delayed-ACK fires (the sender picks up the
      gap on its next ACK).  On in-order arrival, arm the delayed-
      ACK timer.
  drop_packet exit
      Releases the per-packet shared-memory buffer (spb), then
      calls ack_snd synchronously after the spb release to surface
      any pending cumulative ACK.


7. Read path and reassembly

7.1. Read path

flow_read returns a full reassembled SDU (Service Data Unit) via frcti_consume on every FRCP SDU-mode flow (FRTX or best-effort); stream-mode is covered in Section 16. An incomplete head-of-line (HoL) run yields -EAGAIN; an oversized run yields -EMSGSIZE (the run is dropped so the flow does not stall). On best-effort flows, a permanently-lost mid-fragment is dropped as soon as a later complete SDU becomes visible in the ring (Section 7.2 skip-past- gap).

Raw flows carry no frcti, so flow_read returns the next pending packet-buffer index directly, with no role-bit inspection. (Raw service is selected via qos.service == SVC_RAW at flow allocation, which suppresses frcti creation.)

frcti_pdu_ready is the no-advance peek used by fevent (the Ouroboros flow-event multiplexer, the poll(2)-equivalent on flows). It returns ready only when the head-of-line run is complete and the lead packet (a Protocol Data Unit, here one FRCP packet) is present at rcv_cr.rwe - RQ_SIZE; any other state (including the best-effort skip-past-gap case) returns not ready, and frcti_consume is left to drop the broken prefix and re- inspect.


7.2. Fragmentation and reassembly

Send side (flow_write_frag). An SDU larger than (frag_mtu - PCI) is split into ceil(count / (frag_mtu - PCI)) fragments; each fragment is its own FRCP packet with its own seqno and a per-fragment role flag pair (Section 1.2). Roles are assigned at emit time:

    +------+--------+
    | i    | Role   |
    +------+--------+
    | n=1  | SOLE   |
    | i=0  | FIRST  |
    | i=n-1| LAST   |
    | else | MID    |
    +------+--------+

A mid-loop allocation or transmit failure may yield a partial write: the call returns the bytes already enqueued (off > 0) or the underlying error (off == 0). Best-effort flows fragment identically; on the receiver, a partial run with a permanently- lost fragment is dropped when a later complete SDU is visible in the ring (see skip-past-gap below). Raw flows carry no PCI and refuse anything larger than the layer's user MTU (-EMSGSIZE).

Wire-level recovery is fragment-agnostic on FRTX flows: each fragment's seqno flows through SACK / RACK / RTO / NACK exactly as for a SOLE DATA packet, and reassembly does not re-enter the loss-detection path. Best-effort flows run the same seqno machinery (DRF, FC, ACK piggyback, pre-DRF NACK emit) but queue no rxm state at the sender, so a lost MID is unrecoverable; skip-past-gap handles it (below).

Receive side. Fragments stash into rq[seqno] unchanged; role bits are read only at consume time. frag_run_inspect, called from frcti_consume, walks the ring starting at the oldest still- undelivered seqno base = rcv_cr.rwe - RQ_SIZE (equal to rcv_cr.lwe only when no partial run is in progress; during a partial run lwe has already advanced past base). It produces one of three outcomes:

    +---------------+---------------------------------------------+
    | Outcome       | Cause                                       |
    +---------------+---------------------------------------------+
    | DELIVER (n)   | rq[base]=SOLE (n=1), or rq[base]=FIRST and  |
    |               | a LAST follows in slots [base+1..base+n-1]  |
    |               | with all intermediate roles in {MID,FIRST,  |
    |               | LAST} contiguous.                           |
    | DROP (n)      | rq[base] is MID or LAST without a preceding |
    |               | FIRST (n=1); a FIRST..[non-LAST]..new-FIRST |
    |               | or new-SOLE mid-run (drop the broken prefix |
    |               | with n = run length minus 1, so the new     |
    |               | FIRST/SOLE stays); or, on best-effort       |
    |               | flows, a gap at base with a FIRST/SOLE      |
    |               | later in the ring (drop up to the new run   |
    |               | start).                                     |
    | NOT_READY     | rq[base] absent or FIRST..[non-LAST] with   |
    |               | no later FIRST/SOLE in the ring (FRTX waits |
    |               | for retx; best-effort waits for arrival).   |
    +---------------+---------------------------------------------+

DELIVER triggers frag_gather: a scatter-gather memcpy of the n consecutive fragments at rq[base..base+n-1] directly into the caller's buffer; each per-packet shared-memory buffer (spb) is released and rwe advances by n. lwe was already advanced incrementally as each contiguous fragment arrived; frag_gather only restores the fixed-width invariant rwe == lwe + RQ_SIZE. No intermediate reassembly buffer is allocated.

DROP advances rwe past the broken prefix (releasing the spbs) and pulls lwe up to the new trailing edge if needed; the next consume retries from the new base. Oversize or arithmetically overflowing delivery (sum of fragment lengths > max_rcv_sdu, sum > caller's buffer, or running-sum overflow) also drops the run with -EMSGSIZE.

Skip-past-gap (best-effort only). On FRTX, a gap in the run means "waiting for retransmit" and frag_run_inspect returns NOT_READY. On best-effort flows the gap is permanent, so frag_run_inspect scans forward in the ring for the next FIRST or SOLE; if one is visible within RQ_SIZE, it returns DROP for the broken prefix and the consume loop retries at the new lwe. Memory hold is bounded by RQ_SIZE; the partial releases on the next consume call once a later complete run exists. Voice-like flows (one SOLE per SDU) see no extra wait: any later SOLE makes the prior gap droppable immediately.

The choice to defer reassembly to consume time keeps the receive path zero-copy: fragments stay in the shared-memory ring until the application pulls, and the SDU lands directly in the caller's buffer.


8. Retransmission

FRCP is bounded by two delta-t-derived timers (Watson 1981, see Section 15):

  - t_a (a-timer): upper bound on ACK delay.  An ACK for a received
    DATA packet MUST be emitted within t_a of receipt; an attempt
    to send an ACK after the a-timer has expired is suppressed
    (the sender's RTO is already in motion).
  - t_r (r-timer): upper bound on retransmission.  A given DATA
    packet MUST NOT be retransmitted after t_r has elapsed since
    its first send (t0); when the bound is hit, the flow is
    declared down (raising the Ouroboros asynchronous flow
    condition ACL_FLOWDOWN, which marks the flow dead to both
    endpoints) rather than retransmitted again.

Each in-flight FRTX seqno owns one rxm_entry, armed in a hashed timing wheel; the wheel deadline is the slot's next eligible retransmit time.

  RTO timer
      On fire (rxm_due), re-emit with FRCT_RXM, mark SND_RTX
      (Karn-suppress next ACK's RTT sample), and (for the head-of-
      line (HoL) slot only) bump rto_mul up to MAX_RTO_MUL.  Wheel
      deadline is t_send + (rto << rto_mul).  Re-armed unless
      consumed.  The RTO timer also clears SND_FAST_RXM (re-arming
      fast-retransmit eligibility), resets reo_wnd_mult to 1 on a
      HoL fire (RFC 8985 sec. 6.2 step 4 reset clause), and marks
      the flow ACL_FLOWDOWN if its frct_tx call fails.
  r-timer guard
      Before any retransmit attempt, check (now - t0) against t_r.
      If exceeded, the slot is no longer eligible for retransmit.
      Only the RTO timer (rxm_due) treats r-timer expiry as
      terminal: it marks the flow ACL_FLOWDOWN (peer unreachable).
      Fast-retransmit, SACK-driven retransmit, and NACK-driven
      head-of-line re-emit silently skip aged-out slots and defer
      the flow-down decision to the next RTO fire.
  Fast retransmit (hybrid trigger, RFC 8985 sec. 6.2)
      On a non-advancing cumulative ACK with the scoreboard
      advanced, fire one fast retransmit when EITHER (a) the head-
      of-line slot's latest send is older than the RACK reorder
      window R (Section 3) and not yet aged out, OR (b) the SACK
      dup-thresh count above snd_cr.lwe reaches DUP_THRESH (= 3,
      RFC 8985 sec. 6.2 step 4).  Fires at most once per non-
      advancing cumulative-ACK value, gated by rack_fired_lwe (the
      snd_cr.lwe at which fast-retransmit last fired).  Set
      SND_FAST_RXM on the slot (one-shot per-slot gate) and enter
      NewReno-style careful recovery (see NewReno below in this
      section).
      The RACK reorder window R uses the RFC 8985 sec. 6.2 form
      R = MIN(reo_wnd_mult * min_RTT / 4, SRTT) with a
      MIN_REORDER_NS = 250 us floor.  Before the first RTT sample
      seeds min_rtt, R falls back to MIN(reo_wnd_mult * SRTT / 4,
      SRTT), still floored at MIN_REORDER_NS (consistent with the
      windowed-minimum fallback described in Section 12).  min_rtt
      is a windowed minimum over the last MIN_RTT_WIN_NS = 5 min of
      RTT samples (matches the Linux tcp_min_rtt_wlen default) so a
      route change to a longer path eventually re-anchors the
      reorder window without relying on reo_wnd_mult growth alone.
  SACK-driven retransmit
      For each gap below hi_sacked whose slot is (1) still owned,
      (2) not already SND_FAST_RXM, (3) not aged out past t_r, and
      (4) either outside the RACK window R OR with dup_thresh >=
      DUP_THRESH (same hybrid as fast-retransmit, see Section 6.2),
      re-emit.  Each SACK-driven retransmit re-arms a fresh rxm so
      a lost retransmit can still be recovered by its own RTO
      timer.
  NewReno
      On entry, recovery_high = snd_cr.seqno + RTT_QUARANTINE.
      Exit when ackno >= recovery_high or ackno == snd_cr.seqno
      (the latter means everything sent has been acknowledged).
      seqno_rotate also clears recovery.


9. Pre-DRF NACK

The two sides have different inactivity thresholds (snd_cr.inact > rcv_cr.inact), so a receiver can detect "stale data run" before the sender's own DRF logic kicks in. NACK is the receiver-driven nudge that asks the sender to re-transmit the head of the run.

  Send (frcti_nack_snd, called by frcti_rcv when rcv_inact_check
        returns FRCT_INACT_NEED_NACK)
      When an incoming DATA packet has no DRF and rcv-side activity
      is older than rcv_cr.inact, the receiver emits a bare packet
      with flags = FRCT_NACK and seqno = arrival_seqno - 1
      (informational only, not consulted by the receive handler).
      The cooldown in Section 3 rate-limits the burst.  Non-DATA
      non-DRF arrivals bypass rcv_inact_check entirely; non-DATA
      DRF still rebases via the DRF branch.
  Receive (frcti_nack_rcv)
      Dispatched in the early-exit branch (Section 6.1), before
      rcv_inact_check.  The sender copies the head-of-line (HoL)
      rxm packet, marks the slot SND_RTX | SND_FAST_RXM (Karn-
      suppress next ACK, one-shot fast-rxm gate), sets rtt_lwe =
      snd_cr.lwe + 1, and re-emits via fast_rxm_send with FRCT_RXM
      and a refreshed ackno.  The original rxm_entry and its RTO
      timer are left armed - the NACK emit is additive to the
      normal retransmit machinery, not a replacement.  No-op if
      nothing is in flight, the HoL slot has aged past t_r, or
      the HoL rxm pointer has been cleared by SACK or RACK.

NACK serves two roles:

  1. Lost first-of-run (DRF) packet recovery.  Required.  Until
     the DRF packet arrives, the receiver cannot rebase its
     window, so any subsequent in-flight packets look stale to
     the receiver.  The NACK fires the moment the second
     packet arrives at a stale receiver, telling the sender to
     re-emit the HoL (DRF) packet at NACK-cooldown latency rather
     than waiting for the initial RTO (which is the configured
     default until srtt is seeded by the first probe round-trip).
  2. General loss-recovery accelerator.  When loss is detected
     receiver-first, the NACK skips one RTO of latency relative to
     waiting for the sender's RTO to fire.

In both cases the existing rxm_entry and its RTO timer are left armed, so the RTO path remains the eventual fallback.


10. Cumulative + selective ACK

Cumulative ACK is ackno = rcv_cr.lwe. On out-of-order arrival the receiver also emits a SACK packet (Section 1.3) whose payload lists

  • present* blocks above lwe (analogous to TCP SACK / QUIC ACK

ranges). SACKs are rate-limited per Section 3 and suppressed when neither lwe nor block count has changed since the last SACK.

D-SACK reports (RFC 2883) are emitted in-band as block[0] of an otherwise normal SACK frame (see Section 1.3 for the encoding). Two receiver triggers arm a pending D-SACK report (single-slot, latest-wins):

  - DATA arrival with seqno < rcv_cr.lwe, both wire-dup (no RXM,
    is_dup_data path) and retransmit (RXM, post-FC branch)
    (RFC 2883 sec. 4.1.1, full duplicate)
  - rq_accept conflict, slot already occupied in [lwe, rwe)
    (RFC 2883 sec. 4.1.2, partial duplicate)

When a D-SACK is pending and the standard scoreboard SACK would be suppressed by dedup or rate-limit, the report is emitted as a stand-alone SACK frame through the normal ack_snd path; when a D-SACK report is pending the path bypasses dedup and the TICTIME rate-limit, but the a-timer suppression on rcv inactivity still applies.

Bare ACKs are deferred via a per-flow delayed-ACK timer (one in flight at a time, atomic test-and-set dedup; fires per Section 3 after the first in-order arrival). Suppressed if (1) no new seqno, (2) rcv side is inactive (older than t_a), or (3) the sender just sent within TICTIME. A pending D-SACK ride-through bypasses (1) and (3); the a-timer gate (2) is unconditional.


11. Flow control

The receiver advertises rwe in every FC field. The sender treats its snd_cr.rwe as the absolute right edge: when snd_cr.seqno >= snd_cr.rwe the window is closed and flow_write yields. While closed, the sender periodically emits RDVS (rendezvous) packets (cadence DELT_RDV); the receiver replies with a bare FC packet (ackno = 0) that reopens the window. Once the window has been closed for longer than MAX_RDV the sender stops emitting RDVS but does not tear the flow down - the writer keeps blocking until either a peer-driven FC arrives or the KA (keepalive) / r-timer marks the flow.

rwe is clamped to lwe + RQ_SIZE on receipt and MUST NOT shrink: a backward rwe is silently clamped to the current snd_cr.rwe; the FC packet still reopens the window.


12. RTT estimation

Active RTTP probes (Section 1.4) carry a 32-bit probe_id (0 reserved) and a 16-byte random nonce echoed verbatim - defends against spoofed replies. A ring of RTTP_RING in-flight probes is kept; an echo whose (id, nonce) doesn't match the ring slot is dropped. A single RTTP sample is clamped to RTT_CLAMP_MUL * srtt (compile-time RTT_CLAMP_MUL = 16) once srtt is seeded; the first cold-probe sample feeds rtt_update raw.

Probe arming gates:

  - Cold (no srtt yet): the receive path arms at most one probe
    per 100 ms via frcti_rcv_probe (PROBE_DUE_COLD); arming
    requires an incoming packet.  Active send-path arming bails
    while srtt == 0.
  - Warm (rtt_probe_arm, called from frcti_snd): outstanding
    data (snd_cr.seqno > snd_cr.lwe), AND at least 2 * srtt
    since t_rcv_rtt (last RTT receive of any kind), AND at
    least srtt since t_snd_probe (last probe emit).

Sample feeds either Linux's asymmetric mdev estimator (FRCT_LINUX_RTT_ESTIMATOR, default ON) or RFC 6298 symmetric EWMA (compile option). srtt is floored at 10 ms when seeded from a hint, at 1 us after every update (including the first seeding sample); mdev floored at 100 ns.

    RTO = max(rto_min, 2 * srtt, srtt + (mdev << MDEV_MUL))

(the 2 * srtt floor is an FRCT addition not in RFC 6298). Effective wheel deadline capped per Section 3.

ACK-derived samples (frcti_ack_rcv -> rtt_sample_eligible), beyond the cum-ACK advance gate in frcti_ack_rcv (ackno > lwe and ackno <= seqno), require all of: not in recovery; ACK packet does not carry FRCT_RXM; HoL slot's SND_RTX bit clear; slot's rxm pointer non-NULL (not SACK-consumed); lwe not below the rtt_lwe fence; srtt already seeded by an RTTP probe. There is no ACK-only seeding.

Every eligible sample also feeds RACK.min_RTT (RFC 8985 sec. 6.2) via a windowed minimum: replace whenever the sample is strictly smaller OR more than MIN_RTT_WIN_NS (5 min, matches Linux tcp_min_rtt_wlen) has elapsed since the current min was set. The downward branch is immediate (faster path picked up at once); the upward branch is gated on the window (a transient queue burst does not poison the estimate, but a sustained route change to a longer path re-anchors min_RTT after at most one window). Seeded from rtt_hint at rtt_init; 0 acts as the unset sentinel and the base in rack_reorder_window falls back from min_RTT to SRTT (so R = mult * SRTT/4, capped at SRTT, floored at MIN_REORDER_NS) until the first sample. See Section 6.2.


13. Liveness (keepalive)

When qs.timeout > 0 a per-flow KA (keepalive) timer is armed. Arming uses rcv_cr.act for the deadline computation:

    deadline = min(snd_act + qs.timeout/4,
                   rcv_act + qs.timeout)

(clamped to now + qs.timeout/4 if already past). The timer fires either on sender idleness (to send a KA) or on receiver idleness (to declare the peer dead). On fire (ka_snd) the peer-dead test uses max(rcv_cr.act, t_ka_rcv) so a recent KA reply counts even when no DATA has arrived:

  - If now - max(rcv_cr.act, t_ka_rcv) > qs.timeout, mark the flow
    ACL_FLOWPEER and notify the per-process flow-event set
    (proc.fqset) with FLOW_PEER.
  - Else if snd_idle > qs.timeout/4, emit a bare KA | ACK
    (ackno = rcv_cr.lwe) and re-arm.
  - Else just re-arm.

Note: rx_rb and tx_rb are the receive and transmit shared-memory ring buffers. The r-timer raises ACL_FLOWDOWN on both (route is broken); keepalive raises ACL_FLOWPEER on rx_rb only and notifies the flow-event set (peer is silent, writer keeps tx_rb usable) - distinct ACLs. qs.timeout == 0 disables keepalive entirely; a silent peer crash is then undetected.


14. Linger / teardown

On flow_dealloc, frcti_dealloc computes a grace timeout

    max(rcv_cr.act + rcv_cr.inact, snd_cr.act + snd_cr.inact) - now

(floored at 0 and converted to seconds) and returns it; flow_dealloc forwards this to the IRMd as the dealloc grace. The IRMd, not FRCT, performs the wait. Before computing the timeout, FRCT may emit a final ACK when rcv_cr.lwe != rcv_cr.seqno (the peer has not been told the most recent cumulative ACK) AND the rcv side has been active within t_a (a-timer not aged out).

FRCTFLINGER is honoured only when snd_cr.lwe < edge, where edge = snd_fin_seqno after FIN has been sent in stream mode and snd_cr.seqno otherwise (data or FIN still in flight). The drain itself runs in flow_dealloc's while (FRCTI_LINGERING) loop, not in frcti_dealloc.

The fd is single-reader / single-writer (documented in the manpages). flow_write pumps rx_rb on every call (via flow_wait_window -> flow_drain_rx_nb) and additionally blocks on rx_rb when the send window is closed. A pure-writer thread thus consumes ACKs without a dedicated reader.


15. Heritage and adopted techniques

Delta-t (Watson, 1981) is the primary heritage; FRCP descends from the delta-t protocol family via the Recursive InterNetwork Architecture (RINA; Day, "Patterns in Network Architecture", 2008, ch. 9). Timer-based connection management (no SYN/FIN handshake, per-flow state born on first DATA and reclaimed after t_mpl + a + r of silence), the DRF marker, and the t_mpl / t_a / t_r timers all come from delta-t. See Watson, "Timer-Based Mechanisms in Reliable Transport Protocol Connection Management", Computer Networks 5 (1981).

The unified `flow_alloc(name, qos, ...)` primitive and its multi-axis QoS-cube argument (Section 2.2) also come from RINA (Day 2008, ch. 6; Grasa et al., "IRATI: investigating RINA as an alternative to TCP/IP", Computer Networks 92 (2015)) - reliability, ordering, CRC presence, and encryption are flow attributes, not separate sockets or protocols.

The table below summarises additional adopted techniques and their references.

+------------------------+------------------+------------------------+
| FRCP mechanism         | Heritage         | Reference / note       |
+------------------------+------------------+------------------------+
| Random new seqno on    | TCP ISN          | RFC 6528 (Gont &       |
| seqno_rotate           |                  | Bellovin, 2012).       |
|                        |                  | QUIC PN-space reset    |
|                        |                  | (RFC 9000 sec. 12.3)   |
|                        |                  | is a structural        |
|                        |                  | analogue.              |
+------------------------+------------------+------------------------+
| Cumulative ACK,        | TCP              | RFC 793 / RFC 9293     |
| left-window-edge       |                  |                        |
| advance                |                  |                        |
+------------------------+------------------+------------------------+
| Receive window with    | TCP              | RFC 793 sec. 3.7 /     |
| non-shrink rule        |                  | RFC 9293 sec. 3.8.6;   |
|                        |                  | RFC 1122 sec. 4.2.2.16 |
|                        |                  | for the explicit non-  |
|                        |                  | shrink prohibition     |
+------------------------+------------------+------------------------+
| Modular seqno          | TCP              | RFC 793 sec. 3.3 /     |
| arithmetic             |                  | RFC 9293 sec. 3.4      |
| (before/after helpers) |                  |                        |
+------------------------+------------------+------------------------+
| Selective ACK block    | TCP              | RFC 2018 (Mathis et    |
| list                   |                  | al., 1996).  Encoded   |
|                        |                  | as a typed FRCP packet |
|                        |                  | rather than a TCP      |
|                        |                  | option, so framing is  |
|                        |                  | closer to QUIC ACK     |
|                        |                  | frames.  D-SACK (RFC   |
|                        |                  | 2883) carried in-band  |
|                        |                  | as block[0]; see       |
|                        |                  | Section 1.3.           |
+------------------------+------------------+------------------------+
| NewReno-careful        | TCP              | RFC 6582 (Henderson    |
| recovery with          |                  | et al., 2012); QUIC    |
| recovery_high gate     |                  | builds on the same     |
|                        |                  | model in RFC 9002      |
|                        |                  | sec. 7.3.2.  Cwnd half |
|                        |                  | absent (CC in IPCP).   |
+------------------------+------------------+------------------------+
| RACK reordering        | TCP              | RFC 8985 (Cheng et     |
| window for fast        |                  | al., 2021).  FRCP      |
| retransmit             |                  | R = MIN(reo_wnd_mult * |
|                        |                  | min_RTT / 4, SRTT)     |
|                        |                  | with a MIN_REORDER_NS  |
|                        |                  | = 250 us floor against |
|                        |                  | srtt collapse; matches |
|                        |                  | RFC 8985 sec. 6.2 and  |
|                        |                  | Linux tcp_rack_reo_wnd.|
|                        |                  | DSACK-driven           |
|                        |                  | reo_wnd_mult (sec. 6.2 |
|                        |                  | step 4) is adopted;    |
|                        |                  | see Section 1.3 for    |
|                        |                  | the wire encoding.     |
|                        |                  | The hybrid RACK-or-    |
|                        |                  | DUP_THRESH trigger     |
|                        |                  | from RFC 8985 sec. 6.2 |
|                        |                  | step 4 is adopted      |
|                        |                  | (Section 8).  QUIC's   |
|                        |                  | analogue in RFC 9002   |
|                        |                  | sec. 6.1.2 uses        |
|                        |                  | max(srtt, latest_rtt)  |
|                        |                  | as the base.           |
+------------------------+------------------+------------------------+
| Karn's algorithm:      | TCP              | Karn & Partridge,      |
| no RTT sample on       |                  | "Improving Round-Trip  |
| retransmits, RTO-      |                  | Time Estimates in      |
| collapse freeze        |                  | Reliable Transport     |
|                        |                  | Protocols", SIGCOMM    |
|                        |                  | 1987; RFC 6298 sec. 3. |
+------------------------+------------------+------------------------+
| RTO formula            | TCP              | RFC 6298 (Paxson et    |
| RTO = max(RTO_MIN,     |                  | al., 2011).  RTO_MIN = |
| srtt + (mdev <<        |                  | 5 ms is below RFC 6298 |
| MDEV_MUL))             |                  | sec. 2.4's 1 s SHOULD- |
|                        |                  | floor - a recursive-   |
|                        |                  | layer choice.          |
+------------------------+------------------+------------------------+
| Linux asymmetric mdev  | Linux kernel     | tcp_rtt_estimator() in |
| estimator (default)    |                  | net/ipv4/tcp_input.c;  |
|                        |                  | the if(delta<0) m>>=3  |
|                        |                  | dampening is a         |
|                        |                  | kernel divergence from |
|                        |                  | RFC 6298.  RFC 6298    |
|                        |                  | EWMA available behind  |
|                        |                  | a compile flag.        |
+------------------------+------------------+------------------------+
| Delayed ACK with rate  | TCP              | RFC 813 (Clark, 1982); |
| suppression            |                  | RFC 1122 sec. 4.2.3.2; |
|                        |                  | RFC 5681 sec. 4.2.     |
|                        |                  | Single-deadline        |
|                        |                  | coalescing rather than |
|                        |                  | "ack-every-other-      |
|                        |                  | segment".              |
+------------------------+------------------+------------------------+
| Zero-window-probe /    | TCP              | RFC 1122 sec.          |
| persist-timer          |                  | 4.2.2.17 / RFC 9293    |
| analogue (RDVS)        |                  | sec. 3.8.6.1.  RDVS    |
|                        |                  | solicits an FC reply,  |
|                        |                  | distinct from QUIC     |
|                        |                  | DATA_BLOCKED (RFC 9000 |
|                        |                  | sec. 19.12), which is  |
|                        |                  | one-way notification.  |
|                        |                  | MAX_RDV give-up        |
|                        |                  | departs from TCP.      |
+------------------------+------------------+------------------------+
| Multiplexed control    | SCTP / QUIC      | SCTP chunk bundling    |
| on a single PCI        |                  | (RFC 9260 sec. 6.10);  |
|                        |                  | QUIC frame             |
|                        |                  | multiplexing (RFC 9000 |
|                        |                  | sec. 12.4).  Cleaner   |
|                        |                  | fit than TCP's         |
|                        |                  | separate-flag-bits     |
|                        |                  | design.                |
+------------------------+------------------+------------------------+
| ACK ranges as          | QUIC             | QUIC ACK frame (RFC    |
| multiple discontiguous |                  | 9000 sec. 19.3).  FRCP |
| acked blocks           |                  | SACK is conceptually   |
|                        |                  | QUIC-frame-shaped      |
|                        |                  | even though encoded    |
|                        |                  | as absolute            |
|                        |                  | [start,end] pairs.     |
+------------------------+------------------+------------------------+
| Nonce-authenticated    | QUIC             | PATH_CHALLENGE /       |
| active RTT / liveness  | PATH_CHALLENGE   | PATH_RESPONSE (RFC     |
| probing (RTTP)         |                  | 9000 sec. 8.2,         |
|                        |                  | sec. 19.17, sec.       |
|                        |                  | 19.18).  WebRTC ICE    |
|                        |                  | consent-freshness      |
|                        |                  | (RFC 7675) is the      |
|                        |                  | same pattern.  QUIC's  |
|                        |                  | nonce is 8 octets;     |
|                        |                  | FRCP chooses 16.       |
+------------------------+------------------+------------------------+
| Probing distinct from  | QUIC             | KA timer answers       |
| keepalive              |                  | "peer alive?", RTTP    |
|                        |                  | answers "path          |
|                        |                  | measurable?", as in    |
|                        |                  | QUIC PING (RFC 9000    |
|                        |                  | sec. 19.2) vs          |
|                        |                  | PATH_CHALLENGE.        |
+------------------------+------------------+------------------------+
| Bare KA + ACK          | QUIC / SCTP      | QUIC PING (RFC 9000    |
| keepalive packets      |                  | sec. 19.2); SCTP       |
|                        |                  | HEARTBEAT /            |
|                        |                  | HEARTBEAT-ACK (RFC     |
|                        |                  | 9260 sec. 8.3).  SCTP  |
|                        |                  | HEARTBEAT also carries |
|                        |                  | an opaque echoed blob, |
|                        |                  | structurally similar   |
|                        |                  | to FRCP RTTP.          |
+------------------------+------------------+------------------------+
| (FFGM, LFGM)           | SCTP             | RFC 9260 sec. 3.3.1    |
| fragment-role bits     |                  | DATA chunk B/E bits    |
| (Section 7.2)          |                  | encode the same four   |
|                        |                  | states (B+E=SOLE,      |
|                        |                  | B-only=FIRST, neither  |
|                        |                  | =MID, E-only=LAST).    |
|                        |                  | Each fragment carries  |
|                        |                  | its own seqno/TSN and  |
|                        |                  | is independently       |
|                        |                  | retransmitted.         |
+------------------------+------------------+------------------------+
| Stream byte-offset     | QUIC             | QUIC STREAM frame      |
| reassembly             |                  | (RFC 9000 sec. 19.8)   |
| (Sections 1.5, 16)     |                  | uses Offset + Length   |
|                        |                  | varints; FRCP uses     |
|                        |                  | fixed 32-bit start /   |
|                        |                  | end.  One stream per   |
|                        |                  | flow vs QUIC's many    |
|                        |                  | streams multiplexed.   |
+------------------------+------------------+------------------------+
| FIN end-of-stream      | TCP / QUIC       | TCP FIN flag (RFC 9293 |
| marker                 |                  | sec. 3.1) closes one   |
| (Sections 1.2, 16)     |                  | half of the byte       |
|                        |                  | stream; QUIC STREAM    |
|                        |                  | frame FIN bit (RFC     |
|                        |                  | 9000 sec. 19.8) does   |
|                        |                  | the same per stream    |
|                        |                  | with an immutable      |
|                        |                  | final-size invariance  |
|                        |                  | (RFC 9000 sec. 4.5:    |
|                        |                  | the final size is      |
|                        |                  | fixed once observed).  |
|                        |                  | FRCP's FIN consumes    |
|                        |                  | one packet seqno (not  |
|                        |                  | one byte of stream     |
|                        |                  | space) and is          |
|                        |                  | idempotent on the      |
|                        |                  | sender side.           |
+------------------------+------------------+------------------------+
| Stream byte-credit     | QUIC             | MAX_STREAM_DATA (RFC   |
| flow control           |                  | 9000 sec. 4.1, sec.    |
| (Section 16)           |                  | 19.10).  FRCP projects |
|                        |                  | a per-flow byte budget |
|                        |                  | onto the seqno-space   |
|                        |                  | rwe.  Single stream    |
|                        |                  | per flow collapses     |
|                        |                  | QUIC's MAX_DATA /      |
|                        |                  | MAX_STREAM_            |
|                        |                  | DATA distinction.      |
+------------------------+------------------+------------------------+
| Header protection      | QUIC             | QUIC RFC 9001 sec. 5.4 |
| (encrypted seqnos)     |                  | applies header         |
|                        |                  | protection on top of   |
|                        |                  | AEAD to mask the       |
|                        |                  | packet number.  FRCP's |
|                        |                  | per-flow AEAD wrap     |
|                        |                  | (Section 16) is wider: |
|                        |                  | it encrypts the entire |
|                        |                  | PCI including seqno    |
|                        |                  | because the IPCP       |
|                        |                  | below already routes,  |
|                        |                  | so no destination      |
|                        |                  | connection-ID needs to |
|                        |                  | stay in clear (cf.     |
|                        |                  | RFC 9000 sec. 5.2).    |
+------------------------+------------------+------------------------+
| Two-bit fragment role  | SCTP             | The (FFGM, LFGM) pair  |
| polarity               |                  | follows SCTP B/E       |
|                        |                  | (begin = 1 / end = 1)  |
|                        |                  | rather than IPv4 MF    |
|                        |                  | (RFC 791 sec. 3.2),    |
|                        |                  | which has the inverse  |
|                        |                  | polarity (MF = 1 means |
|                        |                  | NOT last).             |
+------------------------+------------------+------------------------+
| Orthogonal reliability | SCTP             | PR-SCTP (RFC 3758,     |
| / ordering axes        |                  | per-message partial    |
| (Section 2.2)          |                  | reliability) and SCTP  |
|                        |                  | DATA U-bit (RFC 9260   |
|                        |                  | sec. 3.3.1, per-       |
|                        |                  | message unordered)     |
|                        |                  | are the closest        |
|                        |                  | precedents for         |
|                        |                  | decoupling reliability |
|                        |                  | from ordering; FRCP    |
|                        |                  | sets them per-flow     |
|                        |                  | rather than per-       |
|                        |                  | message.               |
+------------------------+------------------+------------------------+
| Orthogonal CRC         | UDP-Lite         | RFC 3828 (Larzon et    |
| (qs.ber == 0)          |                  | al., 2004) lets the    |
|                        |                  | sender pick a per-     |
|                        |                  | packet Checksum        |
|                        |                  | Coverage and the       |
|                        |                  | receiver enforce a     |
|                        |                  | locally configured     |
|                        |                  | minimum (no in-band    |
|                        |                  | negotiation; sec. 3.1, |
|                        |                  | sec. 3.3).  FRCP       |
|                        |                  | gates a full CRC       |
|                        |                  | trailer on qs.ber == 0 |
|                        |                  | at flow setup.         |
|                        |                  | Contrast TCP / SCTP    |
|                        |                  | (mandatory checksum)   |
|                        |                  | and QUIC (AEAD         |
|                        |                  | subsumes CRC).         |
+------------------------+------------------+------------------------+
| Setup-time service     | DCCP / SCTP /    | DCCP Service Codes     |
| negotiation            | QUIC             | (RFC 4340 sec. 8.1.2,  |
|                        |                  | RFC 5595); SCTP INIT   |
|                        |                  | parameters (RFC 9260   |
|                        |                  | sec. 3.3.2); QUIC      |
|                        |                  | transport parameters   |
|                        |                  | (RFC 9000 sec. 7.4).   |
|                        |                  | All negotiate service  |
|                        |                  | properties at          |
|                        |                  | connection setup; only |
|                        |                  | RINA's QoS cube        |
|                        |                  | exposes them as an     |
|                        |                  | orthogonal vector.     |
+------------------------+------------------+------------------------+


15.1. Original to FRCP (no clean prior art)

  - Pre-DRF NACK (Section 9): receiver-driven nudge exploiting
    snd_cr.inact > rcv_cr.inact.  Closest analogues are SCTP Gap Ack
    Blocks (RFC 9260 sec. 3.3.4) and DCCP Ack Vector (RFC 4340
    sec. 11.4) - both let the receiver describe gaps to the sender,
    but neither targets the cross-epoch / pre-DRF case.
  - MAX_RDV window-probe give-up: neither TCP (persist-timer
    probes until application or R2 abort, RFC 9293 sec. 3.8.6.1)
    nor QUIC has an explicit FC-give-up counter.  A recursive-
    network choice: outer layers can drop the flow.
  - Skip-past-gap reassembly (Section 7.2): SCTP fragments and
    reassembles every flow regardless of reliability/ordering,
    using its own per-stream reassembly queue; QUIC fragments via
    STREAM offsets.  FRCP fragments best-effort flows too, but
    the receiver drops the broken prefix the moment a later run-
    start (FIRST or SOLE role) is visible inside the RQ_SIZE-wide
    reorder ring - no IP-frag-style timeout, no SCTP-style
    explicit abort.  If no later run-start arrives within the
    ring, frag_run_inspect returns NOT_READY and the partial run
    keeps its slots; the next inspect retries.  The trade-off: a
    permanently-lost MID in a long isolated run holds slots until
    either a later FIRST/SOLE appears in the ring or the writer
    stops, at which point the slots are reclaimed on flow
    teardown.
  - Reassembly deferred to consume time (Section 7.2), message
    mode only (qos.service == SVC_MESSAGE): SCTP (RFC 9260
    sec. 6.9), QUIC (RFC 9000 sec. 2.2), and TCP (RFC 9293) all
    hold reassembly state at the receive boundary.  FRCP message-
    mode leaves fragments in the shared-memory ring until
    flow_read pulls and lands the SDU directly in the caller's
    buffer.  Stream mode (Section 16) uses the standard QUIC-
    style direct ring placement on receive and does not defer.
    The optimisation is enabled by the Shared-Memory Subsystem
    (SSM) packet-buffer ring (see struct ssm_pk_buff at
    Section 1.1); the analogue is OS-level scatter-gather I/O
    (recvmsg+iovec), not a transport-layer prior art.
  - TLP-equivalent tail-loss recovery (RFC 8985 sec. 7;
    RFC 9002 sec. 6.2): FRCP does not emit an explicit Tail Loss
    Probe packet, but the same goal is met implicitly by RACK
    loss detection (Section 8) firing on a non-advancing
    cumulative ACK once the head-of-line slot ages past the RACK
    reorder window R = MIN(reo_wnd_mult * min_RTT / 4, SRTT) -
    well below RTO = max(2 * SRTT, SRTT + (mdev << MDEV_MUL)).
    A receiver-driven nudge is also available via the pre-DRF
    NACK (Section 9).


15.2. Not adopted

  - Slow start, congestion window (cwnd), Additive Increase /
    Multiplicative Decrease (AIMD), NewReno cwnd inflation.
    Congestion control lives in the IPCP CA policies and is
    driven by Explicit Congestion Notification (ECN, RFC 3168).
  - Nagle / silly-window-syndrome (SWS) avoidance (RFC 896, RFC
    1122 sec. 4.2.3.4).  (Deferred work, not adopted in the
    current spec.)
  - TCP Timestamps (RFC 7323) / Protection Against Wrapped
    Sequences (PAWS) - RTT measurement uses RTTP,
    not per-segment timestamps.  A peer-supplied timestamp echoed
    on every ACK lets a malicious peer drive the srtt estimate
    arbitrarily low, collapsing the RTO and triggering a self-
    inflicted retransmit storm.  RTTP confines RTT measurement to
    nonce-authenticated probe round-trips, where a forged echo is
    rejected before it can reach the estimator.
  - ECN (Explicit Congestion Notification) response inside FRCP
    (consumed by IPCP Congestion Avoidance / CA).
  - IP-style fragment-offset reassembly (RFC 791 sec. 3.2; RFC 8200
    sec. 4.5).  Message-mode FRCP relies on the FRCT rq[] reorder
    ring keyed by seqno (shared by FRTX and best-effort flows) to
    put fragments back in order; no separate offset field is
    needed and no IP-style hole-list reassembly buffer is kept.
    Stream-mode FRCP does carry [start, end) byte offsets
    (Section 1.5) for direct ring placement on receive.
  - QUIC STREAM offset+length framing on *every* flow (RFC 9000
    sec. 19.8).  Message-mode FRCP uses the SCTP-style B/E flag-
    bit encoding (FFGM/LFGM) and skips the offsets; stream-mode
    FRCP adopts the QUIC offset model (heritage table above).


16. Stream-mode flows

When a flow is allocated with qos.service == SVC_STREAM both peers switch to byte-stream semantics, layered on top of the FRTX reorder machinery already described in Sections 6-8.

16.1. Send

The sender splits the caller's octets into chunks of at most (frag_mtu - base PCI - stream PCI extension) octets (Sections 1.1 and 1.5). Each chunk is one DATA packet with its own seqno and a [start, end) byte range copied from a monotonic stream counter. In stream mode FFGM and LFGM are unused and MUST be transmitted as zero; the per-byte position is carried by the [start, end) extension instead.

End-of-stream is signalled with a 0-byte DATA packet that has FIN (bit 12) set, emitted on the FIN triggers listed in Section 1.2 (WR-half close, flow_dealloc, and any other path that yields the final byte). The sender MUST emit at most one FIN per flow; its [start, end) MUST equal [final-byte, final-byte) (i.e., empty interval at the final byte position; final-size invariance, analogous to QUIC RFC 9000 sec. 4.5). Idempotency is enforced by an snd_fin_sent guard.

16.2. Receive

On arrival the receiver places the payload directly into a per-flow byte-indexed receive ring of width ring_sz (octets) at the position indicated by start, with a two-segment memcpy across the ring boundary if needed. Receipt is recorded in the FRTX reorder machinery (Section 6.2) augmented with the packet's start, end, and FIN bit per slot. When a packet's [start, end) front-overlaps bytes already at or below the byte high-water mark, the overlap is trimmed before placement so the same byte is never written twice. After stashing, the receiver advances lwe and the byte high-water mark across any newly-contiguous prefix. Each slot advanced MUST satisfy `start == the last-delivered slot's end`; a slot whose start does not equal that end is silently dropped at delivery time (the seqno is consumed, no stream bytes contributed) and the high- water mark does not advance past it. The stream byte-stream stalls at that point - there is no flow-tear-down on mismatch. This filters spliced or off-path-injected slots that fall in window without strong cryptographic authentication.

A FIN slot marks end-of-stream at advance time only if its byte position equals the last-delivered slot's end; otherwise the FIN is ignored and the corresponding seqno occupies a slot but contributes no stream bytes. No packet buffer is held after the ring copy.

16.3. Read

flow_read returns up to count octets from the contiguous prefix [next, high-water), where next is the byte the application has already consumed up to and high-water is the rightmost contiguous byte received. When the stream is fully drained AND end-of-stream (EOS) was observed (next == EOS byte position), flow_read returns 0 (EOF) - the same shape POSIX read(2) uses on TCP after a peer FIN.

16.4. Flow control

ACK / SACK / RACK / RTO machinery is unchanged; the FRTX reorder ring is reused as a per-seqno received-bitmap. Let per_pkt = (frag_mtu - base PCI - stream PCI extension), the maximum stream- byte payload one DATA packet can carry (Section 16.1). The receive window advertised in FC is clamped so the byte window (ring_sz) cannot be overrun: the seqno-space rwe is at most `rcv_cr.lwe + ring_sz / per_pkt`.

This is the QUIC byte-credit flow-control model (MAX_STREAM_DATA, RFC 9000 sec. 4.1 and sec. 19.10) projected onto seqno space. With one stream per flow there is no MAX_DATA / MAX_STREAM_DATA distinction. Receiver-side silly-window-syndrome (SWS) avoidance (RFC 9293 sec. 3.8.6.2.2) is achieved by combining the consume-time rwe bump with the global non-shrink rule from Section 11.

16.5. Security considerations

Threat model. An attacker that can observe (on-path passive) or predict (off-path blind) the flow's seqnos and byte offsets on an unencrypted stream flow can inject DATA or FIN at any in-window position. The in-line consistency checks above (start == prior end on advance; FIN MUST be 0-byte; FIN MUST sit at the final byte position) realise the spirit of RFC 5961's "sequence-window plus exact-position match for control bits" without an explicit challenge-ACK probe; they make a few specific blind attack shapes harder but are not cryptographic authentication. This is comparable to TCP without the TCP Authentication Option (TCP-AO, RFC 5925), tighter than a pre-RFC-5961 TCP stack, and roughly equivalent to a modern RFC 5961 stack against blind off-path injection - none of these help once the attacker can sniff. TLS over TCP (RFC 8446) encrypts only the TCP payload and leaves TCP seqnos, ACKs, FIN, and RST in the clear, so TLS does NOT defend against TCP-header- level injection; QUIC (RFC 9000) hides packet numbers under header protection (RFC 9001 sec. 5.4), so this specific weakness does not apply to QUIC.

Mitigation: AEAD. When the flow has encryption enabled the recommended AEAD ciphers (AES-GCM, RFC 5288; or ChaCha20-Poly1305, RFC 8439) wrap the entire FRCP packet on the wire - PCI, stream extension, body, and the CRC trailer when ber == 0 - under a per-flow symmetric key derived from the flow's own key exchange (Section 1.1). The AEAD tag (~2^-128 forgery probability) dominates the CRC (~2^-32) for integrity in this mode but the CRC trailer is currently retained inside the wrap (see Section 1.1). Implementations MUST NOT rely on the security properties below when a non-AEAD cipher (e.g. AES-CTR alone) is negotiated; non- AEAD modes provide confidentiality only and the threat-model claims do not hold.

With an AEAD cipher in use, seqnos, byte offsets, and the FIN bit are both authenticated and confidential. Against an off-path or on-path-passive attacker this is:

  - Stronger than TCP+TLS (TCP header in the clear).
  - Stronger than TCP+TCP-AO (header authenticated but visible).
  - Comparable to IPsec ESP transport mode (RFC 4303), which
    similarly authenticates and encrypts the upper-layer header
    plus payload, and to QUIC packet protection (RFC 9001 sec. 5),
    with the difference that QUIC must leave the destination
    connection ID in the clear for routing whereas FRCP relies on
    the IPCP below for delivery and can therefore encrypt its
    entire PCI.

Keying granularity. FRCP runs key exchange (kex) per flow, so each flow_alloc yields independent symmetric keys. This is finer-grained than QUIC (per-connection, RFC 9001, where one handshake covers all multiplexed streams) and finer-grained than typical IPsec deployment (per-host-pair Security Associations, SAs). Forward secrecy follows from the kex when an ephemeral Diffie-Hellman exchange (DHE), or a hybrid mode (classical DH + post-quantum Key Encapsulation Mechanism / KEM), is selected.

Replay protection. The AEAD layer itself does NOT carry an explicit anti-replay window (unlike IPsec ESP, RFC 4303 sec. 3.4.3, or DTLS, RFC 9147 sec. 4.5.1). For FRCP-engaged flows the seqno-space duplicate-suppression in Section 6.2 rejects replayed DATA after the AEAD strips the wrap, because the AEAD authenticates the seqno and a replay re-presents an old seqno that is then discarded either as a duplicate (still inside the receive window) or as outside the receive window, depending on how far lwe has advanced since the original packet was delivered. RAW (qos.service == SVC_RAW) flows have no FRCP layer and therefore no replay protection at the AEAD layer either; deployments that need replay rejection on RAW flows MUST provide it at a higher layer.

Layering. The AEAD wrap sits below FRCP on the data path, so RAW best-effort flows (qos.service == SVC_RAW, the UDP-equivalent service of Section 2.2) inherit the same per-flow integrity + confidentiality scope as FRCP-engaged flows - whatever the IPCP and FRCP (if any) put on the wire is what the AEAD authenticates. No DTLS-equivalent layering is required for confidentiality and integrity; replay protection above AEAD is a separate concern as noted above.


17. References

This section lists the IETF documents, published works, and source-code references cited inline elsewhere in this document. IETF documents are cited inline as "RFC NNNN sec. X.Y"; books, journal papers, and source-code references are cited inline by author and year (or by file and function name) and are listed here for convenience.


17.1. IETF documents

  [RFC 791]   J. Postel, "Internet Protocol", STD 5, RFC 791,
              September 1981.
  [RFC 793]   J. Postel, "Transmission Control Protocol", STD 7,
              RFC 793, September 1981.  Obsoleted by RFC 9293.
  [RFC 813]   D. D. Clark, "Window and Acknowledgement Strategy
              in TCP", RFC 813, July 1982.
  [RFC 896]   J. Nagle, "Congestion Control in IP/TCP
              Internetworks", RFC 896, January 1984.
  [RFC 1122]  R. Braden (ed.), "Requirements for Internet Hosts
              -- Communication Layers", STD 3, RFC 1122,
              October 1989.
  [RFC 2018]  M. Mathis, J. Mahdavi, S. Floyd, A. Romanow,
              "TCP Selective Acknowledgment Options", RFC 2018,
              October 1996.
  [RFC 2119]  S. Bradner, "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.
  [RFC 2883]  S. Floyd, J. Mahdavi, M. Mathis, M. Podolsky,
              "An Extension to the Selective Acknowledgement
              (SACK) Option for TCP", RFC 2883, July 2000.
  [RFC 3758]  R. Stewart, M. Ramalho, Q. Xie, M. Tuexen,
              P. Conrad, "Stream Control Transmission Protocol
              (SCTP) Partial Reliability Extension", RFC 3758,
              May 2004.
  [RFC 3828]  L-A. Larzon, M. Degermark, S. Pink, L-E. Jonsson
              (ed.), G. Fairhurst (ed.), "The Lightweight User
              Datagram Protocol (UDP-Lite)", RFC 3828,
              July 2004.
  [RFC 4303]  S. Kent, "IP Encapsulating Security Payload
              (ESP)", RFC 4303, December 2005.
  [RFC 4340]  E. Kohler, M. Handley, S. Floyd, "Datagram
              Congestion Control Protocol (DCCP)", RFC 4340,
              March 2006.
  [RFC 5288]  J. Salowey, A. Choudhury, D. McGrew, "AES Galois
              Counter Mode (GCM) Cipher Suites for TLS",
              RFC 5288, August 2008.
  [RFC 5595]  G. Fairhurst, "The Datagram Congestion Control
              Protocol (DCCP) Service Codes", RFC 5595,
              September 2009.
  [RFC 5681]  M. Allman, V. Paxson, E. Blanton, "TCP Congestion
              Control", RFC 5681, September 2009.
  [RFC 5925]  J. Touch, A. Mankin, R. Bonica, "The TCP
              Authentication Option", RFC 5925, June 2010.
  [RFC 5961]  A. Ramaiah, R. Stewart, M. Dalal, "Improving
              TCP's Robustness to Blind In-Window Attacks",
              RFC 5961, August 2010.
  [RFC 6298]  V. Paxson, M. Allman, J. Chu, M. Sargent,
              "Computing TCP's Retransmission Timer", RFC 6298,
              June 2011.
  [RFC 6528]  F. Gont, S. Bellovin, "Defending against Sequence
              Number Attacks", RFC 6528, February 2012.
              Obsoletes RFC 1948.
  [RFC 6582]  T. Henderson, S. Floyd, A. Gurtov, Y. Nishida,
              "The NewReno Modification to TCP's Fast Recovery
              Algorithm", RFC 6582, April 2012.
  [RFC 7323]  D. Borman, B. Braden, V. Jacobson,
              R. Scheffenegger (ed.), "TCP Extensions for High
              Performance", RFC 7323, September 2014.
  [RFC 7675]  M. Perumal, D. Wing, R. Ravindranath, T. Reddy,
              M. Thomson, "Session Traversal Utilities for NAT
              (STUN) Usage for Consent Freshness", RFC 7675,
              October 2015.
  [RFC 8174]  B. Leiba, "Ambiguity of Uppercase vs Lowercase in
              RFC 2119 Key Words", BCP 14, RFC 8174, May 2017.
  [RFC 8200]  S. Deering, R. Hinden, "Internet Protocol,
              Version 6 (IPv6) Specification", STD 86, RFC 8200,
              July 2017.
  [RFC 8439]  Y. Nir, A. Langley, "ChaCha20 and Poly1305 for IETF
              Protocols", RFC 8439, June 2018.
  [RFC 8446]  E. Rescorla, "The Transport Layer Security (TLS)
              Protocol Version 1.3", RFC 8446, August 2018.
  [RFC 8985]  Y. Cheng, N. Cardwell, N. Dukkipati, P. Jha,
              "The RACK-TLP Loss Detection Algorithm for TCP",
              RFC 8985, February 2021.
  [RFC 9000]  J. Iyengar (ed.), M. Thomson (ed.), "QUIC: A
              UDP-Based Multiplexed and Secure Transport",
              RFC 9000, May 2021.
  [RFC 9001]  M. Thomson (ed.), S. Turner (ed.), "Using TLS to
              Secure QUIC", RFC 9001, May 2021.
  [RFC 9002]  J. Iyengar (ed.), I. Swett (ed.), "QUIC Loss
              Detection and Congestion Control", RFC 9002,
              May 2021.
  [RFC 9147]  E. Rescorla, H. Tschofenig, N. Modadugu,
              "The Datagram Transport Layer Security (DTLS)
              Protocol Version 1.3", RFC 9147, April 2022.
  [RFC 9260]  R. Stewart, M. Tuexen, K. Nielsen, "Stream Control
              Transmission Protocol", RFC 9260, June 2022.
              Obsoletes RFC 4960.
  [RFC 9293]  W. Eddy (ed.), "Transmission Control Protocol
              (TCP)", STD 7, RFC 9293, August 2022.  Obsoletes
              RFC 793 and several follow-ons; updates RFC 1122
              and others.


17.2. Books and journal papers

  - J. Day, "Patterns in Network Architecture: A Return to
    Fundamentals", Prentice Hall, 2008.
  - E. Grasa et al., "IRATI: investigating RINA as an alternative
    to TCP/IP", Computer Networks, Vol. 92, December 2015.
  - P. Karn, C. Partridge, "Improving Round-Trip Time Estimates
    in Reliable Transport Protocols", ACM SIGCOMM, August 1987.
  - R. W. Watson, "Timer-Based Mechanisms in Reliable Transport
    Protocol Connection Management", Computer Networks, Vol. 5,
    1981.


17.3. Source-code references

  - tcp_rtt_estimator() in net/ipv4/tcp_input.c of the Linux
    kernel, defining the asymmetric mdev variance update used as
    FRCP's default RTT estimator (Section 12).  Line-stable
    browseable copy at
    https://elixir.bootlin.com/linux/latest/source/net/ipv4/tcp_input.c.