Flow and Retransmission Control Protocol: Difference between revisions

From Ouroboros
Jump to navigation Jump to search
 
(10 intermediate revisions by the same user not shown)
Line 1: Line 1:
= FRCP - Flow and Retransmission Control Protocol =
{{DISPLAYTITLE:FRCP - Flow and Retransmission Control Protocol}}


FRCP runs end-to-end between two peers over a flow.  It delivers
FRCP runs end-to-end between two peers over a flow.  It delivers
Line 9: Line 9:


FRCT (Flow and Retransmission Control Task) is the libouroboros
FRCT (Flow and Retransmission Control Task) is the libouroboros
implementation of FRCP; the task lives in src/lib/frct.c.  The
implementation of FRCP; the task lives in <code>src/lib/frct.c</code>.  The
remainder of this document describes the FRCP wire protocol and the
remainder of this document describes the FRCP wire protocol and the
behaviour FRCT realises.  Code symbols retain the FRCT_ prefix
behaviour FRCT realises.  Code symbols retain the <code>FRCT_</code> prefix
(FRCT_DATA, FRCT_RXM, ...) because they belong to the implementing
(<code>FRCT_DATA</code>, <code>FRCT_RXM</code>, ...) because they belong to the implementing
task; this document references them verbatim.
task; this document references them verbatim.


The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in
this document are to be interpreted as described in BCP 14 (Best
this document are to be interpreted as described in <code>BCP</code> 14 (Best
Current Practice; RFC 2119, RFC 8174) when, and only when, they
Current Practice; RFC 2119, RFC 8174) when, and only when, they
appear in all capitals.
appear in all capitals.
Line 24: Line 24:
== Notation ==
== Notation ==


<pre>
;<code>u32</code>, <code>u8</code>
  u32, u8       Unsigned 32-bit / 8-bit integers (kernel-C style).
:Unsigned 32-bit / 8-bit integers (kernel-C style).
  ns           Nanoseconds.
;<code>ns</code>
</pre>
:Nanoseconds.


Modular sequence-number comparators (32-bit, modulo 2^32):
Modular sequence-number comparators (32-bit, modulo 2^32):


<pre>
;<code>before(a, b)</code>
    before(a, b) ==  (int32_t)(a - b) < 0
:<code>(int32_t)(a - b) &lt; 0</code>
    after(a, b)   ==  before(b, a)
;<code>after(a, b)</code>
</pre>
:<code>before(b, a)</code>


Used throughout for ackno / seqno ordering checks.
Used throughout for <code>ackno</code> / <code>seqno</code> ordering checks.


Round-Trip Time (RTT) abbreviations used throughout:
Round-Trip Time (RTT) abbreviations used throughout:


<pre>
;<code>SRTT</code>
    SRTT          Smoothed RTT estimate (RFC 6298).
:Smoothed RTT estimate (RFC 6298).
    mdev         Mean deviation of RTT (Linux variance estimator).
;<code>mdev</code>
    EWMA         Exponentially Weighted Moving Average.
:Mean deviation of RTT (Linux variance estimator).
    RTO           Retransmission Timeout, max(RTO_MIN,
;<code>EWMA</code>
                  srtt + (mdev << MDEV_MUL)).
:Exponentially Weighted Moving Average.
</pre>
;<code>RTO</code>
:Retransmission Timeout, <code>max(RTO_MIN, srtt + (mdev &lt;&lt; MDEV_MUL))</code>.


Timer-bound symbols t_a (a-timer, ACK delay) and t_r (r-timer,
Timer-bound symbols <code>t_a</code> (a-timer, ACK delay) and <code>t_r</code> (r-timer,
retransmission window) are defined in Section 8; t_mpl (Maximum
retransmission window) are defined in [[#8. Retransmission|Section 8]]; <code>t_mpl</code> (Maximum
Packet Lifetime) is introduced in Section 2.1 (the inact field)
Packet Lifetime) is introduced in [[#2.1. Per-flow state|Section 2.1]] (the <code>inact</code> field)
with heritage in Section 15.
with heritage in [[#15. Heritage and adopted techniques|Section 15]].


Wire-format diagrams follow the IETF convention: bit 0 is the
Wire-format diagrams follow the IETF convention: bit 0 is the
leftmost (most significant) bit and fields are in network byte
leftmost (most significant) bit and fields are in network byte
order unless stated otherwise.
order unless stated otherwise.
__TOC__




Line 64: Line 67:
Fixed 16-octet base Protocol-Control Information (PCI) header
Fixed 16-octet base Protocol-Control Information (PCI) header
prefixed to every FRCP packet (RFC convention: bit 0 leftmost,
prefixed to every FRCP packet (RFC convention: bit 0 leftmost,
most-significant bit first).  All multi-byte fields except hcs
most-significant bit first).  All multi-byte fields are in network byte order. DATA packets on
are in network byte order; hcs is an opaque 16-bit value that
the receiver recomputes from the wire bytes and compares to the
in-place pci->hcs read, so its on-wire byte order need only
match between peers running compatible builds. DATA packets on
stream-mode flows carry an additional 8-octet extension (see
stream-mode flows carry an additional 8-octet extension (see
Section 1.5); SACK and RTTP carry their own payloads after the
[[#1.5. Stream PCI extension|Section 1.5]]); SACK and RTTP carry their own payloads after the
base PCI.
base PCI.


<pre>
<syntaxhighlight lang="text">
     0                  1                  2                  3
     0                  1                  2                  3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
Line 87: Line 86:
     |                    payload (variable) ...
     |                    payload (variable) ...
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
</pre>
</syntaxhighlight>


<pre>
;<code>flags</code>
  flags  - feature/type bitmap (see 1.2).
:feature/type bitmap (see [[#1.2. Flag bits|Section 1.2]]).
  hcs     - CRC-16-CCITT-FALSE Header Check Sequence (HCS) over
;<code>hcs</code>
            flags + window + seqno + ackno (+ stream extension when
:CRC-16-CCITT-FALSE Header Check Sequence (HCS) over <code>flags</code> + <code>window</code> + <code>seqno</code> + <code>ackno</code> (+ stream extension when present); the two octets of the <code>hcs</code> field itself are omitted from the CRC input.  Verified on receive before any flag-driven dispatch.
            present); the two octets of the hcs field itself are
;<code>window</code>
            omitted from the CRC input.  Verified on receive before
:receiver-advertised right window edge (valid iff FC).
            any flag-driven dispatch.
;<code>seqno</code>
  window - receiver-advertised right window edge (valid iff FC).
:per-flow sequence number.
  seqno   - per-flow sequence number.
;<code>ackno</code>
  ackno   - cumulative Acknowledgement (ACK) (valid iff ACK).
:cumulative Acknowledgement (ACK) (valid iff ACK).
</pre>


A single packet can simultaneously carry DATA + ACK + FC (Flow
A single packet can simultaneously carry DATA + ACK + FC (Flow
Line 110: Line 108:
PCI.
PCI.


Optional framing (per-flow, see Section 2.2).  On the wire, the
Optional framing (per-flow, see [[#2.2. Service modes (orthogonal axes)|Section 2.2]]).  On the wire, the
order from inside out is:
order from inside out is:


<pre>
{| class="wikitable"
    [   PCI + body         ]   -- the FRCP packet
! Layer !! Scope
    [   PCI + body + CRC-32 ]   -- CRC-32 covers the body only (PCI
|-
                                    is in HCS); appended iff qs.ber
| <code>[ PCI + body ]</code>
                                    == 0 on DATA, or on every SACK
| The FRCP packet.
                                    packet
|-
    [ AEAD-wrap of above   ]   -- iff Authenticated Encryption
| <code>[ PCI + body + CRC-32 ]</code>
                                    with Associated Data (AEAD) is
| CRC-32 covers the body only (PCI is in HCS); appended iff <code>qs.ber == 0</code> on DATA, or on every SACK packet.
                                    enabled
|-
</pre>
| <code>[ AEAD-wrap of above ]</code>
| Iff Authenticated Encryption with Associated Data (AEAD) is enabled.
|}


<pre>
* HCS in the PCI covers the header fields on every packet and is verified before any flag-driven dispatch.
  - HCS in the PCI covers the header fields on every packet and is
* The CRC-32 trailer (IEEE 802.3 / zlib reflected polynomial <code>0xEDB88320</code>, init <code>0xFFFFFFFF</code>, xor-out <code>0xFFFFFFFF</code>) covers the body on DATA when <code>qs.ber == 0</code> and on every SACK packet. The PCI is not under the CRC (Cyclic Redundancy Check) because the HCS already protects it. It is appended before AEAD encryption and therefore rides inside the AEAD wrap when both are active; the AEAD tag (~2^-128 forgery probability) dominates the CRC (~2^-32) for integrity in that mode but the CRC trailer is currently retained.
    verified before any flag-driven dispatch.
* When encryption is enabled, the entire (possibly-CRC'd) FRCP packet is wrapped with AEAD inside the shared-memory packet buffer (<code>spb</code>, <code>struct ssm_pk_buff</code>); the packet grows by the AEAD overhead, namely a leading nonce / Initialization Vector (IV) of <code>headsz</code> bytes (<code>crypt_get_ivsz</code>) and a trailing authentication tag of <code>tailsz</code> bytes (<code>crypt_get_tagsz</code>).
  - The CRC-32 trailer (IEEE 802.3 / zlib reflected polynomial
    0xEDB88320, init 0xFFFFFFFF, xor-out 0xFFFFFFFF) covers the
    body on DATA when qs.ber == 0 and on every SACK packet; the
    trailer is written as a raw uint32_t (the same convention as
    hcs: opaque on the wire as long as both peers run compatible
    builds). The PCI is not under the CRC (Cyclic Redundancy
    Check) because the HCS already protects it. It is
    appended before AEAD encryption and therefore rides inside the
    AEAD wrap when both are active; the AEAD tag (~2^-128 forgery
    probability) dominates the CRC (~2^-32) for integrity in that
    mode but the CRC trailer is currently retained.
  - When encryption is enabled, the entire (possibly-CRC'd) FRCP
    packet is wrapped with AEAD inside the shared-memory packet
    buffer (spb, struct ssm_pk_buff); the packet grows by the AEAD
    overhead, namely a leading nonce / Initialization Vector (IV)
    of headsz bytes (crypt_get_ivsz) and a trailing authentication
    tag of tailsz bytes (crypt_get_tagsz).
</pre>


Both CRC and AEAD are layered around the FRCP wire format and
Both CRC and AEAD are layered around the FRCP wire format and are not visible to the FRCP machinery itself.
are not visible to the FRCP machinery itself.


=== 1.2. Flag bits ===
=== 1.2. Flag bits ===


Flag bits are numbered most-significant-bit first to match the wire
Flag bits are numbered most-significant-bit first to match the wire
diagram (bit numbering per Section 1.1; bit 0 is the MSB of the
diagram (bit numbering per [[#1.1. PCI header|Section 1.1]]; bit 0 is the MSB of the
16-bit flags field and lands at wire-position 0 in network byte
16-bit <code>flags</code> field and lands at wire-position 0 in network byte
order).  Bits 13..15 are reserved and MUST be transmitted as zero.
order).  Bits 13..15 are reserved and MUST be transmitted as zero.


<pre>
{| class="wikitable"
    +------+--------+--------+----------------------------------------+
! Bit !! Mask !! Name !! Meaning
    | Bit | Mask   | Name   | Meaning                               |
|-
    +------+--------+--------+----------------------------------------+
| 0 || <code>0x8000</code> || <code>DATA</code> || Carries caller payload
    |   0 | 0x8000 | DATA   | Carries caller payload                 |
|-
    |   1 | 0x4000 | DRF   | Data Run Flag: start of a fresh run   |
| 1 || <code>0x4000</code> || <code>DRF</code> || Data Run Flag: start of a fresh run
    |   2 | 0x2000 | ACK   | Acknowledgement: ackno field valid     |
|-
    |   3 | 0x1000 | NACK   | Negative ACK; seqno = arrival_seqno-1 |
| 2 || <code>0x2000</code> || <code>ACK</code> || Acknowledgement: <code>ackno</code> field valid
    |   4 | 0x0800 | FC     | Flow Control: window field valid (rwe) |
|-
    |   5 | 0x0400 | RDVS   | Rendezvous probe (window-closed)       |
| 3 || <code>0x1000</code> || <code>NACK</code> || Negative ACK; <code>seqno = arrival_seqno-1</code>
    |   6 | 0x0200 | FFGM   | First Fragment (role bit 0; see below) |
|-
    |   7 | 0x0100 | LFGM   | Last Fragment (role bit 1; see below) |
| 4 || <code>0x0800</code> || <code>FC</code> || Flow Control: <code>window</code> field valid (<code>rwe</code>)
    |   8 | 0x0080 | RXM   | Retransmission                         |
|-
    |   9 | 0x0040 | SACK   | Selective ACK block list in payload   |
| 5 || <code>0x0400</code> || <code>RDVS</code> || Rendezvous probe (window-closed)
    | 10 | 0x0020 | RTTP   | RTT Probe / echo (payload follows)     |
|-
    | 11 | 0x0010 | KA     | Keepalive                             |
| 6 || <code>0x0200</code> || <code>FFGM</code> || First Fragment (role bit 0; see below)
    | 12 | 0x0008 | FIN   | End-of-stream marker (stream mode)     |
|-
    | 13-15|   --   | --   | Reserved (MUST be zero)               |
| 7 || <code>0x0100</code> || <code>LFGM</code> || Last Fragment (role bit 1; see below)
    +------+--------+--------+----------------------------------------+
|-
</pre>
| 8 || <code>0x0080</code> || <code>RXM</code> || Retransmission
|-
| 9 || <code>0x0040</code> || <code>SACK</code> || Selective ACK block list in payload
|-
| 10 || <code>0x0020</code> || <code>RTTP</code> || RTT Probe / echo (payload follows)
|-
| 11 || <code>0x0010</code> || <code>KA</code> || Keepalive
|-
| 12 || <code>0x0008</code> || <code>FIN</code> || End-of-stream marker (stream mode)
|-
| 13-15 || -- || -- || Reserved (MUST be zero)
|}


The (FFGM, LFGM) pair encodes the fragment role of a DATA-bearing
The (<code>FFGM</code>, <code>LFGM</code>) pair encodes the fragment role of a DATA-bearing
Service Data Unit (SDU), SCTP-style begin/end flags (RFC 9260
Service Data Unit (SDU), SCTP-style begin/end flags (RFC 9260
sec. 3.3.1):
sec. 3.3.1):


<pre>
{| class="wikitable"
    +-----------+-------------------------------------------------+
! FFGM !! LFGM !! Role
    | FFGM LFGM | Role                                           |
|-
    +-----------+-------------------------------------------------+
| 1 || 1 || Sole / un-fragmented SDU (begin AND end)
    |   1   1   | Sole / un-fragmented SDU (begin AND end)       |
|-
    |   1   0   | First fragment of a multi-fragment SDU         |
| 1 || 0 || First fragment of a multi-fragment SDU
    |   0   0   | Middle fragment                                 |
|-
    |   0   1   | Last fragment                                   |
| 0 || 0 || Middle fragment
    +-----------+-------------------------------------------------+
|-
</pre>
| 0 || 1 || Last fragment
|}


Each fragment is carried in its own FRCP packet with its own seqno;
Each fragment is carried in its own FRCP packet with its own <code>seqno</code>;
FRTX (the FRCT Retransmission service mode, see Section 2.2)
FRTX (the FRCT Retransmission service mode, see [[#2.2. Service modes (orthogonal axes)|Section 2.2]])
recovers individual fragments via the normal Retransmission Timeout
recovers individual fragments via the normal Retransmission Timeout
(RTO) / SACK / Recent Acknowledgement (RACK, RFC 8985) path.  The
(RTO) / SACK / Recent Acknowledgement (RACK, RFC 8985) path.  The
Line 200: Line 193:
bits are unused and MUST be transmitted as zero.
bits are unused and MUST be transmitted as zero.


In stream mode (qos.service == SVC_STREAM, see Section 16) there are
In stream mode (<code>qos.service == SVC_STREAM</code>, see [[#16. Stream-mode flows|Section 16]]) there are
no SDU boundaries to encode, so FFGM and LFGM are unused and MUST
no SDU boundaries to encode, so <code>FFGM</code> and <code>LFGM</code> are unused and MUST
be transmitted as zero.  End-of-stream uses a dedicated bit (FIN,
be transmitted as zero.  End-of-stream uses a dedicated bit (<code>FIN</code>,
bit 12) carried on a 0-byte DATA packet, emitted at write-half close
bit 12) carried on a 0-byte DATA packet, emitted at write-half close
(fccntl to FLOWFRDONLY), during linger drain, and at flow_dealloc;
(<code>fccntl</code> to <code>FLOWFRDONLY</code>), during linger drain, and at <code>flow_dealloc</code>;
emission is idempotent (first call wins).  After contiguous delivery
emission is idempotent (first call wins).  After contiguous delivery
of the FIN-bearing slot, the receiver latches byte_fin at the FIN's
of the FIN-bearing slot, the receiver latches <code>byte_fin</code> at the FIN's
start offset; flow_read returns 0 (end-of-file, EOF) once buffered
start offset; <code>flow_read</code> returns 0 (end-of-file, <code>EOF</code>) once buffered
bytes have been drained up to byte_fin.  Per-byte position is
bytes have been drained up to <code>byte_fin</code>.  Per-byte position is
carried by the [start, end) extension (Section 1.5).
carried by the [start, end) extension ([[#1.5. Stream PCI extension|Section 1.5]]).




=== 1.3. SACK payload ===
=== 1.3. SACK payload ===


A SACK packet has the FRCT_ACK | FRCT_FC | FRCT_SACK flag bits set
A SACK packet has the <code>FRCT_ACK | FRCT_FC | FRCT_SACK</code> flag bits set
(bit numbering per Section 1.1).  Following the 16-octet PCI, the
(bit numbering per [[#1.1. PCI header|Section 1.1]]).  Following the 16-octet PCI, the
payload is a 2-octet block count (network byte order), 2 octets of
payload is a 2-octet block count (network byte order), 2 octets of
padding to 4-byte align the block list, then n_blocks pairs of
padding to 4-byte align the block list, then <code>n_blocks</code> pairs of
32-bit start/end seqnos describing *present* (received) ranges
32-bit start/end seqnos describing ''present'' (received) ranges
above the cumulative ACK.
above the cumulative ACK.


<pre>
<syntaxhighlight lang="text">
     0                  1                  2                  3
     0                  1                  2                  3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
Line 234: Line 227:
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                       ... n_blocks pairs total ...
                       ... n_blocks pairs total ...
</pre>
</syntaxhighlight>


n_blocks <= SACK_MAX_BLOCKS (2048).  The per-flow effective cap is
<code>n_blocks &lt;= SACK_MAX_BLOCKS</code> (2048).  The per-flow effective cap is
further bounded by (frag_mtu - PCI - 4) / 8 blocks per packet; SACK
further bounded by <code>(frag_mtu - PCI - 4) / 8</code> blocks per packet; SACK
packets carry no stream extension, so PCI here is the 16-octet base
packets carry no stream extension, so PCI here is the 16-octet base
header even on stream-mode flows.
header even on stream-mode flows.
Line 244: Line 237:
optional leading Duplicate SACK (D-SACK) block as described below,
optional leading Duplicate SACK (D-SACK) block as described below,
describes a range strictly above the cumulative ACK carried in the
describes a range strictly above the cumulative ACK carried in the
PCI ackno field (after(start[i], ackno)).  This makes the D-SACK
PCI <code>ackno</code> field (<code>after(start[i], ackno)</code>).  This makes the D-SACK
convention below unambiguous; the receiver-side builder MUST
convention below unambiguous; the receiver-side builder MUST
preserve it.
preserve it.
Line 250: Line 243:
Duplicate SACK (D-SACK, RFC 2883) is signalled in-band: no flag
Duplicate SACK (D-SACK, RFC 2883) is signalled in-band: no flag
bit, no extra framing.  Modular seqno arithmetic uses the
bit, no extra framing.  Modular seqno arithmetic uses the
before() / after() comparators defined in the Notation block.
<code>before()</code> / <code>after()</code> comparators defined in the Notation block.
Block[0] carries a D-SACK report when either:
 
Encoding.  When a duplicate is observed the receiver arms a
single-slot pending report (<code>dsack_seqno</code> + <code>dsack_valid</code>,
latest-wins across multiple arms before the next emit).  On the
next outbound SACK the receiver prepends <code>block[0] = [dsack_seqno,
dsack_seqno + 1)</code> - always a one-<code>seqno</code> range - and clears the
flag.  The three arm sites are listed in [[#10. Cumulative + selective ACK|Section 10]]; case-1 sites
yield <code>dsack_seqno &lt; rcv_cr.lwe</code> (the next <code>pci.ackno</code>), and the
case-2 site (<code>rq_accept</code> conflict) yields <code>dsack_seqno</code> in
<code>[rcv_cr.lwe, rcv_cr.rwe)</code>.
 
Detection.  The sender classifies <code>block[0]</code> by its relation to
<code>pci.ackno</code>:


<pre>
;case 1 (RFC 2883 sec. 4.1.1, full duplicate)
  case 1 (RFC 2883 sec. 4.1.1, full duplicate):
:<code>before(blocks[0].start, pci.ackno)</code> AND <code>pci.ackno - blocks[0].start &lt;= MAX_DSACK_LAG</code> (<code>== RQ_SIZE</code>).  The lag bound rejects stale or spoofed reports beyond one receive window.
      before(blocks[0].start, ackno) and ackno - blocks[0].start is
;case 2 (RFC 2883 sec. 4.1.2, partial duplicate)
      within MAX_DSACK_LAG (== RQ_SIZE).  A single duplicate seqno
:<code>blocks[0]</code> is a sub-range (with at least one endpoint differing) of some <code>blocks[i&gt;0]</code> - i.e. the same packet's remaining SACK blocks already describe the duplicated <code>seqno</code> as received.
      observed below the cumulative ACK.
</pre>


<pre>
On detect, the sender:
  case 2 (RFC 2883 sec. 4.1.2, partial duplicate):
      blocks[0] is a sub-range of some blocks[i>0] (not exactly
      equal).  Reports a duplicate of an in-window seqno that the
      same packet's remaining SACK blocks already describe as
      received.
</pre>


Senders that do not implement D-SACK process block[0] through the
* bumps <code>reo_wnd_mult</code> by 1, capped at <code>REO_WND_MULT_MAX</code> (= 20), per RFC 8985 sec. 6.2 step 4;
normal SACK-mark loop and the existing clamp-and-skip path makes
* snapshots <code>dsack_lwe_snap = snd_cr.lwe</code>, resetting the 16-cum-ACK halving counter so the multiplier doesn't decay while D-SACK evidence is still arriving;
case-1 a no-op (start < snd_cr.lwe clamps to snd_cr.lwe, the inner
* excludes <code>block[0]</code> from the gap-marking loop (<code>n_real = n - 1</code>), so a D-SACK alone never enters NewReno-careful recovery (see [[#8. Retransmission|Section 8]]); only non-D-SACK blocks count as gaps.
loop then skips k == snd_cr.lwe) and case-2 idempotent (same slots
NULL'd twice).  D-SACK-aware senders feed the report into the RACK
reo_wnd_mult scaler (RFC 8985 sec. 6.2 step 4): bump on receipt
(cap 20), halve once per 16 cumulatively-ACK'd seqnos since the
most recent D-SACK arrival or halve event, reset to 1 on an RTO
timer fire at the head-of-line.  D-SACK alone never enters
NewReno-careful recovery (see Section 8); only non-D-SACK blocks
count as gaps.


The <code>reo_wnd_mult</code> halving cadence (once per 16 cumulatively-ACK'd
seqnos since the most-recent D-SACK arrival or halve event) and
the reset-to-1 on a HoL RTO fire are both per the same RFC 8985
clause.  The clamp-and-skip path in the regular SACK-mark loop is
incidentally idempotent on any leftover case-1 or case-2 block
(<code>start &lt; snd_cr.lwe</code> clamps to <code>snd_cr.lwe</code> and the inner loop
skips <code>k == snd_cr.lwe</code>; case-2 re-NULLs slots already marked
received by later blocks), so block[0] is harmless even when fed
to the loop.


=== 1.4. RTTP payload ===
=== 1.4. RTTP payload ===


An RTTP (Round-Trip Time Probe) packet has only the FRCT_RTTP flag
An RTTP (Round-Trip Time Probe) packet has only the <code>FRCT_RTTP</code> flag
set (bit numbering per Section 1.1).  Following the 16-octet PCI,
set (bit numbering per [[#1.1. PCI header|Section 1.1]]).  Following the 16-octet PCI,
the payload is 24 octets (packed):
the payload is 24 octets (packed):


<pre>
<syntaxhighlight lang="text">
     0                  1                  2                  3
     0                  1                  2                  3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
Line 299: Line 297:
     |                                                              |
     |                                                              |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
</pre>
</syntaxhighlight>


<pre>
;<code>probe_id</code>
  probe_id - sender counter, 0 on reply, 0 reserved.
:sender counter, 0 on reply, 0 reserved.
  echo_id - peer's probe_id, 0 on outbound probe.
;<code>echo_id</code>
  nonce   - random, echoed unmodified, memcmp'd to defeat spoof.
:peer's <code>probe_id</code>, 0 on outbound probe.
</pre>
;<code>nonce</code>
:random, echoed unmodified, memcmp'd to defeat spoof.




=== 1.5. Stream PCI extension ===
=== 1.5. Stream PCI extension ===


A stream-mode flow (qos.service == SVC_STREAM) carries an extra
A stream-mode flow (<code>qos.service == SVC_STREAM</code>) carries an extra
8-octet extension after the 16-octet base PCI on every DATA packet
8-octet extension after the 16-octet base PCI on every DATA packet
(bit numbering per Section 1.1):
(bit numbering per [[#1.1. PCI header|Section 1.1]]):


<pre>
<syntaxhighlight lang="text">
     0                  1                  2                  3
     0                  1                  2                  3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
Line 322: Line 321:
     |                            end                              |
     |                            end                              |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
</pre>
</syntaxhighlight>


<pre>
;<code>start</code>
  start - octet offset of the first payload byte in the stream.
:octet offset of the first payload byte in the stream.
  end   - octet offset one past the last payload byte;
;<code>end</code>
          end - start equals the on-wire payload length.
:octet offset one past the last payload byte; <code>end - start</code> equals the on-wire payload length.
</pre>


Total stream-mode PCI for DATA packets is 24 octets (16 base + 8
Total stream-mode PCI for DATA packets is 24 octets (16 base + 8
Line 334: Line 332:
the 16-octet base PCI.  Stream mode MUST be negotiated at flow
the 16-octet base PCI.  Stream mode MUST be negotiated at flow
allocation; the extension is present iff stream mode is in use,
allocation; the extension is present iff stream mode is in use,
never on a per-packet basis.  Both peers MUST treat start/end as
never on a per-packet basis.  Both peers MUST treat <code>start</code>/<code>end</code> as
monotonic 32-bit byte offsets; when a slot reaches the head of the
monotonic 32-bit byte offsets; when a slot reaches the head of the
contiguous run with start not equal to the prior packet's end the
contiguous run with <code>start</code> not equal to the prior packet's <code>end</code> the
slot is silently dropped at delivery time (Section 16) rather
slot is silently dropped at delivery time ([[#16. Stream-mode flows|Section 16]]) rather
than rejected at stash.
than rejected at stash.


This is the QUIC STREAM-frame reassembly model (RFC 9000 sec. 19.8):
This is the QUIC STREAM-frame reassembly model (RFC 9000 sec. 19.8):
each packet carries its packet seqno (this PCI's seqno field) and a
each packet carries its packet <code>seqno</code> (this PCI's <code>seqno</code> field) and a
separate stream byte position (start/end).  Separating the two
separate stream byte position (<code>start</code>/<code>end</code>).  Separating the two
avoids TCP's conflation of packet identity with byte position which
avoids TCP's conflation of packet identity with byte position which
forces Karn's algorithm for Round-Trip Time (RTT) sampling (no RTT
forces Karn's algorithm for Round-Trip Time (RTT) sampling (no RTT
sample on retransmits, RFC 6298 sec. 3); FRCP applies the
sample on retransmits, RFC 6298 sec. 3); FRCP applies the
Karn-equivalent gate via a combination of per-packet FRCT_RXM,
Karn-equivalent gate via a combination of per-packet <code>FRCT_RXM</code>,
per-slot SND_RTX flags, and a sample-fence rtt_lwe (see Section 2.1
per-slot <code>SND_RTX</code> flags, and a sample-fence <code>rtt_lwe</code> (see [[#2.1. Per-flow state|Section 2.1]]
and Section 12).  FRCP's fixed-32-bit start/end wrap at 4 GiB of
and [[#12. RTT estimation|Section 12]]).  FRCP's fixed-32-bit <code>start</code>/<code>end</code> wrap at 4 GiB of
wire bytes, narrower than QUIC's 62-bit varint offset (cf. RFC 9000
wire bytes, narrower than QUIC's 62-bit varint offset (cf. RFC 9000
sec. 16); the on-wire wrap is handled by the same modular before()
sec. 16); the on-wire wrap is handled by the same modular <code>before()</code>
/ after() comparators (Section 1.3) FRCP uses for seqnos, which
/ <code>after()</code> comparators ([[#1.3. SACK payload|Section 1.3]]) FRCP uses for seqnos, which
remain unambiguous as long as the in-flight byte window stays
remain unambiguous as long as the in-flight byte window stays
strictly under 2 GiB (the half-range of the signed-int32 difference
strictly under 2 GiB (the half-range of the signed-int32 difference
in before()).  The default per-flow ring is 1 MiB; the
in <code>before()</code>).  The default per-flow ring is 1 MiB; the
implementation caps ring_sz at 128 MiB (FRCT_STREAM_RING_SZ_MAX),
implementation caps <code>ring_sz</code> at 128 MiB (<code>FRCT_STREAM_RING_SZ_MAX</code>),
well below the 2 GiB half-range bound.  The runtime byte counters
well below the 2 GiB half-range bound.  The runtime byte counters
exposed via FUSE (Filesystem in Userspace) in the Ouroboros
exposed via FUSE (Filesystem in Userspace) in the Ouroboros
Resource Information Base (RIB, a virtual-filesystem introspection
Resource Information Base (RIB, a virtual-filesystem introspection
bridge) are platform size_t and do not wrap on 64-bit hosts.
bridge) are platform <code>size_t</code> and do not wrap on 64-bit hosts.




Line 369: Line 367:
record:
record:


<pre>
;<code>lwe</code> : <code>u32</code>
    lwe   : u32 snd: oldest unacked seqno (cumulative ACK
:snd: oldest unacked <code>seqno</code> (cumulative ACK boundary as seen by sender); rcv: next in-order <code>seqno</code> expected
                  boundary as seen by sender);
;<code>rwe</code> : <code>u32</code>
                  rcv: next in-order seqno expected
:snd: peer-advertised right window edge; rcv: locally-advertised right window edge
    rwe   : u32 snd: peer-advertised right window edge;
;<code>cflags</code> : <code>u8</code>
                  rcv: locally-advertised right window edge
:per-direction feature flags: retransmission (<code>FRCTFRTX</code>), receiver flow control (<code>FRCTFRESCNTL</code>), linger-on-close (<code>FRCTFLINGER</code>); see <code>&lt;ouroboros/fccntl.h&gt;</code>
    cflags : u8   per-direction feature flags: retransmission
;<code>seqno</code> : <code>u32</code>
                  (FRCTFRTX), receiver flow control
:snd: next <code>seqno</code> to send; rcv: force-ACK trigger - set on a stale or dup DATA so the next <code>ack_snd</code> emits a fresh cumulative ACK
                  (FRCTFRESCNTL), linger-on-close (FRCTFLINGER);
;<code>ackno</code> : <code>u32</code>
                  see <ouroboros/fccntl.h>
:snd: <code>seqno</code> counter for standalone ACK-bearing control packets (delayed ACK, SACK, final ACK on dealloc); not bumped on piggybacked ACK riding a DATA packet (which uses the DATA <code>seqno</code>).  Used by wire-dup ACK detection; rcv: incoming-ACK dedup tracker
    seqno : u32 snd: next seqno to send;
;<code>act</code> : <code>ns</code>
                  rcv: force-ACK trigger - set on a stale or dup
:last activity (used by inactivity / DRF)
                  DATA so the next ack_snd emits a fresh
;<code>inact</code> : <code>ns</code>
                  cumulative ACK
:inactivity threshold; sender = <code>3*mpl + a + r + 1s</code>, receiver = <code>2*mpl + a + r + 1s</code><code>mpl</code> is the Maximum Packet Lifetime (delta-t terminology; see [[#15. Heritage and adopted techniques|Section 15]]); <code>a</code> and <code>r</code> are the FRCT a-timer and r-timer bounds (see [[#8. Retransmission|Section 8]]).  The asymmetry is load-bearing for pre-DRF NACK ([[#9. Pre-DRF NACK|Section 9]]).
    ackno : u32 snd: outbound ACK-packet seqno counter,
                  incremented for every ACK-bearing packet (bare
                  ACK, delayed ACK, SACK); used by wire-dup ACK
                  detection;
                  rcv: incoming-ACK dedup tracker
    act   : ns   last activity (used by inactivity / DRF)
    inact : ns   inactivity threshold; sender = 3*mpl + a + r + 1s,
                  receiver = 2*mpl + a + r + 1s.  mpl is the
                  Maximum Packet Lifetime (delta-t terminology;
                  see Section 15); a and r are the FRCT a-timer
                  and r-timer bounds (see Section 8).  The
                  asymmetry is load-bearing for pre-DRF NACK
                  (Section 9).
</pre>


The sender holds a per-slot ring snd_slots[RQ_SIZE] keyed by
The sender holds a per-slot ring <code>snd_slots[RQ_SIZE]</code> keyed by
(seqno mod RQ_SIZE).  Each slot tracks its retransmit entry (rxm),
<code>(seqno mod RQ_SIZE)</code>.  Each slot tracks its retransmit entry (<code>rxm</code>),
last-send timestamp, and retransmit flag bits: SND_RTX (a
last-send timestamp, and retransmit flag bits: <code>SND_RTX</code> (a
retransmit is pending or has fired, gates the next RTT sample
retransmit is pending or has fired, gates the next RTT sample
under Karn) and SND_FAST_RXM (one-shot fast-retransmit staged for
under Karn) and <code>SND_FAST_RXM</code> (one-shot fast-retransmit staged for
this loss event).
this loss event).


The receiver holds a parallel reorder ring rcv_slots[RQ_SIZE]
The receiver holds a parallel reorder ring <code>rcv_slots[RQ_SIZE]</code>
(referred to as rq[] in prose) holding stashed out-of-order
(referred to as <code>rq[]</code> in prose) holding stashed out-of-order
packet-buffer indexes; both FRTX and best-effort flows share this
packet-buffer indexes; both FRTX and best-effort flows share this
path.  The invariant rwe - lwe <= RQ_SIZE holds: on each consume
path.  The invariant <code>rwe - lwe &lt;= RQ_SIZE</code> holds: on each consume
the receiver advances rwe by the consumed count, capping the
the receiver advances <code>rwe</code> by the consumed count, capping the
receive window at RQ_SIZE seqno slots.
receive window at <code>RQ_SIZE</code> <code>seqno</code> slots.


A separate fence variable rtt_lwe is bumped on every retransmit
A separate fence variable <code>rtt_lwe</code> is bumped on every retransmit
(timer-fire, SACK-driven, fast-rxm, NACK-driven) and on every
(timer-fire, SACK-driven, fast-rxm, NACK-driven) and on every
seqno_rotate (Section 4) to mark the seqno range whose RTT samples
<code>seqno_rotate</code> ([[#4. Sequence-number rotation (DRF)|Section 4]]) to mark the <code>seqno</code> range whose RTT samples
MUST be discarded.
MUST be discarded.


Line 422: Line 406:
FRCP exposes its wire features as a vector of independent QoS
FRCP exposes its wire features as a vector of independent QoS
axes selected at flow allocation time.  All flows go through the
axes selected at flow allocation time.  All flows go through the
same flow_alloc(name, qos, ...) primitive; the qosspec_t passed
same <code>flow_alloc(name, qos, ...)</code> primitive; the <code>qosspec_t</code> passed
in determines which protocol machinery engages on the wire.  This
in determines which protocol machinery engages on the wire.  This
contrasts with the POSIX BSD socket model where TCP and UDP
contrasts with the POSIX BSD socket model where TCP and UDP
require different socket types (SOCK_STREAM / SOCK_DGRAM).
require different socket types (<code>SOCK_STREAM</code> / <code>SOCK_DGRAM</code>).


The axes:
The axes:


<pre>
;<code>service</code>
  service  0 = unordered (no FRCP engagement: raw datagrams,
:0 = unordered (no FRCP engagement: raw datagrams, no PCI on the wire, UDP-equivalent at this layer); 1 = message-ordered (FRCP engaged; SDU boundaries preserved across fragmentation); 2 = stream (byte-oriented, no SDU boundaries; FRTX required)
              no PCI on the wire, UDP-equivalent at this layer)
;<code>loss</code>
            1 = message-ordered (FRCP engaged; SDU boundaries
:0 = lossless service requested: FRTX retransmit machinery engages ([[#8. Retransmission|Section 8]]); MUST be 0 for <code>service=2</code>.  Non-zero = best-effort, FRTX off.
              preserved across fragmentation)
;<code>ber</code>
            2 = stream (byte-oriented, no SDU boundaries; FRTX
:Bit Error Rate tolerance. 0 = error-free service requested: a CRC trailer is appended after the body of DATA packets and verified on receive (added / checked outside the FRCP PCI; see [[#1.1. PCI header|Section 1.1]]).  Non-zero = peer accepts errors; trailer omitted.  SACK control packets carry a CRC32 trailer regardless of <code>ber</code>; the <code>ber</code> gate applies to DATA only.
              required)
;<code>timeout</code>
  loss     0 = lossless service requested: FRTX retransmit
:Peer-timeout (ms); 0 disables the keepalive timer. Independent of FRCP engagement.
              machinery engages (Section 8); MUST be 0 for
              service=2.  Non-zero = best-effort, FRTX off.
  ber       Bit Error Rate tolerance.
            0 = error-free service requested: a CRC trailer is
              appended after the body of DATA packets and verified
              on receive (added / checked outside the FRCP PCI;
              see Section 1.1).  Non-zero = peer accepts errors;
              trailer omitted.  SACK control packets carry a
              CRC32 trailer regardless of ber; the ber gate
              applies to DATA only.
  timeout   Peer-timeout (ms); 0 disables the keepalive timer.
              Independent of FRCP engagement.
</pre>


Encryption is a separate per-flow attribute set at flow setup;
Encryption is a separate per-flow attribute set at flow setup;
when enabled it wraps the FRCP packet (PCI + body, plus the CRC
when enabled it wraps the FRCP packet (PCI + body, plus the CRC
trailer if any) under AEAD, expanding the spb by headsz + tailsz
trailer if any) under AEAD, expanding the <code>spb</code> by <code>headsz</code> + <code>tailsz</code>
octets (nonce / tag).  The CRC trailer is currently kept inside
octets (nonce / tag).  The CRC trailer is currently kept inside
the AEAD wrap (see Section 1.1).
the AEAD wrap (see [[#1.1. PCI header|Section 1.1]]).


Reachable combinations exported by include/ouroboros/qos.h:
Reachable combinations exported by <code>include/ouroboros/qos.h</code>:


<pre>
{| class="wikitable"
  +-----------------+---------+------+-----+-----------------------+
! Cube !! <code>service</code> !! <code>loss</code> !! <code>ber</code> !! Engaged
  | Cube            | service | loss | ber | Engaged               |
|-
  +-----------------+---------+------+-----+-----------------------+
| <code>qos_raw</code> || 0 || 1 || 1 || Raw passthrough
  | qos_raw         |   0   |   1 |   1 | Raw passthrough       |
|-
  | qos_raw_safe   |   0   |   1 |   0 | Raw + CRC trailer     |
| <code>qos_raw_safe</code> || 0 || 1 || 0 || Raw + CRC trailer
  | qos_rt         |   1   |   1 |   1 | FRCP, no FRTX, no CRC |
|-
  | qos_rt_safe     |   1   |   1 |   0 | FRCP, no FRTX, CRC   |
| <code>qos_rt</code> || 1 || 1 || 1 || FRCP, no FRTX, no CRC
  | qos_msg         |   1   |   0 |   0 | FRCP + FRTX           |
|-
  | qos_stream     |   2   |   0 |   0 | FRCP + FRTX, stream   |
| <code>qos_rt_safe</code> || 1 || 1 || 0 || FRCP, no FRTX, CRC
  +-----------------+---------+------+-----+-----------------------+
|-
</pre>
| <code>qos_msg</code> || 1 || 0 || 0 || FRCP + FRTX
|-
| <code>qos_stream</code> || 2 || 0 || 0 || FRCP + FRTX, stream
|}


Forced couplings actually enforced by the public API:
Forced couplings actually enforced by the public API:


<pre>
* <code>service == SVC_STREAM</code> (2) requires <code>loss == 0</code>; <code>flow_alloc</code> / <code>flow_accept</code> reject the pair otherwise with <code>-EINVAL</code>.
  - service == SVC_STREAM (2) requires loss == 0; flow_alloc /
* FRTX requires FRCP engagement (<code>service != SVC_RAW</code>); requesting <code>loss = 0</code> with <code>service = SVC_RAW</code> is structurally a no-op because no <code>frcti</code> is created.
    flow_accept reject the pair otherwise with -EINVAL.
* The <code>QOS_DISABLE_CRC</code> build flag globally forces <code>ber = 1</code>. Note: this flag defaults to ON, so default builds ship with CRC disabled until <code>QOS_DISABLE_CRC</code> is set to OFF.
  - FRTX requires FRCP engagement (service != SVC_RAW); requesting
    loss = 0 with service = SVC_RAW is structurally a no-op
    because no frcti is created.
  - The QOS_DISABLE_CRC build flag globally forces ber = 1.
    Note: this flag defaults to ON, so default builds ship with
    CRC disabled until QOS_DISABLE_CRC is set to OFF.
</pre>


Caveat: the API does NOT force ber = 0 when service != SVC_RAW.
Caveat: the API does NOT force <code>ber = 0</code> when <code>service != SVC_RAW</code>.
qos_rt has service = SVC_MESSAGE with ber = 1, which means the PCI
<code>qos_rt</code> has <code>service = SVC_MESSAGE</code> with <code>ber = 1</code>, which means the PCI
itself is not CRC-protected on that cube; the HCS (Section 1.1)
itself is not CRC-protected on that cube; the HCS ([[#1.1. PCI header|Section 1.1]])
remains the only integrity check on the header.
remains the only integrity check on the header.


The FRCP-no-FRTX regime (service = SVC_MESSAGE, loss > 0) is meaningful
The FRCP-no-FRTX regime (<code>service = SVC_MESSAGE</code>, <code>loss &gt; 0</code>) is meaningful
and live: sequence numbering, in-order delivery, flow-control
and live: sequence numbering, in-order delivery, flow-control
advertisement, KA, DRF rotation, and SDU fragmentation /
advertisement, KA, DRF rotation, and SDU fragmentation /
reassembly (Section 7.2) all run.  Lost packets are dropped
reassembly ([[#7.2. Fragmentation and reassembly|Section 7.2]]) all run.  Lost packets are dropped
rather than retransmitted; a permanently-lost mid-fragment is
rather than retransmitted; a permanently-lost mid-fragment is
dropped via skip-past-gap once a later SDU is visible in the
dropped via skip-past-gap once a later SDU is visible in the
Line 501: Line 468:
== 3. Protocol parameters ==
== 3. Protocol parameters ==


<pre>
{| class="wikitable"
    +--------------------+------------------------+-------------------+
! Parameter !! Value !! Role
    | Parameter         | Value                 | Role             |
|-
    +--------------------+------------------------+-------------------+
| <code>RQ_SIZE</code> || compile-time, power of 2 (default 128) || Slot ring / rcv window width
    | RQ_SIZE           | compile-time, power of | Slot ring / rcv  |
|-
    |                    |  2 (default 128)       | window width     |
| <code>START_WINDOW</code> || compile-time, power of 2 (default 128) || Initial <code>rwe-lwe</code> after rotate
    | START_WINDOW       | compile-time, power of | Initial rwe-lwe  |
|-
    |                    |  2 (default 128)       | after rotate     |
| <code>RTO_MIN</code> || <code>MAX(250 us build-tunable, 1&lt;&lt;RXMQ_RES)</code>; per-flow via <code>fccntl</code> (<code>FRCTSRTOMIN</code>).  Default ~1 ms with <code>RXMQ_RES=20</code>. || RTO floor; also floored at the retransmit-wheel resolution (~1 ms by default).
    | RTO_MIN           | MAX(250 us build-tun-  | RTO floor; also  |
|-
    |                    |  able, 1<<RXMQ_RES);   | floored at the    |
| <code>MAX_RTO_MUL</code> || 20 || Backoff shift cap
    |                    |  per-flow via fccntl   | retransmit-wheel  |
|-
    |                    |  (FRCTSRTOMIN).       | resolution        |
| RACK window <code>R</code> || <code>MIN(reo_wnd_mult * min_RTT/4, SRTT)</code> with <code>MIN_REORDER_NS = 250 us</code> floor; <code>reo_wnd_mult</code> scales on D-SACK, cap 20 || Reorder window; per RFC 8985 sec. 6.2; <code>reo_wnd_mult</code> per sec. 6.2 step 4
    |                    | Default ~1 ms with   | (~1 ms by         |
|-
    |                    |  RXMQ_RES=20.          | default).         |
| <code>MIN_RTT_WIN_NS</code> || 300 s (5 min, Linux <code>tcp_min_rtt_wlen</code>) || <code>min_RTT</code> windowed re-anchor
    | MAX_RTO_MUL       | 20                     | Backoff shift cap |
|-
    | RACK window R     | MIN(reo_wnd_mult       | Reorder window;  |
| <code>REO_WND_MULT_MAX</code> || 20 (RFC 8985 sec. 6.2 step 4) || <code>reo_wnd_mult</code> cap
    |                    |  * min_RTT/4, SRTT)   | per RFC 8985     |
|-
    |                    |  with MIN_REORDER_NS  | sec. 6.2;         |
| <code>REO_DECAY_PKTS</code> || 16 (RFC 8985 sec. 6.2 step 4 / <code>RACK.reo_wnd_persist</code>) || Fresh-ACK'd seq count per halving
    |                    |  = 250 us floor;      | reo_wnd_mult per |
|-
    |                    |  reo_wnd_mult scales  | sec. 6.2 step 4   |
| <code>MAX_DSACK_LAG</code> || <code>RQ_SIZE</code> || D-SACK sanity cap
    |                   |  on D-SACK, cap 20    |                  |
|-
    | MIN_RTT_WIN_NS     | 300 s (5 min, Linux   | min_RTT windowed |
| <code>RTT_QUARANTINE</code> || 32 (<code>seqno</code> steps) || NewReno gate pad
    |                    |  tcp_min_rtt_wlen)    | re-anchor         |
|-
    | REO_WND_MULT_MAX   | 20 (RFC 8985 sec.     | reo_wnd_mult cap  |
| SACK rate-limit || <code>SACK_MIN_GAP_NS</code> (250 us, fixed) || Min SACK gap
    |                    |  6.2 step 4)           |                   |
|-
    | REO_DECAY_PKTS     | 16 (RFC 8985 sec.     | Fresh-ACK'd seq   |
| <code>SACK_MAX_BLOCKS</code> || 2048 (wire cap; per-flow capped at <code>(frag_mtu-PCI-4)/8</code>) || Per-SACK block cap
    |                    |  6.2 step 4 /          | count per halving |
|-
    |                    |  RACK.reo_wnd_persist) |                  |
| <code>SACK_RXM_MAX</code> || 32 || Per-pass staged retransmit cap
    | MAX_DSACK_LAG     | RQ_SIZE               | D-SACK sanity cap |
|-
    | RTT_QUARANTINE     | 32 (seqno steps)       | NewReno gate pad |
| <code>DUP_THRESH</code> || 3 (RFC 8985 default) || Hybrid fast-rxm trigger ([[#8. Retransmission|Section 8]])
    | SACK rate-limit   | SACK_MIN_GAP_NS       | Min SACK gap      |
|-
    |                    |  (250 us, fixed)       |                   |
| <code>MDEV_MUL</code> || 2 (build-tunable via <code>FRCT_RTO_MDEV_MULTIPLIER</code>) || <code>mdev</code> shift in <code>RTO = srtt + (mdev &lt;&lt; MDEV_MUL)</code>
    | SACK_MAX_BLOCKS   | 2048 (wire cap; per-   | Per-SACK block    |
|-
    |                    |  flow capped at       | cap              |
| RTTP nonce || 16 octets || Echoed verbatim
    |                    |  (frag_mtu-PCI-4)/8)   |                   |
|-
    | SACK_RXM_MAX       | 32                     | Per-pass staged   |
| <code>RTTP_RING</code> || 8 || In-flight probes
    |                    |                        | retransmit cap   |
|-
    | DUP_THRESH         | 3 (RFC 8985 default)   | Hybrid fast-rxm   |
| RTT clamp || <code>16 * srtt</code> || Probe-sample upper bound (ACK-derived RTT samples gated by Karn / recovery only)
    |                    |                        | trigger (Sec. 8) |
|-
    | MDEV_MUL           | 2 (build-tunable via   | mdev shift in     |
| Cold-probe cadence || 100 ms (rx-driven; see [[#12. RTT estimation|Section 12]]) || Pre-<code>srtt</code> RTTP rate
    |                    |  FRCT_RTO_MDEV_-      | RTO = srtt +     |
|-
    |                    |  MULTIPLIER)          | (mdev << MDEV_MUL)|
| <code>DELT_RDV</code> || 100 ms || RDVS emit cadence
    | RTTP nonce         | 16 octets             | Echoed verbatim   |
|-
    | RTTP_RING         | 8                     | In-flight probes |
| <code>MAX_RDV</code> || 1 s || RDVS give-up
    | RTT clamp         | 16 * srtt             | Probe-sample     |
|-
    |                    |                        | upper bound       |
| Delayed-ACK fire || <code>2 * TICTIME</code> (<code>TICTIME</code> = FRCT tick granularity, default 5 ms; <code>2*TICTIME = 10 ms</code> by default) || Fired after the first in-order DATA arrival; tick is build-tunable
    |                    |                        | (ACK-derived RTT |
|-
    |                    |                        | samples gated by |
| NACK send cooldown || <code>srtt</code> when an <code>srtt</code> sample exists, else 100 ms || Pre-DRF NACK rate-limit
    |                    |                        | Karn / recovery   |
|-
    |                    |                        | only)             |
| <code>MAX_SDU</code> || 1 MiB || Max reassembled SDU; configurable per flow
    | Cold-probe cadence | 100 ms (rx-driven;     | Pre-srtt RTTP     |
|}
    |                    |  see Section 12)      | rate              |
    | DELT_RDV           | 100 ms                 | RDVS emit cadence |
    | MAX_RDV           | 1 s                   | RDVS give-up     |
    | Delayed-ACK fire   | 2 * TICTIME (TICTIME   | Fired after the  |
    |                    |  = FRCT tick gran-    | first in-order    |
    |                    |  ularity, default     | DATA arrival;    |
    |                    |  5 ms; 2*TICTIME       | tick is build-    |
    |                    |  = 10 ms by default)   | tunable           |
    | NACK send cooldown | srtt when an srtt     | Pre-DRF NACK     |
    |                    |  sample exists, else  | rate-limit       |
    |                    |  100 ms                |                  |
    | MAX_SDU           | 1 MiB                 | Max reassembled   |
    |                    |                        | SDU; configurable |
    |                    |                        | per flow         |
    +--------------------+------------------------+-------------------+
</pre>


The per-flow fragment Maximum Transmission Unit (MTU) is computed
The per-flow fragment Maximum Transmission Unit (MTU) is computed
at flow setup from the lower IPCP's mtu minus encryption
at flow setup from the lower IPCP's mtu minus encryption
headsz / tailsz and CRC trailer; there is no FRCT-level default or
<code>headsz</code> / <code>tailsz</code> and CRC trailer; there is no FRCT-level default or
environment-variable override.
environment-variable override.


Line 579: Line 530:
The DRF (Data Run Flag) bit on an outbound packet means "this is
The DRF (Data Run Flag) bit on an outbound packet means "this is
the start of a fresh data run" and is set whenever the sender has
the start of a fresh data run" and is set whenever the sender has
nothing in flight (snd_cr.seqno == snd_cr.lwe).
nothing in flight (<code>snd_cr.seqno == snd_cr.lwe</code>).


Independently of that, if the sender has been idle longer than
Independently of that, if the sender has been idle longer than
snd_cr.inact AND the pipe is empty (snd_cr.seqno == snd_cr.lwe),
<code>snd_cr.inact</code> AND the pipe is empty (<code>snd_cr.seqno == snd_cr.lwe</code>),
seqno_rotate() rolls a random new seqno before the send and
<code>seqno_rotate()</code> rolls a random new <code>seqno</code> before the send and
resets
resets


<pre>
<syntaxhighlight lang="text">
     snd_cr.seqno  = random()
     snd_cr.seqno  = random()
     snd_cr.lwe    = snd_cr.seqno
     snd_cr.lwe    = snd_cr.seqno
Line 593: Line 544:
     in_recovery  = false  (recovery state, see Section 8)
     in_recovery  = false  (recovery state, see Section 8)
     recovery_high = snd_cr.seqno
     recovery_high = snd_cr.seqno
</pre>
</syntaxhighlight>


The receiver, on observing rcv-side inactivity
The receiver, on observing rcv-side inactivity
(now - rcv_cr.act > rcv_cr.inact), requires a DRF on the next
(<code>now - rcv_cr.act &gt; rcv_cr.inact</code>), requires a DRF on the next
DATA packet; otherwise it replies with a rate-limited NACK (see
DATA packet; otherwise it replies with a rate-limited NACK (see
below). Non-DATA control packets pass through without the DRF
below). Non-DATA control packets pass through without the DRF
requirement. On DRF the receiver releases the rq[] slots and
requirement (no impact on receiver state). On DRF the receiver releases the <code>rq[]</code> slots and
rebases
rebases


<pre>
<syntaxhighlight lang="text">
     rcv_cr.lwe  = seqno
     rcv_cr.lwe  = seqno
     rcv_cr.rwe  = seqno + RQ_SIZE
     rcv_cr.rwe  = seqno + RQ_SIZE
     rcv_cr.seqno = seqno
     rcv_cr.seqno = seqno
</pre>
</syntaxhighlight>


If the inactive packet has DATA but no DRF, a rate-limited NACK is
If the inactive packet has DATA but no DRF, a rate-limited NACK is
fired back to the sender (cooldown per Section 3); non-DATA stale
fired back to the sender (cooldown per [[#3. Protocol parameters|Section 3]]); non-DATA stale
arrivals fall through to normal processing (no NACK, no drop).
arrivals fall through to normal processing (no NACK, no drop).


== 5. Send path ==
== 5. Send path ==


<pre>
# If the SDU exceeds <code>(frag_mtu - data_hdr_len)</code>, the caller (<code>dev.c</code>) fans it out into <code>ceil(count / (frag_mtu - data_hdr_len))</code> fragments, each emitted via <code>frcti_snd</code> as its own DATA packet with a per-fragment role ([[#7.2. Fragmentation and reassembly|Section 7.2]]); both FRTX and best-effort flows fragment.  Raw flows (no FRCP engagement, <code>qos.service == SVC_RAW</code>) carry no PCI and return <code>-EMSGSIZE</code> for any SDU larger than one packet at the layer below.  An SDU that fits in a single packet is sent as SOLE.  <code>frcti_snd</code> reserves PCI head room; sets DATA, plus DRF when the pipe is empty (<code>snd_cr.seqno == snd_cr.lwe</code>).
    1. If the SDU exceeds (frag_mtu - data_hdr_len), the caller
# <code>seqno_rotate()</code> if past sender inactivity and the pipe is empty ([[#4. Sequence-number rotation (DRF)|Section 4]]).
      (dev.c) fans it out into ceil(count / (frag_mtu -
# Advertise FC (<code>pci.window = frcti_advert_rwe(frcti)</code>, i.e. <code>rcv_cr.rwe</code> clamped to <code>rcv_cr.lwe + ring_seq_cap</code> in stream mode) when the receiver side is recent: <code>now - rcv_cr.act &lt; rcv_cr.inact</code>.
      data_hdr_len)) fragments, each emitted via frcti_snd as its
# Reliable mode (FRTX): leave <code>snd_cr.lwe</code> where it is; reset the slot at <code>RQ_SLOT(seqno)</code> (<code>snd_slots[p].time = now</code>, <code>snd_slots[p].flags = 0</code>); queue an <code>rxm_entry</code> (saves a packet copy, arms a wheel timer at <code>now + (rto &lt;&lt; rto_mul)</code>). Piggyback ACK (<code>pci.ackno = rcv_cr.lwe</code>) while the a-timer for the most recent received DATA packet has not yet expired (<code>now - rcv_cr.act &lt;= t_a</code>); on piggyback, set <code>rcv_cr.seqno = rcv_cr.lwe</code> so the next delayed-ACK fire is suppressed.  See [[#8. Retransmission|Section 8]] for <code>t_a</code> / <code>t_r</code> semantics.
      own DATA packet with a per-fragment role (Section 7.2);
# Best-effort mode (no FRTX): advance <code>snd_cr.lwe</code> immediately (<code>snd_cr.lwe = snd_cr.lwe + 1</code>, <code>snd_cr.rwe = snd_cr.lwe + RQ_SIZE</code>); no retransmit state.  No send-side RTT probe is armed in this mode (<code>rtt_probe_arm</code> requires an in-flight <code>seqno</code>, which best-effort never has); the rx-driven cold seeder in <code>frcti_rcv</code> is the only probe path.
      both FRTX and best-effort flows fragment.  Raw flows (no
# In reliable mode, optionally arm an RTT probe ([[#12. RTT estimation|Section 12]]).
      FRCP engagement, qos.service == SVC_RAW) carry no PCI and
      return -EMSGSIZE for any SDU larger than one packet at the
      layer below.  An SDU that fits in a single packet is sent
      as SOLE.  frcti_snd reserves PCI head room; sets DATA, plus
      DRF when the pipe is empty (snd_cr.seqno == snd_cr.lwe).
    2. seqno_rotate() if past sender inactivity and the pipe is
      empty (Section 4).
    3. Advertise FC (pci.window = frcti_advert_rwe(frcti), i.e.
      rcv_cr.rwe clamped to rcv_cr.lwe + ring_seq_cap in stream
      mode) when the receiver side is recent: now - rcv_cr.act
      < rcv_cr.inact.
    4. Reliable mode (FRTX): leave snd_cr.lwe where it is; reset
      the slot at RQ_SLOT(seqno) (snd_slots[p].time = now,
      snd_slots[p].flags = 0); queue an rxm_entry (saves a packet
      copy, arms a wheel timer at now + (rto << rto_mul)).
      Piggyback ACK (pci.ackno = rcv_cr.lwe) while the a-timer
      for the most recent received DATA packet has not yet
      expired (now - rcv_cr.act <= t_a); on piggyback, set
      rcv_cr.seqno = rcv_cr.lwe so the next delayed-ACK fire is
      suppressed.  See Section 8 for t_a / t_r semantics.
    5. Best-effort mode (no FRTX): advance snd_cr.lwe immediately
      (snd_cr.lwe = snd_cr.lwe + 1, snd_cr.rwe = snd_cr.lwe +
      RQ_SIZE); no retransmit state.  No send-side RTT probe is
      armed in this mode (rtt_probe_arm requires an in-flight
      seqno, which best-effort never has); the rx-driven cold
      seeder in frcti_rcv is the only probe path.
    6. In reliable mode, optionally arm an RTT probe (Section 12).
</pre>




Line 656: Line 578:


Keepalive (KA), RTT probe (RTTP), pre-DRF NACK, and rendezvous
Keepalive (KA), RTT probe (RTTP), pre-DRF NACK, and rendezvous
(RDVS) packets short-circuit out of frcti_rcv before the locked
(RDVS) packets short-circuit out of <code>frcti_rcv</code> before the locked
main path; each handler takes its own lock internally.
main path; each handler takes its own lock internally.


<pre>
<syntaxhighlight lang="text">
       incoming packet
       incoming packet
             |
             |
Line 684: Line 606:
             v
             v
       acquire wrlock; enter locked main path
       acquire wrlock; enter locked main path
</pre>
</syntaxhighlight>


<pre>
;<code>KA</code>
  - KA  : refresh t_ka_rcv, honour piggybacked ACK.
:refresh <code>t_ka_rcv</code>, honour piggybacked ACK.
  - RTTP : probe (echo back nonce) or echo (verify nonce, sample
;<code>RTTP</code>
          RTT).
:probe (echo back nonce) or echo (verify nonce, sample RTT).
  - NACK : pre-DRF, sender-side handler.  See Section 9.
;<code>NACK</code>
  - RDVS : reply with a bare FC packet (ackno = 0); rdlock only.
:pre-DRF, sender-side handler.  See [[#9. Pre-DRF NACK|Section 9]].
</pre>
;<code>RDVS</code>
:reply with a bare FC packet (<code>ackno = 0</code>); <code>rdlock</code> only.




=== 6.2. Locked main path ===
=== 6.2. Locked main path ===


Steps below run with the per-flow frcti.lock held for writing
Steps below run with the per-flow <code>frcti.lock</code> held for writing
(pthread_rwlock_wrlock) unless noted.
(<code>pthread_rwlock_wrlock</code>) unless noted.


<pre>
;<code>rcv_inact_check</code>
  rcv_inact_check
:Only meaningful when the receive side is stale.  On DRF (Data Run Flag): release <code>rq[]</code> slots, rebase <code>rcv_cr</code>, continue. On stale DATA without DRF: fire a pre-DRF NACK if cooldown allows ([[#9. Pre-DRF NACK|Section 9]]), then discard the packet; on cooldown, drop without sending a NACK (a pending cumulative ACK from <code>drop_packet</code> may still go out).  Non-DATA, non-DRF arrivals bypass <code>rcv_inact_check</code> entirely; pure-DRF stale arrivals fall through after the DRF rebase branch.
      Only meaningful when the receive side is stale.  On DRF
      (Data Run Flag): release rq[] slots, rebase rcv_cr, continue.
      On stale DATA without DRF: fire a pre-DRF NACK if cooldown
      allows (Section 9), then discard the packet; on cooldown,
      drop without sending a NACK (a pending cumulative ACK from
      drop_packet may still go out).  Non-DATA, non-DRF arrivals
      bypass rcv_inact_check entirely; pure-DRF stale arrivals fall
      through after the DRF rebase branch.
</pre>


<pre>
;DATA-only act refresh
  DATA-only act refresh
:Refresh <code>rcv_cr.act</code> only when <code>FRCT_DATA</code> is set, so that non-DATA packets never block the next DRF rebase.
      Refresh rcv_cr.act only when FRCT_DATA is set, so that non-DATA
      packets never block the next DRF rebase.
</pre>


<pre>
;Wire-dup gate
  Wire-dup gate
:Before flag-driven dispatch, drop wire-duplicate ACKs and wire-duplicate DATA (<code>is_dup_ack</code> / <code>is_dup_data</code>).  The DATA check is bypassed for <code>FRCT_RXM</code>-bearing arrivals so the piggybacked ACK / SACK / FC carried on a retransmitted DATA at an already-ACK'd <code>seqno</code> is still applied; the stale-in-window branch below then drops the packet.
      Before flag-driven dispatch, drop wire-duplicate ACKs and
      wire-duplicate DATA (is_dup_ack / is_dup_data).  The DATA
      check is bypassed for FRCT_RXM-bearing arrivals so the
      piggybacked ACK / SACK / FC carried on a retransmitted DATA
      at an already-ACK'd seqno is still applied; the stale-in-
      window branch below then drops the packet.
</pre>


<pre>
;<code>ACK</code>
  ACK
:Drop ACKs whose <code>ackno</code> falls outside <code>(snd_cr.lwe, snd_cr.seqno]</code>. If <code>ackno == snd_cr.lwe</code> (non-advancing cumulative ACK), drive RACK fast-retransmit consideration ([[#8. Retransmission|Section 8]]).  Otherwise advance <code>snd_cr.lwe = ackno</code>, collapse <code>rto_mul</code> to 0 (Karn-gated by <code>SND_RTX</code> on the just-acknowledged slot, the old head-of-line), reset <code>dup_thresh</code> to 0, update <code>t_latest_ack</code> to the send-time of the slot at <code>ackno-1</code> (consumed by RACK and SACK below), decay <code>reo_wnd_mult</code> per RFC 8985 sec. 6.2 step 4, exit NewReno-careful recovery (see [[#8. Retransmission|Section 8]]) on <code>ackno &gt;= recovery_high</code> or <code>ackno == snd_cr.seqno</code>, and feed an RTT sample if eligible ([[#12. RTT estimation|Section 12]]).
      Drop ACKs whose ackno falls outside (snd_cr.lwe, snd_cr.seqno].
      If ackno == snd_cr.lwe (non-advancing cumulative ACK), drive
      RACK fast-retransmit consideration (Section 8).  Otherwise
      advance snd_cr.lwe = ackno, collapse rto_mul to 0 (Karn-gated
      by SND_RTX on the just-acknowledged slot, the old head-of-
      line), reset dup_thresh to 0, update t_latest_ack to the
      send-time of the slot at ackno-1 (consumed by RACK and SACK
      below), decay reo_wnd_mult per RFC 8985 sec. 6.2 step 4,
      exit NewReno-careful recovery (see Section 8) on
      ackno >= recovery_high or ackno == snd_cr.seqno, and feed an
      RTT sample if eligible (Section 12).
</pre>


<pre>
;<code>SACK</code>
  SACK
:Walk the block list.  For each block (a present range above <code>lwe</code>) NULL out <code>snd_slots[k].rxm</code>, clear the slot's per-send flags, and advance <code>t_latest_ack</code> to the latest send-time covered (the Forward Acknowledgement / fack equivalent, Mathis &amp; Mahdavi 1996); the first block whose start clamps to <code>snd_cr.lwe</code> skips this fack update so that a head-of-line clamp does not falsely advance fack.  For un-SACKed gaps below <code>hi_sacked</code>, stage a retransmit per slot that is (1) still owned (<code>rxm != NULL</code>), (2) not already <code>SND_FAST_RXM</code>, (3) not aged out past <code>t_r</code>, and (4) either outside the RACK reorder window <code>R</code> OR with <code>dup_thresh &gt;= DUP_THRESH</code> (the RFC 8985 sec. 6.2 hybrid trigger).  Mark the slot <code>SND_FAST_RXM</code> and NULL the <code>rxm</code> at stage time.  Capped at <code>SACK_RXM_MAX</code> staged retransmits per receive pass; what's left rides the next SACK.
      Walk the block list.  For each block (a present range above
      lwe) NULL out snd_slots[k].rxm, clear the slot's per-send
      flags, and advance t_latest_ack to the latest send-time
      covered (the Forward Acknowledgement / fack equivalent,
      Mathis & Mahdavi 1996); the first block whose start
      clamps to snd_cr.lwe skips this fack update so that a head-
      of-line clamp does not falsely advance fack.  For un-SACKed
      gaps below hi_sacked, stage a retransmit per slot that is
      (1) still owned (rxm != NULL), (2) not already SND_FAST_RXM,
      (3) not aged out past t_r, and (4) either outside the RACK
      reorder window R OR with dup_thresh >= DUP_THRESH (the RFC
      8985 sec. 6.2 hybrid trigger).  Mark the slot SND_FAST_RXM
      and NULL the rxm at stage time.  Capped at SACK_RXM_MAX
      staged retransmits per receive pass; what's left rides the
      next SACK.
</pre>


<pre>
;<code>FC</code>
  FC
:Bump <code>snd_cr.rwe</code> (clamped to <code>lwe + RQ_SIZE</code>, never shrinks) and mark window open.
      Bump snd_cr.rwe (clamped to lwe + RQ_SIZE, never shrinks)
      and mark window open.
</pre>


<pre>
;<code>DATA</code>
  DATA
:Bounds-check <code>seqno</code> against window.  On stale-dup (<code>seqno &lt; rcv_cr.lwe</code>), set <code>rcv_cr.seqno = seqno</code> to force a fresh ACK on the next <code>ack_snd</code>, then drop.  On accept: both FRTX and best-effort stash the packet-buffer index into <code>rq[seqno mod RQ_SIZE]</code>.  Fragments stash unchanged - the role bits are inspected only at consume time ([[#7.2. Fragmentation and reassembly|Section 7.2]]).  On out-of-order arrival, build a SACK reply if not rate-limited (per [[#3. Protocol parameters|Section 3]]) and not deduplicated against the previous <code>(rcv_cr.lwe, n_blocks)</code> pair; D-SACK reports always bypass the dedup.  If both rate-limit and dedup suppress the reply, neither SACK nor delayed-ACK fires (the sender picks up the gap on its next ACK).  On in-order arrival, arm the delayed-ACK timer.
      Bounds-check seqno against window.  On stale-dup
      (seqno < rcv_cr.lwe), set rcv_cr.seqno = seqno to force a
      fresh ACK on the next ack_snd, then drop.  On accept: both
      FRTX and best-effort stash the packet-buffer index into
      rq[seqno mod RQ_SIZE].  Fragments stash unchanged - the role
      bits are inspected only at consume time (Section 7.2).  On
      out-of-order arrival, build a SACK reply if not rate-limited
      (per Section 3) and not deduplicated against the previous
      (rcv_cr.lwe, n_blocks) pair; D-SACK reports always bypass the
      dedup.  If both rate-limit and dedup suppress the reply,
      neither SACK nor delayed-ACK fires (the sender picks up the
      gap on its next ACK).  On in-order arrival, arm the delayed-
      ACK timer.
</pre>


<pre>
;<code>drop_packet</code> exit
  drop_packet exit
:Releases the per-packet shared-memory buffer (<code>spb</code>), then calls <code>ack_snd</code> synchronously after the <code>spb</code> release to surface any pending cumulative ACK.
      Releases the per-packet shared-memory buffer (spb), then
      calls ack_snd synchronously after the spb release to surface
      any pending cumulative ACK.
</pre>




Line 797: Line 652:
=== 7.1. Read path ===
=== 7.1. Read path ===


flow_read returns a full reassembled SDU (Service Data Unit) via
<code>flow_read</code> returns a full reassembled SDU (Service Data Unit) via
frcti_consume on every FRCP SDU-mode flow (FRTX or best-effort);
<code>frcti_consume</code> on every FRCP SDU-mode flow (FRTX or best-effort);
stream-mode is covered in Section 16.  An incomplete head-of-line
stream-mode is covered in [[#16. Stream-mode flows|Section 16]].  An incomplete head-of-line
(HoL) run yields -EAGAIN; an oversized run yields -EMSGSIZE (the
(HoL) run yields <code>-EAGAIN</code>; an oversized run yields <code>-EMSGSIZE</code> (the
run is dropped so the flow does not stall).  On best-effort flows,
run is dropped so the flow does not stall).  On best-effort flows,
a permanently-lost mid-fragment is dropped as soon as a later
a permanently-lost mid-fragment is dropped as soon as a later
complete SDU becomes visible in the ring (Section 7.2 skip-past-
complete SDU becomes visible in the ring ([[#7.2. Fragmentation and reassembly|Section 7.2]] skip-past-
gap).
gap).


Raw flows carry no frcti, so flow_read returns the next pending
Raw flows carry no <code>frcti</code>, so <code>flow_read</code> returns the next pending
packet-buffer index directly, with no role-bit inspection.  (Raw
packet-buffer index directly, with no role-bit inspection.  (Raw
service is selected via qos.service == SVC_RAW at flow allocation,
service is selected via <code>qos.service == SVC_RAW</code> at flow allocation,
which suppresses frcti creation.)
which suppresses <code>frcti</code> creation.)


frcti_pdu_ready is the no-advance peek used by fevent (the
<code>frcti_pdu_ready</code> is the no-advance peek used by <code>fevent</code> (the
Ouroboros flow-event multiplexer, the poll(2)-equivalent on
Ouroboros flow-event multiplexer, the <code>poll(2)</code>-equivalent on
flows).  It returns ready only when the head-of-line run is
flows).  It returns ready only when the head-of-line run is
complete and the lead packet (a Protocol Data Unit, here one FRCP
complete and the lead packet (a Protocol Data Unit, here one FRCP
packet) is present at rcv_cr.rwe - RQ_SIZE; any other state
packet) is present at <code>rcv_cr.rwe - RQ_SIZE</code>; any other state
(including the best-effort skip-past-gap case) returns not ready,
(including the best-effort skip-past-gap case) returns not ready,
and frcti_consume is left to drop the broken prefix and re-
and <code>frcti_consume</code> is left to drop the broken prefix and re-
inspect.
inspect.


Line 823: Line 678:
=== 7.2. Fragmentation and reassembly ===
=== 7.2. Fragmentation and reassembly ===


Send side (flow_write_frag).  An SDU larger than
Send side (<code>flow_write_frag</code>).  An SDU larger than
(frag_mtu - PCI) is split into ceil(count / (frag_mtu - PCI))
<code>(frag_mtu - PCI)</code> is split into <code>ceil(count / (frag_mtu - PCI))</code>
fragments; each fragment is its own FRCP packet with its own
fragments; each fragment is its own FRCP packet with its own
seqno and a per-fragment role flag pair (Section 1.2).  Roles are
<code>seqno</code> and a per-fragment role flag pair ([[#1.2. Flag bits|Section 1.2]]).  Roles are
assigned at emit time:
assigned at emit time:


<pre>
{| class="wikitable"
    +------+--------+
! i !! Role
    | i   | Role   |
|-
    +------+--------+
| <code>n=1</code> || <code>SOLE</code>
    | n=1 | SOLE   |
|-
    | i=0 | FIRST |
| <code>i=0</code> || <code>FIRST</code>
    | i=n-1| LAST   |
|-
    | else | MID   |
| <code>i=n-1</code> || <code>LAST</code>
    +------+--------+
|-
</pre>
| else || <code>MID</code>
|}


A mid-loop allocation or transmit failure may yield a partial
A mid-loop allocation or transmit failure may yield a partial
write: the call returns the bytes already enqueued (off > 0) or
write: the call returns the bytes already enqueued (<code>off &gt; 0</code>) or
the underlying error (off == 0).  Best-effort flows fragment
the underlying error (<code>off == 0</code>).  Best-effort flows fragment
identically; on the receiver, a partial run with a permanently-
identically; on the receiver, a partial run with a permanently-
lost fragment is dropped when a later complete SDU is visible in
lost fragment is dropped when a later complete SDU is visible in
the ring (see skip-past-gap below).  Raw flows carry no PCI and
the ring (see skip-past-gap below).  Raw flows carry no PCI and
refuse anything larger than the layer's user MTU (-EMSGSIZE).
refuse anything larger than the layer's user MTU (<code>-EMSGSIZE</code>).


Wire-level recovery is fragment-agnostic on FRTX flows: each
Wire-level recovery is fragment-agnostic on FRTX flows: each
fragment's seqno flows through SACK / RACK / RTO / NACK exactly
fragment's <code>seqno</code> flows through SACK / RACK / RTO / NACK exactly
as for a SOLE DATA packet, and reassembly does not re-enter the
as for a SOLE DATA packet, and reassembly does not re-enter the
loss-detection path.  Best-effort flows run the same seqno
loss-detection path.  Best-effort flows run the same <code>seqno</code>
machinery (DRF, FC, ACK piggyback, pre-DRF NACK emit) but queue
machinery (DRF, FC, ACK piggyback, pre-DRF NACK emit) but queue
no rxm state at the sender, so a lost MID is unrecoverable;
no <code>rxm</code> state at the sender, so a lost MID is unrecoverable;
skip-past-gap handles it (below).
skip-past-gap handles it (below).


Receive side.  Fragments stash into rq[seqno] unchanged; role bits
Receive side.  Fragments stash into <code>rq[seqno]</code> unchanged; role bits
are read only at consume time.  frag_run_inspect, called from
are read only at consume time.  <code>frag_run_inspect</code>, called from
frcti_consume, walks the ring starting at the oldest still-
<code>frcti_consume</code>, walks the ring starting at the oldest still-
undelivered seqno base = rcv_cr.rwe - RQ_SIZE (equal to rcv_cr.lwe
undelivered <code>seqno</code> <code>base = rcv_cr.rwe - RQ_SIZE</code> (equal to <code>rcv_cr.lwe</code>
only when no partial run is in progress; during a partial run lwe
only when no partial run is in progress; during a partial run <code>lwe</code>
has already advanced past base).  It produces one of three
has already advanced past <code>base</code>).  It produces one of three
outcomes:
outcomes:


<pre>
{| class="wikitable"
    +---------------+---------------------------------------------+
! Outcome !! Cause
    | Outcome       | Cause                                       |
|-
    +---------------+---------------------------------------------+
| <code>DELIVER (n)</code>
    | DELIVER (n)   | rq[base]=SOLE (n=1), or rq[base]=FIRST and |
| <code>rq[base]=SOLE</code> (<code>n=1</code>), or <code>rq[base]=FIRST</code> and a <code>LAST</code> follows in slots <code>[base+1..base+n-1]</code> with all intermediate roles in <code>{MID,FIRST,LAST}</code> contiguous.
    |              | a LAST follows in slots [base+1..base+n-1] |
|-
    |              | with all intermediate roles in {MID,FIRST, |
| <code>DROP (n)</code>
    |              | LAST} contiguous.                           |
| <code>rq[base]</code> is <code>MID</code> or <code>LAST</code> without a preceding <code>FIRST</code> (<code>n=1</code>); a <code>FIRST..[non-LAST]..new-FIRST</code> or new-<code>SOLE</code> mid-run (drop the broken prefix with <code>n</code> = run length minus 1, so the new <code>FIRST</code>/<code>SOLE</code> stays); or, on best-effort flows, a gap at <code>base</code> with a <code>FIRST</code>/<code>SOLE</code> later in the ring (drop up to the new run start).
    | DROP (n)     | rq[base] is MID or LAST without a preceding |
|-
    |              | FIRST (n=1); a FIRST..[non-LAST]..new-FIRST |
| <code>NOT_READY</code>
    |              | or new-SOLE mid-run (drop the broken prefix |
| <code>rq[base]</code> absent or <code>FIRST..[non-LAST]</code> with no later <code>FIRST</code>/<code>SOLE</code> in the ring (FRTX waits for retx; best-effort waits for arrival).
    |              | with n = run length minus 1, so the new     |
|}
    |              | FIRST/SOLE stays); or, on best-effort       |
    |              | flows, a gap at base with a FIRST/SOLE     |
    |              | later in the ring (drop up to the new run   |
    |              | start).                                     |
    | NOT_READY     | rq[base] absent or FIRST..[non-LAST] with   |
    |              | no later FIRST/SOLE in the ring (FRTX waits |
    |              | for retx; best-effort waits for arrival).   |
    +---------------+---------------------------------------------+
</pre>


DELIVER triggers frag_gather: a scatter-gather memcpy of the n
<code>DELIVER</code> triggers <code>frag_gather</code>: a scatter-gather <code>memcpy</code> of the <code>n</code>
consecutive fragments at rq[base..base+n-1] directly into the
consecutive fragments at <code>rq[base..base+n-1]</code> directly into the
caller's buffer; each per-packet shared-memory buffer (spb) is
caller's buffer; each per-packet shared-memory buffer (<code>spb</code>) is
released and rwe advances by n.  lwe was already advanced
released and <code>rwe</code> advances by <code>n</code><code>lwe</code> was already advanced
incrementally as each contiguous fragment arrived; frag_gather
incrementally as each contiguous fragment arrived; <code>frag_gather</code>
only restores the fixed-width invariant rwe == lwe + RQ_SIZE.
only restores the fixed-width invariant <code>rwe == lwe + RQ_SIZE</code>.
No intermediate reassembly buffer is allocated.
No intermediate reassembly buffer is allocated.


DROP advances rwe past the broken prefix (releasing the spbs)
<code>DROP</code> advances <code>rwe</code> past the broken prefix (releasing the <code>spb</code>s)
and pulls lwe up to the new trailing edge if needed; the next
and pulls <code>lwe</code> up to the new trailing edge if needed; the next
consume retries from the new base.  Oversize or arithmetically
consume retries from the new <code>base</code>.  Oversize or arithmetically
overflowing delivery (sum of fragment lengths > max_rcv_sdu, sum
overflowing delivery (sum of fragment lengths &gt; <code>max_rcv_sdu</code>, sum
> caller's buffer, or running-sum overflow) also drops the run
&gt; caller's buffer, or running-sum overflow) also drops the run
with -EMSGSIZE.
with <code>-EMSGSIZE</code>.


Skip-past-gap (best-effort only).  On FRTX, a gap in the run means
Skip-past-gap (best-effort only).  On FRTX, a gap in the run means
"waiting for retransmit" and frag_run_inspect returns NOT_READY.
"waiting for retransmit" and <code>frag_run_inspect</code> returns <code>NOT_READY</code>.
On best-effort flows the gap is permanent, so frag_run_inspect
On best-effort flows the gap is permanent, so <code>frag_run_inspect</code>
scans forward in the ring for the next FIRST or SOLE; if one is
scans forward in the ring for the next <code>FIRST</code> or <code>SOLE</code>; if one is
visible within RQ_SIZE, it returns DROP for the broken prefix and
visible within <code>RQ_SIZE</code>, it returns <code>DROP</code> for the broken prefix and
the consume loop retries at the new lwe.  Memory hold is bounded
the consume loop retries at the new <code>lwe</code>.  Memory hold is bounded
by RQ_SIZE; the partial releases on the next consume call once a
by <code>RQ_SIZE</code>; the partial releases on the next consume call once a
later complete run exists.  Voice-like flows (one SOLE per SDU)
later complete run exists.  Voice-like flows (one <code>SOLE</code> per SDU)
see no extra wait: any later SOLE makes the prior gap droppable
see no extra wait: any later <code>SOLE</code> makes the prior gap droppable
immediately.
immediately.


Line 921: Line 768:


FRCP is bounded by two delta-t-derived timers (Watson 1981, see
FRCP is bounded by two delta-t-derived timers (Watson 1981, see
Section 15):
[[#15. Heritage and adopted techniques|Section 15]]):


<pre>
* <code>t_a</code> (a-timer): upper bound on ACK delay.  An ACK for a received DATA packet MUST NOT be emitted after <code>t_a</code> of receipt; an attempt to send an ACK after the a-timer has expired is suppressed.
  - t_a (a-timer): upper bound on ACK delay.  An ACK for a received
* <code>t_r</code> (r-timer): upper bound on retransmission.  A given DATA packet MUST NOT be retransmitted after <code>t_r</code> has elapsed since its first send (<code>t0</code>); when the bound is hit, the flow is declared down (raising the Ouroboros asynchronous flow condition <code>ACL_FLOWDOWN</code>, which marks the flow dead to both endpoints) rather than retransmitted again.
    DATA packet MUST be emitted within t_a of receipt; an attempt
    to send an ACK after the a-timer has expired is suppressed
    (the sender's RTO is already in motion).
  - t_r (r-timer): upper bound on retransmission.  A given DATA
    packet MUST NOT be retransmitted after t_r has elapsed since
    its first send (t0); when the bound is hit, the flow is
    declared down (raising the Ouroboros asynchronous flow
    condition ACL_FLOWDOWN, which marks the flow dead to both
    endpoints) rather than retransmitted again.
</pre>


Each in-flight FRTX seqno owns one rxm_entry, armed in a hashed
Each in-flight FRTX <code>seqno</code> owns one <code>rxm_entry</code>, armed in a hashed
timing wheel; the wheel deadline is the slot's next eligible
timing wheel; the wheel deadline is the slot's next eligible
retransmit time.
retransmit time.


<pre>
;RTO timer
  RTO timer
:On fire (<code>rxm_due</code>), re-emit with <code>FRCT_RXM</code>, mark <code>SND_RTX</code> (Karn-suppress next ACK's RTT sample), and (for the head-of-line (HoL) slot only) bump <code>rto_mul</code> up to <code>MAX_RTO_MUL</code>.  Wheel deadline is <code>t_send + (rto &lt;&lt; rto_mul)</code>.  Re-armed unless consumed.  The RTO timer also clears <code>SND_FAST_RXM</code> (re-arming fast-retransmit eligibility), resets <code>reo_wnd_mult</code> to 1 on a HoL fire (RFC 8985 sec. 6.2 step 4 reset clause), and marks the flow <code>ACL_FLOWDOWN</code> if its <code>frct_tx</code> call fails.
      On fire (rxm_due), re-emit with FRCT_RXM, mark SND_RTX
      (Karn-suppress next ACK's RTT sample), and (for the head-of-
      line (HoL) slot only) bump rto_mul up to MAX_RTO_MUL.  Wheel
      deadline is t_send + (rto << rto_mul).  Re-armed unless
      consumed.  The RTO timer also clears SND_FAST_RXM (re-arming
      fast-retransmit eligibility), resets reo_wnd_mult to 1 on a
      HoL fire (RFC 8985 sec. 6.2 step 4 reset clause), and marks
      the flow ACL_FLOWDOWN if its frct_tx call fails.
</pre>
 
<pre>
  r-timer guard
      Before any retransmit attempt, check (now - t0) against t_r.
      If exceeded, the slot is no longer eligible for retransmit.
      Only the RTO timer (rxm_due) treats r-timer expiry as
      terminal: it marks the flow ACL_FLOWDOWN (peer unreachable).
      Fast-retransmit, SACK-driven retransmit, and NACK-driven
      head-of-line re-emit silently skip aged-out slots and defer
      the flow-down decision to the next RTO fire.
</pre>
 
<pre>
  Fast retransmit (hybrid trigger, RFC 8985 sec. 6.2)
      On a non-advancing cumulative ACK with the scoreboard
      advanced, fire one fast retransmit when EITHER (a) the head-
      of-line slot's latest send is older than the RACK reorder
      window R (Section 3) and not yet aged out, OR (b) the SACK
      dup-thresh count above snd_cr.lwe reaches DUP_THRESH (= 3,
      RFC 8985 sec. 6.2 step 4).  Fires at most once per non-
      advancing cumulative-ACK value, gated by rack_fired_lwe (the
      snd_cr.lwe at which fast-retransmit last fired).  Set
      SND_FAST_RXM on the slot (one-shot per-slot gate) and enter
      NewReno-style careful recovery (see NewReno below in this
      section).
</pre>


<pre>
;r-timer guard
      The RACK reorder window R uses the RFC 8985 sec. 6.2 form
:Before any retransmit attempt, check <code>(now - t0)</code> against <code>t_r</code>.  If exceeded, the slot is no longer eligible for retransmitOnly the RTO timer (<code>rxm_due</code>) treats r-timer expiry as terminal: it marks the flow <code>ACL_FLOWDOWN</code> (peer unreachable).  Fast-retransmit, SACK-driven retransmit, and NACK-driven head-of-line re-emit silently skip aged-out slots and defer the flow-down decision to the next RTO fire.
      R = MIN(reo_wnd_mult * min_RTT / 4, SRTT) with a
      MIN_REORDER_NS = 250 us floorBefore the first RTT sample
      seeds min_rtt, R falls back to MIN(reo_wnd_mult * SRTT / 4,
      SRTT), still floored at MIN_REORDER_NS (consistent with the
      windowed-minimum fallback described in Section 12).  min_rtt
      is a windowed minimum over the last MIN_RTT_WIN_NS = 5 min of
      RTT samples (matches the Linux tcp_min_rtt_wlen default) so a
      route change to a longer path eventually re-anchors the
      reorder window without relying on reo_wnd_mult growth alone.
</pre>


<pre>
;Fast retransmit (hybrid trigger, RFC 8985 sec. 6.2)
  SACK-driven retransmit
:On a non-advancing cumulative ACK with the scoreboard advanced, fire one fast retransmit when EITHER (a) the head-of-line slot's latest send is older than the RACK reorder window <code>R</code> ([[#3. Protocol parameters|Section 3]]) and not yet aged out, OR (b) the SACK <code>dup-thresh</code> count above <code>snd_cr.lwe</code> reaches <code>DUP_THRESH</code> (= 3, RFC 8985 sec. 6.2 step 4).  Fires at most once per non-advancing cumulative-ACK value, gated by <code>rack_fired_lwe</code> (the <code>snd_cr.lwe</code> at which fast-retransmit last fired).  Set <code>SND_FAST_RXM</code> on the slot (one-shot per-slot gate) and enter NewReno-style careful recovery (see NewReno below in this section).
      For each gap below hi_sacked whose slot is (1) still owned,
:The RACK reorder window <code>R</code> uses the RFC 8985 sec. 6.2 form <code>R = MIN(reo_wnd_mult * min_RTT / 4, SRTT)</code> with a <code>MIN_REORDER_NS = 250 us</code> floor.  Before the first RTT sample seeds <code>min_rtt</code>, <code>R</code> falls back to <code>MIN(reo_wnd_mult * SRTT / 4, SRTT)</code>, still floored at <code>MIN_REORDER_NS</code> (consistent with the windowed-minimum fallback described in [[#12. RTT estimation|Section 12]])<code>min_rtt</code> is a windowed minimum over the last <code>MIN_RTT_WIN_NS</code> = 5 min of RTT samples (matches the Linux <code>tcp_min_rtt_wlen</code> default) so a route change to a longer path eventually re-anchors the reorder window without relying on <code>reo_wnd_mult</code> growth alone.
      (2) not already SND_FAST_RXM, (3) not aged out past t_r, and
      (4) either outside the RACK window R OR with dup_thresh >=
      DUP_THRESH (same hybrid as fast-retransmit, see Section 6.2),
      re-emitEach SACK-driven retransmit re-arms a fresh rxm so
      a lost retransmit can still be recovered by its own RTO
      timer.
</pre>


<pre>
;SACK-driven retransmit
  NewReno
:For each gap below <code>hi_sacked</code> whose slot is (1) still owned, (2) not already <code>SND_FAST_RXM</code>, (3) not aged out past <code>t_r</code>, and (4) either outside the RACK window <code>R</code> OR with <code>dup_thresh &gt;= DUP_THRESH</code> (same hybrid as fast-retransmit, see [[#6.2. Locked main path|Section 6.2]]), re-emit. Each SACK-driven retransmit re-arms a fresh <code>rxm</code> so a lost retransmit can still be recovered by its own RTO timer.
      On entry, recovery_high = snd_cr.seqno + RTT_QUARANTINE.
      Exit when ackno >= recovery_high or ackno == snd_cr.seqno
      (the latter means everything sent has been acknowledged).
      seqno_rotate also clears recovery.
</pre>


;NewReno
:On entry, <code>recovery_high = snd_cr.seqno + RTT_QUARANTINE</code>.  Exit when <code>ackno &gt;= recovery_high</code> or <code>ackno == snd_cr.seqno</code> (the latter means everything sent has been acknowledged).  <code>seqno_rotate</code> also clears recovery.


== 9. Pre-DRF NACK ==
== 9. Pre-DRF NACK ==


The two sides have different inactivity thresholds
The two sides have different inactivity thresholds (<code>snd_cr.inact &gt; rcv_cr.inact</code>), so a receiver can detect "stale data run" before the sender's own DRF logic kicks in.  NACK is the receiver-driven nudge that asks the sender to re-transmit the head of the run.
(snd_cr.inact > rcv_cr.inact), so a receiver can detect "stale data
run" before the sender's own DRF logic kicks in.  NACK is the
receiver-driven nudge that asks the sender to re-transmit the head
of the run.
 
<pre>
  Send (frcti_nack_snd, called by frcti_rcv when rcv_inact_check
        returns FRCT_INACT_NEED_NACK)
      When an incoming DATA packet has no DRF and rcv-side activity
      is older than rcv_cr.inact, the receiver emits a bare packet
      with flags = FRCT_NACK and seqno = arrival_seqno - 1
      (informational only, not consulted by the receive handler).
      The cooldown in Section 3 rate-limits the burst.  Non-DATA
      non-DRF arrivals bypass rcv_inact_check entirely; non-DATA
      DRF still rebases via the DRF branch.
</pre>


<pre>
;Send (<code>frcti_nack_snd</code>, called by <code>frcti_rcv</code> when <code>rcv_inact_check</code> returns <code>FRCT_INACT_NEED_NACK</code>)
  Receive (frcti_nack_rcv)
:When an incoming DATA packet has no DRF and rcv-side activity is older than <code>rcv_cr.inact</code>, the receiver emits a bare packet with <code>flags = FRCT_NACK</code> and <code>seqno = arrival_seqno - 1</code> (informational only, not consulted by the receive handler).  The cooldown in [[#3. Protocol parameters|Section 3]] rate-limits the burstNon-DATA non-DRF arrivals bypass <code>rcv_inact_check</code> entirely; non-DATA DRF still rebases via the DRF branch.
      Dispatched in the early-exit branch (Section 6.1), before
      rcv_inact_check.  The sender copies the head-of-line (HoL)
      rxm packet, marks the slot SND_RTX | SND_FAST_RXM (Karn-
      suppress next ACK, one-shot fast-rxm gate), sets rtt_lwe =
      snd_cr.lwe + 1, and re-emits via fast_rxm_send with FRCT_RXM
      and a refreshed ackno.  The original rxm_entry and its RTO
      timer are left armed - the NACK emit is additive to the
      normal retransmit machinery, not a replacementNo-op if
      nothing is in flight, the HoL slot has aged past t_r, or
      the HoL rxm pointer has been cleared by SACK or RACK.
</pre>


NACK serves two roles:
;Receive (<code>frcti_nack_rcv</code>)
:Dispatched in the early-exit branch ([[#6.1. Early-exit dispatch|Section 6.1]]), before <code>rcv_inact_check</code>.  The sender copies the head-of-line (HoL) <code>rxm</code> packet, marks the slot <code>SND_RTX | SND_FAST_RXM</code> (Karn-suppress next ACK, one-shot fast-rxm gate), sets <code>rtt_lwe = snd_cr.lwe + 1</code>, and re-emits via <code>fast_rxm_send</code> with <code>FRCT_RXM</code> and a refreshed <code>ackno</code>.  The original <code>rxm_entry</code> and its RTO timer are left armed - the NACK emit is additive to the normal retransmit machinery, not a replacement.  No-op if nothing is in flight, the HoL slot has aged past <code>t_r</code>, or the HoL <code>rxm</code> pointer has been cleared by SACK or RACK.


<pre>
NACK has exactly one role: lost first-of-run (DRF) packet recovery. Until the DRF packet arrives, the receiver cannot rebase its window, so any subsequent in-flight packets look stale to the receiver. The NACK fires the moment a stale receiver sees DATA without DRF, telling the sender to re-emit the head-of-line (DRF) packet at NACK-cooldown latency rather than waiting for the initial RTO (which is the configured default until <code>srtt</code> is seeded by the first probe round-trip). Mid-stream loss is NOT NACK-driven; it is recovered by the sender's RTO, fast retransmit, and SACK-driven retransmit paths ([[#8. Retransmission|Section 8]]) only.
  1. Lost first-of-run (DRF) packet recovery. Required.  Until
    the DRF packet arrives, the receiver cannot rebase its
    window, so any subsequent in-flight packets look stale to
    the receiver. The NACK fires the moment the second
    packet arrives at a stale receiver, telling the sender to
    re-emit the HoL (DRF) packet at NACK-cooldown latency rather
    than waiting for the initial RTO (which is the configured
    default until srtt is seeded by the first probe round-trip).
  2. General loss-recovery accelerator.  When loss is detected
    receiver-first, the NACK skips one RTO of latency relative to
    waiting for the sender's RTO to fire.
</pre>
 
In both cases the existing rxm_entry and its RTO timer are left
armed, so the RTO path remains the eventual fallback.


The existing <code>rxm_entry</code> and its RTO timer are left armed on a NACK re-emit, so the RTO path remains the eventual fallback.


== 10. Cumulative + selective ACK ==
== 10. Cumulative + selective ACK ==


Cumulative ACK is ackno = rcv_cr.lwe.  On out-of-order arrival the
Cumulative ACK is <code>ackno = rcv_cr.lwe</code>.  On out-of-order arrival the
receiver also emits a SACK packet (Section 1.3) whose payload lists
receiver also emits a SACK packet ([[#1.3. SACK payload|Section 1.3]]) whose payload lists
*present* blocks above lwe (analogous to TCP SACK / QUIC ACK
''present'' blocks above <code>lwe</code> (analogous to TCP SACK / QUIC ACK
ranges).  SACKs are rate-limited per Section 3 and suppressed when
ranges).  SACKs are rate-limited per [[#3. Protocol parameters|Section 3]] and suppressed when
neither lwe nor block count has changed since the last SACK.
neither <code>lwe</code> nor block count has changed since the last SACK.


D-SACK reports (RFC 2883) are emitted in-band as block[0] of an
D-SACK reports (RFC 2883) are emitted in-band as <code>block[0]</code> of an
otherwise normal SACK frame (see Section 1.3 for the encoding).
otherwise normal SACK frame (see [[#1.3. SACK payload|Section 1.3]] for the encoding).
Two receiver triggers arm a pending D-SACK report (single-slot,
Two receiver triggers arm a pending D-SACK report (single-slot,
latest-wins):
latest-wins):


<pre>
* DATA arrival with <code>seqno &lt; rcv_cr.lwe</code>, both wire-dup (no RXM, <code>is_dup_data</code> path) and retransmit (RXM, post-FC branch) (RFC 2883 sec. 4.1.1, full duplicate)
  - DATA arrival with seqno < rcv_cr.lwe, both wire-dup (no RXM,
* <code>rq_accept</code> conflict, slot already occupied in <code>[lwe, rwe)</code> (RFC 2883 sec. 4.1.2, partial duplicate)
    is_dup_data path) and retransmit (RXM, post-FC branch)
    (RFC 2883 sec. 4.1.1, full duplicate)
  - rq_accept conflict, slot already occupied in [lwe, rwe)
    (RFC 2883 sec. 4.1.2, partial duplicate)
</pre>


When a D-SACK is pending and the standard scoreboard SACK would be
When a D-SACK is pending and the standard scoreboard SACK would be
suppressed by dedup or rate-limit, the report is emitted as a
suppressed by dedup or rate-limit, the report is emitted as a
stand-alone SACK frame through the normal ack_snd path; when a
stand-alone SACK frame through the normal <code>ack_snd</code> path; when a
D-SACK report is pending the path bypasses dedup and the TICTIME
D-SACK report is pending the path bypasses dedup and the <code>TICTIME</code>
rate-limit, but the a-timer suppression on rcv inactivity still
rate-limit, but the a-timer suppression on rcv inactivity still
applies.
applies.


Bare ACKs are deferred via a per-flow delayed-ACK timer (one in
Bare ACKs are deferred via a per-flow delayed-ACK timer (one in
flight at a time, atomic test-and-set dedup; fires per Section 3
flight at a time, atomic test-and-set dedup; fires per [[#3. Protocol parameters|Section 3]]
after the first in-order arrival).  Suppressed if (1) no new
after the first in-order arrival).  Suppressed if (1) no new
seqno, (2) rcv side is inactive (older than t_a), or (3) the
<code>seqno</code>, (2) rcv side is inactive (older than <code>t_a</code>), or (3) the
sender just sent within TICTIME.  A pending D-SACK ride-through
sender just sent within <code>TICTIME</code>.  A pending D-SACK ride-through
bypasses (1) and (3); the a-timer gate (2) is unconditional.
bypasses (1) and (3); the a-timer gate (2) is unconditional.


Line 1,103: Line 840:
== 11. Flow control ==
== 11. Flow control ==


The receiver advertises rwe in every FC field.  The sender treats
The receiver advertises <code>rwe</code> in every FC field.  The sender treats
its snd_cr.rwe as the absolute right edge: when
its <code>snd_cr.rwe</code> as the absolute right edge: when
snd_cr.seqno >= snd_cr.rwe the window is closed and flow_write
<code>snd_cr.seqno &gt;= snd_cr.rwe</code> the window is closed and <code>flow_write</code>
yields.  While closed, the sender periodically emits RDVS
yields.  While closed, the sender periodically emits RDVS
(rendezvous) packets (cadence DELT_RDV); the receiver replies with
(rendezvous) packets (cadence <code>DELT_RDV</code>); the receiver replies with
a bare FC packet (ackno = 0) that reopens the window.  Once the
a bare FC packet (<code>ackno = 0</code>) that reopens the window.  Once the
window has been closed for longer than MAX_RDV the sender stops
window has been closed for longer than <code>MAX_RDV</code> the sender stops
emitting RDVS but does not tear the flow down - the writer keeps
emitting RDVS but does not tear the flow down - the writer keeps
blocking until either a peer-driven FC arrives or the KA
blocking until either a peer-driven FC arrives or the KA
(keepalive) / r-timer marks the flow.
(keepalive) / r-timer marks the flow.


rwe is clamped to lwe + RQ_SIZE on receipt and MUST NOT shrink:
<code>rwe</code> is clamped to <code>lwe + RQ_SIZE</code> on receipt and MUST NOT shrink:
a backward rwe is silently clamped to the current snd_cr.rwe;
a backward <code>rwe</code> is silently clamped to the current <code>snd_cr.rwe</code>;
the FC packet still reopens the window.
the FC packet still reopens the window.


Line 1,121: Line 858:
== 12. RTT estimation ==
== 12. RTT estimation ==


Active RTTP probes (Section 1.4) carry a 32-bit probe_id (0
Active RTTP probes ([[#1.4. RTTP payload|Section 1.4]]) carry a 32-bit <code>probe_id</code> (0
reserved) and a 16-byte random nonce echoed verbatim - defends
reserved) and a 16-byte random nonce echoed verbatim - defends
against spoofed replies.  A ring of RTTP_RING in-flight probes is
against spoofed replies.  A ring of <code>RTTP_RING</code> in-flight probes is
kept; an echo whose (id, nonce) doesn't match the ring slot is
kept; an echo whose <code>(id, nonce)</code> doesn't match the ring slot is
dropped.  A single RTTP sample is clamped to RTT_CLAMP_MUL * srtt
dropped.  A single RTTP sample is clamped to <code>RTT_CLAMP_MUL * srtt</code>
(compile-time RTT_CLAMP_MUL = 16) once srtt is seeded; the first
(compile-time <code>RTT_CLAMP_MUL = 16</code>) once <code>srtt</code> is seeded; the first
cold-probe sample feeds rtt_update raw.
cold-probe sample feeds <code>rtt_update</code> raw.


Probe arming gates:
Probe arming gates:


<pre>
;Cold (no <code>srtt</code> yet)
  - Cold (no srtt yet): the receive path arms at most one probe
:the receive path arms at most one probe per 100 ms via <code>frcti_rcv_probe</code> (<code>PROBE_DUE_COLD</code>); arming requires an incoming packet.  Active send-path arming bails while <code>srtt == 0</code>.
    per 100 ms via frcti_rcv_probe (PROBE_DUE_COLD); arming
;Warm (<code>rtt_probe_arm</code>, called from <code>frcti_snd</code>)
    requires an incoming packet.  Active send-path arming bails
:outstanding data (<code>snd_cr.seqno &gt; snd_cr.lwe</code>), AND at least <code>2 * srtt</code> since <code>t_rcv_rtt</code> (last RTT receive of any kind), AND at least <code>srtt</code> since <code>t_snd_probe</code> (last probe emit).
    while srtt == 0.
  - Warm (rtt_probe_arm, called from frcti_snd): outstanding
    data (snd_cr.seqno > snd_cr.lwe), AND at least 2 * srtt
    since t_rcv_rtt (last RTT receive of any kind), AND at
    least srtt since t_snd_probe (last probe emit).
</pre>


Sample feeds either Linux's asymmetric mdev estimator
Sample feeds either Linux's asymmetric <code>mdev</code> estimator
(FRCT_LINUX_RTT_ESTIMATOR, default ON) or RFC 6298 symmetric EWMA
(<code>FRCT_LINUX_RTT_ESTIMATOR</code>, default ON) or RFC 6298 symmetric EWMA
(compile option).  srtt is floored at 10 ms when seeded from a
(compile option).  <code>srtt</code> is floored at 10 ms when seeded from a
hint, at 1 us after every update (including the first seeding
hint, at 1 us after every update (including the first seeding
sample); mdev floored at 100 ns.
sample); <code>mdev</code> floored at 100 ns.


<pre>
:<code>RTO = max(rto_min, 2 * srtt, srtt + (mdev &lt;&lt; MDEV_MUL))</code>
    RTO = max(rto_min, 2 * srtt, srtt + (mdev << MDEV_MUL))
</pre>


(the 2 * srtt floor is an FRCT addition not in RFC 6298).
(the <code>2 * srtt</code> floor is an FRCT addition not in RFC 6298).
Effective wheel deadline capped per Section 3.
Effective wheel deadline capped per [[#3. Protocol parameters|Section 3]].


ACK-derived samples (frcti_ack_rcv -> rtt_sample_eligible), beyond
ACK-derived samples (<code>frcti_ack_rcv</code> -&gt; <code>rtt_sample_eligible</code>), beyond
the cum-ACK advance gate in frcti_ack_rcv (ackno > lwe and
the cum-ACK advance gate in <code>frcti_ack_rcv</code> (<code>ackno &gt; lwe</code> and
ackno <= seqno), require all of: not in recovery; ACK packet does
<code>ackno &lt;= seqno</code>), require all of: not in recovery; ACK packet does
not carry FRCT_RXM; HoL slot's SND_RTX bit clear; slot's rxm
not carry <code>FRCT_RXM</code>; HoL slot's <code>SND_RTX</code> bit clear; slot's <code>rxm</code>
pointer non-NULL (not SACK-consumed); lwe not below the rtt_lwe
pointer non-NULL (not SACK-consumed); <code>lwe</code> not below the <code>rtt_lwe</code>
fence; srtt already seeded by an RTTP probe.  There is no ACK-only
fence; <code>srtt</code> already seeded by an RTTP probe.  There is no ACK-only
seeding.
seeding.


Every eligible sample also feeds RACK.min_RTT (RFC 8985 sec. 6.2)
Every eligible sample also feeds <code>RACK.min_RTT</code> (RFC 8985 sec. 6.2)
via a windowed minimum: replace whenever the sample is strictly
via a windowed minimum: replace whenever the sample is strictly
smaller OR more than MIN_RTT_WIN_NS (5 min, matches Linux
smaller OR more than <code>MIN_RTT_WIN_NS</code> (5 min, matches Linux
tcp_min_rtt_wlen) has elapsed since the current min was set.  The
<code>tcp_min_rtt_wlen</code>) has elapsed since the current min was set.  The
downward branch is immediate (faster path picked up at once); the
downward branch is immediate (faster path picked up at once); the
upward branch is gated on the window (a transient queue burst does
upward branch is gated on the window (a transient queue burst does
not poison the estimate, but a sustained route change to a longer
not poison the estimate, but a sustained route change to a longer
path re-anchors min_RTT after at most one window).  Seeded from
path re-anchors <code>min_RTT</code> after at most one window).  Seeded from
rtt_hint at rtt_init; 0 acts as the unset sentinel and the base
<code>rtt_hint</code> at <code>rtt_init</code>; 0 acts as the unset sentinel and the base
in rack_reorder_window falls back from min_RTT to SRTT (so
in <code>rack_reorder_window</code> falls back from <code>min_RTT</code> to <code>SRTT</code> (so
R = mult * SRTT/4, capped at SRTT, floored at MIN_REORDER_NS)
<code>R = mult * SRTT/4</code>, capped at <code>SRTT</code>, floored at <code>MIN_REORDER_NS</code>)
until the first sample.  See Section 6.2.
until the first sample.  See [[#6.2. Locked main path|Section 6.2]].




== 13. Liveness (keepalive) ==
== 13. Liveness (keepalive) ==


When qs.timeout > 0 a per-flow KA (keepalive) timer is armed.
When <code>qs.timeout &gt; 0</code> a per-flow KA (keepalive) timer is armed.
Arming uses rcv_cr.act for the deadline computation:
Arming uses <code>rcv_cr.act</code> for the deadline computation:


<pre>
:<code>deadline = min(snd_act + qs.timeout/4, rcv_act + qs.timeout)</code>
    deadline = min(snd_act + qs.timeout/4,
                  rcv_act + qs.timeout)
</pre>


(clamped to now + qs.timeout/4 if already past).  The timer fires
(clamped to <code>now + qs.timeout/4</code> if already past).  The timer fires
either on sender idleness (to send a KA) or on receiver idleness
either on sender idleness (to send a KA) or on receiver idleness
(to declare the peer dead).  On fire (ka_snd) the peer-dead test
(to declare the peer dead).  On fire (<code>ka_snd</code>) the peer-dead test
uses max(rcv_cr.act, t_ka_rcv) so a recent KA reply counts even
uses <code>max(rcv_cr.act, t_ka_rcv)</code> so a recent KA reply counts even
when no DATA has arrived:
when no DATA has arrived:


<pre>
* If <code>now - max(rcv_cr.act, t_ka_rcv) &gt; qs.timeout</code>, mark the flow <code>ACL_FLOWPEER</code> and notify the per-process flow-event set (<code>proc.fqset</code>) with <code>FLOW_PEER</code>.
  - If now - max(rcv_cr.act, t_ka_rcv) > qs.timeout, mark the flow
* Else if <code>snd_idle &gt; qs.timeout/4</code>, emit a bare <code>KA | ACK</code> (<code>ackno = rcv_cr.lwe</code>) and re-arm.
    ACL_FLOWPEER and notify the per-process flow-event set
* Else just re-arm.
    (proc.fqset) with FLOW_PEER.
  - Else if snd_idle > qs.timeout/4, emit a bare KA | ACK
    (ackno = rcv_cr.lwe) and re-arm.
  - Else just re-arm.
</pre>


Note: rx_rb and tx_rb are the receive and transmit shared-memory
Note: <code>rx_rb</code> and <code>tx_rb</code> are the receive and transmit shared-memory
ring buffers.  The r-timer raises ACL_FLOWDOWN on both (route is
ring buffers.  The r-timer raises <code>ACL_FLOWDOWN</code> on both (route is
broken); keepalive raises ACL_FLOWPEER on rx_rb only and notifies
broken); keepalive raises <code>ACL_FLOWPEER</code> on <code>rx_rb</code> only and notifies
the flow-event set (peer is silent, writer keeps tx_rb usable) -
the flow-event set (peer is silent, writer keeps <code>tx_rb</code> usable) -
distinct ACLs.  qs.timeout == 0 disables keepalive entirely; a
distinct ACLs.  <code>qs.timeout == 0</code> disables keepalive entirely; a
silent peer crash is then undetected.
silent peer crash is then undetected.


Line 1,212: Line 933:
== 14. Linger / teardown ==
== 14. Linger / teardown ==


On flow_dealloc, frcti_dealloc computes a grace timeout
On <code>flow_dealloc</code>, <code>frcti_dealloc</code> computes a grace timeout


<pre>
:<code>max(rcv_cr.act + rcv_cr.inact, snd_cr.act + snd_cr.inact) - now</code>
    max(rcv_cr.act + rcv_cr.inact, snd_cr.act + snd_cr.inact) - now
</pre>


(floored at 0 and converted to seconds) and returns it; flow_dealloc
(floored at 0 and converted to seconds) and returns it; <code>flow_dealloc</code>
forwards this to the IRMd as the dealloc grace.  The IRMd, not FRCT,
forwards this to the IRMd as the dealloc grace.  The IRMd, not FRCT,
performs the wait.  Before computing the timeout, FRCT may emit a
performs the wait.  Before computing the timeout, FRCT may emit a
final ACK when rcv_cr.lwe != rcv_cr.seqno (the peer has not been
final ACK when <code>rcv_cr.lwe != rcv_cr.seqno</code> (the peer has not been
told the most recent cumulative ACK) AND the rcv side has been
told the most recent cumulative ACK) AND the rcv side has been
active within t_a (a-timer not aged out).
active within <code>t_a</code> (a-timer not aged out).


FRCTFLINGER is honoured only when snd_cr.lwe < edge, where edge =
<code>FRCTFLINGER</code> is honoured only when <code>snd_cr.lwe &lt; edge</code>, where <code>edge =
snd_fin_seqno after FIN has been sent in stream mode and
snd_fin_seqno</code> after FIN has been sent in stream mode and
snd_cr.seqno otherwise (data or FIN still in flight).  The drain
<code>snd_cr.seqno</code> otherwise (data or FIN still in flight).  The drain
itself runs in flow_dealloc's while (FRCTI_LINGERING) loop, not in
itself runs in <code>flow_dealloc</code>'s <code>while (FRCTI_LINGERING)</code> loop, not in
frcti_dealloc.
<code>frcti_dealloc</code>.


The fd is single-reader / single-writer (documented in the
The fd is single-reader / single-writer (documented in the
manpages).  flow_write pumps rx_rb on every call (via
manpages).  <code>flow_write</code> pumps <code>rx_rb</code> on every call (via
flow_wait_window -> flow_drain_rx_nb) and additionally blocks on
<code>flow_wait_window</code> -&gt; <code>flow_drain_rx_nb</code>) and additionally blocks on
rx_rb when the send window is closed.  A pure-writer thread thus
<code>rx_rb</code> when the send window is closed.  A pure-writer thread thus
consumes ACKs without a dedicated reader.
consumes ACKs without a dedicated reader.


Line 1,245: Line 964:
ch. 9).  Timer-based connection management
ch. 9).  Timer-based connection management
(no SYN/FIN handshake, per-flow state born on first DATA and
(no SYN/FIN handshake, per-flow state born on first DATA and
reclaimed after t_mpl + a + r of silence), the DRF marker, and the
reclaimed after <code>t_mpl + a + r</code> of silence), the DRF marker, and the
t_mpl / t_a / t_r timers all come from delta-t.  See Watson,
<code>t_mpl</code> / <code>t_a</code> / <code>t_r</code> timers all come from delta-t.  See Watson,
"Timer-Based Mechanisms in Reliable Transport Protocol Connection
"Timer-Based Mechanisms in Reliable Transport Protocol Connection
Management", Computer Networks 5 (1981).
Management", Computer Networks 5 (1981).


The unified `flow_alloc(name, qos, ...)` primitive and its
The unified <code>flow_alloc(name, qos, ...)</code> primitive and its
multi-axis QoS-cube argument (Section 2.2) also come from RINA
multi-axis QoS-cube argument ([[#2.2. Service modes (orthogonal axes)|Section 2.2]]) also come from RINA
(Day 2008, ch. 6; Grasa et al., "IRATI: investigating RINA as an
(Day 2008, ch. 6; Grasa et al., "IRATI: investigating RINA as an
alternative to TCP/IP", Computer Networks 92 (2015)) - reliability,
alternative to TCP/IP", Computer Networks 92 (2015)) - reliability,
Line 1,260: Line 979:
references.
references.


<pre>
{| class="wikitable"
+------------------------+------------------+------------------------+
! FRCP mechanism !! Heritage !! Reference / note
| FRCP mechanism        | Heritage        | Reference / note      |
|-
+------------------------+------------------+------------------------+
| Random new <code>seqno</code> on <code>seqno_rotate</code> || TCP ISN || RFC 6528 (Gont &amp; Bellovin, 2012).  QUIC PN-space reset (RFC 9000 sec. 12.3) is a structural analogue.
| Random new seqno on    | TCP ISN          | RFC 6528 (Gont &      |
|-
| seqno_rotate          |                  | Bellovin, 2012).       |
| Cumulative ACK, left-window-edge advance || TCP || RFC 793 / RFC 9293
|                        |                  | QUIC PN-space reset    |
|-
|                        |                  | (RFC 9000 sec. 12.3)   |
| Receive window with non-shrink rule || TCP || RFC 793 sec. 3.7 / RFC 9293 sec. 3.8.6; RFC 1122 sec. 4.2.2.16 for the explicit non-shrink prohibition
|                        |                  | is a structural        |
|-
|                        |                  | analogue.             |
| Modular <code>seqno</code> arithmetic (<code>before</code>/<code>after</code> helpers) || TCP || RFC 793 sec. 3.3 / RFC 9293 sec. 3.4
+------------------------+------------------+------------------------+
|-
| Cumulative ACK,        | TCP             | RFC 793 / RFC 9293    |
| Selective ACK block list || TCP || RFC 2018 (Mathis et al., 1996).  Encoded as a typed FRCP packet rather than a TCP option, so framing is closer to QUIC ACK frames.  D-SACK (RFC 2883) carried in-band as <code>block[0]</code>; see [[#1.3. SACK payload|Section 1.3]].
| left-window-edge      |                  |                        |
|-
| advance                |                 |                       |
| NewReno-careful recovery with <code>recovery_high</code> gate || TCP || RFC 6582 (Henderson et al., 2012); QUIC builds on the same model in RFC 9002 sec. 7.3.2.  Cwnd half absent (CC in IPCP).
+------------------------+------------------+------------------------+
|-
| Receive window with   | TCP             | RFC 793 sec. 3.7 /    |
| RACK reordering window for fast retransmit || TCP || RFC 8985 (Cheng et al., 2021). FRCP <code>R = MIN(reo_wnd_mult * min_RTT / 4, SRTT)</code> with a <code>MIN_REORDER_NS = 250 us</code> floor against <code>srtt</code> collapse; matches RFC 8985 sec. 6.2 and Linux <code>tcp_rack_reo_wnd</code>. DSACK-driven <code>reo_wnd_mult</code> (sec. 6.2 step 4) is adopted; see [[#1.3. SACK payload|Section 1.3]] for the wire encoding.  The hybrid RACK-or-<code>DUP_THRESH</code> trigger from RFC 8985 sec. 6.2 step 4 is adopted ([[#8. Retransmission|Section 8]]).  QUIC's analogue in RFC 9002 sec. 6.1.2 uses <code>max(srtt, latest_rtt)</code> as the base.
| non-shrink rule        |                  | RFC 9293 sec. 3.8.6;   |
|-
|                        |                  | RFC 1122 sec. 4.2.2.16 |
| Karn's algorithm: no RTT sample on retransmits, RTO-collapse freeze || TCP || Karn &amp; Partridge, "Improving Round-Trip Time Estimates in Reliable Transport Protocols", SIGCOMM 1987; RFC 6298 sec. 3.
|                       |                  | for the explicit non- |
|-
|                       |                  | shrink prohibition    |
| RTO formula <code>RTO = max(RTO_MIN, srtt + (mdev &lt;&lt; MDEV_MUL))</code> || TCP || RFC 6298 (Paxson et al., 2011).  <code>RTO_MIN</code> = 250 us is below RFC 6298 sec. 2.4's 1 s SHOULD-floor - a recursive-layer choice.
+------------------------+------------------+------------------------+
|-
| Modular seqno          | TCP              | RFC 793 sec. 3.3 /    |
| Linux asymmetric <code>mdev</code> estimator (default) || Linux kernel || <code>tcp_rtt_estimator()</code> in <code>net/ipv4/tcp_input.c</code>; the <code>if(delta&lt;0) m&gt;&gt;=3</code> dampening is a kernel divergence from RFC 6298.  RFC 6298 EWMA available behind a compile flag.
| arithmetic            |                  | RFC 9293 sec. 3.4     |
|-
| (before/after helpers) |                  |                        |
| Delayed ACK with rate suppression || TCP || RFC 813 (Clark, 1982); RFC 1122 sec. 4.2.3.2; RFC 5681 sec. 4.2. Single-deadline coalescing rather than "ack-every-other-segment".
+------------------------+------------------+------------------------+
|-
| Selective ACK block    | TCP              | RFC 2018 (Mathis et    |
| Zero-window-probe / persist-timer analogue (RDVS) || TCP || RFC 1122 sec. 4.2.2.17 / RFC 9293 sec. 3.8.6.1.  RDVS solicits an FC reply, distinct from QUIC <code>DATA_BLOCKED</code> (RFC 9000 sec. 19.12), which is one-way notification.  <code>MAX_RDV</code> give-up departs from TCP.
| list                  |                  | al., 1996)Encoded  |
|-
|                        |                  | as a typed FRCP packet |
| Multiplexed control on a single PCI || SCTP / QUIC || SCTP chunk bundling (RFC 9260 sec. 6.10); QUIC frame multiplexing (RFC 9000 sec. 12.4).  Cleaner fit than TCP's separate-flag-bits design.
|                        |                  | rather than a TCP      |
|-
|                       |                  | option, so framing is  |
| ACK ranges as multiple discontiguous acked blocks || QUIC || QUIC ACK frame (RFC 9000 sec. 19.3).  FRCP SACK is conceptually QUIC-frame-shaped even though encoded as absolute <code>[start,end]</code> pairs.
|                       |                 | closer to QUIC ACK    |
|-
|                       |                  | frames.  D-SACK (RFC   |
| Nonce-authenticated active RTT / liveness probing (RTTP) || QUIC <code>PATH_CHALLENGE</code> || <code>PATH_CHALLENGE</code> / <code>PATH_RESPONSE</code> (RFC 9000 sec. 8.2, sec. 19.17, sec. 19.18)WebRTC ICE consent-freshness (RFC 7675) is the same pattern.  QUIC's nonce is 8 octets; FRCP chooses 16.
|                        |                  | 2883) carried in-band  |
|-
|                       |                 | as block[0]; see      |
| Probing distinct from keepalive || QUIC || KA timer answers "peer alive?", RTTP answers "path measurable?", as in QUIC PING (RFC 9000 sec. 19.2) vs <code>PATH_CHALLENGE</code>.
|                       |                  | Section 1.3.           |
|-
+------------------------+------------------+------------------------+
| Bare KA + ACK keepalive packets || QUIC / SCTP || QUIC PING (RFC 9000 sec. 19.2); SCTP HEARTBEAT / HEARTBEAT-ACK (RFC 9260 sec. 8.3). SCTP HEARTBEAT also carries an opaque echoed blob, structurally similar to FRCP RTTP.
| NewReno-careful        | TCP              | RFC 6582 (Henderson    |
|-
| recovery with          |                 | et al., 2012); QUIC    |
| (<code>FFGM</code>, <code>LFGM</code>) fragment-role bits ([[#7.2. Fragmentation and reassembly|Section 7.2]]) || SCTP || RFC 9260 sec. 3.3.1 DATA chunk B/E bits encode the same four states (<code>B+E=SOLE</code>, <code>B-only=FIRST</code>, neither=<code>MID</code>, <code>E-only=LAST</code>).  Each fragment carries its own <code>seqno</code>/TSN and is independently retransmitted.
| recovery_high gate    |                  | builds on the same    |
|-
|                        |                  | model in RFC 9002      |
| Stream byte-offset reassembly (Sections [[#1.5. Stream PCI extension|1.5]], [[#16. Stream-mode flows|16]]) || QUIC || QUIC STREAM frame (RFC 9000 sec. 19.8) uses Offset + Length varints; FRCP uses fixed 32-bit <code>start</code> / <code>end</code>One stream per flow vs QUIC's many streams multiplexed.
|                        |                  | sec. 7.3.2Cwnd half |
|-
|                        |                  | absent (CC in IPCP).   |
| FIN end-of-stream marker (Sections [[#1.2. Flag bits|1.2]], [[#16. Stream-mode flows|16]]) || TCP / QUIC || TCP FIN flag (RFC 9293 sec. 3.1) closes one half of the byte stream; QUIC STREAM frame FIN bit (RFC 9000 sec. 19.8) does the same per stream with an immutable final-size invariance (RFC 9000 sec. 4.5: the final size is fixed once observed).  FRCP's FIN consumes one packet <code>seqno</code> (not one byte of stream space) and is idempotent on the sender side.
+------------------------+------------------+------------------------+
|-
| RACK reordering        | TCP              | RFC 8985 (Cheng et    |
| Stream byte-credit flow control ([[#16. Stream-mode flows|Section 16]]) || QUIC || <code>MAX_STREAM_DATA</code> (RFC 9000 sec. 4.1, sec. 19.10). FRCP projects a per-flow byte budget onto the <code>seqno</code>-space <code>rwe</code>.  Single stream per flow collapses QUIC's <code>MAX_DATA</code> / <code>MAX_STREAM_DATA</code> distinction.
| window for fast        |                  | al., 2021). FRCP     |
|-
| retransmit            |                  | R = MIN(reo_wnd_mult * |
| Header protection (encrypted seqnos) || QUIC || QUIC RFC 9001 sec. 5.4 applies header protection on top of AEAD to mask the packet numberFRCP's per-flow AEAD wrap ([[#16. Stream-mode flows|Section 16]]) is wider: it encrypts the entire PCI including <code>seqno</code> because the IPCP below already routes, so no destination connection-ID needs to stay in clear (cf. RFC 9000 sec. 5.2).
|                       |                 | min_RTT / 4, SRTT)     |
|-
|                       |                 | with a MIN_REORDER_NS  |
| Two-bit fragment role polarity || SCTP || The (<code>FFGM</code>, <code>LFGM</code>) pair follows SCTP B/E (begin = 1 / end = 1) rather than IPv4 MF (RFC 791 sec. 3.2), which has the inverse polarity (MF = 1 means NOT last).
|                        |                  | = 250 us floor against |
|-
|                        |                  | srtt collapse; matches |
| Orthogonal reliability / ordering axes ([[#2.2. Service modes (orthogonal axes)|Section 2.2]]) || SCTP || PR-SCTP (RFC 3758, per-message partial reliability) and SCTP DATA U-bit (RFC 9260 sec. 3.3.1, per-message unordered) are the closest precedents for decoupling reliability from ordering; FRCP sets them per-flow rather than per-message.
|                        |                  | RFC 8985 sec. 6.2 and  |
|-
|                        |                  | Linux tcp_rack_reo_wnd.|
| Orthogonal CRC (<code>qs.ber == 0</code>) || UDP-Lite || RFC 3828 (Larzon et al., 2004) lets the sender pick a per-packet Checksum Coverage and the receiver enforce a locally configured minimum (no in-band negotiation; sec. 3.1, sec. 3.3).  FRCP gates a full CRC trailer on <code>qs.ber == 0</code> at flow setup.  Contrast TCP / SCTP (mandatory checksum) and QUIC (AEAD subsumes CRC).
|                        |                  | DSACK-driven          |
|-
|                        |                  | reo_wnd_mult (sec. 6.2 |
| Setup-time service negotiation || DCCP / SCTP / QUIC || DCCP Service Codes (RFC 4340 sec. 8.1.2, RFC 5595); SCTP INIT parameters (RFC 9260 sec. 3.3.2); QUIC transport parameters (RFC 9000 sec. 7.4).  All negotiate service properties at connection setup; only RINA's QoS cube exposes them as an orthogonal vector.
|                        |                  | step 4) is adopted;    |
|}
|                        |                  | see Section 1.3 for    |
|                        |                  | the wire encoding.    |
|                        |                  | The hybrid RACK-or-    |
|                        |                  | DUP_THRESH trigger    |
|                        |                  | from RFC 8985 sec. 6.2 |
|                       |                 | step 4 is adopted      |
|                       |                 | (Section 8)QUIC's   |
|                        |                  | analogue in RFC 9002  |
|                        |                  | sec. 6.1.2 uses        |
|                        |                  | max(srtt, latest_rtt) |
|                        |                  | as the base.           |
+------------------------+------------------+------------------------+
| Karn's algorithm:      | TCP              | Karn & Partridge,      |
| no RTT sample on      |                  | "Improving Round-Trip  |
| retransmits, RTO-     |                  | Time Estimates in     |
| collapse freeze        |                  | Reliable Transport    |
|                        |                  | Protocols", SIGCOMM    |
|                        |                  | 1987; RFC 6298 sec. 3. |
+------------------------+------------------+------------------------+
| RTO formula            | TCP              | RFC 6298 (Paxson et    |
| RTO = max(RTO_MIN,    |                  | al., 2011).  RTO_MIN = |
| srtt + (mdev <<        |                  | 5 ms is below RFC 6298 |
| MDEV_MUL))            |                  | sec. 2.4's 1 s SHOULD- |
|                        |                  | floor - a recursive-  |
|                        |                  | layer choice.          |
+------------------------+------------------+------------------------+
| Linux asymmetric mdev  | Linux kernel    | tcp_rtt_estimator() in |
| estimator (default)    |                  | net/ipv4/tcp_input.c;  |
|                        |                  | the if(delta<0) m>>=3  |
|                        |                  | dampening is a        |
|                        |                  | kernel divergence from |
|                        |                  | RFC 6298.  RFC 6298    |
|                        |                  | EWMA available behind  |
|                        |                  | a compile flag.        |
+------------------------+------------------+------------------------+
| Delayed ACK with rate  | TCP              | RFC 813 (Clark, 1982); |
| suppression            |                  | RFC 1122 sec. 4.2.3.2; |
|                        |                  | RFC 5681 sec. 4.2.    |
|                        |                  | Single-deadline        |
|                        |                  | coalescing rather than |
|                        |                  | "ack-every-other-      |
|                        |                  | segment".              |
+------------------------+------------------+------------------------+
| Zero-window-probe /    | TCP              | RFC 1122 sec.          |
| persist-timer          |                  | 4.2.2.17 / RFC 9293    |
| analogue (RDVS)        |                  | sec. 3.8.6.1.  RDVS    |
|                        |                  | solicits an FC reply,  |
|                        |                  | distinct from QUIC    |
|                        |                  | DATA_BLOCKED (RFC 9000 |
|                        |                  | sec. 19.12), which is  |
|                        |                  | one-way notification.  |
|                        |                  | MAX_RDV give-up        |
|                        |                  | departs from TCP.      |
+------------------------+------------------+------------------------+
| Multiplexed control    | SCTP / QUIC      | SCTP chunk bundling    |
| on a single PCI        |                  | (RFC 9260 sec. 6.10);  |
|                        |                  | QUIC frame            |
|                        |                  | multiplexing (RFC 9000 |
|                        |                  | sec. 12.4).  Cleaner  |
|                        |                  | fit than TCP's        |
|                        |                  | separate-flag-bits    |
|                        |                  | design.                |
+------------------------+------------------+------------------------+
| ACK ranges as          | QUIC            | QUIC ACK frame (RFC    |
| multiple discontiguous |                  | 9000 sec. 19.3).  FRCP |
| acked blocks          |                  | SACK is conceptually  |
|                        |                  | QUIC-frame-shaped      |
|                        |                  | even though encoded    |
|                        |                  | as absolute            |
|                        |                  | [start,end] pairs.    |
+------------------------+------------------+------------------------+
| Nonce-authenticated    | QUIC            | PATH_CHALLENGE /      |
| active RTT / liveness  | PATH_CHALLENGE  | PATH_RESPONSE (RFC    |
| probing (RTTP)        |                  | 9000 sec. 8.2,        |
|                        |                  | sec. 19.17, sec.      |
|                        |                  | 19.18).  WebRTC ICE    |
|                        |                  | consent-freshness      |
|                        |                  | (RFC 7675) is the      |
|                        |                  | same pattern.  QUIC's  |
|                        |                  | nonce is 8 octets;    |
|                        |                  | FRCP chooses 16.      |
+------------------------+------------------+------------------------+
| Probing distinct from  | QUIC            | KA timer answers      |
| keepalive              |                  | "peer alive?", RTTP    |
|                        |                  | answers "path          |
|                        |                  | measurable?", as in    |
|                        |                  | QUIC PING (RFC 9000    |
|                        |                  | sec. 19.2) vs          |
|                        |                  | PATH_CHALLENGE.        |
+------------------------+------------------+------------------------+
| Bare KA + ACK          | QUIC / SCTP      | QUIC PING (RFC 9000    |
| keepalive packets      |                  | sec. 19.2); SCTP      |
|                        |                  | HEARTBEAT /            |
|                        |                  | HEARTBEAT-ACK (RFC    |
|                        |                  | 9260 sec. 8.3).  SCTP  |
|                        |                  | HEARTBEAT also carries |
|                        |                  | an opaque echoed blob, |
|                        |                  | structurally similar  |
|                        |                  | to FRCP RTTP.          |
+------------------------+------------------+------------------------+
| (FFGM, LFGM)          | SCTP            | RFC 9260 sec. 3.3.1    |
| fragment-role bits    |                  | DATA chunk B/E bits    |
| (Section 7.2)          |                  | encode the same four  |
|                        |                  | states (B+E=SOLE,      |
|                        |                  | B-only=FIRST, neither  |
|                        |                  | =MID, E-only=LAST).    |
|                        |                  | Each fragment carries  |
|                        |                  | its own seqno/TSN and  |
|                        |                  | is independently      |
|                        |                  | retransmitted.        |
+------------------------+------------------+------------------------+
| Stream byte-offset    | QUIC            | QUIC STREAM frame      |
| reassembly            |                  | (RFC 9000 sec. 19.8)  |
| (Sections 1.5, 16)    |                  | uses Offset + Length  |
|                        |                  | varints; FRCP uses    |
|                        |                  | fixed 32-bit start /  |
|                        |                  | end.  One stream per  |
|                        |                  | flow vs QUIC's many    |
|                        |                  | streams multiplexed.  |
+------------------------+------------------+------------------------+
| FIN end-of-stream      | TCP / QUIC      | TCP FIN flag (RFC 9293 |
| marker                |                  | sec. 3.1) closes one  |
| (Sections 1.2, 16)    |                  | half of the byte      |
|                        |                  | stream; QUIC STREAM    |
|                        |                  | frame FIN bit (RFC    |
|                        |                  | 9000 sec. 19.8) does  |
|                        |                  | the same per stream    |
|                        |                  | with an immutable      |
|                        |                  | final-size invariance  |
|                        |                  | (RFC 9000 sec. 4.5:    |
|                        |                  | the final size is      |
|                        |                  | fixed once observed).  |
|                        |                  | FRCP's FIN consumes    |
|                        |                  | one packet seqno (not  |
|                        |                  | one byte of stream    |
|                        |                  | space) and is          |
|                        |                  | idempotent on the      |
|                        |                  | sender side.          |
+------------------------+------------------+------------------------+
| Stream byte-credit    | QUIC            | MAX_STREAM_DATA (RFC  |
| flow control          |                  | 9000 sec. 4.1, sec.    |
| (Section 16)          |                  | 19.10).  FRCP projects |
|                        |                  | a per-flow byte budget |
|                        |                  | onto the seqno-space  |
|                        |                  | rwe.  Single stream    |
|                        |                  | per flow collapses    |
|                        |                  | QUIC's MAX_DATA /      |
|                        |                  | MAX_STREAM_            |
|                        |                  | DATA distinction.      |
+------------------------+------------------+------------------------+
| Header protection      | QUIC            | QUIC RFC 9001 sec. 5.4 |
| (encrypted seqnos)    |                  | applies header        |
|                        |                  | protection on top of  |
|                        |                  | AEAD to mask the      |
|                        |                  | packet number.  FRCP's |
|                        |                  | per-flow AEAD wrap    |
|                        |                  | (Section 16) is wider: |
|                        |                  | it encrypts the entire |
|                        |                  | PCI including seqno    |
|                        |                  | because the IPCP      |
|                        |                  | below already routes,  |
|                        |                  | so no destination      |
|                        |                  | connection-ID needs to |
|                        |                  | stay in clear (cf.    |
|                        |                  | RFC 9000 sec. 5.2).    |
+------------------------+------------------+------------------------+
| Two-bit fragment role  | SCTP            | The (FFGM, LFGM) pair  |
| polarity              |                  | follows SCTP B/E      |
|                        |                  | (begin = 1 / end = 1)  |
|                        |                  | rather than IPv4 MF    |
|                        |                  | (RFC 791 sec. 3.2),    |
|                        |                  | which has the inverse  |
|                        |                  | polarity (MF = 1 means |
|                        |                  | NOT last).            |
+------------------------+------------------+------------------------+
| Orthogonal reliability | SCTP            | PR-SCTP (RFC 3758,    |
| / ordering axes        |                  | per-message partial    |
| (Section 2.2)          |                  | reliability) and SCTP  |
|                        |                  | DATA U-bit (RFC 9260  |
|                        |                  | sec. 3.3.1, per-      |
|                        |                  | message unordered)    |
|                        |                  | are the closest        |
|                        |                  | precedents for        |
|                        |                  | decoupling reliability |
|                        |                  | from ordering; FRCP    |
|                        |                  | sets them per-flow    |
|                        |                  | rather than per-      |
|                        |                  | message.              |
+------------------------+------------------+------------------------+
| Orthogonal CRC        | UDP-Lite        | RFC 3828 (Larzon et    |
| (qs.ber == 0)          |                  | al., 2004) lets the    |
|                        |                  | sender pick a per-    |
|                        |                  | packet Checksum        |
|                        |                  | Coverage and the      |
|                        |                  | receiver enforce a    |
|                        |                  | locally configured    |
|                        |                  | minimum (no in-band    |
|                        |                  | negotiation; sec. 3.1, |
|                        |                  | sec. 3.3).  FRCP      |
|                        |                  | gates a full CRC      |
|                        |                  | trailer on qs.ber == 0 |
|                        |                  | at flow setup.        |
|                        |                  | Contrast TCP / SCTP    |
|                        |                  | (mandatory checksum)  |
|                        |                  | and QUIC (AEAD        |
|                        |                  | subsumes CRC).        |
+------------------------+------------------+------------------------+
| Setup-time service    | DCCP / SCTP /    | DCCP Service Codes    |
| negotiation            | QUIC            | (RFC 4340 sec. 8.1.2,  |
|                        |                  | RFC 5595); SCTP INIT  |
|                        |                  | parameters (RFC 9260  |
|                        |                  | sec. 3.3.2); QUIC      |
|                        |                  | transport parameters  |
|                        |                  | (RFC 9000 sec. 7.4).  |
|                        |                  | All negotiate service  |
|                        |                  | properties at          |
|                        |                  | connection setup; only |
|                        |                  | RINA's QoS cube        |
|                        |                  | exposes them as an    |
|                        |                  | orthogonal vector.    |
+------------------------+------------------+------------------------+
</pre>




=== 15.1. Original to FRCP (no clean prior art) ===
=== 15.1. Original to FRCP (no clean prior art) ===


<pre>
* Pre-DRF NACK ([[#9. Pre-DRF NACK|Section 9]]): receiver-driven nudge exploiting <code>snd_cr.inact &gt; rcv_cr.inact</code>.  Closest analogues are SCTP Gap Ack Blocks (RFC 9260 sec. 3.3.4) and DCCP Ack Vector (RFC 4340 sec. 11.4) - both let the receiver describe gaps to the sender, but neither targets the cross-epoch / pre-DRF case.
  - Pre-DRF NACK (Section 9): receiver-driven nudge exploiting
* <code>MAX_RDV</code> window-probe give-up: neither TCP (persist-timer probes until application or R2 abort, RFC 9293 sec. 3.8.6.1) nor QUIC has an explicit FC-give-up counter.  A recursive-network choice: outer layers can drop the flow.
    snd_cr.inact > rcv_cr.inact.  Closest analogues are SCTP Gap Ack
* Skip-past-gap reassembly ([[#7.2. Fragmentation and reassembly|Section 7.2]]): SCTP fragments and reassembles every flow regardless of reliability/ordering, using its own per-stream reassembly queue; QUIC fragments via STREAM offsets.  FRCP fragments best-effort flows too, but the receiver drops the broken prefix the moment a later run-start (<code>FIRST</code> or <code>SOLE</code> role) is visible inside the <code>RQ_SIZE</code>-wide reorder ring - no IP-frag-style timeout, no SCTP-style explicit abort.  If no later run-start arrives within the ring, <code>frag_run_inspect</code> returns <code>NOT_READY</code> and the partial run keeps its slots; the next inspect retries.  The trade-off: a permanently-lost <code>MID</code> in a long isolated run holds slots until either a later <code>FIRST</code>/<code>SOLE</code> appears in the ring or the writer stops, at which point the slots are reclaimed on flow teardown.
    Blocks (RFC 9260 sec. 3.3.4) and DCCP Ack Vector (RFC 4340
* Reassembly deferred to consume time ([[#7.2. Fragmentation and reassembly|Section 7.2]]), message mode only (<code>qos.service == SVC_MESSAGE</code>): SCTP (RFC 9260 sec. 6.9), QUIC (RFC 9000 sec. 2.2), and TCP (RFC 9293) all hold reassembly state at the receive boundary.  FRCP message-mode leaves fragments in the shared-memory ring until <code>flow_read</code> pulls and lands the SDU directly in the caller's buffer.  Stream mode ([[#16. Stream-mode flows|Section 16]]) uses the standard QUIC-style direct ring placement on receive and does not defer. The optimisation is enabled by the Shared-Memory Subsystem (SSM) packet-buffer ring (see <code>struct ssm_pk_buff</code> at [[#1.1. PCI header|Section 1.1]]); the analogue is OS-level scatter-gather I/O (<code>recvmsg+iovec</code>), not a transport-layer prior art.
    sec. 11.4) - both let the receiver describe gaps to the sender,
* TLP-equivalent tail-loss recovery (RFC 8985 sec. 7; RFC 9002 sec. 6.2): FRCP does not emit an explicit Tail Loss Probe packet, but the same goal is met implicitly by RACK loss detection ([[#8. Retransmission|Section 8]]) firing on a non-advancing cumulative ACK once the head-of-line slot ages past the RACK reorder window <code>R = MIN(reo_wnd_mult * min_RTT / 4, SRTT)</code> - well below <code>RTO = max(2 * SRTT, SRTT + (mdev &lt;&lt; MDEV_MUL))</code>. A receiver-driven nudge is also available via the pre-DRF NACK ([[#9. Pre-DRF NACK|Section 9]]).
    but neither targets the cross-epoch / pre-DRF case.
  - MAX_RDV window-probe give-up: neither TCP (persist-timer
    probes until application or R2 abort, RFC 9293 sec. 3.8.6.1)
    nor QUIC has an explicit FC-give-up counter.  A recursive-
    network choice: outer layers can drop the flow.
  - Skip-past-gap reassembly (Section 7.2): SCTP fragments and
    reassembles every flow regardless of reliability/ordering,
    using its own per-stream reassembly queue; QUIC fragments via
    STREAM offsets.  FRCP fragments best-effort flows too, but
    the receiver drops the broken prefix the moment a later run-
    start (FIRST or SOLE role) is visible inside the RQ_SIZE-wide
    reorder ring - no IP-frag-style timeout, no SCTP-style
    explicit abort.  If no later run-start arrives within the
    ring, frag_run_inspect returns NOT_READY and the partial run
    keeps its slots; the next inspect retries.  The trade-off: a
    permanently-lost MID in a long isolated run holds slots until
    either a later FIRST/SOLE appears in the ring or the writer
    stops, at which point the slots are reclaimed on flow
    teardown.
  - Reassembly deferred to consume time (Section 7.2), message
    mode only (qos.service == SVC_MESSAGE): SCTP (RFC 9260
    sec. 6.9), QUIC (RFC 9000 sec. 2.2), and TCP (RFC 9293) all
    hold reassembly state at the receive boundary.  FRCP message-
    mode leaves fragments in the shared-memory ring until
    flow_read pulls and lands the SDU directly in the caller's
    buffer.  Stream mode (Section 16) uses the standard QUIC-
    style direct ring placement on receive and does not defer.
    The optimisation is enabled by the Shared-Memory Subsystem
    (SSM) packet-buffer ring (see struct ssm_pk_buff at
    Section 1.1); the analogue is OS-level scatter-gather I/O
    (recvmsg+iovec), not a transport-layer prior art.
  - TLP-equivalent tail-loss recovery (RFC 8985 sec. 7;
    RFC 9002 sec. 6.2): FRCP does not emit an explicit Tail Loss
    Probe packet, but the same goal is met implicitly by RACK
    loss detection (Section 8) firing on a non-advancing
    cumulative ACK once the head-of-line slot ages past the RACK
    reorder window R = MIN(reo_wnd_mult * min_RTT / 4, SRTT) -
    well below RTO = max(2 * SRTT, SRTT + (mdev << MDEV_MUL)).
    A receiver-driven nudge is also available via the pre-DRF
    NACK (Section 9).
</pre>




=== 15.2. Not adopted ===
=== 15.2. Not adopted ===


<pre>
* Slow start, congestion window (cwnd), Additive Increase / Multiplicative Decrease (AIMD), NewReno cwnd inflation. Congestion control lives in the IPCP CA policies and is driven by Explicit Congestion Notification (ECN, RFC 3168).
  - Slow start, congestion window (cwnd), Additive Increase /
* Nagle / silly-window-syndrome (SWS) avoidance (RFC 896, RFC 1122 sec. 4.2.3.4).  (Deferred work, not adopted in the current spec.)
    Multiplicative Decrease (AIMD), NewReno cwnd inflation.
* TCP Timestamps (RFC 7323) / Protection Against Wrapped Sequences (PAWS) - RTT measurement uses RTTP, not per-segment timestamps.  A peer-supplied timestamp echoed on every ACK lets a malicious peer drive the <code>srtt</code> estimate arbitrarily low, collapsing the RTO and triggering a self-inflicted retransmit storm.  RTTP confines RTT measurement to nonce-authenticated probe round-trips, where a forged echo is rejected before it can reach the estimator.
    Congestion control lives in the IPCP CA policies and is
* ECN (Explicit Congestion Notification) response inside FRCP (consumed by IPCP Congestion Avoidance / CA).
    driven by Explicit Congestion Notification (ECN, RFC 3168).
* IP-style fragment-offset reassembly (RFC 791 sec. 3.2; RFC 8200 sec. 4.5).  Message-mode FRCP relies on the FRCT <code>rq[]</code> reorder ring keyed by <code>seqno</code> (shared by FRTX and best-effort flows) to put fragments back in order; no separate offset field is needed and no IP-style hole-list reassembly buffer is kept. Stream-mode FRCP does carry <code>[start, end)</code> byte offsets ([[#1.5. Stream PCI extension|Section 1.5]]) for direct ring placement on receive.
  - Nagle / silly-window-syndrome (SWS) avoidance (RFC 896, RFC
* QUIC STREAM offset+length framing on ''every'' flow (RFC 9000 sec. 19.8).  Message-mode FRCP uses the SCTP-style B/E flag-bit encoding (<code>FFGM</code>/<code>LFGM</code>) and skips the offsets; stream-mode FRCP adopts the QUIC offset model (heritage table above).
    1122 sec. 4.2.3.4).  (Deferred work, not adopted in the
    current spec.)
  - TCP Timestamps (RFC 7323) / Protection Against Wrapped
    Sequences (PAWS) - RTT measurement uses RTTP,
    not per-segment timestamps.  A peer-supplied timestamp echoed
    on every ACK lets a malicious peer drive the srtt estimate
    arbitrarily low, collapsing the RTO and triggering a self-
    inflicted retransmit storm.  RTTP confines RTT measurement to
    nonce-authenticated probe round-trips, where a forged echo is
    rejected before it can reach the estimator.
  - ECN (Explicit Congestion Notification) response inside FRCP
    (consumed by IPCP Congestion Avoidance / CA).
  - IP-style fragment-offset reassembly (RFC 791 sec. 3.2; RFC 8200
    sec. 4.5).  Message-mode FRCP relies on the FRCT rq[] reorder
    ring keyed by seqno (shared by FRTX and best-effort flows) to
    put fragments back in order; no separate offset field is
    needed and no IP-style hole-list reassembly buffer is kept.
    Stream-mode FRCP does carry [start, end) byte offsets
    (Section 1.5) for direct ring placement on receive.
  - QUIC STREAM offset+length framing on *every* flow (RFC 9000
    sec. 19.8).  Message-mode FRCP uses the SCTP-style B/E flag-
    bit encoding (FFGM/LFGM) and skips the offsets; stream-mode
    FRCP adopts the QUIC offset model (heritage table above).
</pre>
 


== 16. Stream-mode flows ==
== 16. Stream-mode flows ==


When a flow is allocated with qos.service == SVC_STREAM both peers
When a flow is allocated with <code>qos.service == SVC_STREAM</code> both peers
switch to byte-stream semantics, layered on top of the FRTX reorder
switch to byte-stream semantics, layered on top of the FRTX reorder
machinery already described in Sections 6-8.
machinery already described in Sections [[#6. Receive path|6]]-[[#8. Retransmission|8]].


=== 16.1. Send ===
=== 16.1. Send ===


The sender splits the caller's octets into chunks of at most
The sender splits the caller's octets into chunks of at most
(frag_mtu - base PCI - stream PCI extension) octets (Sections 1.1
<code>(frag_mtu - base PCI - stream PCI extension)</code> octets (Sections [[#1.1. PCI header|1.1]]
and 1.5).  Each chunk is one DATA packet with its own seqno and a
and [[#1.5. Stream PCI extension|1.5]]).  Each chunk is one DATA packet with its own <code>seqno</code> and a
[start, end) byte range copied from a monotonic stream counter.
<code>[start, end)</code> byte range copied from a monotonic stream counter.
In stream mode FFGM and LFGM are unused and MUST be transmitted as
In stream mode <code>FFGM</code> and <code>LFGM</code> are unused and MUST be transmitted as
zero; the per-byte position is carried by the [start, end)
zero; the per-byte position is carried by the <code>[start, end)</code>
extension instead.
extension instead.


End-of-stream is signalled with a 0-byte DATA packet that has FIN
End-of-stream is signalled with a 0-byte DATA packet that has <code>FIN</code>
(bit 12) set, emitted on the FIN triggers listed in Section 1.2
(bit 12) set, emitted on the FIN triggers listed in [[#1.2. Flag bits|Section 1.2]]
(WR-half close, flow_dealloc, and any other path that yields the
(WR-half close, <code>flow_dealloc</code>, and any other path that yields the
final byte).  The sender MUST emit at most one FIN per flow; its
final byte).  The sender MUST emit at most one FIN per flow; its
[start, end) MUST equal [final-byte, final-byte) (i.e., empty
<code>[start, end)</code> MUST equal <code>[final-byte, final-byte)</code> (i.e., empty
interval at the final byte position; final-size invariance,
interval at the final byte position; final-size invariance,
analogous to QUIC RFC 9000 sec. 4.5).  Idempotency is enforced by
analogous to QUIC RFC 9000 sec. 4.5).  Idempotency is enforced by
an snd_fin_sent guard.
an <code>snd_fin_sent</code> guard.


=== 16.2. Receive ===
=== 16.2. Receive ===


On arrival the receiver places the payload directly into a per-flow
On arrival the receiver places the payload directly into a per-flow
byte-indexed receive ring of width ring_sz (octets) at the position
byte-indexed receive ring of width <code>ring_sz</code> (octets) at the position
indicated by start, with a two-segment memcpy across the ring
indicated by <code>start</code>, with a two-segment <code>memcpy</code> across the ring
boundary if needed.  Receipt is recorded in the FRTX reorder
boundary if needed.  Receipt is recorded in the FRTX reorder
machinery (Section 6.2) augmented with the packet's start, end, and
machinery ([[#6.2. Locked main path|Section 6.2]]) augmented with the packet's <code>start</code>, <code>end</code>, and
FIN bit per slot.  When a packet's [start, end) front-overlaps
FIN bit per slot.  When a packet's <code>[start, end)</code> front-overlaps
bytes already at or below the byte high-water mark, the overlap is
bytes already at or below the byte high-water mark, the overlap is
trimmed before placement so the same byte is never written twice.
trimmed before placement so the same byte is never written twice.
After stashing, the receiver advances lwe and the byte high-water
After stashing, the receiver advances <code>lwe</code> and the byte high-water
mark across any newly-contiguous prefix.  Each slot advanced MUST
mark across any newly-contiguous prefix.  Each slot advanced MUST
satisfy `start == the last-delivered slot's end`; a slot whose
satisfy <code>start == the last-delivered slot's end</code>; a slot whose
start does not equal that end is silently dropped at delivery time
<code>start</code> does not equal that <code>end</code> is silently dropped at delivery time
(the seqno is consumed, no stream bytes contributed) and the high-
(the <code>seqno</code> is consumed, no stream bytes contributed) and the high-
water mark does not advance past it.  The stream byte-stream
water mark does not advance past it.  The stream byte-stream
stalls at that point - there is no flow-tear-down on mismatch.
stalls at that point - there is no flow-tear-down on mismatch.
Line 1,669: Line 1,100:


A FIN slot marks end-of-stream at advance time only if its byte
A FIN slot marks end-of-stream at advance time only if its byte
position equals the last-delivered slot's end; otherwise the FIN
position equals the last-delivered slot's <code>end</code>; otherwise the FIN
is ignored and the corresponding seqno occupies a slot but
is ignored and the corresponding <code>seqno</code> occupies a slot but
contributes no stream bytes.  No packet buffer is held after the
contributes no stream bytes.  No packet buffer is held after the
ring copy.
ring copy.
Line 1,676: Line 1,107:
=== 16.3. Read ===
=== 16.3. Read ===


flow_read returns up to count octets from the contiguous prefix
<code>flow_read</code> returns up to <code>count</code> octets from the contiguous prefix
[next, high-water), where next is the byte the application has
<code>[next, high-water)</code>, where <code>next</code> is the byte the application has
already consumed up to and high-water is the rightmost contiguous
already consumed up to and <code>high-water</code> is the rightmost contiguous
byte received.  When the stream is fully drained AND end-of-stream
byte received.  When the stream is fully drained AND end-of-stream
(EOS) was observed (next == EOS byte position), flow_read returns
(EOS) was observed (<code>next == EOS</code> byte position), <code>flow_read</code> returns
0 (EOF) - the same shape POSIX read(2) uses on TCP after a peer
0 (<code>EOF</code>) - the same shape POSIX <code>read(2)</code> uses on TCP after a peer
FIN.
FIN.


Line 1,687: Line 1,118:


ACK / SACK / RACK / RTO machinery is unchanged; the FRTX reorder
ACK / SACK / RACK / RTO machinery is unchanged; the FRTX reorder
ring is reused as a per-seqno received-bitmap.  Let per_pkt =
ring is reused as a per-<code>seqno</code> received-bitmap.  Let <code>per_pkt =
(frag_mtu - base PCI - stream PCI extension), the maximum stream-
(frag_mtu - base PCI - stream PCI extension)</code>, the maximum stream-
byte payload one DATA packet can carry (Section 16.1).  The
byte payload one DATA packet can carry ([[#16.1. Send|Section 16.1]]).  The
receive window advertised in FC is clamped so the byte window
receive window advertised in FC is clamped so the byte window
(ring_sz) cannot be overrun: the seqno-space rwe is at most
(<code>ring_sz</code>) cannot be overrun: the <code>seqno</code>-space <code>rwe</code> is at most
`rcv_cr.lwe + ring_sz / per_pkt`.
<code>rcv_cr.lwe + ring_sz / per_pkt</code>.


This is the QUIC byte-credit flow-control model
This is the QUIC byte-credit flow-control model
(MAX_STREAM_DATA, RFC 9000 sec. 4.1 and sec. 19.10) projected onto
(<code>MAX_STREAM_DATA</code>, RFC 9000 sec. 4.1 and sec. 19.10) projected onto
seqno space.  With one stream per flow there is no MAX_DATA /
<code>seqno</code> space.  With one stream per flow there is no <code>MAX_DATA</code> /
MAX_STREAM_DATA distinction.  Receiver-side silly-window-syndrome
<code>MAX_STREAM_DATA</code> distinction.  Receiver-side silly-window-syndrome
(SWS) avoidance (RFC 9293 sec. 3.8.6.2.2) is achieved by combining
(SWS) avoidance (RFC 9293 sec. 3.8.6.2.2) is achieved by combining
the consume-time rwe bump with the global non-shrink rule from
the consume-time <code>rwe</code> bump with the global non-shrink rule from
Section 11.
[[#11. Flow control|Section 11]].


=== 16.5. Security considerations ===
=== 16.5. Security considerations ===
Line 1,707: Line 1,138:
predict (off-path blind) the flow's seqnos and byte offsets on an
predict (off-path blind) the flow's seqnos and byte offsets on an
unencrypted stream flow can inject DATA or FIN at any in-window
unencrypted stream flow can inject DATA or FIN at any in-window
position.  The in-line consistency checks above (start == prior
position.  The in-line consistency checks above (<code>start</code> == prior
end on advance; FIN MUST be 0-byte; FIN MUST sit at the final
<code>end</code> on advance; FIN MUST be 0-byte; FIN MUST sit at the final
byte position) realise the spirit of RFC 5961's "sequence-window
byte position) realise the spirit of RFC 5961's "sequence-window
plus exact-position match for control bits" without an explicit
plus exact-position match for control bits" without an explicit
Line 1,727: Line 1,158:
recommended AEAD ciphers (AES-GCM, RFC 5288; or ChaCha20-Poly1305,
recommended AEAD ciphers (AES-GCM, RFC 5288; or ChaCha20-Poly1305,
RFC 8439) wrap the entire FRCP packet on the wire - PCI, stream
RFC 8439) wrap the entire FRCP packet on the wire - PCI, stream
extension, body, and the CRC trailer when ber == 0 - under a
extension, body, and the CRC trailer when <code>ber == 0</code> - under a
per-flow symmetric key derived from the flow's own key exchange
per-flow symmetric key derived from the flow's own key exchange
(Section 1.1).  The AEAD tag (~2^-128 forgery probability)
([[#1.1. PCI header|Section 1.1]]).  The AEAD tag (~2^-128 forgery probability)
dominates the CRC (~2^-32) for integrity in this mode but the CRC
dominates the CRC (~2^-32) for integrity in this mode but the CRC
trailer is currently retained inside the wrap (see Section 1.1).
trailer is currently retained inside the wrap (see [[#1.1. PCI header|Section 1.1]]).
Implementations MUST NOT rely on the security properties below
Implementations MUST NOT rely on the security properties below
when a non-AEAD cipher (e.g. AES-CTR alone) is negotiated; non-
when a non-AEAD cipher (e.g. AES-CTR alone) is negotiated; non-
Line 1,741: Line 1,172:
on-path-passive attacker this is:
on-path-passive attacker this is:


<pre>
* Stronger than TCP+TLS (TCP header in the clear).
  - Stronger than TCP+TLS (TCP header in the clear).
* Stronger than TCP+TCP-AO (header authenticated but visible).
  - Stronger than TCP+TCP-AO (header authenticated but visible).
* Comparable to IPsec ESP transport mode (RFC 4303), which similarly authenticates and encrypts the upper-layer header plus payload, and to QUIC packet protection (RFC 9001 sec. 5), with the difference that QUIC must leave the destination connection ID in the clear for routing whereas FRCP relies on the IPCP below for delivery and can therefore encrypt its entire PCI.
  - Comparable to IPsec ESP transport mode (RFC 4303), which
    similarly authenticates and encrypts the upper-layer header
    plus payload, and to QUIC packet protection (RFC 9001 sec. 5),
    with the difference that QUIC must leave the destination
    connection ID in the clear for routing whereas FRCP relies on
    the IPCP below for delivery and can therefore encrypt its
    entire PCI.
</pre>
 
Keying granularity.  FRCP runs key exchange (kex) per flow, so
each flow_alloc yields independent symmetric keys.  This is
finer-grained than QUIC (per-connection, RFC 9001, where one
handshake covers all multiplexed streams) and finer-grained than
typical IPsec deployment (per-host-pair Security Associations,
SAs).  Forward secrecy follows from the kex when an ephemeral
Diffie-Hellman exchange (DHE), or a hybrid mode (classical DH +
post-quantum Key Encapsulation Mechanism / KEM), is selected.


Replay protection. The AEAD layer itself does NOT carry an
Keying granularity. Ouroboros flow allocation runs key exchange (<code>kex</code>) per flow, so each <code>flow_alloc</code> yields independent symmetric keys. This is finer-grained than QUIC (per-connection, RFC 9001, where one handshake covers all multiplexed streams) and finer-grained than typical IPsec deployment (per-host-pair Security Associations, SAs). Forward secrecy follows from the <code>kex</code> when an ephemeral Diffie-Hellman exchange (DHE), or a hybrid mode (classical DH + post-quantum Key Encapsulation Mechanism / KEM), is selected.
explicit anti-replay window (unlike IPsec ESP, RFC 4303 sec.
3.4.3, or DTLS, RFC 9147 sec. 4.5.1).  For FRCP-engaged flows the
seqno-space duplicate-suppression in Section 6.2 rejects replayed
DATA after the AEAD strips the wrap, because the AEAD authenticates
the seqno and a replay re-presents an old seqno that is then
discarded either as a duplicate (still inside the receive window)
or as outside the receive window, depending on how far lwe has
advanced since the original packet was delivered.  RAW
(qos.service == SVC_RAW) flows have no FRCP layer and therefore
no replay protection at the AEAD layer either; deployments that
need replay rejection on RAW flows MUST provide it at a higher
layer.


Layering. The AEAD wrap sits below FRCP on the data path, so
Replay protection. The AEAD layer itself does NOT carry an explicit anti-replay window (unlike IPsec ESP, RFC 4303 sec. 3.4.3, or DTLS, RFC 9147 sec. 4.5.1).  For FRCP-engaged flows the <code>seqno</code>-space duplicate-suppression in [[#6.2. Locked main path|Section 6.2]] rejects replayed
RAW best-effort flows (qos.service == SVC_RAW, the UDP-equivalent
DATA after the AEAD strips the wrap, because the AEAD authenticates the <code>seqno</code> and a replay re-presents an old <code>seqno</code> that is then discarded either as a duplicate (still inside the receive window or as outside the receive window, depending on how far <code>lwe</code> has advanced since the original packet was delivered. RAW (<code>qos.service == SVC_RAW</code>) flows have no FRCP layer and therefore no replay protection at the AEAD layer either; deployments that need replay rejection on RAW flows SHOULD use SVC_MESSAGE.
service of Section 2.2) inherit the same per-flow integrity +
confidentiality scope as FRCP-engaged flows - whatever the IPCP
and FRCP (if any) put on the wire is what the AEAD authenticates.
No DTLS-equivalent layering is required for confidentiality and
integrity; replay protection above AEAD is a separate concern as
noted above.


Layering. The AEAD wrap sits below FRCP on the data path, so RAW best-effort flows (<code>qos.service == SVC_RAW</code>, the UDP-equivalent service of [[#2.2. Service modes (orthogonal axes)|Section 2.2]]) inherit the same per-flow integrity + confidentiality scope as FRCP-engaged flows - whatever the process and FRCP (if any) put on the wire is what the AEAD authenticates. No DTLS-equivalent layering is required for confidentiality and integrity; replay protection above AEAD is a separate concern as noted above.


== 17. References ==
== 17. References ==
Line 1,798: Line 1,195:
=== 17.1. IETF documents ===
=== 17.1. IETF documents ===


<pre>
;[RFC 791]
  [RFC 791]   J. Postel, "Internet Protocol", STD 5, RFC 791,
:J. Postel, "Internet Protocol", STD 5, RFC 791, September 1981.
              September 1981.
;[RFC 793]
</pre>
:J. Postel, "Transmission Control Protocol", STD 7, RFC 793, September 1981.  Obsoleted by RFC 9293.
 
;[RFC 813]
<pre>
:D. D. Clark, "Window and Acknowledgement Strategy in TCP", RFC 813, July 1982.
  [RFC 793]   J. Postel, "Transmission Control Protocol", STD 7,
;[RFC 896]
              RFC 793, September 1981.  Obsoleted by RFC 9293.
:J. Nagle, "Congestion Control in IP/TCP Internetworks", RFC 896, January 1984.
</pre>
;[RFC 1122]
 
:R. Braden (ed.), "Requirements for Internet Hosts -- Communication Layers", STD 3, RFC 1122, October 1989.
<pre>
;[RFC 2018]
  [RFC 813]   D. D. Clark, "Window and Acknowledgement Strategy
:M. Mathis, J. Mahdavi, S. Floyd, A. Romanow, "TCP Selective Acknowledgment Options", RFC 2018, October 1996.
              in TCP", RFC 813, July 1982.
;[RFC 2119]
</pre>
:S. Bradner, "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.
 
;[RFC 2883]
<pre>
:S. Floyd, J. Mahdavi, M. Mathis, M. Podolsky, "An Extension to the Selective Acknowledgement (SACK) Option for TCP", RFC 2883, July 2000.
  [RFC 896]   J. Nagle, "Congestion Control in IP/TCP
;[RFC 3758]
              Internetworks", RFC 896, January 1984.
:R. Stewart, M. Ramalho, Q. Xie, M. Tuexen, P. Conrad, "Stream Control Transmission Protocol (SCTP) Partial Reliability Extension", RFC 3758, May 2004.
</pre>
;[RFC 3828]
 
:L-A. Larzon, M. Degermark, S. Pink, L-E. Jonsson (ed.), G. Fairhurst (ed.), "The Lightweight User Datagram Protocol (UDP-Lite)", RFC 3828, July 2004.
<pre>
;[RFC 4303]
  [RFC 1122] R. Braden (ed.), "Requirements for Internet Hosts
:S. Kent, "IP Encapsulating Security Payload (ESP)", RFC 4303, December 2005.
              -- Communication Layers", STD 3, RFC 1122,
;[RFC 4340]
              October 1989.
:E. Kohler, M. Handley, S. Floyd, "Datagram Congestion Control Protocol (DCCP)", RFC 4340, March 2006.
</pre>
;[RFC 5288]
 
:J. Salowey, A. Choudhury, D. McGrew, "AES Galois Counter Mode (GCM) Cipher Suites for TLS", RFC 5288, August 2008.
<pre>
;[RFC 5595]
  [RFC 2018] M. Mathis, J. Mahdavi, S. Floyd, A. Romanow,
:G. Fairhurst, "The Datagram Congestion Control Protocol (DCCP) Service Codes", RFC 5595, September 2009.
              "TCP Selective Acknowledgment Options", RFC 2018,
;[RFC 5681]
              October 1996.
:M. Allman, V. Paxson, E. Blanton, "TCP Congestion Control", RFC 5681, September 2009.
</pre>
;[RFC 5925]
 
:J. Touch, A. Mankin, R. Bonica, "The TCP Authentication Option", RFC 5925, June 2010.
<pre>
;[RFC 5961]
  [RFC 2119] S. Bradner, "Key words for use in RFCs to Indicate
:A. Ramaiah, R. Stewart, M. Dalal, "Improving TCP's Robustness to Blind In-Window Attacks", RFC 5961, August 2010.
              Requirement Levels", BCP 14, RFC 2119, March 1997.
;[RFC 6298]
</pre>
:V. Paxson, M. Allman, J. Chu, M. Sargent, "Computing TCP's Retransmission Timer", RFC 6298, June 2011.
 
;[RFC 6528]
<pre>
:F. Gont, S. Bellovin, "Defending against Sequence Number Attacks", RFC 6528, February 2012. Obsoletes RFC 1948.
  [RFC 2883] S. Floyd, J. Mahdavi, M. Mathis, M. Podolsky,
;[RFC 6582]
              "An Extension to the Selective Acknowledgement
:T. Henderson, S. Floyd, A. Gurtov, Y. Nishida, "The NewReno Modification to TCP's Fast Recovery Algorithm", RFC 6582, April 2012.
              (SACK) Option for TCP", RFC 2883, July 2000.
;[RFC 7323]
</pre>
:D. Borman, B. Braden, V. Jacobson, R. Scheffenegger (ed.), "TCP Extensions for High Performance", RFC 7323, September 2014.
 
;[RFC 7675]
<pre>
:M. Perumal, D. Wing, R. Ravindranath, T. Reddy, M. Thomson, "Session Traversal Utilities for NAT (STUN) Usage for Consent Freshness", RFC 7675, October 2015.
  [RFC 3758] R. Stewart, M. Ramalho, Q. Xie, M. Tuexen,
;[RFC 8174]
              P. Conrad, "Stream Control Transmission Protocol
:B. Leiba, "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, May 2017.
              (SCTP) Partial Reliability Extension", RFC 3758,
;[RFC 8200]
              May 2004.
:S. Deering, R. Hinden, "Internet Protocol, Version 6 (IPv6) Specification", STD 86, RFC 8200, July 2017.
</pre>
;[RFC 8439]
 
:Y. Nir, A. Langley, "ChaCha20 and Poly1305 for IETF Protocols", RFC 8439, June 2018.
<pre>
;[RFC 8446]
  [RFC 3828] L-A. Larzon, M. Degermark, S. Pink, L-E. Jonsson
:E. Rescorla, "The Transport Layer Security (TLS) Protocol Version 1.3", RFC 8446, August 2018.
              (ed.), G. Fairhurst (ed.), "The Lightweight User
;[RFC 8985]
              Datagram Protocol (UDP-Lite)", RFC 3828,
:Y. Cheng, N. Cardwell, N. Dukkipati, P. Jha, "The RACK-TLP Loss Detection Algorithm for TCP", RFC 8985, February 2021.
              July 2004.
;[RFC 9000]
</pre>
:J. Iyengar (ed.), M. Thomson (ed.), "QUIC: A UDP-Based Multiplexed and Secure Transport", RFC 9000, May 2021.
 
;[RFC 9001]
<pre>
:M. Thomson (ed.), S. Turner (ed.), "Using TLS to Secure QUIC", RFC 9001, May 2021.
  [RFC 4303] S. Kent, "IP Encapsulating Security Payload
;[RFC 9002]
              (ESP)", RFC 4303, December 2005.
:J. Iyengar (ed.), I. Swett (ed.), "QUIC Loss Detection and Congestion Control", RFC 9002, May 2021.
</pre>
;[RFC 9147]
 
:E. Rescorla, H. Tschofenig, N. Modadugu, "The Datagram Transport Layer Security (DTLS) Protocol Version 1.3", RFC 9147, April 2022.
<pre>
;[RFC 9260]
  [RFC 4340] E. Kohler, M. Handley, S. Floyd, "Datagram
:R. Stewart, M. Tuexen, K. Nielsen, "Stream Control Transmission Protocol", RFC 9260, June 2022. Obsoletes RFC 4960.
              Congestion Control Protocol (DCCP)", RFC 4340,
;[RFC 9293]
              March 2006.
:W. Eddy (ed.), "Transmission Control Protocol (TCP)", STD 7, RFC 9293, August 2022.  Obsoletes RFC 793 and several follow-ons; updates RFC 1122 and others.
</pre>
 
<pre>
  [RFC 5288] J. Salowey, A. Choudhury, D. McGrew, "AES Galois
              Counter Mode (GCM) Cipher Suites for TLS",
              RFC 5288, August 2008.
</pre>
 
<pre>
  [RFC 5595] G. Fairhurst, "The Datagram Congestion Control
              Protocol (DCCP) Service Codes", RFC 5595,
              September 2009.
</pre>
 
<pre>
  [RFC 5681] M. Allman, V. Paxson, E. Blanton, "TCP Congestion
              Control", RFC 5681, September 2009.
</pre>
 
<pre>
  [RFC 5925] J. Touch, A. Mankin, R. Bonica, "The TCP
              Authentication Option", RFC 5925, June 2010.
</pre>
 
<pre>
  [RFC 5961] A. Ramaiah, R. Stewart, M. Dalal, "Improving
              TCP's Robustness to Blind In-Window Attacks",
              RFC 5961, August 2010.
</pre>
 
<pre>
  [RFC 6298] V. Paxson, M. Allman, J. Chu, M. Sargent,
              "Computing TCP's Retransmission Timer", RFC 6298,
              June 2011.
</pre>
 
<pre>
  [RFC 6528] F. Gont, S. Bellovin, "Defending against Sequence
              Number Attacks", RFC 6528, February 2012.
              Obsoletes RFC 1948.
</pre>
 
<pre>
  [RFC 6582] T. Henderson, S. Floyd, A. Gurtov, Y. Nishida,
              "The NewReno Modification to TCP's Fast Recovery
              Algorithm", RFC 6582, April 2012.
</pre>
 
<pre>
  [RFC 7323] D. Borman, B. Braden, V. Jacobson,
              R. Scheffenegger (ed.), "TCP Extensions for High
              Performance", RFC 7323, September 2014.
</pre>
 
<pre>
  [RFC 7675] M. Perumal, D. Wing, R. Ravindranath, T. Reddy,
              M. Thomson, "Session Traversal Utilities for NAT
              (STUN) Usage for Consent Freshness", RFC 7675,
              October 2015.
</pre>
 
<pre>
  [RFC 8174] B. Leiba, "Ambiguity of Uppercase vs Lowercase in
              RFC 2119 Key Words", BCP 14, RFC 8174, May 2017.
</pre>
 
<pre>
  [RFC 8200] S. Deering, R. Hinden, "Internet Protocol,
              Version 6 (IPv6) Specification", STD 86, RFC 8200,
              July 2017.
</pre>
 
<pre>
  [RFC 8439] Y. Nir, A. Langley, "ChaCha20 and Poly1305 for IETF
              Protocols", RFC 8439, June 2018.
</pre>
 
<pre>
  [RFC 8446] E. Rescorla, "The Transport Layer Security (TLS)
              Protocol Version 1.3", RFC 8446, August 2018.
</pre>
 
<pre>
  [RFC 8985] Y. Cheng, N. Cardwell, N. Dukkipati, P. Jha,
              "The RACK-TLP Loss Detection Algorithm for TCP",
              RFC 8985, February 2021.
</pre>
 
<pre>
  [RFC 9000] J. Iyengar (ed.), M. Thomson (ed.), "QUIC: A
              UDP-Based Multiplexed and Secure Transport",
              RFC 9000, May 2021.
</pre>
 
<pre>
  [RFC 9001] M. Thomson (ed.), S. Turner (ed.), "Using TLS to
              Secure QUIC", RFC 9001, May 2021.
</pre>
 
<pre>
  [RFC 9002] J. Iyengar (ed.), I. Swett (ed.), "QUIC Loss
              Detection and Congestion Control", RFC 9002,
              May 2021.
</pre>
 
<pre>
  [RFC 9147] E. Rescorla, H. Tschofenig, N. Modadugu,
              "The Datagram Transport Layer Security (DTLS)
              Protocol Version 1.3", RFC 9147, April 2022.
</pre>
 
<pre>
  [RFC 9260] R. Stewart, M. Tuexen, K. Nielsen, "Stream Control
              Transmission Protocol", RFC 9260, June 2022.
              Obsoletes RFC 4960.
</pre>
 
<pre>
  [RFC 9293] W. Eddy (ed.), "Transmission Control Protocol
              (TCP)", STD 7, RFC 9293, August 2022.  Obsoletes
              RFC 793 and several follow-ons; updates RFC 1122
              and others.
</pre>




=== 17.2. Books and journal papers ===
=== 17.2. Books and journal papers ===


<pre>
;[Day08]
  - J. Day, "Patterns in Network Architecture: A Return to
:J. Day, "Patterns in Network Architecture: A Return to Fundamentals", Prentice Hall, 2008.
    Fundamentals", Prentice Hall, 2008.
;[Grasa15]
</pre>
:E. Grasa et al., "IRATI: investigating RINA as an alternative to TCP/IP", Computer Networks, Vol. 92, December 2015.
;[KP87]
:P. Karn, C. Partridge, "Improving Round-Trip Time Estimates in Reliable Transport Protocols", ACM SIGCOMM, August 1987.
;[Wat81]
:R. W. Watson, "Timer-Based Mechanisms in Reliable Transport Protocol Connection Management", Computer Networks, Vol. 5, 1981.


<pre>
  - E. Grasa et al., "IRATI: investigating RINA as an alternative
    to TCP/IP", Computer Networks, Vol. 92, December 2015.
</pre>


<pre>
=== 17.3. Source-code references ===
  - P. Karn, C. Partridge, "Improving Round-Trip Time Estimates
    in Reliable Transport Protocols", ACM SIGCOMM, August 1987.
</pre>


<pre>
;[Linux-RTT]
  - R. W. Watson, "Timer-Based Mechanisms in Reliable Transport
:<code>tcp_rtt_estimator()</code> in <code>net/ipv4/tcp_input.c</code> of the Linux kernel, defining the asymmetric <code>mdev</code> variance update used as FRCP's default RTT estimator ([[#12. RTT estimation|Section 12]]). Line-stable browseable copy at https://elixir.bootlin.com/linux/latest/source/net/ipv4/tcp_input.c.
    Protocol Connection Management", Computer Networks, Vol. 5,
    1981.
</pre>
 
 
=== 17.3. Source-code references ===


<pre>
[[Category:Protocols]]
  - tcp_rtt_estimator() in net/ipv4/tcp_input.c of the Linux
[[Category:Ouroboros internals]]
    kernel, defining the asymmetric mdev variance update used as
    FRCP's default RTT estimator (Section 12).  Line-stable
    browseable copy at
    https://elixir.bootlin.com/linux/latest/source/net/ipv4/tcp_input.c.
</pre>

Latest revision as of 16:11, 18 May 2026


FRCP runs end-to-end between two peers over a flow. It delivers reliability, in-order delivery, flow control, and liveness. Congestion Control (CC) is not in FRCP - that lives in the IPC Process (IPCP) Congestion Avoidance (CA) policies, orthogonal to FRCP. Flow allocation, naming, and IPCP lifecycle are handled by the IPC Resource Manager daemon (IRMd).

FRCT (Flow and Retransmission Control Task) is the libouroboros implementation of FRCP; the task lives in src/lib/frct.c. The remainder of this document describes the FRCP wire protocol and the behaviour FRCT realises. Code symbols retain the FRCT_ prefix (FRCT_DATA, FRCT_RXM, ...) because they belong to the implementing task; this document references them verbatim.

The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 (Best Current Practice; RFC 2119, RFC 8174) when, and only when, they appear in all capitals.


Notation

u32, u8
Unsigned 32-bit / 8-bit integers (kernel-C style).
ns
Nanoseconds.

Modular sequence-number comparators (32-bit, modulo 2^32):

before(a, b)
(int32_t)(a - b) < 0
after(a, b)
before(b, a)

Used throughout for ackno / seqno ordering checks.

Round-Trip Time (RTT) abbreviations used throughout:

SRTT
Smoothed RTT estimate (RFC 6298).
mdev
Mean deviation of RTT (Linux variance estimator).
EWMA
Exponentially Weighted Moving Average.
RTO
Retransmission Timeout, max(RTO_MIN, srtt + (mdev << MDEV_MUL)).

Timer-bound symbols t_a (a-timer, ACK delay) and t_r (r-timer, retransmission window) are defined in Section 8; t_mpl (Maximum Packet Lifetime) is introduced in Section 2.1 (the inact field) with heritage in Section 15.

Wire-format diagrams follow the IETF convention: bit 0 is the leftmost (most significant) bit and fields are in network byte order unless stated otherwise.


1. Wire format

1.1. PCI header

Fixed 16-octet base Protocol-Control Information (PCI) header prefixed to every FRCP packet (RFC convention: bit 0 leftmost, most-significant bit first). All multi-byte fields are in network byte order. DATA packets on stream-mode flows carry an additional 8-octet extension (see Section 1.5); SACK and RTTP carry their own payloads after the base PCI.

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |             flags             |              hcs              |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                            window                             |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                            seqno                              |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                            ackno                              |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                     payload (variable) ...
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
flags
feature/type bitmap (see Section 1.2).
hcs
CRC-16-CCITT-FALSE Header Check Sequence (HCS) over flags + window + seqno + ackno (+ stream extension when present); the two octets of the hcs field itself are omitted from the CRC input. Verified on receive before any flag-driven dispatch.
window
receiver-advertised right window edge (valid iff FC).
seqno
per-flow sequence number.
ackno
cumulative Acknowledgement (ACK) (valid iff ACK).

A single packet can simultaneously carry DATA + ACK + FC (Flow Control) + RXM (Retransmission) by ORing flag bits; the PCI multiplexes control on the same wire frame in the spirit of SCTP chunk bundling (RFC 9260 sec. 6.10) and QUIC frame multiplexing (RFC 9000 sec. 12.4). DATA-bearing packets carry the caller's payload after the PCI; SACK (Selective Acknowledgement) and RTTP (Round-Trip Time Probe) carry their own typed payloads after the PCI.

Optional framing (per-flow, see Section 2.2). On the wire, the order from inside out is:

Layer Scope
[ PCI + body ] The FRCP packet.
[ PCI + body + CRC-32 ] CRC-32 covers the body only (PCI is in HCS); appended iff qs.ber == 0 on DATA, or on every SACK packet.
[ AEAD-wrap of above ] Iff Authenticated Encryption with Associated Data (AEAD) is enabled.
  • HCS in the PCI covers the header fields on every packet and is verified before any flag-driven dispatch.
  • The CRC-32 trailer (IEEE 802.3 / zlib reflected polynomial 0xEDB88320, init 0xFFFFFFFF, xor-out 0xFFFFFFFF) covers the body on DATA when qs.ber == 0 and on every SACK packet. The PCI is not under the CRC (Cyclic Redundancy Check) because the HCS already protects it. It is appended before AEAD encryption and therefore rides inside the AEAD wrap when both are active; the AEAD tag (~2^-128 forgery probability) dominates the CRC (~2^-32) for integrity in that mode but the CRC trailer is currently retained.
  • When encryption is enabled, the entire (possibly-CRC'd) FRCP packet is wrapped with AEAD inside the shared-memory packet buffer (spb, struct ssm_pk_buff); the packet grows by the AEAD overhead, namely a leading nonce / Initialization Vector (IV) of headsz bytes (crypt_get_ivsz) and a trailing authentication tag of tailsz bytes (crypt_get_tagsz).

Both CRC and AEAD are layered around the FRCP wire format and are not visible to the FRCP machinery itself.

1.2. Flag bits

Flag bits are numbered most-significant-bit first to match the wire diagram (bit numbering per Section 1.1; bit 0 is the MSB of the 16-bit flags field and lands at wire-position 0 in network byte order). Bits 13..15 are reserved and MUST be transmitted as zero.

Bit Mask Name Meaning
0 0x8000 DATA Carries caller payload
1 0x4000 DRF Data Run Flag: start of a fresh run
2 0x2000 ACK Acknowledgement: ackno field valid
3 0x1000 NACK Negative ACK; seqno = arrival_seqno-1
4 0x0800 FC Flow Control: window field valid (rwe)
5 0x0400 RDVS Rendezvous probe (window-closed)
6 0x0200 FFGM First Fragment (role bit 0; see below)
7 0x0100 LFGM Last Fragment (role bit 1; see below)
8 0x0080 RXM Retransmission
9 0x0040 SACK Selective ACK block list in payload
10 0x0020 RTTP RTT Probe / echo (payload follows)
11 0x0010 KA Keepalive
12 0x0008 FIN End-of-stream marker (stream mode)
13-15 -- -- Reserved (MUST be zero)

The (FFGM, LFGM) pair encodes the fragment role of a DATA-bearing Service Data Unit (SDU), SCTP-style begin/end flags (RFC 9260 sec. 3.3.1):

FFGM LFGM Role
1 1 Sole / un-fragmented SDU (begin AND end)
1 0 First fragment of a multi-fragment SDU
0 0 Middle fragment
0 1 Last fragment

Each fragment is carried in its own FRCP packet with its own seqno; FRTX (the FRCT Retransmission service mode, see Section 2.2) recovers individual fragments via the normal Retransmission Timeout (RTO) / SACK / Recent Acknowledgement (RACK, RFC 8985) path. The receiver reassembles the SDU at consume time once the contiguous [FIRST .. LAST] run has fully arrived. On non-DATA packets the role bits are unused and MUST be transmitted as zero.

In stream mode (qos.service == SVC_STREAM, see Section 16) there are no SDU boundaries to encode, so FFGM and LFGM are unused and MUST be transmitted as zero. End-of-stream uses a dedicated bit (FIN, bit 12) carried on a 0-byte DATA packet, emitted at write-half close (fccntl to FLOWFRDONLY), during linger drain, and at flow_dealloc; emission is idempotent (first call wins). After contiguous delivery of the FIN-bearing slot, the receiver latches byte_fin at the FIN's start offset; flow_read returns 0 (end-of-file, EOF) once buffered bytes have been drained up to byte_fin. Per-byte position is carried by the [start, end) extension (Section 1.5).


1.3. SACK payload

A SACK packet has the FRCT_ACK | FRCT_FC | FRCT_SACK flag bits set (bit numbering per Section 1.1). Following the 16-octet PCI, the payload is a 2-octet block count (network byte order), 2 octets of padding to 4-byte align the block list, then n_blocks pairs of 32-bit start/end seqnos describing present (received) ranges above the cumulative ACK.

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |           n_blocks            |        padding (2 octets)     |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                           start[0]                            |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                            end[0]                             |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                           start[1]                            |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                       ... n_blocks pairs total ...

n_blocks <= SACK_MAX_BLOCKS (2048). The per-flow effective cap is further bounded by (frag_mtu - PCI - 4) / 8 blocks per packet; SACK packets carry no stream extension, so PCI here is the 16-octet base header even on stream-mode flows.

Wire invariant: every block produced by the receiver, except an optional leading Duplicate SACK (D-SACK) block as described below, describes a range strictly above the cumulative ACK carried in the PCI ackno field (after(start[i], ackno)). This makes the D-SACK convention below unambiguous; the receiver-side builder MUST preserve it.

Duplicate SACK (D-SACK, RFC 2883) is signalled in-band: no flag bit, no extra framing. Modular seqno arithmetic uses the before() / after() comparators defined in the Notation block.

Encoding. When a duplicate is observed the receiver arms a single-slot pending report (dsack_seqno + dsack_valid, latest-wins across multiple arms before the next emit). On the next outbound SACK the receiver prepends block[0] = [dsack_seqno, dsack_seqno + 1) - always a one-seqno range - and clears the flag. The three arm sites are listed in Section 10; case-1 sites yield dsack_seqno < rcv_cr.lwe (the next pci.ackno), and the case-2 site (rq_accept conflict) yields dsack_seqno in [rcv_cr.lwe, rcv_cr.rwe).

Detection. The sender classifies block[0] by its relation to pci.ackno:

case 1 (RFC 2883 sec. 4.1.1, full duplicate)
before(blocks[0].start, pci.ackno) AND pci.ackno - blocks[0].start <= MAX_DSACK_LAG (== RQ_SIZE). The lag bound rejects stale or spoofed reports beyond one receive window.
case 2 (RFC 2883 sec. 4.1.2, partial duplicate)
blocks[0] is a sub-range (with at least one endpoint differing) of some blocks[i>0] - i.e. the same packet's remaining SACK blocks already describe the duplicated seqno as received.

On detect, the sender:

  • bumps reo_wnd_mult by 1, capped at REO_WND_MULT_MAX (= 20), per RFC 8985 sec. 6.2 step 4;
  • snapshots dsack_lwe_snap = snd_cr.lwe, resetting the 16-cum-ACK halving counter so the multiplier doesn't decay while D-SACK evidence is still arriving;
  • excludes block[0] from the gap-marking loop (n_real = n - 1), so a D-SACK alone never enters NewReno-careful recovery (see Section 8); only non-D-SACK blocks count as gaps.

The reo_wnd_mult halving cadence (once per 16 cumulatively-ACK'd seqnos since the most-recent D-SACK arrival or halve event) and the reset-to-1 on a HoL RTO fire are both per the same RFC 8985 clause. The clamp-and-skip path in the regular SACK-mark loop is incidentally idempotent on any leftover case-1 or case-2 block (start < snd_cr.lwe clamps to snd_cr.lwe and the inner loop skips k == snd_cr.lwe; case-2 re-NULLs slots already marked received by later blocks), so block[0] is harmless even when fed to the loop.

1.4. RTTP payload

An RTTP (Round-Trip Time Probe) packet has only the FRCT_RTTP flag set (bit numbering per Section 1.1). Following the 16-octet PCI, the payload is 24 octets (packed):

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                          probe_id                             |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                          echo_id                              |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                                                               |
    +                  nonce (16 octets, echoed verbatim)           +
    |                                                               |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
probe_id
sender counter, 0 on reply, 0 reserved.
echo_id
peer's probe_id, 0 on outbound probe.
nonce
random, echoed unmodified, memcmp'd to defeat spoof.


1.5. Stream PCI extension

A stream-mode flow (qos.service == SVC_STREAM) carries an extra 8-octet extension after the 16-octet base PCI on every DATA packet (bit numbering per Section 1.1):

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                            start                              |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                             end                               |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
start
octet offset of the first payload byte in the stream.
end
octet offset one past the last payload byte; end - start equals the on-wire payload length.

Total stream-mode PCI for DATA packets is 24 octets (16 base + 8 extension); control packets (SACK, RTTP, bare ACK, KA, etc.) retain the 16-octet base PCI. Stream mode MUST be negotiated at flow allocation; the extension is present iff stream mode is in use, never on a per-packet basis. Both peers MUST treat start/end as monotonic 32-bit byte offsets; when a slot reaches the head of the contiguous run with start not equal to the prior packet's end the slot is silently dropped at delivery time (Section 16) rather than rejected at stash.

This is the QUIC STREAM-frame reassembly model (RFC 9000 sec. 19.8): each packet carries its packet seqno (this PCI's seqno field) and a separate stream byte position (start/end). Separating the two avoids TCP's conflation of packet identity with byte position which forces Karn's algorithm for Round-Trip Time (RTT) sampling (no RTT sample on retransmits, RFC 6298 sec. 3); FRCP applies the Karn-equivalent gate via a combination of per-packet FRCT_RXM, per-slot SND_RTX flags, and a sample-fence rtt_lwe (see Section 2.1 and Section 12). FRCP's fixed-32-bit start/end wrap at 4 GiB of wire bytes, narrower than QUIC's 62-bit varint offset (cf. RFC 9000 sec. 16); the on-wire wrap is handled by the same modular before() / after() comparators (Section 1.3) FRCP uses for seqnos, which remain unambiguous as long as the in-flight byte window stays strictly under 2 GiB (the half-range of the signed-int32 difference in before()). The default per-flow ring is 1 MiB; the implementation caps ring_sz at 128 MiB (FRCT_STREAM_RING_SZ_MAX), well below the 2 GiB half-range bound. The runtime byte counters exposed via FUSE (Filesystem in Userspace) in the Ouroboros Resource Information Base (RIB, a virtual-filesystem introspection bridge) are platform size_t and do not wrap on 64-bit hosts.


2. Per-flow state and service modes

2.1. Per-flow state

Each flow keeps a sender control record and a receiver control record:

lwe
u32
snd: oldest unacked seqno (cumulative ACK boundary as seen by sender); rcv: next in-order seqno expected
rwe
u32
snd: peer-advertised right window edge; rcv: locally-advertised right window edge
cflags
u8
per-direction feature flags: retransmission (FRCTFRTX), receiver flow control (FRCTFRESCNTL), linger-on-close (FRCTFLINGER); see <ouroboros/fccntl.h>
seqno
u32
snd: next seqno to send; rcv: force-ACK trigger - set on a stale or dup DATA so the next ack_snd emits a fresh cumulative ACK
ackno
u32
snd: seqno counter for standalone ACK-bearing control packets (delayed ACK, SACK, final ACK on dealloc); not bumped on piggybacked ACK riding a DATA packet (which uses the DATA seqno). Used by wire-dup ACK detection; rcv: incoming-ACK dedup tracker
act
ns
last activity (used by inactivity / DRF)
inact
ns
inactivity threshold; sender = 3*mpl + a + r + 1s, receiver = 2*mpl + a + r + 1s. mpl is the Maximum Packet Lifetime (delta-t terminology; see Section 15); a and r are the FRCT a-timer and r-timer bounds (see Section 8). The asymmetry is load-bearing for pre-DRF NACK (Section 9).

The sender holds a per-slot ring snd_slots[RQ_SIZE] keyed by (seqno mod RQ_SIZE). Each slot tracks its retransmit entry (rxm), last-send timestamp, and retransmit flag bits: SND_RTX (a retransmit is pending or has fired, gates the next RTT sample under Karn) and SND_FAST_RXM (one-shot fast-retransmit staged for this loss event).

The receiver holds a parallel reorder ring rcv_slots[RQ_SIZE] (referred to as rq[] in prose) holding stashed out-of-order packet-buffer indexes; both FRTX and best-effort flows share this path. The invariant rwe - lwe <= RQ_SIZE holds: on each consume the receiver advances rwe by the consumed count, capping the receive window at RQ_SIZE seqno slots.

A separate fence variable rtt_lwe is bumped on every retransmit (timer-fire, SACK-driven, fast-rxm, NACK-driven) and on every seqno_rotate (Section 4) to mark the seqno range whose RTT samples MUST be discarded.


2.2. Service modes (orthogonal axes)

FRCP exposes its wire features as a vector of independent QoS axes selected at flow allocation time. All flows go through the same flow_alloc(name, qos, ...) primitive; the qosspec_t passed in determines which protocol machinery engages on the wire. This contrasts with the POSIX BSD socket model where TCP and UDP require different socket types (SOCK_STREAM / SOCK_DGRAM).

The axes:

service
0 = unordered (no FRCP engagement: raw datagrams, no PCI on the wire, UDP-equivalent at this layer); 1 = message-ordered (FRCP engaged; SDU boundaries preserved across fragmentation); 2 = stream (byte-oriented, no SDU boundaries; FRTX required)
loss
0 = lossless service requested: FRTX retransmit machinery engages (Section 8); MUST be 0 for service=2. Non-zero = best-effort, FRTX off.
ber
Bit Error Rate tolerance. 0 = error-free service requested: a CRC trailer is appended after the body of DATA packets and verified on receive (added / checked outside the FRCP PCI; see Section 1.1). Non-zero = peer accepts errors; trailer omitted. SACK control packets carry a CRC32 trailer regardless of ber; the ber gate applies to DATA only.
timeout
Peer-timeout (ms); 0 disables the keepalive timer. Independent of FRCP engagement.

Encryption is a separate per-flow attribute set at flow setup; when enabled it wraps the FRCP packet (PCI + body, plus the CRC trailer if any) under AEAD, expanding the spb by headsz + tailsz octets (nonce / tag). The CRC trailer is currently kept inside the AEAD wrap (see Section 1.1).

Reachable combinations exported by include/ouroboros/qos.h:

Cube service loss ber Engaged
qos_raw 0 1 1 Raw passthrough
qos_raw_safe 0 1 0 Raw + CRC trailer
qos_rt 1 1 1 FRCP, no FRTX, no CRC
qos_rt_safe 1 1 0 FRCP, no FRTX, CRC
qos_msg 1 0 0 FRCP + FRTX
qos_stream 2 0 0 FRCP + FRTX, stream

Forced couplings actually enforced by the public API:

  • service == SVC_STREAM (2) requires loss == 0; flow_alloc / flow_accept reject the pair otherwise with -EINVAL.
  • FRTX requires FRCP engagement (service != SVC_RAW); requesting loss = 0 with service = SVC_RAW is structurally a no-op because no frcti is created.
  • The QOS_DISABLE_CRC build flag globally forces ber = 1. Note: this flag defaults to ON, so default builds ship with CRC disabled until QOS_DISABLE_CRC is set to OFF.

Caveat: the API does NOT force ber = 0 when service != SVC_RAW. qos_rt has service = SVC_MESSAGE with ber = 1, which means the PCI itself is not CRC-protected on that cube; the HCS (Section 1.1) remains the only integrity check on the header.

The FRCP-no-FRTX regime (service = SVC_MESSAGE, loss > 0) is meaningful and live: sequence numbering, in-order delivery, flow-control advertisement, KA, DRF rotation, and SDU fragmentation / reassembly (Section 7.2) all run. Lost packets are dropped rather than retransmitted; a permanently-lost mid-fragment is dropped via skip-past-gap once a later SDU is visible in the reorder ring.


3. Protocol parameters

Parameter Value Role
RQ_SIZE compile-time, power of 2 (default 128) Slot ring / rcv window width
START_WINDOW compile-time, power of 2 (default 128) Initial rwe-lwe after rotate
RTO_MIN MAX(250 us build-tunable, 1<<RXMQ_RES); per-flow via fccntl (FRCTSRTOMIN). Default ~1 ms with RXMQ_RES=20. RTO floor; also floored at the retransmit-wheel resolution (~1 ms by default).
MAX_RTO_MUL 20 Backoff shift cap
RACK window R MIN(reo_wnd_mult * min_RTT/4, SRTT) with MIN_REORDER_NS = 250 us floor; reo_wnd_mult scales on D-SACK, cap 20 Reorder window; per RFC 8985 sec. 6.2; reo_wnd_mult per sec. 6.2 step 4
MIN_RTT_WIN_NS 300 s (5 min, Linux tcp_min_rtt_wlen) min_RTT windowed re-anchor
REO_WND_MULT_MAX 20 (RFC 8985 sec. 6.2 step 4) reo_wnd_mult cap
REO_DECAY_PKTS 16 (RFC 8985 sec. 6.2 step 4 / RACK.reo_wnd_persist) Fresh-ACK'd seq count per halving
MAX_DSACK_LAG RQ_SIZE D-SACK sanity cap
RTT_QUARANTINE 32 (seqno steps) NewReno gate pad
SACK rate-limit SACK_MIN_GAP_NS (250 us, fixed) Min SACK gap
SACK_MAX_BLOCKS 2048 (wire cap; per-flow capped at (frag_mtu-PCI-4)/8) Per-SACK block cap
SACK_RXM_MAX 32 Per-pass staged retransmit cap
DUP_THRESH 3 (RFC 8985 default) Hybrid fast-rxm trigger (Section 8)
MDEV_MUL 2 (build-tunable via FRCT_RTO_MDEV_MULTIPLIER) mdev shift in RTO = srtt + (mdev << MDEV_MUL)
RTTP nonce 16 octets Echoed verbatim
RTTP_RING 8 In-flight probes
RTT clamp 16 * srtt Probe-sample upper bound (ACK-derived RTT samples gated by Karn / recovery only)
Cold-probe cadence 100 ms (rx-driven; see Section 12) Pre-srtt RTTP rate
DELT_RDV 100 ms RDVS emit cadence
MAX_RDV 1 s RDVS give-up
Delayed-ACK fire 2 * TICTIME (TICTIME = FRCT tick granularity, default 5 ms; 2*TICTIME = 10 ms by default) Fired after the first in-order DATA arrival; tick is build-tunable
NACK send cooldown srtt when an srtt sample exists, else 100 ms Pre-DRF NACK rate-limit
MAX_SDU 1 MiB Max reassembled SDU; configurable per flow

The per-flow fragment Maximum Transmission Unit (MTU) is computed at flow setup from the lower IPCP's mtu minus encryption headsz / tailsz and CRC trailer; there is no FRCT-level default or environment-variable override.


4. Sequence-number rotation (DRF)

The DRF (Data Run Flag) bit on an outbound packet means "this is the start of a fresh data run" and is set whenever the sender has nothing in flight (snd_cr.seqno == snd_cr.lwe).

Independently of that, if the sender has been idle longer than snd_cr.inact AND the pipe is empty (snd_cr.seqno == snd_cr.lwe), seqno_rotate() rolls a random new seqno before the send and resets

    snd_cr.seqno  = random()
    snd_cr.lwe    = snd_cr.seqno
    snd_cr.rwe    = snd_cr.seqno + START_WINDOW
    rtt_lwe       = snd_cr.seqno
    in_recovery   = false   (recovery state, see Section 8)
    recovery_high = snd_cr.seqno

The receiver, on observing rcv-side inactivity (now - rcv_cr.act > rcv_cr.inact), requires a DRF on the next DATA packet; otherwise it replies with a rate-limited NACK (see below). Non-DATA control packets pass through without the DRF requirement (no impact on receiver state). On DRF the receiver releases the rq[] slots and rebases

    rcv_cr.lwe   = seqno
    rcv_cr.rwe   = seqno + RQ_SIZE
    rcv_cr.seqno = seqno

If the inactive packet has DATA but no DRF, a rate-limited NACK is fired back to the sender (cooldown per Section 3); non-DATA stale arrivals fall through to normal processing (no NACK, no drop).

5. Send path

  1. If the SDU exceeds (frag_mtu - data_hdr_len), the caller (dev.c) fans it out into ceil(count / (frag_mtu - data_hdr_len)) fragments, each emitted via frcti_snd as its own DATA packet with a per-fragment role (Section 7.2); both FRTX and best-effort flows fragment. Raw flows (no FRCP engagement, qos.service == SVC_RAW) carry no PCI and return -EMSGSIZE for any SDU larger than one packet at the layer below. An SDU that fits in a single packet is sent as SOLE. frcti_snd reserves PCI head room; sets DATA, plus DRF when the pipe is empty (snd_cr.seqno == snd_cr.lwe).
  2. seqno_rotate() if past sender inactivity and the pipe is empty (Section 4).
  3. Advertise FC (pci.window = frcti_advert_rwe(frcti), i.e. rcv_cr.rwe clamped to rcv_cr.lwe + ring_seq_cap in stream mode) when the receiver side is recent: now - rcv_cr.act < rcv_cr.inact.
  4. Reliable mode (FRTX): leave snd_cr.lwe where it is; reset the slot at RQ_SLOT(seqno) (snd_slots[p].time = now, snd_slots[p].flags = 0); queue an rxm_entry (saves a packet copy, arms a wheel timer at now + (rto << rto_mul)). Piggyback ACK (pci.ackno = rcv_cr.lwe) while the a-timer for the most recent received DATA packet has not yet expired (now - rcv_cr.act <= t_a); on piggyback, set rcv_cr.seqno = rcv_cr.lwe so the next delayed-ACK fire is suppressed. See Section 8 for t_a / t_r semantics.
  5. Best-effort mode (no FRTX): advance snd_cr.lwe immediately (snd_cr.lwe = snd_cr.lwe + 1, snd_cr.rwe = snd_cr.lwe + RQ_SIZE); no retransmit state. No send-side RTT probe is armed in this mode (rtt_probe_arm requires an in-flight seqno, which best-effort never has); the rx-driven cold seeder in frcti_rcv is the only probe path.
  6. In reliable mode, optionally arm an RTT probe (Section 12).


6. Receive path

6.1. Early-exit dispatch

Keepalive (KA), RTT probe (RTTP), pre-DRF NACK, and rendezvous (RDVS) packets short-circuit out of frcti_rcv before the locked main path; each handler takes its own lock internally.

      incoming packet
            |
            v
       +---------+
       | KA?     |---yes--> ka_rcv  ; return
       +---------+
            |no
            v
       +---------+
       | RTTP?   |---yes--> rttp_rcv; return
       +---------+
            |no
            v
       +---------+
       | NACK?   |---yes--> nack_rcv; return  (see Section 9)
       +---------+
            |no
            v
       +---------+
       | RDVS?   |---yes--> rdv_rcv ; return  (reply bare FC, ackno=0)
       +---------+
            |no
            v
       acquire wrlock; enter locked main path
KA
refresh t_ka_rcv, honour piggybacked ACK.
RTTP
probe (echo back nonce) or echo (verify nonce, sample RTT).
NACK
pre-DRF, sender-side handler. See Section 9.
RDVS
reply with a bare FC packet (ackno = 0); rdlock only.


6.2. Locked main path

Steps below run with the per-flow frcti.lock held for writing (pthread_rwlock_wrlock) unless noted.

rcv_inact_check
Only meaningful when the receive side is stale. On DRF (Data Run Flag): release rq[] slots, rebase rcv_cr, continue. On stale DATA without DRF: fire a pre-DRF NACK if cooldown allows (Section 9), then discard the packet; on cooldown, drop without sending a NACK (a pending cumulative ACK from drop_packet may still go out). Non-DATA, non-DRF arrivals bypass rcv_inact_check entirely; pure-DRF stale arrivals fall through after the DRF rebase branch.
DATA-only act refresh
Refresh rcv_cr.act only when FRCT_DATA is set, so that non-DATA packets never block the next DRF rebase.
Wire-dup gate
Before flag-driven dispatch, drop wire-duplicate ACKs and wire-duplicate DATA (is_dup_ack / is_dup_data). The DATA check is bypassed for FRCT_RXM-bearing arrivals so the piggybacked ACK / SACK / FC carried on a retransmitted DATA at an already-ACK'd seqno is still applied; the stale-in-window branch below then drops the packet.
ACK
Drop ACKs whose ackno falls outside (snd_cr.lwe, snd_cr.seqno]. If ackno == snd_cr.lwe (non-advancing cumulative ACK), drive RACK fast-retransmit consideration (Section 8). Otherwise advance snd_cr.lwe = ackno, collapse rto_mul to 0 (Karn-gated by SND_RTX on the just-acknowledged slot, the old head-of-line), reset dup_thresh to 0, update t_latest_ack to the send-time of the slot at ackno-1 (consumed by RACK and SACK below), decay reo_wnd_mult per RFC 8985 sec. 6.2 step 4, exit NewReno-careful recovery (see Section 8) on ackno >= recovery_high or ackno == snd_cr.seqno, and feed an RTT sample if eligible (Section 12).
SACK
Walk the block list. For each block (a present range above lwe) NULL out snd_slots[k].rxm, clear the slot's per-send flags, and advance t_latest_ack to the latest send-time covered (the Forward Acknowledgement / fack equivalent, Mathis & Mahdavi 1996); the first block whose start clamps to snd_cr.lwe skips this fack update so that a head-of-line clamp does not falsely advance fack. For un-SACKed gaps below hi_sacked, stage a retransmit per slot that is (1) still owned (rxm != NULL), (2) not already SND_FAST_RXM, (3) not aged out past t_r, and (4) either outside the RACK reorder window R OR with dup_thresh >= DUP_THRESH (the RFC 8985 sec. 6.2 hybrid trigger). Mark the slot SND_FAST_RXM and NULL the rxm at stage time. Capped at SACK_RXM_MAX staged retransmits per receive pass; what's left rides the next SACK.
FC
Bump snd_cr.rwe (clamped to lwe + RQ_SIZE, never shrinks) and mark window open.
DATA
Bounds-check seqno against window. On stale-dup (seqno < rcv_cr.lwe), set rcv_cr.seqno = seqno to force a fresh ACK on the next ack_snd, then drop. On accept: both FRTX and best-effort stash the packet-buffer index into rq[seqno mod RQ_SIZE]. Fragments stash unchanged - the role bits are inspected only at consume time (Section 7.2). On out-of-order arrival, build a SACK reply if not rate-limited (per Section 3) and not deduplicated against the previous (rcv_cr.lwe, n_blocks) pair; D-SACK reports always bypass the dedup. If both rate-limit and dedup suppress the reply, neither SACK nor delayed-ACK fires (the sender picks up the gap on its next ACK). On in-order arrival, arm the delayed-ACK timer.
drop_packet exit
Releases the per-packet shared-memory buffer (spb), then calls ack_snd synchronously after the spb release to surface any pending cumulative ACK.


7. Read path and reassembly

7.1. Read path

flow_read returns a full reassembled SDU (Service Data Unit) via frcti_consume on every FRCP SDU-mode flow (FRTX or best-effort); stream-mode is covered in Section 16. An incomplete head-of-line (HoL) run yields -EAGAIN; an oversized run yields -EMSGSIZE (the run is dropped so the flow does not stall). On best-effort flows, a permanently-lost mid-fragment is dropped as soon as a later complete SDU becomes visible in the ring (Section 7.2 skip-past- gap).

Raw flows carry no frcti, so flow_read returns the next pending packet-buffer index directly, with no role-bit inspection. (Raw service is selected via qos.service == SVC_RAW at flow allocation, which suppresses frcti creation.)

frcti_pdu_ready is the no-advance peek used by fevent (the Ouroboros flow-event multiplexer, the poll(2)-equivalent on flows). It returns ready only when the head-of-line run is complete and the lead packet (a Protocol Data Unit, here one FRCP packet) is present at rcv_cr.rwe - RQ_SIZE; any other state (including the best-effort skip-past-gap case) returns not ready, and frcti_consume is left to drop the broken prefix and re- inspect.


7.2. Fragmentation and reassembly

Send side (flow_write_frag). An SDU larger than (frag_mtu - PCI) is split into ceil(count / (frag_mtu - PCI)) fragments; each fragment is its own FRCP packet with its own seqno and a per-fragment role flag pair (Section 1.2). Roles are assigned at emit time:

i Role
n=1 SOLE
i=0 FIRST
i=n-1 LAST
else MID

A mid-loop allocation or transmit failure may yield a partial write: the call returns the bytes already enqueued (off > 0) or the underlying error (off == 0). Best-effort flows fragment identically; on the receiver, a partial run with a permanently- lost fragment is dropped when a later complete SDU is visible in the ring (see skip-past-gap below). Raw flows carry no PCI and refuse anything larger than the layer's user MTU (-EMSGSIZE).

Wire-level recovery is fragment-agnostic on FRTX flows: each fragment's seqno flows through SACK / RACK / RTO / NACK exactly as for a SOLE DATA packet, and reassembly does not re-enter the loss-detection path. Best-effort flows run the same seqno machinery (DRF, FC, ACK piggyback, pre-DRF NACK emit) but queue no rxm state at the sender, so a lost MID is unrecoverable; skip-past-gap handles it (below).

Receive side. Fragments stash into rq[seqno] unchanged; role bits are read only at consume time. frag_run_inspect, called from frcti_consume, walks the ring starting at the oldest still- undelivered seqno base = rcv_cr.rwe - RQ_SIZE (equal to rcv_cr.lwe only when no partial run is in progress; during a partial run lwe has already advanced past base). It produces one of three outcomes:

Outcome Cause
DELIVER (n) rq[base]=SOLE (n=1), or rq[base]=FIRST and a LAST follows in slots [base+1..base+n-1] with all intermediate roles in {MID,FIRST,LAST} contiguous.
DROP (n) rq[base] is MID or LAST without a preceding FIRST (n=1); a FIRST..[non-LAST]..new-FIRST or new-SOLE mid-run (drop the broken prefix with n = run length minus 1, so the new FIRST/SOLE stays); or, on best-effort flows, a gap at base with a FIRST/SOLE later in the ring (drop up to the new run start).
NOT_READY rq[base] absent or FIRST..[non-LAST] with no later FIRST/SOLE in the ring (FRTX waits for retx; best-effort waits for arrival).

DELIVER triggers frag_gather: a scatter-gather memcpy of the n consecutive fragments at rq[base..base+n-1] directly into the caller's buffer; each per-packet shared-memory buffer (spb) is released and rwe advances by n. lwe was already advanced incrementally as each contiguous fragment arrived; frag_gather only restores the fixed-width invariant rwe == lwe + RQ_SIZE. No intermediate reassembly buffer is allocated.

DROP advances rwe past the broken prefix (releasing the spbs) and pulls lwe up to the new trailing edge if needed; the next consume retries from the new base. Oversize or arithmetically overflowing delivery (sum of fragment lengths > max_rcv_sdu, sum > caller's buffer, or running-sum overflow) also drops the run with -EMSGSIZE.

Skip-past-gap (best-effort only). On FRTX, a gap in the run means "waiting for retransmit" and frag_run_inspect returns NOT_READY. On best-effort flows the gap is permanent, so frag_run_inspect scans forward in the ring for the next FIRST or SOLE; if one is visible within RQ_SIZE, it returns DROP for the broken prefix and the consume loop retries at the new lwe. Memory hold is bounded by RQ_SIZE; the partial releases on the next consume call once a later complete run exists. Voice-like flows (one SOLE per SDU) see no extra wait: any later SOLE makes the prior gap droppable immediately.

The choice to defer reassembly to consume time keeps the receive path zero-copy: fragments stay in the shared-memory ring until the application pulls, and the SDU lands directly in the caller's buffer.


8. Retransmission

FRCP is bounded by two delta-t-derived timers (Watson 1981, see Section 15):

  • t_a (a-timer): upper bound on ACK delay. An ACK for a received DATA packet MUST NOT be emitted after t_a of receipt; an attempt to send an ACK after the a-timer has expired is suppressed.
  • t_r (r-timer): upper bound on retransmission. A given DATA packet MUST NOT be retransmitted after t_r has elapsed since its first send (t0); when the bound is hit, the flow is declared down (raising the Ouroboros asynchronous flow condition ACL_FLOWDOWN, which marks the flow dead to both endpoints) rather than retransmitted again.

Each in-flight FRTX seqno owns one rxm_entry, armed in a hashed timing wheel; the wheel deadline is the slot's next eligible retransmit time.

RTO timer
On fire (rxm_due), re-emit with FRCT_RXM, mark SND_RTX (Karn-suppress next ACK's RTT sample), and (for the head-of-line (HoL) slot only) bump rto_mul up to MAX_RTO_MUL. Wheel deadline is t_send + (rto << rto_mul). Re-armed unless consumed. The RTO timer also clears SND_FAST_RXM (re-arming fast-retransmit eligibility), resets reo_wnd_mult to 1 on a HoL fire (RFC 8985 sec. 6.2 step 4 reset clause), and marks the flow ACL_FLOWDOWN if its frct_tx call fails.
r-timer guard
Before any retransmit attempt, check (now - t0) against t_r. If exceeded, the slot is no longer eligible for retransmit. Only the RTO timer (rxm_due) treats r-timer expiry as terminal: it marks the flow ACL_FLOWDOWN (peer unreachable). Fast-retransmit, SACK-driven retransmit, and NACK-driven head-of-line re-emit silently skip aged-out slots and defer the flow-down decision to the next RTO fire.
Fast retransmit (hybrid trigger, RFC 8985 sec. 6.2)
On a non-advancing cumulative ACK with the scoreboard advanced, fire one fast retransmit when EITHER (a) the head-of-line slot's latest send is older than the RACK reorder window R (Section 3) and not yet aged out, OR (b) the SACK dup-thresh count above snd_cr.lwe reaches DUP_THRESH (= 3, RFC 8985 sec. 6.2 step 4). Fires at most once per non-advancing cumulative-ACK value, gated by rack_fired_lwe (the snd_cr.lwe at which fast-retransmit last fired). Set SND_FAST_RXM on the slot (one-shot per-slot gate) and enter NewReno-style careful recovery (see NewReno below in this section).
The RACK reorder window R uses the RFC 8985 sec. 6.2 form R = MIN(reo_wnd_mult * min_RTT / 4, SRTT) with a MIN_REORDER_NS = 250 us floor. Before the first RTT sample seeds min_rtt, R falls back to MIN(reo_wnd_mult * SRTT / 4, SRTT), still floored at MIN_REORDER_NS (consistent with the windowed-minimum fallback described in Section 12). min_rtt is a windowed minimum over the last MIN_RTT_WIN_NS = 5 min of RTT samples (matches the Linux tcp_min_rtt_wlen default) so a route change to a longer path eventually re-anchors the reorder window without relying on reo_wnd_mult growth alone.
SACK-driven retransmit
For each gap below hi_sacked whose slot is (1) still owned, (2) not already SND_FAST_RXM, (3) not aged out past t_r, and (4) either outside the RACK window R OR with dup_thresh >= DUP_THRESH (same hybrid as fast-retransmit, see Section 6.2), re-emit. Each SACK-driven retransmit re-arms a fresh rxm so a lost retransmit can still be recovered by its own RTO timer.
NewReno
On entry, recovery_high = snd_cr.seqno + RTT_QUARANTINE. Exit when ackno >= recovery_high or ackno == snd_cr.seqno (the latter means everything sent has been acknowledged). seqno_rotate also clears recovery.

9. Pre-DRF NACK

The two sides have different inactivity thresholds (snd_cr.inact > rcv_cr.inact), so a receiver can detect "stale data run" before the sender's own DRF logic kicks in. NACK is the receiver-driven nudge that asks the sender to re-transmit the head of the run.

Send (frcti_nack_snd, called by frcti_rcv when rcv_inact_check returns FRCT_INACT_NEED_NACK)
When an incoming DATA packet has no DRF and rcv-side activity is older than rcv_cr.inact, the receiver emits a bare packet with flags = FRCT_NACK and seqno = arrival_seqno - 1 (informational only, not consulted by the receive handler). The cooldown in Section 3 rate-limits the burst. Non-DATA non-DRF arrivals bypass rcv_inact_check entirely; non-DATA DRF still rebases via the DRF branch.
Receive (frcti_nack_rcv)
Dispatched in the early-exit branch (Section 6.1), before rcv_inact_check. The sender copies the head-of-line (HoL) rxm packet, marks the slot SND_RTX | SND_FAST_RXM (Karn-suppress next ACK, one-shot fast-rxm gate), sets rtt_lwe = snd_cr.lwe + 1, and re-emits via fast_rxm_send with FRCT_RXM and a refreshed ackno. The original rxm_entry and its RTO timer are left armed - the NACK emit is additive to the normal retransmit machinery, not a replacement. No-op if nothing is in flight, the HoL slot has aged past t_r, or the HoL rxm pointer has been cleared by SACK or RACK.

NACK has exactly one role: lost first-of-run (DRF) packet recovery. Until the DRF packet arrives, the receiver cannot rebase its window, so any subsequent in-flight packets look stale to the receiver. The NACK fires the moment a stale receiver sees DATA without DRF, telling the sender to re-emit the head-of-line (DRF) packet at NACK-cooldown latency rather than waiting for the initial RTO (which is the configured default until srtt is seeded by the first probe round-trip). Mid-stream loss is NOT NACK-driven; it is recovered by the sender's RTO, fast retransmit, and SACK-driven retransmit paths (Section 8) only.

The existing rxm_entry and its RTO timer are left armed on a NACK re-emit, so the RTO path remains the eventual fallback.

10. Cumulative + selective ACK

Cumulative ACK is ackno = rcv_cr.lwe. On out-of-order arrival the receiver also emits a SACK packet (Section 1.3) whose payload lists present blocks above lwe (analogous to TCP SACK / QUIC ACK ranges). SACKs are rate-limited per Section 3 and suppressed when neither lwe nor block count has changed since the last SACK.

D-SACK reports (RFC 2883) are emitted in-band as block[0] of an otherwise normal SACK frame (see Section 1.3 for the encoding). Two receiver triggers arm a pending D-SACK report (single-slot, latest-wins):

  • DATA arrival with seqno < rcv_cr.lwe, both wire-dup (no RXM, is_dup_data path) and retransmit (RXM, post-FC branch) (RFC 2883 sec. 4.1.1, full duplicate)
  • rq_accept conflict, slot already occupied in [lwe, rwe) (RFC 2883 sec. 4.1.2, partial duplicate)

When a D-SACK is pending and the standard scoreboard SACK would be suppressed by dedup or rate-limit, the report is emitted as a stand-alone SACK frame through the normal ack_snd path; when a D-SACK report is pending the path bypasses dedup and the TICTIME rate-limit, but the a-timer suppression on rcv inactivity still applies.

Bare ACKs are deferred via a per-flow delayed-ACK timer (one in flight at a time, atomic test-and-set dedup; fires per Section 3 after the first in-order arrival). Suppressed if (1) no new seqno, (2) rcv side is inactive (older than t_a), or (3) the sender just sent within TICTIME. A pending D-SACK ride-through bypasses (1) and (3); the a-timer gate (2) is unconditional.


11. Flow control

The receiver advertises rwe in every FC field. The sender treats its snd_cr.rwe as the absolute right edge: when snd_cr.seqno >= snd_cr.rwe the window is closed and flow_write yields. While closed, the sender periodically emits RDVS (rendezvous) packets (cadence DELT_RDV); the receiver replies with a bare FC packet (ackno = 0) that reopens the window. Once the window has been closed for longer than MAX_RDV the sender stops emitting RDVS but does not tear the flow down - the writer keeps blocking until either a peer-driven FC arrives or the KA (keepalive) / r-timer marks the flow.

rwe is clamped to lwe + RQ_SIZE on receipt and MUST NOT shrink: a backward rwe is silently clamped to the current snd_cr.rwe; the FC packet still reopens the window.


12. RTT estimation

Active RTTP probes (Section 1.4) carry a 32-bit probe_id (0 reserved) and a 16-byte random nonce echoed verbatim - defends against spoofed replies. A ring of RTTP_RING in-flight probes is kept; an echo whose (id, nonce) doesn't match the ring slot is dropped. A single RTTP sample is clamped to RTT_CLAMP_MUL * srtt (compile-time RTT_CLAMP_MUL = 16) once srtt is seeded; the first cold-probe sample feeds rtt_update raw.

Probe arming gates:

Cold (no srtt yet)
the receive path arms at most one probe per 100 ms via frcti_rcv_probe (PROBE_DUE_COLD); arming requires an incoming packet. Active send-path arming bails while srtt == 0.
Warm (rtt_probe_arm, called from frcti_snd)
outstanding data (snd_cr.seqno > snd_cr.lwe), AND at least 2 * srtt since t_rcv_rtt (last RTT receive of any kind), AND at least srtt since t_snd_probe (last probe emit).

Sample feeds either Linux's asymmetric mdev estimator (FRCT_LINUX_RTT_ESTIMATOR, default ON) or RFC 6298 symmetric EWMA (compile option). srtt is floored at 10 ms when seeded from a hint, at 1 us after every update (including the first seeding sample); mdev floored at 100 ns.

RTO = max(rto_min, 2 * srtt, srtt + (mdev << MDEV_MUL))

(the 2 * srtt floor is an FRCT addition not in RFC 6298). Effective wheel deadline capped per Section 3.

ACK-derived samples (frcti_ack_rcv -> rtt_sample_eligible), beyond the cum-ACK advance gate in frcti_ack_rcv (ackno > lwe and ackno <= seqno), require all of: not in recovery; ACK packet does not carry FRCT_RXM; HoL slot's SND_RTX bit clear; slot's rxm pointer non-NULL (not SACK-consumed); lwe not below the rtt_lwe fence; srtt already seeded by an RTTP probe. There is no ACK-only seeding.

Every eligible sample also feeds RACK.min_RTT (RFC 8985 sec. 6.2) via a windowed minimum: replace whenever the sample is strictly smaller OR more than MIN_RTT_WIN_NS (5 min, matches Linux tcp_min_rtt_wlen) has elapsed since the current min was set. The downward branch is immediate (faster path picked up at once); the upward branch is gated on the window (a transient queue burst does not poison the estimate, but a sustained route change to a longer path re-anchors min_RTT after at most one window). Seeded from rtt_hint at rtt_init; 0 acts as the unset sentinel and the base in rack_reorder_window falls back from min_RTT to SRTT (so R = mult * SRTT/4, capped at SRTT, floored at MIN_REORDER_NS) until the first sample. See Section 6.2.


13. Liveness (keepalive)

When qs.timeout > 0 a per-flow KA (keepalive) timer is armed. Arming uses rcv_cr.act for the deadline computation:

deadline = min(snd_act + qs.timeout/4, rcv_act + qs.timeout)

(clamped to now + qs.timeout/4 if already past). The timer fires either on sender idleness (to send a KA) or on receiver idleness (to declare the peer dead). On fire (ka_snd) the peer-dead test uses max(rcv_cr.act, t_ka_rcv) so a recent KA reply counts even when no DATA has arrived:

  • If now - max(rcv_cr.act, t_ka_rcv) > qs.timeout, mark the flow ACL_FLOWPEER and notify the per-process flow-event set (proc.fqset) with FLOW_PEER.
  • Else if snd_idle > qs.timeout/4, emit a bare KA | ACK (ackno = rcv_cr.lwe) and re-arm.
  • Else just re-arm.

Note: rx_rb and tx_rb are the receive and transmit shared-memory ring buffers. The r-timer raises ACL_FLOWDOWN on both (route is broken); keepalive raises ACL_FLOWPEER on rx_rb only and notifies the flow-event set (peer is silent, writer keeps tx_rb usable) - distinct ACLs. qs.timeout == 0 disables keepalive entirely; a silent peer crash is then undetected.


14. Linger / teardown

On flow_dealloc, frcti_dealloc computes a grace timeout

max(rcv_cr.act + rcv_cr.inact, snd_cr.act + snd_cr.inact) - now

(floored at 0 and converted to seconds) and returns it; flow_dealloc forwards this to the IRMd as the dealloc grace. The IRMd, not FRCT, performs the wait. Before computing the timeout, FRCT may emit a final ACK when rcv_cr.lwe != rcv_cr.seqno (the peer has not been told the most recent cumulative ACK) AND the rcv side has been active within t_a (a-timer not aged out).

FRCTFLINGER is honoured only when snd_cr.lwe < edge, where edge = snd_fin_seqno after FIN has been sent in stream mode and snd_cr.seqno otherwise (data or FIN still in flight). The drain itself runs in flow_dealloc's while (FRCTI_LINGERING) loop, not in frcti_dealloc.

The fd is single-reader / single-writer (documented in the manpages). flow_write pumps rx_rb on every call (via flow_wait_window -> flow_drain_rx_nb) and additionally blocks on rx_rb when the send window is closed. A pure-writer thread thus consumes ACKs without a dedicated reader.


15. Heritage and adopted techniques

Delta-t (Watson, 1981) is the primary heritage; FRCP descends from the delta-t protocol family via the Recursive InterNetwork Architecture (RINA; Day, "Patterns in Network Architecture", 2008, ch. 9). Timer-based connection management (no SYN/FIN handshake, per-flow state born on first DATA and reclaimed after t_mpl + a + r of silence), the DRF marker, and the t_mpl / t_a / t_r timers all come from delta-t. See Watson, "Timer-Based Mechanisms in Reliable Transport Protocol Connection Management", Computer Networks 5 (1981).

The unified flow_alloc(name, qos, ...) primitive and its multi-axis QoS-cube argument (Section 2.2) also come from RINA (Day 2008, ch. 6; Grasa et al., "IRATI: investigating RINA as an alternative to TCP/IP", Computer Networks 92 (2015)) - reliability, ordering, CRC presence, and encryption are flow attributes, not separate sockets or protocols.

The table below summarises additional adopted techniques and their references.

FRCP mechanism Heritage Reference / note
Random new seqno on seqno_rotate TCP ISN RFC 6528 (Gont & Bellovin, 2012). QUIC PN-space reset (RFC 9000 sec. 12.3) is a structural analogue.
Cumulative ACK, left-window-edge advance TCP RFC 793 / RFC 9293
Receive window with non-shrink rule TCP RFC 793 sec. 3.7 / RFC 9293 sec. 3.8.6; RFC 1122 sec. 4.2.2.16 for the explicit non-shrink prohibition
Modular seqno arithmetic (before/after helpers) TCP RFC 793 sec. 3.3 / RFC 9293 sec. 3.4
Selective ACK block list TCP RFC 2018 (Mathis et al., 1996). Encoded as a typed FRCP packet rather than a TCP option, so framing is closer to QUIC ACK frames. D-SACK (RFC 2883) carried in-band as block[0]; see Section 1.3.
NewReno-careful recovery with recovery_high gate TCP RFC 6582 (Henderson et al., 2012); QUIC builds on the same model in RFC 9002 sec. 7.3.2. Cwnd half absent (CC in IPCP).
RACK reordering window for fast retransmit TCP RFC 8985 (Cheng et al., 2021). FRCP R = MIN(reo_wnd_mult * min_RTT / 4, SRTT) with a MIN_REORDER_NS = 250 us floor against srtt collapse; matches RFC 8985 sec. 6.2 and Linux tcp_rack_reo_wnd. DSACK-driven reo_wnd_mult (sec. 6.2 step 4) is adopted; see Section 1.3 for the wire encoding. The hybrid RACK-or-DUP_THRESH trigger from RFC 8985 sec. 6.2 step 4 is adopted (Section 8). QUIC's analogue in RFC 9002 sec. 6.1.2 uses max(srtt, latest_rtt) as the base.
Karn's algorithm: no RTT sample on retransmits, RTO-collapse freeze TCP Karn & Partridge, "Improving Round-Trip Time Estimates in Reliable Transport Protocols", SIGCOMM 1987; RFC 6298 sec. 3.
RTO formula RTO = max(RTO_MIN, srtt + (mdev << MDEV_MUL)) TCP RFC 6298 (Paxson et al., 2011). RTO_MIN = 250 us is below RFC 6298 sec. 2.4's 1 s SHOULD-floor - a recursive-layer choice.
Linux asymmetric mdev estimator (default) Linux kernel tcp_rtt_estimator() in net/ipv4/tcp_input.c; the if(delta<0) m>>=3 dampening is a kernel divergence from RFC 6298. RFC 6298 EWMA available behind a compile flag.
Delayed ACK with rate suppression TCP RFC 813 (Clark, 1982); RFC 1122 sec. 4.2.3.2; RFC 5681 sec. 4.2. Single-deadline coalescing rather than "ack-every-other-segment".
Zero-window-probe / persist-timer analogue (RDVS) TCP RFC 1122 sec. 4.2.2.17 / RFC 9293 sec. 3.8.6.1. RDVS solicits an FC reply, distinct from QUIC DATA_BLOCKED (RFC 9000 sec. 19.12), which is one-way notification. MAX_RDV give-up departs from TCP.
Multiplexed control on a single PCI SCTP / QUIC SCTP chunk bundling (RFC 9260 sec. 6.10); QUIC frame multiplexing (RFC 9000 sec. 12.4). Cleaner fit than TCP's separate-flag-bits design.
ACK ranges as multiple discontiguous acked blocks QUIC QUIC ACK frame (RFC 9000 sec. 19.3). FRCP SACK is conceptually QUIC-frame-shaped even though encoded as absolute [start,end] pairs.
Nonce-authenticated active RTT / liveness probing (RTTP) QUIC PATH_CHALLENGE PATH_CHALLENGE / PATH_RESPONSE (RFC 9000 sec. 8.2, sec. 19.17, sec. 19.18). WebRTC ICE consent-freshness (RFC 7675) is the same pattern. QUIC's nonce is 8 octets; FRCP chooses 16.
Probing distinct from keepalive QUIC KA timer answers "peer alive?", RTTP answers "path measurable?", as in QUIC PING (RFC 9000 sec. 19.2) vs PATH_CHALLENGE.
Bare KA + ACK keepalive packets QUIC / SCTP QUIC PING (RFC 9000 sec. 19.2); SCTP HEARTBEAT / HEARTBEAT-ACK (RFC 9260 sec. 8.3). SCTP HEARTBEAT also carries an opaque echoed blob, structurally similar to FRCP RTTP.
(FFGM, LFGM) fragment-role bits (Section 7.2) SCTP RFC 9260 sec. 3.3.1 DATA chunk B/E bits encode the same four states (B+E=SOLE, B-only=FIRST, neither=MID, E-only=LAST). Each fragment carries its own seqno/TSN and is independently retransmitted.
Stream byte-offset reassembly (Sections 1.5, 16) QUIC QUIC STREAM frame (RFC 9000 sec. 19.8) uses Offset + Length varints; FRCP uses fixed 32-bit start / end. One stream per flow vs QUIC's many streams multiplexed.
FIN end-of-stream marker (Sections 1.2, 16) TCP / QUIC TCP FIN flag (RFC 9293 sec. 3.1) closes one half of the byte stream; QUIC STREAM frame FIN bit (RFC 9000 sec. 19.8) does the same per stream with an immutable final-size invariance (RFC 9000 sec. 4.5: the final size is fixed once observed). FRCP's FIN consumes one packet seqno (not one byte of stream space) and is idempotent on the sender side.
Stream byte-credit flow control (Section 16) QUIC MAX_STREAM_DATA (RFC 9000 sec. 4.1, sec. 19.10). FRCP projects a per-flow byte budget onto the seqno-space rwe. Single stream per flow collapses QUIC's MAX_DATA / MAX_STREAM_DATA distinction.
Header protection (encrypted seqnos) QUIC QUIC RFC 9001 sec. 5.4 applies header protection on top of AEAD to mask the packet number. FRCP's per-flow AEAD wrap (Section 16) is wider: it encrypts the entire PCI including seqno because the IPCP below already routes, so no destination connection-ID needs to stay in clear (cf. RFC 9000 sec. 5.2).
Two-bit fragment role polarity SCTP The (FFGM, LFGM) pair follows SCTP B/E (begin = 1 / end = 1) rather than IPv4 MF (RFC 791 sec. 3.2), which has the inverse polarity (MF = 1 means NOT last).
Orthogonal reliability / ordering axes (Section 2.2) SCTP PR-SCTP (RFC 3758, per-message partial reliability) and SCTP DATA U-bit (RFC 9260 sec. 3.3.1, per-message unordered) are the closest precedents for decoupling reliability from ordering; FRCP sets them per-flow rather than per-message.
Orthogonal CRC (qs.ber == 0) UDP-Lite RFC 3828 (Larzon et al., 2004) lets the sender pick a per-packet Checksum Coverage and the receiver enforce a locally configured minimum (no in-band negotiation; sec. 3.1, sec. 3.3). FRCP gates a full CRC trailer on qs.ber == 0 at flow setup. Contrast TCP / SCTP (mandatory checksum) and QUIC (AEAD subsumes CRC).
Setup-time service negotiation DCCP / SCTP / QUIC DCCP Service Codes (RFC 4340 sec. 8.1.2, RFC 5595); SCTP INIT parameters (RFC 9260 sec. 3.3.2); QUIC transport parameters (RFC 9000 sec. 7.4). All negotiate service properties at connection setup; only RINA's QoS cube exposes them as an orthogonal vector.


15.1. Original to FRCP (no clean prior art)

  • Pre-DRF NACK (Section 9): receiver-driven nudge exploiting snd_cr.inact > rcv_cr.inact. Closest analogues are SCTP Gap Ack Blocks (RFC 9260 sec. 3.3.4) and DCCP Ack Vector (RFC 4340 sec. 11.4) - both let the receiver describe gaps to the sender, but neither targets the cross-epoch / pre-DRF case.
  • MAX_RDV window-probe give-up: neither TCP (persist-timer probes until application or R2 abort, RFC 9293 sec. 3.8.6.1) nor QUIC has an explicit FC-give-up counter. A recursive-network choice: outer layers can drop the flow.
  • Skip-past-gap reassembly (Section 7.2): SCTP fragments and reassembles every flow regardless of reliability/ordering, using its own per-stream reassembly queue; QUIC fragments via STREAM offsets. FRCP fragments best-effort flows too, but the receiver drops the broken prefix the moment a later run-start (FIRST or SOLE role) is visible inside the RQ_SIZE-wide reorder ring - no IP-frag-style timeout, no SCTP-style explicit abort. If no later run-start arrives within the ring, frag_run_inspect returns NOT_READY and the partial run keeps its slots; the next inspect retries. The trade-off: a permanently-lost MID in a long isolated run holds slots until either a later FIRST/SOLE appears in the ring or the writer stops, at which point the slots are reclaimed on flow teardown.
  • Reassembly deferred to consume time (Section 7.2), message mode only (qos.service == SVC_MESSAGE): SCTP (RFC 9260 sec. 6.9), QUIC (RFC 9000 sec. 2.2), and TCP (RFC 9293) all hold reassembly state at the receive boundary. FRCP message-mode leaves fragments in the shared-memory ring until flow_read pulls and lands the SDU directly in the caller's buffer. Stream mode (Section 16) uses the standard QUIC-style direct ring placement on receive and does not defer. The optimisation is enabled by the Shared-Memory Subsystem (SSM) packet-buffer ring (see struct ssm_pk_buff at Section 1.1); the analogue is OS-level scatter-gather I/O (recvmsg+iovec), not a transport-layer prior art.
  • TLP-equivalent tail-loss recovery (RFC 8985 sec. 7; RFC 9002 sec. 6.2): FRCP does not emit an explicit Tail Loss Probe packet, but the same goal is met implicitly by RACK loss detection (Section 8) firing on a non-advancing cumulative ACK once the head-of-line slot ages past the RACK reorder window R = MIN(reo_wnd_mult * min_RTT / 4, SRTT) - well below RTO = max(2 * SRTT, SRTT + (mdev << MDEV_MUL)). A receiver-driven nudge is also available via the pre-DRF NACK (Section 9).


15.2. Not adopted

  • Slow start, congestion window (cwnd), Additive Increase / Multiplicative Decrease (AIMD), NewReno cwnd inflation. Congestion control lives in the IPCP CA policies and is driven by Explicit Congestion Notification (ECN, RFC 3168).
  • Nagle / silly-window-syndrome (SWS) avoidance (RFC 896, RFC 1122 sec. 4.2.3.4). (Deferred work, not adopted in the current spec.)
  • TCP Timestamps (RFC 7323) / Protection Against Wrapped Sequences (PAWS) - RTT measurement uses RTTP, not per-segment timestamps. A peer-supplied timestamp echoed on every ACK lets a malicious peer drive the srtt estimate arbitrarily low, collapsing the RTO and triggering a self-inflicted retransmit storm. RTTP confines RTT measurement to nonce-authenticated probe round-trips, where a forged echo is rejected before it can reach the estimator.
  • ECN (Explicit Congestion Notification) response inside FRCP (consumed by IPCP Congestion Avoidance / CA).
  • IP-style fragment-offset reassembly (RFC 791 sec. 3.2; RFC 8200 sec. 4.5). Message-mode FRCP relies on the FRCT rq[] reorder ring keyed by seqno (shared by FRTX and best-effort flows) to put fragments back in order; no separate offset field is needed and no IP-style hole-list reassembly buffer is kept. Stream-mode FRCP does carry [start, end) byte offsets (Section 1.5) for direct ring placement on receive.
  • QUIC STREAM offset+length framing on every flow (RFC 9000 sec. 19.8). Message-mode FRCP uses the SCTP-style B/E flag-bit encoding (FFGM/LFGM) and skips the offsets; stream-mode FRCP adopts the QUIC offset model (heritage table above).

16. Stream-mode flows

When a flow is allocated with qos.service == SVC_STREAM both peers switch to byte-stream semantics, layered on top of the FRTX reorder machinery already described in Sections 6-8.

16.1. Send

The sender splits the caller's octets into chunks of at most (frag_mtu - base PCI - stream PCI extension) octets (Sections 1.1 and 1.5). Each chunk is one DATA packet with its own seqno and a [start, end) byte range copied from a monotonic stream counter. In stream mode FFGM and LFGM are unused and MUST be transmitted as zero; the per-byte position is carried by the [start, end) extension instead.

End-of-stream is signalled with a 0-byte DATA packet that has FIN (bit 12) set, emitted on the FIN triggers listed in Section 1.2 (WR-half close, flow_dealloc, and any other path that yields the final byte). The sender MUST emit at most one FIN per flow; its [start, end) MUST equal [final-byte, final-byte) (i.e., empty interval at the final byte position; final-size invariance, analogous to QUIC RFC 9000 sec. 4.5). Idempotency is enforced by an snd_fin_sent guard.

16.2. Receive

On arrival the receiver places the payload directly into a per-flow byte-indexed receive ring of width ring_sz (octets) at the position indicated by start, with a two-segment memcpy across the ring boundary if needed. Receipt is recorded in the FRTX reorder machinery (Section 6.2) augmented with the packet's start, end, and FIN bit per slot. When a packet's [start, end) front-overlaps bytes already at or below the byte high-water mark, the overlap is trimmed before placement so the same byte is never written twice. After stashing, the receiver advances lwe and the byte high-water mark across any newly-contiguous prefix. Each slot advanced MUST satisfy start == the last-delivered slot's end; a slot whose start does not equal that end is silently dropped at delivery time (the seqno is consumed, no stream bytes contributed) and the high- water mark does not advance past it. The stream byte-stream stalls at that point - there is no flow-tear-down on mismatch. This filters spliced or off-path-injected slots that fall in window without strong cryptographic authentication.

A FIN slot marks end-of-stream at advance time only if its byte position equals the last-delivered slot's end; otherwise the FIN is ignored and the corresponding seqno occupies a slot but contributes no stream bytes. No packet buffer is held after the ring copy.

16.3. Read

flow_read returns up to count octets from the contiguous prefix [next, high-water), where next is the byte the application has already consumed up to and high-water is the rightmost contiguous byte received. When the stream is fully drained AND end-of-stream (EOS) was observed (next == EOS byte position), flow_read returns 0 (EOF) - the same shape POSIX read(2) uses on TCP after a peer FIN.

16.4. Flow control

ACK / SACK / RACK / RTO machinery is unchanged; the FRTX reorder ring is reused as a per-seqno received-bitmap. Let per_pkt = (frag_mtu - base PCI - stream PCI extension), the maximum stream- byte payload one DATA packet can carry (Section 16.1). The receive window advertised in FC is clamped so the byte window (ring_sz) cannot be overrun: the seqno-space rwe is at most rcv_cr.lwe + ring_sz / per_pkt.

This is the QUIC byte-credit flow-control model (MAX_STREAM_DATA, RFC 9000 sec. 4.1 and sec. 19.10) projected onto seqno space. With one stream per flow there is no MAX_DATA / MAX_STREAM_DATA distinction. Receiver-side silly-window-syndrome (SWS) avoidance (RFC 9293 sec. 3.8.6.2.2) is achieved by combining the consume-time rwe bump with the global non-shrink rule from Section 11.

16.5. Security considerations

Threat model. An attacker that can observe (on-path passive) or predict (off-path blind) the flow's seqnos and byte offsets on an unencrypted stream flow can inject DATA or FIN at any in-window position. The in-line consistency checks above (start == prior end on advance; FIN MUST be 0-byte; FIN MUST sit at the final byte position) realise the spirit of RFC 5961's "sequence-window plus exact-position match for control bits" without an explicit challenge-ACK probe; they make a few specific blind attack shapes harder but are not cryptographic authentication. This is comparable to TCP without the TCP Authentication Option (TCP-AO, RFC 5925), tighter than a pre-RFC-5961 TCP stack, and roughly equivalent to a modern RFC 5961 stack against blind off-path injection - none of these help once the attacker can sniff. TLS over TCP (RFC 8446) encrypts only the TCP payload and leaves TCP seqnos, ACKs, FIN, and RST in the clear, so TLS does NOT defend against TCP-header- level injection; QUIC (RFC 9000) hides packet numbers under header protection (RFC 9001 sec. 5.4), so this specific weakness does not apply to QUIC.

Mitigation: AEAD. When the flow has encryption enabled the recommended AEAD ciphers (AES-GCM, RFC 5288; or ChaCha20-Poly1305, RFC 8439) wrap the entire FRCP packet on the wire - PCI, stream extension, body, and the CRC trailer when ber == 0 - under a per-flow symmetric key derived from the flow's own key exchange (Section 1.1). The AEAD tag (~2^-128 forgery probability) dominates the CRC (~2^-32) for integrity in this mode but the CRC trailer is currently retained inside the wrap (see Section 1.1). Implementations MUST NOT rely on the security properties below when a non-AEAD cipher (e.g. AES-CTR alone) is negotiated; non- AEAD modes provide confidentiality only and the threat-model claims do not hold.

With an AEAD cipher in use, seqnos, byte offsets, and the FIN bit are both authenticated and confidential. Against an off-path or on-path-passive attacker this is:

  • Stronger than TCP+TLS (TCP header in the clear).
  • Stronger than TCP+TCP-AO (header authenticated but visible).
  • Comparable to IPsec ESP transport mode (RFC 4303), which similarly authenticates and encrypts the upper-layer header plus payload, and to QUIC packet protection (RFC 9001 sec. 5), with the difference that QUIC must leave the destination connection ID in the clear for routing whereas FRCP relies on the IPCP below for delivery and can therefore encrypt its entire PCI.

Keying granularity. Ouroboros flow allocation runs key exchange (kex) per flow, so each flow_alloc yields independent symmetric keys. This is finer-grained than QUIC (per-connection, RFC 9001, where one handshake covers all multiplexed streams) and finer-grained than typical IPsec deployment (per-host-pair Security Associations, SAs). Forward secrecy follows from the kex when an ephemeral Diffie-Hellman exchange (DHE), or a hybrid mode (classical DH + post-quantum Key Encapsulation Mechanism / KEM), is selected.

Replay protection. The AEAD layer itself does NOT carry an explicit anti-replay window (unlike IPsec ESP, RFC 4303 sec. 3.4.3, or DTLS, RFC 9147 sec. 4.5.1). For FRCP-engaged flows the seqno-space duplicate-suppression in Section 6.2 rejects replayed DATA after the AEAD strips the wrap, because the AEAD authenticates the seqno and a replay re-presents an old seqno that is then discarded either as a duplicate (still inside the receive window or as outside the receive window, depending on how far lwe has advanced since the original packet was delivered. RAW (qos.service == SVC_RAW) flows have no FRCP layer and therefore no replay protection at the AEAD layer either; deployments that need replay rejection on RAW flows SHOULD use SVC_MESSAGE.

Layering. The AEAD wrap sits below FRCP on the data path, so RAW best-effort flows (qos.service == SVC_RAW, the UDP-equivalent service of Section 2.2) inherit the same per-flow integrity + confidentiality scope as FRCP-engaged flows - whatever the process and FRCP (if any) put on the wire is what the AEAD authenticates. No DTLS-equivalent layering is required for confidentiality and integrity; replay protection above AEAD is a separate concern as noted above.

17. References

This section lists the IETF documents, published works, and source-code references cited inline elsewhere in this document. IETF documents are cited inline as "RFC NNNN sec. X.Y"; books, journal papers, and source-code references are cited inline by author and year (or by file and function name) and are listed here for convenience.


17.1. IETF documents

[RFC 791]
J. Postel, "Internet Protocol", STD 5, RFC 791, September 1981.
[RFC 793]
J. Postel, "Transmission Control Protocol", STD 7, RFC 793, September 1981. Obsoleted by RFC 9293.
[RFC 813]
D. D. Clark, "Window and Acknowledgement Strategy in TCP", RFC 813, July 1982.
[RFC 896]
J. Nagle, "Congestion Control in IP/TCP Internetworks", RFC 896, January 1984.
[RFC 1122]
R. Braden (ed.), "Requirements for Internet Hosts -- Communication Layers", STD 3, RFC 1122, October 1989.
[RFC 2018]
M. Mathis, J. Mahdavi, S. Floyd, A. Romanow, "TCP Selective Acknowledgment Options", RFC 2018, October 1996.
[RFC 2119]
S. Bradner, "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC 2883]
S. Floyd, J. Mahdavi, M. Mathis, M. Podolsky, "An Extension to the Selective Acknowledgement (SACK) Option for TCP", RFC 2883, July 2000.
[RFC 3758]
R. Stewart, M. Ramalho, Q. Xie, M. Tuexen, P. Conrad, "Stream Control Transmission Protocol (SCTP) Partial Reliability Extension", RFC 3758, May 2004.
[RFC 3828]
L-A. Larzon, M. Degermark, S. Pink, L-E. Jonsson (ed.), G. Fairhurst (ed.), "The Lightweight User Datagram Protocol (UDP-Lite)", RFC 3828, July 2004.
[RFC 4303]
S. Kent, "IP Encapsulating Security Payload (ESP)", RFC 4303, December 2005.
[RFC 4340]
E. Kohler, M. Handley, S. Floyd, "Datagram Congestion Control Protocol (DCCP)", RFC 4340, March 2006.
[RFC 5288]
J. Salowey, A. Choudhury, D. McGrew, "AES Galois Counter Mode (GCM) Cipher Suites for TLS", RFC 5288, August 2008.
[RFC 5595]
G. Fairhurst, "The Datagram Congestion Control Protocol (DCCP) Service Codes", RFC 5595, September 2009.
[RFC 5681]
M. Allman, V. Paxson, E. Blanton, "TCP Congestion Control", RFC 5681, September 2009.
[RFC 5925]
J. Touch, A. Mankin, R. Bonica, "The TCP Authentication Option", RFC 5925, June 2010.
[RFC 5961]
A. Ramaiah, R. Stewart, M. Dalal, "Improving TCP's Robustness to Blind In-Window Attacks", RFC 5961, August 2010.
[RFC 6298]
V. Paxson, M. Allman, J. Chu, M. Sargent, "Computing TCP's Retransmission Timer", RFC 6298, June 2011.
[RFC 6528]
F. Gont, S. Bellovin, "Defending against Sequence Number Attacks", RFC 6528, February 2012. Obsoletes RFC 1948.
[RFC 6582]
T. Henderson, S. Floyd, A. Gurtov, Y. Nishida, "The NewReno Modification to TCP's Fast Recovery Algorithm", RFC 6582, April 2012.
[RFC 7323]
D. Borman, B. Braden, V. Jacobson, R. Scheffenegger (ed.), "TCP Extensions for High Performance", RFC 7323, September 2014.
[RFC 7675]
M. Perumal, D. Wing, R. Ravindranath, T. Reddy, M. Thomson, "Session Traversal Utilities for NAT (STUN) Usage for Consent Freshness", RFC 7675, October 2015.
[RFC 8174]
B. Leiba, "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, May 2017.
[RFC 8200]
S. Deering, R. Hinden, "Internet Protocol, Version 6 (IPv6) Specification", STD 86, RFC 8200, July 2017.
[RFC 8439]
Y. Nir, A. Langley, "ChaCha20 and Poly1305 for IETF Protocols", RFC 8439, June 2018.
[RFC 8446]
E. Rescorla, "The Transport Layer Security (TLS) Protocol Version 1.3", RFC 8446, August 2018.
[RFC 8985]
Y. Cheng, N. Cardwell, N. Dukkipati, P. Jha, "The RACK-TLP Loss Detection Algorithm for TCP", RFC 8985, February 2021.
[RFC 9000]
J. Iyengar (ed.), M. Thomson (ed.), "QUIC: A UDP-Based Multiplexed and Secure Transport", RFC 9000, May 2021.
[RFC 9001]
M. Thomson (ed.), S. Turner (ed.), "Using TLS to Secure QUIC", RFC 9001, May 2021.
[RFC 9002]
J. Iyengar (ed.), I. Swett (ed.), "QUIC Loss Detection and Congestion Control", RFC 9002, May 2021.
[RFC 9147]
E. Rescorla, H. Tschofenig, N. Modadugu, "The Datagram Transport Layer Security (DTLS) Protocol Version 1.3", RFC 9147, April 2022.
[RFC 9260]
R. Stewart, M. Tuexen, K. Nielsen, "Stream Control Transmission Protocol", RFC 9260, June 2022. Obsoletes RFC 4960.
[RFC 9293]
W. Eddy (ed.), "Transmission Control Protocol (TCP)", STD 7, RFC 9293, August 2022. Obsoletes RFC 793 and several follow-ons; updates RFC 1122 and others.


17.2. Books and journal papers

[Day08]
J. Day, "Patterns in Network Architecture: A Return to Fundamentals", Prentice Hall, 2008.
[Grasa15]
E. Grasa et al., "IRATI: investigating RINA as an alternative to TCP/IP", Computer Networks, Vol. 92, December 2015.
[KP87]
P. Karn, C. Partridge, "Improving Round-Trip Time Estimates in Reliable Transport Protocols", ACM SIGCOMM, August 1987.
[Wat81]
R. W. Watson, "Timer-Based Mechanisms in Reliable Transport Protocol Connection Management", Computer Networks, Vol. 5, 1981.


17.3. Source-code references

[Linux-RTT]
tcp_rtt_estimator() in net/ipv4/tcp_input.c of the Linux kernel, defining the asymmetric mdev variance update used as FRCP's default RTT estimator (Section 12). Line-stable browseable copy at https://elixir.bootlin.com/linux/latest/source/net/ipv4/tcp_input.c.