Flow and Retransmission Control Protocol

From Ouroboros
Jump to navigation Jump to search

The Flow and Retransmission Control Protocol is, together with Ouroboros Data Transfer Protocol and Ouroboros Flow Allocation Protocol one of the three core protocols in Ouroboros.

Header

Packet switched networks use end-to-end protocols to deal with lost or corrupted packets. The Ouroboros End-to-End protocol (called the Flow and Retransmission Control Protocol, FRCP) has 4 fields:

  0                   1                   2                   3
  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |            Flags              |            Window             |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |                        Sequence Number                        |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |                    Acknowledgment Number                      |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Flags

There are currently 7 flags defined for FRCP.

  1. DRF : Data Run Flag, indicates that there are no unacknowledged packets in flight for this connection.
  2. DATA: Indicates that the packet is carrying data (this allows for 0 length data).
  3. ACK : Indicates that this packet carries an acknowledgment.
  4. FC : Indicates that this packet updates the flow control window.
  5. RDVZ: Rendez-vous, this is used to break a zero-window deadlock that can arise when an update to the flow control window gets lost. RDVZ packets must be ACK’d.
  6. FFGM: First Fragment, this packet contains the first fragment of a fragmented payload.
  7. MFGM: More Fragments, this packet is not the last fragment of a fragmented payload.

Window

This updates the flow control window.

Sequence Number

A per-packet increasing sequence number used to acknowledge received packets, (re)order the packets at the receiver and indicate lost packets.

Acknowledgment Number

Set by the receiver to indicate the highest sequence number that has been received in order.

Operation

The operation of FRCP is based on the operation of the Delta-t protocol[1], which is a timer-based protocol with functionality equivalent to the ARQ and flow control functionalities in TCP. FRCP is configurable with a number of functions:

In-order delivery

FRCP is only enabled when needed (based on the requested application QoS). So for a UDP-like operation where packets don’t need to be delivered in order (or at all), Ouroboros doesn’t add an FRCP header. If FRCP is enabled, Ouroboros will track sequence numbers and deliver packets in-order.

The sender considers all packets as ACK’d. Since there are no unacknowledged packets, the Data Run Flag is set for all packets. The receiver tracks the highest received sequence number and drops all packets that have a lower sequence number. The receiver never really sends ACKs.

Reliable delivery

The Ouroboros receiver will keep track of a window of acceptable sequence numbers, indicated by the Left and Right Window Edges (LWE and RWE). The LWE is thus one greater than the highest received sequence number, and the receiver always acknowledges with LWE sequence number. An ACK for a sequence number thus means “I have received all previous sequence numbers”. Received packets with sequence numbers outside of the window are dropped. If a received packet has sequence number LWE, both window edges will be incremented until the LWE reaches a sequence number that has not been received yet. All the packets that are in the reordering buffer with a sequence number lower than the new LWE are delivered to the application. If a received packet has a greater sequence number than LWE but is within the window, it is stored for reordering.

The reliable delivery has to deal with lost packets, duplicates,etc. Automated-repeat request handles this: if a packet is not acknowledged within a certain time-frame, it is retransmitted by the sender.

For reliable transmission in the presence of lost packets to work, three timers need to be bounded[2]. These timers define a “data run”. The state is uni-directional, so for bi-directional communication, each side has a sender record and a receiver record.

  1. MPL: The maximum packet lifetime. This is bound by the network below, using the TTL mechanism. It is approximate with the probability of a packet still arriving after MPL close to zero.
  2. R: The time after which a packet with a given sequence number may not be retransmitted anymore.
  3. A: The maximum time a receiver will wait before acknowledging a given sequence number for the first time.

It’s not so important when to exactly retransmit a packet, as long as there are no retransmissions beyond the R timer. Ouroboros – like TCP – estimates average round-trip time (sRTT) and its deviation based on ACKs. The retranmission timeout (RTO) is set as the sRTT + 2 dev, and packets are retransmitted after RTO expires, with exponential back-off. The sRTT is measured with microsecond accuracy and is the actual response time of the server application.

If the receiver doesn’t hear from the sender for 2MPL + R + A, it may discard its state. If at this point there are packets received beyond the LWE at the receiver, the communication has failed in an unrecoverable way and an error should be returned. From this point, only packets with DRF will be accepted and they will create a new receiver state.

If the sender hasn’t received an ACK within 2MPL + R + A, the data run has failed and the sender must stop sending. If the sender has not received an ACK in 3MPL + R + A, the state associated with this data can be discarded (failed or not). From this point, new data to send on the flow will initiate new sender state. This data must be sent with DRF set and can use a randomly chosen sequence number.

Currently, Ouroboros has one FRCP connection for a flow. In theory there could be multiple connections supporting a flow, but we haven’t really found a reason for it. In the implementation, the connection state is initialized with invalidated timers instead of thrown away and recreated. If a flow is deallocated by the application, care must be taken that all sent packets are acknowledged or all retransmissions timed out (so, wait for R timer to expire). Flow deallocation will also trigger an ACK for the RWE from the receiver (ACKing all packets that can possibly be in flight, it doesn’t care anymore if it receives more packets). To be safe, i.e. be sure there are no packets in the network anymore, the state associated with the flow (i.e. the PORT ID assigned by the IRMd, and by extension the EID in the DT protocol) should only be reused after the sender and receiver state fully timed out.

Flow control

Flow control works for both reliable and unreliable modes of FRCT. If flow control is enabled, the receiver will notify the sender of its Right Window Edge, and the sender keeps track of it. If flow control is disabled, the sender will just keep sending and received packets with sequence numbers outside of the receiver window get dropped.

The unreliable mode with flow control can stall on a when an update to the flow control window gets lost and the sender has reached the RWE. If the sender has new data to send, it will send a packet with a Rendez-Vous (RDVZ) bit set. RDVZ packets must always be acknowledged (so they can be retransmitted). This requires a backoff mechanism. Note that the rendez-vous mechanism is just a way of being ‘nice’; it’s not really needed, since the request was for an unreliable flow and there is no delivery guarantee.

Fragmentation

The last mechanism in FRCP is fragmentation. Messages that are too large to be transmitted on the supporting flow are split up in different packets, called “fragments”. These are marked with two bits. First fragment (FFGM), and More fragments (MFGM). A message that fits in a single packet has the FFGM | MFGM bits set to “10”. If a message is fragmented, it will have a sequence of packets with the bits set to “11” for the first fragment, “01” for intermediate fragments and “00” for the last fragment. Single-bit fragmentation (e.g. only a MFGM bit) is more minimalistic , but it discards two consecutive messages if the last fragment of the first message is lost. This is just being ‘nice’ at little cost.

Implementation

The implementation of this protocol is part of the application library.

  • In-order delivery: Implemented
  • Duplicate detection: Implemented
  • ARQ: Proof-of-concept implemented, timerwheel needs to be moved from application to shared memory so IRMd can schedule retransmissions
  • Flow control: Proof-of-concept implemented
  • Fragmentation: not yet implemented

References