diff options
Diffstat (limited to 'content')
-rw-r--r-- | content/en/_index.html | 16 | ||||
-rw-r--r-- | content/en/about/_index.html | 4 | ||||
-rw-r--r-- | content/en/docs/Concepts/fa.md | 52 | ||||
-rw-r--r-- | content/en/docs/Concepts/protocols.md | 218 | ||||
-rw-r--r-- | content/en/docs/Concepts/what.md | 2 |
5 files changed, 218 insertions, 74 deletions
diff --git a/content/en/_index.html b/content/en/_index.html index 1c74cef..b6071e6 100644 --- a/content/en/_index.html +++ b/content/en/_index.html @@ -22,14 +22,14 @@ linkTitle = "Ouroboros" </div> {{< /blocks/cover >}} -{{% blocks/lead color="secondary" %}} -Ouroboros is a <b>peer-to-peer transport network</b> built on a new -<b>recursive network paradigm</b> according to a <b>UNIX design +{{% blocks/lead color="secondary" %}} Ouroboros is a <b>peer-to-peer +transport network prototype<b> inspired by a <b>recursive network +paradigm</b> and implemented according to a <b>UNIX design philosophy</b>. The aim is to provide a <b>secure and private networking</b> experience and to provide a simple API for writing -distributed software and networked application -libraries. Ouroboros provides a very compact API for -both <b>unicast and multicast</b> communications. All protocols -carry <b>minimal header information</b>, with -easy-to-enable <b>encryption</b>. +distributed software and networked application libraries. Ouroboros +provides a very <b>compact API<b> support +both <b>unicast<b> <b>multicast</b> communications. All protocols +carry <b>minimal header information</b>, with asy-to-enable +<b>encryption</b>. {{% /blocks/lead %}} diff --git a/content/en/about/_index.html b/content/en/about/_index.html index dc1b5fb..b83a688 100644 --- a/content/en/about/_index.html +++ b/content/en/about/_index.html @@ -17,11 +17,11 @@ menu: {{% blocks/lead %}} Ouroboros stems from our deep interest in computer networks, tackling -some long standing problems in IP networks such as achieving clean +some long standing problems in TCP/IP networks such as achieving clean fragmentation, routing scalability, efficient congestion control and simple multicast. Instead of trying to tackle these issues in isolation, we subscribe to the view that the entire TCP/IP design is -*fundamentally* broken, and a holistic approach is needed to build +*fundamentally* broken, and a different approach is needed to build efficient packet networks. {{% /blocks/lead %}} diff --git a/content/en/docs/Concepts/fa.md b/content/en/docs/Concepts/fa.md index e9eb9dd..d91cc00 100644 --- a/content/en/docs/Concepts/fa.md +++ b/content/en/docs/Concepts/fa.md @@ -11,16 +11,17 @@ description: > Arguably the most important concept to grasp in Ouroboros is flow allocation.[^1] It is the process by which a pair of programs agree to -start sending and receiving data. A flow is always unicast, thus -between a source program and a destination program, and is always -established from the source. Flows are provided by unicast layers, and -the endpoints of the flows are accessible for reading and writing by -the requesting processes using an identifier called a _flow -descriptor_. Think of a file descriptor but just for Ouroboros flows. -Maybe one important thing to keep in mind: in Ouroboros terminology, a -flow does not imply ordering or reliable transfer. It just denotes the -network resources inside a layer that are needed for forwarding -packets from a source to a destination in a best effort way. +start sending and receiving data, and the interface to the network. A +flow is always unicast, thus between a source program and a +destination program, and is always established from the source. Flows +are provided by unicast layers, and the endpoints of the flows are +accessible for reading and writing by the requesting processes using +an identifier called a _flow descriptor_. Think of a file descriptor +but just for Ouroboros flows. Maybe one important thing to keep in +mind: in Ouroboros terminology, a flow does not imply ordering or +reliable transfer. It just denotes the network resources inside a +layer that are needed for forwarding packets from a source to a +destination in a best effort way. {{<figure width="60%" src="/docs/concepts/fa_1.jpg">}} @@ -74,18 +75,21 @@ protocol. The third subcomponent in the IPCP that is relevant here -- the most important one -- is the Flow Allocator (FA). This component is -responsible for implementing the requested flows, in our case between -"client" and "server". It needs to establish some shared state between -the two endpoints. A (bidirectional) flow is fully identified in a -layer by a 4-tuple (A1,X,A2,Y) containing two addresses and two EIDs, -in our example A1=720 and A2=1000). This 4-tuple needs to be known at -both endpoints to identify where to send the packets it receives from -the higher-layer application (the client), and to deliver packets that -it reads from a lower layer flow. The flow allocation protocol is -responsible to send this information. It is a request-response -protocol. The flow allocator is identified by the DT component as EID -0. So, all packets in the layer with DT header __DST:0__ are delivered -to the flow allocator inside the destintation IPCP. +responsible for implementing and managing requested flows. It is also +responsible for congestion control. + +For establishing a flow, it needs to establish some shared state +between the two endpoints. A (bidirectional) flow is fully identified +in a layer by a 4-tuple (A1,X,A2,Y) containing two addresses and two +EIDs, in our example A1=720 and A2=1000). This 4-tuple needs to be +known at both endpoints to identify where to send the packets it +receives from the higher-layer application (the client), and to +deliver packets that it reads from a lower layer flow. The flow +allocation protocol is responsible to send this information. It is a +request-response protocol. The flow allocator is identified by the DT +component as EID 0. So, all packets in the layer with DT header +__DST:0__ are delivered to the flow allocator inside the destintation +IPCP. When the source FA in IPCP 1 receives a request for a flow to "server", it will query its DIR for _d197782_ and receive 1000 as the @@ -198,11 +202,11 @@ The translation of the header is an O(1) lookup on the send side, and a nop on the receiver side (since FD == EID and it's passed in the packet). -[^1]: This concept is also present in RINA, but there are differences. This only applies to Ouroboros. +[^1]: This concept is also present in RINA, but there are differences. This text only applies to Ouroboros. [^2]: This is a recursive network, adjancencies in layer N are implemented as flows in layer N - 1. -[^3]: If there is one DT, it is what is usually considered a "flat" address. More complex addressing schemes are accomplished by having more of these DT components inside one IPCP. But this would lead us too far. +[^3]: If there is one DT, it is what is usually considered a "flat" address. More complex addressing schemes are accomplished by having more of these DT components inside one IPCP. But this would lead us too far. It is described in more detail in the paper. [^4]: I will explain QoS in a different post. diff --git a/content/en/docs/Concepts/protocols.md b/content/en/docs/Concepts/protocols.md index 6e06087..fa6a3d0 100644 --- a/content/en/docs/Concepts/protocols.md +++ b/content/en/docs/Concepts/protocols.md @@ -13,7 +13,8 @@ description: > # Network protocol As Ouroboros tries to preserve privacy as much as possible, it has an -*absolutely minimal network protocol*: +*absolutely minimal network protocol*. The field widths are not that +important: ``` 0 1 2 3 @@ -25,8 +26,8 @@ As Ouroboros tries to preserve privacy as much as possible, it has an +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Time-to-Live | QoS | ECN | EID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - | EID | - +-+-+-+-+-+-+-+-+ + | EID + + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ``` The 5 fields in the Ouroboros network protocol are: @@ -37,11 +38,13 @@ The 5 fields in the Ouroboros network protocol are: default is 64 bits. Note that there is _no source address_, this is agreed upon during _flow allocation_. -* **Time-to-Live**: Similar to IPv4 and IPv6 (where this field is called - Hop Limit), this is decremented at each hop to ensures that packets - don't get forwarded forever in the network, for instance due to - (transient) loops in the forwarding path. The Ouroboros default for - the width is one octet (byte). +* **Time-to-Live**: Similar to IPv4 (in IPv6 this field is replaced by + the Hop Limit), this is decremented at each hop to ensures that + packets don't get forwarded forever in the network, for instance due + to (transient) loops in the forwarding path. The Ouroboros default + for the width is one octet (byte), limiting the Maximum Packet + Lifetime in the network to 255 seconds. The initial TTL value for a + flow can be based on the maximum delay requested by the application. * **QoS**: Ouroboros supports Quality of Service via a number of methods (out of scope for this page), and this field is used to prioritize @@ -58,20 +61,21 @@ The 5 fields in the Ouroboros network protocol are: as packets are queued deeper and deeper in a congested routers' forwarding queues. Ouroboros enforces Forward ECN (FECN). -* **EID**: The Endpoint Identifier (EID) field specified the endpoint for - which to deliver the packet. The width of this field is configurable - (the figure shows 16 bits). The values of this field is chosen by - the endpoints, usually at _flow allocation_. It can be thought of as - similar to an ephemeral port. However, in Ouroboros there is no - hardcoded or standardized mapping of an EID to an application. - -# Transport protocol +* **EID**: The Endpoint Identifier (EID) field specified the endpoint + for which to deliver the packet. The width of this field is + configurable (the figure shows 16 bits). The values of this field is + chosen by the endpoints, usually at _flow allocation_. It can be + thought of as similar to an ephemeral port. However, in Ouroboros + there is no hardcoded or standardized mapping of an EID to an + application. For security, this field should be sufficiently large. + For efficiency, it should be easy to map to a flow descriptor at + the endpoints. -Packet switched networks use transport protocols on top of their -network protocol in order to deal with lost or corrupted packets. +# Flow and retransmission control protocol (FRCP) -The Ouroboros Transport protocol (called the _Flow and Retransmission -Control Protocol_, FRCP) has only 4 fields: +Packet switched networks use end-to-end protocols to deal with lost or +corrupted packets. The Ouroboros End-to-End protocol (called the _Flow +and Retransmission Control Protocol_, FRCP) has 4 fields: ``` 0 1 2 3 @@ -88,36 +92,172 @@ Control Protocol_, FRCP) has only 4 fields: * **Flags**: There are 7 flags defined for FRCP. + - **DRF** : Data Run Flag, indicates that there are no unacknowledged + packets in flight for this connection. + - **DATA**: Indicates that the packet is carrying data (this allows for 0 length data). - - **DRF** : Data Run Flag, indicates that there are no unacknowledged - packets in flight for this connection. - - **ACK** : Indicates that this packet carries an acknowledgment. + - **FC** : Indicates that this packet updates the flow control window. + - **RDVZ**: Rendez-vous, this is used to break a zero-window deadlock - that can arise when an update to the flow control window - gets lost. RDVZ packets must be ACK'd. + that can arise when an update to the flow control window + gets lost. RDVZ packets must be ACK'd. + - **FFGM**: First Fragment, this packet contains the first fragment of - a fragmented payload. + a fragmented payload. + - **MFGM**: More Fragments, this packet is not the last fragment of a - fragmented payload. + fragmented payload. * **Window**: This updates the flow control window. -* **Sequence Number**: This is a monotonically increasing sequence number - used to (re)order the packets at the receiver. +* **Sequence Number**, a monotonically increasing sequence number + used to (re)order the packets at the receiver. + +* **Acknowledgment Number**, set by the receiver to indicate the + highest sequence number that has been + received in order. + +# Operation of FRCP + +The operation of FRCP is based on the +[Delta-t protocol](https://www.osti.gov/biblio/5542785-delta-protocol-specification-working-draft), +which is a timer-based protocol that is simpler in operation than the +equivalent ARQ and flow control functionalities in TCP. Watson's +[paper](https://doi.org/10.1016/0376-5075(81)90031-3) +is highly recommended reading; it is truly a thing of beauty. + +Before we proceed, a small note on what is meant by _reliability_ in +this discussion. We're going to use the following definition: _if a +piece of a communication is received, all previous pieces of this +communication will be received_. This means data can only be +delivered reliably if it is delivered in-order. + +FRCP is only enabled when needed (based on the requested application +QoS). So for a UDP-like operation where packets don't need to be +delivered in order (or at all), Ouroboros doesn't add an FRCP header. +If FRCP is enabled, Ouroboros will track sequence numbers and deliver +packets in-order. + +Unreliable delivery: The sender considers all packets as ACK'd. Since +there are no unacknowledged packets, the Data Run Flag is set for all +packets. The receiver tracks the highest received sequence number and +drops all packets that have a lower sequence number. The receiver +never really sends ACKs. + +Reliable delivery: The Ouroboros receiver will keep track of a window +of acceptable sequence numbers, indicated by the Left and Right Window +Edges (LWE and RWE). The LWE is thus one greater than the highest +received sequence number, and the receiver always acknowledges with +LWE sequence number. An ACK for a sequence number thus means "I have +received all previous sequence numbers". Received packets with +sequence numbers outside of the window are dropped. If a received +packet has sequence number LWE, both window edges will be incremented +until the LWE reaches a sequence number that has not been received +yet. All the packets that are in the reordering buffer with a sequence +number lower than the new LWE are delivered to the application. If a +received packet has a greater sequence number than LWE but is within +the window, it is stored for reordering. + +The reliable delivery has to deal with lost packets, +duplicates,etc. Automated-repeat request handles this: if a packet is +not acknowledged within a certain time-frame, it is retransmitted by +the sender. + +For reliable transmission in the presence of lost packets to work, +three timers need to be bounded [^1]. These timers define a "data +run". The state is uni-directional, so for bi-directional +communication, each side has a sender record and a receiver record. + +* **MPL**: The maximum packet lifetime. This is bound by the network + below, using the TTL mechanism. It is approximate with the + probability of a packet still arriving after MPL close to + zero. + +* **R**: The time after which a packet with a given sequence number + may not be retransmitted anymore. + +* **A**: The maximum time a receiver will wait before acknowledging a + given sequence number for the first time. + +It's not so important when to exactly retransmit a packet, as long as +there are no retransmissions beyond the R timer. Ouroboros -- like TCP +-- estimates average round-trip time (sRTT) and its deviation based on +ACKs. The retranmission timeout (RTO) is set as the sRTT + 2 dev, and +packets are retransmitted after RTO expires, with exponential +back-off. The sRTT is measured with microsecond accuracy[^2] and is +the actual response time of the server application. + +If the receiver doesn't hear from the sender for 2MPL + R + A, it may +discard its state. If at this point there are packets received beyond +the LWE at the receiver, the communication has failed in an +unrecoverable way and an error should be returned. From this point, +only packets with DRF will be accepted and they will create a new +receiver state. + +If the sender hasn't received an ACK within 2MPL + R + A, the data run +has failed and the sender must stop sending. If the sender has not +received an ACK in 3MPL + R + A, the state associated with this data +can be discarded (failed or not). From this point, new data to send on +the flow will initiate new sender state. This data must be sent with +DRF set and can use a randomly chosen sequence number. + +Currently, Ouroboros has one FRCP connection for a flow. In theory +there could be multiple connections supporting a flow, but we haven't +really found a reason for it. In the implementation, the connection +state is initialized with invalidated timers instead of thrown away +and recreated. If a flow is deallocated by the application, care must +be taken that all sent packets are acknowledged or all retransmissions +timed out (so, wait for R timer to expire). Flow deallocation will +also trigger an ACK for the RWE from the receiver (ACKing all packets +that can possibly be in flight, it doesn't care anymore if it receives +more packets). + +Flow control works for both reliable and unreliable modes of FRCT. If +flow control is enabled, the receiver will notify the sender of its +Right Window Edge, and the sender keeps track of it. If flow control +is disabled, the sender will just keep sending and received packets +with sequence numbers outside of the receiver window get dropped. + +The unreliable mode with flow control can stall on a when an update to +the flow control window gets lost and the sender has reached the +RWE. If the sender has new data to send, it will send a packet with a +Rendez-Vous (RDVZ) bit set. RDVZ packets must always be acknowledged +(so they can be retransmitted). This requires a backoff +mechanism. Note that the rendez-vous mechanism is just a way of being +'nice'; it's not really needed, since the request was for an +unreliable flow and there is no delivery guarantee. -* **Acknowledgment Number**: This is set by the receiver to indicate the - highest sequence number that was received in - order. +The last mechanism in FRCP is fragmentation. Messages that are too +large to be transmitted on the supporting flow are split up in +different packets, called "fragments"[^3]. These are marked with two +bits. First fragment (FFGM), and More fragments (MFGM). A message that +fits in a single packet has the FFGM | MFGM bits set to "10". If a +message is fragmented, it will have a sequence of packets with the +bits set to "11" for the first fragment, "01" for intermediate +fragments and "00" for the last fragment. Single-bit fragmentation +(e.g. only a MFGM bit) is more minimalistic , but it discards two +consecutive messages if the last fragment of the first message is +lost. This is just being 'nice' at little cost. -# Operation +We can't stress this enough: Ouroboros has this mechanism implemented +in the application. The (simple) logic is executed as part of the +read/write operations. **FRCP is in "the application", not in "the +network"**[^4]. If a packet is acknowledged, it is received by the remote +program, not possibly waiting in some buffer still to be delivered. If +an application crashes, it means all associated state at that endpoint +is gone and a new higher-level flow will need to be established. If a +program requests encryption, the entire FRCP header is encrypted. This +is probably the best course of action to protect against replay +attacks or other attacks based on guessing sequence numbers. Note that +the size of the sequence number space should be at least (2MPL + R + +A) * T, where T is the number of sequence numbers generated in a +certain unit of time. -The operation of the transport protocol is based on the [Delta-t -protocol](https://www.osti.gov/biblio/5542785-delta-protocol-specification-working-draft), -which is a timer-based protocol that is a bit simpler in operation -than the equivalent functionalities in TCP. In contrast with TCP/IP, -Ouroboros does congestion control purely in the network protocol, and -fragmentation and flow control purely in the transport protocol.
\ No newline at end of file +[^1]: This was proven by Watson in [Timer-Based Mechanisms in Reliable Transport Protocol Connection Management](https://doi.org/10.1016/0376-5075(81)90031-3). TCP also has these three timers bounded. +[^2]: Fast retransmit methods (retransmitting if a number of consecutive ACKs with the same sequence number are received) can still be useful. Underestimation of sRTT has little impact on throughput apart from possible unnecessary traffic duplication (the additional packets also update the RTT estimate). In Ouroboros, congestion avoidance is the responsability of the flow allocator. +[^3]: IPv4 and IPv6 fragmentation makes for some rather amusing reading. +[^4]: This doesn't mean it can't be implemented in hardware. diff --git a/content/en/docs/Concepts/what.md b/content/en/docs/Concepts/what.md index b0c6196..ac87754 100644 --- a/content/en/docs/Concepts/what.md +++ b/content/en/docs/Concepts/what.md @@ -15,7 +15,7 @@ was proposed, called the "__R__ecursive __I__nter__N__etwork __A__rchitecture", or [__RINA__](http://www.pouzinsociety.org). __Ouroboros__ follows the recursive principles of RINA, but deviates -quit a bit from its internal design. There are resources on the +quite a bit from its internal design. There are resources on the Internet explaining RINA, but here we will focus on its high level design and what is relevant for Ouroboros. |