Definitions
Definitions: Flow, Routing, and Forwarding
This document contains formal definitions used in the Ouroboros architecture. This section details a model and deals with (abstract) programs, it does not specify an implementation, and it should not be directly interpreted with respect to any particular implementation of a computing system or environment[1]. The definitions in this section may differ (even ever so slightly) from previously held notions of the term.
Graphs and Networks
Basic Graph Definitions
A graph is a pair $G = (V, E)$ consisting of a set of vertices $V$ and a set of edges $E \subset V^2$ of two-element subsets of $V$ Template:Cite. An edge $e = \{v_i,v_j\}$ has (distinct) end vertices $v_i$ and $v_j$.
A directed graph or digraph is a pair $G = (V, A)$ consisting of a set $V$ of vertices and a set $A$ of ordered pairs $(v_i, v_j)$ (called arcs) where $v_i \neq v_j$.
A network is a digraph $G = (V,A)$ on which a mapping $w: A \rightarrow \mathbb{R}$ from the edgeset to the reals is defined; the number $w(a)$ is called the weight of the arc $a$[2].
If $a = (v_i,v_j) \in A$, then $v_i$ is adjacent to $v_j$ and incident with $a$. The set of neighbors $N$ of a vertex $v$ is the set of vertices that are adjacent to $v$, $N(v) = \{u \in V: (v, u) \in A\}$.
Walks, Paths, and Cycles
A walk is a sequence of vertices $\mathcal{W} = (v_0, v_1, \ldots , v_n)$ so that $a_i = (v_{i-1},v_i) \in A , i = 1, \ldots, n$. So each walk also implies a sequence of edges. We define the weight of a walk $\mathcal{W}$ as the sum of the weights of its arcs, $w(\mathcal{W}) = \sum_{i=1}^nw(a_i)$.
If the source $v_0$ and destination $v_n$ are the same ($v_0 = v_n$), the walk is closed.
A walk where each of the arcs $a_i$ is distinct is called a trail.
A path is a trail for which each of the $v_i$ are distinct.
A closed trail for which the $v_i$ are distinct is called a cycle.
A directed acyclic graph (DAG) is a digraph that does not contain any cycles.
If for every pair of vertices $\{v_s, v_d\} \in V^2$ there is at least one path $\mathcal{P} = (v_s, \dots, v_d)$, we call the (di)graph connected.
Distance and Metrics
The distance function defined by the weight of the shortest path between two vertices in a network (or $\infty$ if no such path exists) is called the geodesic distance.
A metric is a (distance) function $d: X^2 \rightarrow [0, {+}\infty[$ fulfilling positivity, symmetry and triangle inequality:
$$\forall x,y,z \in X: d(x, y) \geq 0 \land d(x,y) = d(y,x) \land d(x,z) \leq d(x,y) + d(y,z)$$
A geodesic distance is in general not a metric since it doesn't always fulfil the positivity and symmetry requirements in a digraph.
Routing and Forwarding
Routing is broadly defined as the process of selecting a path for traffic in a network. In computer science literature, there are two main groups of approaches to routing.
On the one hand there is the hierarchical routing solution. This is the approach taken in IP networks, where a set of subnetworks is defined using prefixes or subnet masks Template:Cite. A scalability issue with IP stems from not following the partial ordering implied by the subnetting in the delegation of IP addresses, causing fragmentation of the IP address space Template:Cite and making prefix aggregration in routers increasingly inefficient[3].
On the other hand we have geographic or geometric routing Template:Cite, where each node is assigned a coordinate so next hops can be calculated making use of the coordinates of the direct neighbors[4].
The examples above illustrate that the concept of routing encompasses both the dissemination and gathering of information about the network, and the algorithms for calculation and selection of the paths. We will now define the concepts underpinning "routing" more formally. As far as we know the definitions are original, although the reasoning behind it is similar to the reasoning commonly used in formulating (integer) linear programming solutions to problems in graph theory.
Formal Definition of Routing
Let $G = (V, A)$ be a network.
ROUTING is any algorithm that, given source and destination vertices $v_s$ and $v_d$, for each vertex $v$ in $V$ returns a subset $H(v) \subseteq N(v)$ of neighbors with the associated set of arcs $L(v) = \{(v, u) \in A: u \in H(v))\}$, so that the following 4 conditions are met:
- The graph $D = (V', A') = (\cup_{v \in V}(H(v)), \cup_{v \in V}(L(v)))$ is a directed acyclic subgraph of $G$;
- $v_s$ is the only vertex in $D$ with only outgoing arcs;
- $v_d$ is the only vertex in $D$ with only incoming arcs; and
- $A' = \emptyset$ if and only if there is no path $\mathcal{P}$ between $v_s$ and $v_d$ in $G$.
Equivalently[5], ROUTING is any algorithm that, given source and destination vertices $v_s$ and $v_d$, for each vertex $v$ in $V$ returns a subset $H(v) \subseteq N(v)$ of neighbors so that:
- $\forall{u} \in H(v): d(u, v_d) < d(v, v_d)$ for any chosen distance function $d$ on $G$; and
- $\cup_{v \in V}(H(v)) = \emptyset$ if and only if there is no path $\mathcal{P}$ between $v_s$ and $v_d$ in $G$.
In other words, either the distance function bounds the routing solution, or the routing solution bounds the distance function.
Formal Definition of Forwarding
We define FORWARDING as any algorithm that, for each vertex $v$ in $V$ returns the set of arcs $L(v)$. FORWARDING is often implemented as ROUTING with an additional edge selection step.
The necessary and sufficient condition for ROUTING is full knowledge of the graph $G = (V, A)$ and a valid geodesic distance $d$. For a given vertex $v$, the necessary and sufficient set of information to obtain $L(v)$ is knowledge of its neighbor set $N(v)$ and the subset of the geodesic distances originating at its neighbors, $d_{v} \subset d: N({v)} \times V \rightarrow \mathbb{R}$.
In less formal terms, ROUTING and FORWARDING provide a set of vertices and arcs, respectively, so that there are never loops if one travels to a next vertex or along an arc in the set. If implemented in a centralized way, ROUTING and FORWARDING roughly need to know the full network. When implemented in a distributed way, a node roughly needs to know its neighbors and the distances to the destination from itself and from all its neighbors.
The definitions above show what information needs to be disseminated in a network to allow FORWARDING. Let's assume that vertices know their neighbors or incident outgoing arcs, then what is needed is a dissemination procedure for the (geodesic) distance function $d$. This is implemented in a class of dissemination protocols, called link-state routing protocols.
Flow
A flow is the abstraction of a collection of resources within a network layer that allow bidirectional communications using packets between two processes that are clients of this layer. A flow enables a point-to-point packet delivery service and can be viewed as a bidirectional pipe that has a number of observable quantities associated with it that describe the probability $p(S,t, \ldots) \in [0, 1]$ of a packet of given size $S$ being transferred within a certain period of time $[t_x, t_x + t]$. The maximum probability for error-free transfer depends on the packet-drop-rate (PDR) and bit-error-rate (BER) of the flow.
The Ouroboros architecture ensures that flows are content neutral, i.e. the probability $p(S, t, \ldots)$ above is independent of the bits that make up the packets sent along a flow.
Flow Characteristics
The delay or latency describes how long packets take to traverse the flow, and the variation on the delay is called jitter, or more precisely, packet delay variation Template:Cite. The delay for a flow has four main components:
- Propagation delay
- Queuing delay
- Transmission delay
- Processing delay
Flow Bounds
There are 2 upper bounds:
- The Maximum Packet Lifetime (MPL) is the maximum amount of time that a packet can take to transfer over the flow
- The Maximum Packet Size (MPS) is the maximum length for a packet to be acceptable for transfer
In other words, the probability for a packet to arrive after MPL expires should be 0[6], and the probability for a packet to arrive that exceeds the MPS is equal to 0.
Similarly, there can be lower bounds such as Minimum Packet Lifetime (mPL) and Minimum Packet Size (mPS).
Flow Resources
The resources that make up a layer are finite, limiting the total number of packets that can traverse the network layer in a given period of time. Flows provide the mechanism to put a network layer fully in control of its own resources. The resources that constitute the flow can either be shared with other flows or dedicated (reserved) for this particular flow.
Other externally measureable quantities can be associated with a flow, such as bandwidth and load for flows with reserved resources. The probability function may depend on these quantities (e.g. if the load reaches a certain threshold, delay could increase).
Flow Identifiers
A flow endpoint is identified in a system by a flow ID (FID), which defines the layer boundary[7]. For security reasons, a process has no direct access to the flow, but rather accesses the flow through a Flow Descriptor (FD) to read and write from a flow. Flow identifiers are unique within the scope of a system, flow descriptors are unique within the scope of a process[8].
Flows are an important concept for enabling Quality-of-Service (QoS) in a layer.
References
- ↑ ...but approached with a blank mind, consciously refusing to try to link it with what is already familiar... Template:Cite.
- ↑ The weight of an edge is often called a metric in computer science literature. To avoid confusion, we will use the term metric only in its mathematical meaning.
- ↑ Addressing can follow topology or topology can follow addressing. Choose one. -- Rekhter's Law Template:Cite.
- ↑ Geographic coordinates are a compound name, consisting of latitudes and longitudes, but there is no order implied between these two coordinate spaces. Hence the partial ordering in our definition of an address.
- ↑ Choose for $d$ the length of the longest path between the corresponding vertices in $D$.
- ↑ This may be hard to guarantee with 100% certainty, so MPL should be "large enough" so that this probability is 0 in practice. IP has a bound on the Maximum Datagram Lifetime (MDL) via the Time-To-Live or Hop Limit decrement in each router to a maximum of 255 seconds Template:Cite, with a recommended value of 64 seconds Template:Cite. In addition, TCP defines the Maximum Segment Lifetime (MSL) as 120s Template:Cite.
- ↑ In this respect, a flow id is similar to an OSI Service Access Point (SAP) or RINA port id.
- ↑ This is similar in function to a UNIX file descriptor. A UNIX kernel space implementation of Ouroboros would probably use file descriptors for flow descriptors.