From 84c6cf1980da5cb749657425b99419d80ffc0d15 Mon Sep 17 00:00:00 2001 From: Dimitri Staessens Date: Wed, 7 Dec 2022 22:15:28 +0100 Subject: blog: Add post on loc-id split --- content/en/blog/20221207-loc-id-mobility-1.png | Bin 0 -> 403093 bytes content/en/blog/20221207-loc-id-mobility-2.png | Bin 0 -> 411592 bytes content/en/blog/20221207-loc-id-split.md | 205 +++++++++++++++++++++++++ content/en/blog/20221207-loc-id.png | Bin 0 -> 81278 bytes 4 files changed, 205 insertions(+) create mode 100644 content/en/blog/20221207-loc-id-mobility-1.png create mode 100644 content/en/blog/20221207-loc-id-mobility-2.png create mode 100644 content/en/blog/20221207-loc-id-split.md create mode 100644 content/en/blog/20221207-loc-id.png diff --git a/content/en/blog/20221207-loc-id-mobility-1.png b/content/en/blog/20221207-loc-id-mobility-1.png new file mode 100644 index 0000000..87bb04a Binary files /dev/null and b/content/en/blog/20221207-loc-id-mobility-1.png differ diff --git a/content/en/blog/20221207-loc-id-mobility-2.png b/content/en/blog/20221207-loc-id-mobility-2.png new file mode 100644 index 0000000..4fedee9 Binary files /dev/null and b/content/en/blog/20221207-loc-id-mobility-2.png differ diff --git a/content/en/blog/20221207-loc-id-split.md b/content/en/blog/20221207-loc-id-split.md new file mode 100644 index 0000000..8c2a068 --- /dev/null +++ b/content/en/blog/20221207-loc-id-split.md @@ -0,0 +1,205 @@ +--- +date: 2022-12-07 +title: "Loc/Id split and the Ouroboros network model" +linkTitle: "On Loc/Id split" +author: Dimitri Staessens +--- + +A few weeks back I had a drink with a Thijs who is now doing a +master's thesis on Loc/Id split, so we dug into the concepts behind +Locators and Identifiers and see if matches or in anyway interferes +with the Ouroboros network model. + +For this, we started from the paper _Locator/Identifier Split +Networking: A Promising Future Internet Architecture_[^1]. + +# Loc/Id split? + +In a nutshell, Loc/Id split starts from the observation that the +transport layer (TCP, UDP) is tightly coupled to network (IP) +addresses via a certain TCP/UDP port. + +Assuming our IPv4 local address is 10.10.0.1/24 and there is an SSH +server on 10.10.5.253/24 listening on port 22, after making a +connection, our client application could be bound to 10.10.0.1/24 on +port 25406. If we move our laptop to another room that is on an access +point in a different subnet, and we receive IP address 10.10.4.7/24, +our TCP connection to the SSL server will break. + +Loc/Id split suggest to split the "address" into two parts, an +Identifier that is location-independent and specifies the _who_ at the +transport layer, and a locator that is location-dependent and +specifies the _where_ at the network layer. Since an IPv6 address has +more than enough (128) bits, there's plenty of space to chop it up and +attach some semantics to the individual pieces. + +Of course, after the split, identifiers need to be mapped to locators, +so there is a mapping system needed to resolve the locator given the +identifier. This mapping system resides in a Sub-Layer between the +transport layer and the network layer. If this mapping system sounds a +lot like DNS to you, then you're right, but then remember that TCP +doesn't bind to a DNS name + port, but to an IP address + port. That's +where the issue lies that the Identifier tries to solve. + +Resolving the Locator from the Identifier usually happens in the +end-host, but some Loc/Id split proposals may forward this +responsibility to other nodes in the network. When only end-hosts +perfom Id->Loc resolution, it's called a host-based Loc/Id split +architecture, if some other nodes perform Id->Loc resolution it's +called a network-based architecture. In a network-based architecture, +the identifier MUST be part of the packet header (in a host-based +architecture it's optional), and the network nodes forward towards a +resolver node based on the identifier and then when the locator is +known based on the locator towards the end-host. I have my doubts that +this can ever scale, so in this article, I'll focus on host based +Loc/Id split. Host-based architectures are summarized in the figure +below, taken from the survey paper[^1]. + +{{
}} + +My first reaction to seeing that was _sounds about right to me_, it's +almost identical to what O7s proposes for a fully scalable and +evolvable architecture. But before I get to that, let's first dig a +bit deeper into those locators and identifiers. What _are_ these +beasts? + +# Mobility in Loc/Id split + +{{
}} + +Let's assume the previous example where, from my laptop, I'm connected +to some SSH server, but this time we're in a Loc/Id split network. So +my laptop got a different address for its interface, an identifier, +say COFF33D00D, and, since I'm in the green network, a locator that is +conveniently the IPv4 address for my wireless LAN interface, +10.10.0.1/24. The TCP connection in the SSH client is Loc/Id aware, +and now bound to C0FF33D00D:25406. After connecting to the client at +008BADF00D, It learns that I'm C0FF33D00D and my locator is 10.10.0.1. + +When I move to another floor, the laptop WLAN interface gets a new +locator, but my identifier stays the same. It's now +C0FF33D00D:10.10.4.7. The OS is implementing a host-based Loc/Id split +architecture, so I quickly send a _loc/id update_ message to the +server at 10.10.5.253 that my locator for C0FF33D00D has changed to +10.10.4.7, and it updates its mapping. The Loc/Id-aware TCP state +machine in my laptop had some packet loss to deal with while I was in +the elevator, but other than that, since it was bound to my identifier +the connection remains intact. + +Nice! Splitting an address into a locator and identifier has a pretty +elegant solution to mobility. + +Notice I didn't give the routers identifiers parts in their +address? That's on purpose. + +Let's take a little thought experiment. + +Instead of moving to the other floor, I already have a laptop already +sitting there. Its WLAN interface has address COFFEEBABE:10.10.4.7. + +{{
}} + +Now, what I do in this thought experiment, is copy the entire _program +state_ of my SSH client to that other laptop, _including_ the TCP +state[^2] and fork it as a new process on the other laptop. What is +needed to make it work from a network perspective? + +Well, like when actually moving with my laptop, I need to update the +server that my identifier C0FF33D00D has moved to another locator at +10.10.4.7. That should do the trick, quite easy. + +Unless there was already another application connected on port 25406 +on that destination laptop. Then there is no way for the incoming +laptop to know where to deliver the packets to. Unless the identifier +is in the packet header. But host-based Loc/Id split had them +optional? This seems to hint that host-based Loc/Id split supports +device mobility but not real application mobility[^3]. + +# What does the Ouroboros model say? + +Now, Ouroboros does things a little bit differently, but it maps quite +well. Ouroboros[^4] gives each application process a name, which (well +its hash) is mapped to a network address. That application name +basically maps to the _identifier_, and the network address maps to +the _locator_. + +{{
}} + +Let's compare the architecture of Ouroboros above with the figure at +the top. + +First, the similarities. The Ouroboros model conjectures a split of +the transport layer into an _application end-to-end layer_ (roughly +TCP without congestion avoidance) and a network end-to-end layer that +includes the _flow allocator_. + +The _flow allocator_ in O7s performs the name <--> address mapping +that is similar to id <--> loc mapping. Interesting to note is that +the Flow allocator is present in every network host, which is needed +for Congestion Notifications. Given that identifiers are mapping to +application names, resolving in name <--> address in other nodes than +the source, like in network-based Loc/Id split, is not violating the +O7s architecture. But we haven't considered this as it doesn't look +feasible from a scalability perspective. + +Now, the differences. First, the naming. The "identifier" in Ouroboros +is a network-wide unique application name[^7]. Processes[^7] can be +_bound_ to an application name. If a single process binds to an +application name it's unicast, if multiple processes on the same +server, it provides per-connection load-balancing between these +processes. If multiple processes on different servers bind to the same +name, it's anycast. + +Ouroboros endpoint identifiers (EIDs) are only known to the Flow +Allocator. This allows allocating a new flow (including new EIDs) +while keeping the connection state in the process (FRCP) intact, and +thus allowing application mobility in addition to device mobility. + +Taking another look at the Loc/Id split figure, note that Ouroboros +splits "network" from "application" just above the "Sub-layer", instead +of above the transport layer. + +# Wrapping up + +The discussions on Loc/Id split were quite interesting. A lot of the +steps and solutions it proposes are in line with the O7s model. What +strikes me most is that LoC/Id split is still not very well-defined as +a _model_. What exactly _are_ identifiers? What exactly _are_ +locators? The thing that sets O7s apart is that the model consists of +a limited amount of objects (forwarding elements and flooding +elements, which form Layers[^7], application, process, ...) that have +well-defined names[^8] that are immutable and exist only for as long +as the object exists. But that's a whole post by itself. + + +[^1]: https://doi.org/10.1109/COMST.2017.2728478 + +[^2]: This is hard to do with TCP state being in the kernel, but let's + forget about that and memory addresses and others stuff for a + moment and assume the complete application state is a nice + containerized package. + +[^3]: The Ouroboros model does allow complete application + mobility. The problem in this Loc/Id proposal is that the port + is still part of the Transport Layer state (see the figure at + the start of the post). + +[^4]: This, and a lot of other things in O7s, were proposed in the + RINA architecture, that's where the credit should go. + +[^5]: We might change that to "service name" but terminology is hard + to get right. + +[^6]: In O7s, processes are named with a process name (which in the + implementation maps to the linux process id (pid). Process names + are only local (system) scope and live until the process dies. + +[^7]: I capitalize layers, as these are have a different meaning than + the layers in the figure above. Maybe we should call them + _strata_ instead of layers. Again, terminology is hard. + +[^8]: Synonyms are allowed, but they serve no function in the + architecture. As an example, application names are hashed (a + synonym) which has practical implications for security and + implementation simplicity, but the architecture is theoretically + identical without that hash. \ No newline at end of file diff --git a/content/en/blog/20221207-loc-id.png b/content/en/blog/20221207-loc-id.png new file mode 100644 index 0000000..51a046d Binary files /dev/null and b/content/en/blog/20221207-loc-id.png differ -- cgit v1.2.3