1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
|
---
date: 2022-12-07
title: "Loc/Id split and the Ouroboros network model"
linkTitle: "On Loc/Id split"
author: Dimitri Staessens
---
A few weeks back I had a drink with Thijs who is now doing a master's
thesis on Loc/Id split, so we dug into the concepts behind Locators
and Identifiers and see if matches or in any way interferes with the
Ouroboros network model.
For this, we started from the paper _Locator/Identifier Split
Networking: A Promising Future Internet Architecture_[^1].
# Loc/Id split?
In a nutshell, Loc/Id split starts from the observation that the
transport layer (TCP, UDP) is tightly coupled to network (IP)
addresses via a certain TCP/UDP port.
Assuming our IPv4 local address is 10.10.0.1 /24 and there is an SSH
server on 10.10.5.253 /24 listening on port 22, after making a
connection, our client application could be bound to 10.10.0.1 /24 on
port 25406. If we move our laptop to another room that is on an access
point in a different subnet, and we receive IP address 10.10.4.7 /24,
our TCP connection to the SSL server will break.
Loc/Id split suggest to split the "address" into two parts, an
Identifier that is location-independent and specifies the _who_ at the
transport layer, and a locator that is location-dependent and
specifies the _where_ at the network layer. Since an IPv6 address has
more than enough (128) bits, there's plenty of space to chop it up and
attach some semantics to the individual pieces.
Of course, after the split, identifiers need to be mapped to locators,
so there is a mapping system needed to resolve the locator given the
identifier. This mapping system resides in a Sub-Layer between the
transport layer and the network layer. If this mapping system sounds a
lot like DNS to you, then you're right, but then remember that TCP
doesn't bind to a DNS name + port, but to an IP address + port. That's
where the issue lies that the Identifier tries to solve.
Resolving the Locator from the Identifier usually happens in the
end-host, but some Loc/Id split proposals may forward this
responsibility to other nodes in the network. When only end-hosts
perfom Id->Loc resolution, it's called a host-based Loc/Id split
architecture, if some other nodes perform Id->Loc resolution it's
called a network-based architecture. In a network-based architecture,
the identifier MUST be part of the packet header (in a host-based
architecture it's optional), and the network nodes forward towards a
resolver node based on the identifier and then when the locator is
known based on the locator towards the end-host. I have my doubts that
this can ever scale, so in this article, I'll focus on host based
Loc/Id split. Host-based architectures are summarized in the figure
below, taken from the survey paper[^1].
{{<figure width="60%" src="/blog/20221207-loc-id.png">}}
My first reaction to seeing that was _sounds about right to me_, it's
almost identical to what O7s proposes for a fully scalable and
evolvable architecture. But before I get to that, let's first dig a
bit deeper into those locators and identifiers. What _are_ these
beasts?
# Mobility in Loc/Id split
{{<figure width="40%" src="/blog/20221207-loc-id-mobility-1.png">}}
Let's assume the previous example where, from my laptop, I'm connected
to some SSH server, but this time we're in a Loc/Id split network. So
my laptop got a different address for its interface, an identifier,
say COFF33D00D, and, since I'm in the green network, a locator that is
conveniently the IPv4 address for my wireless LAN interface,
10.10.0.1 /24. The TCP connection in the SSH client is Loc/Id aware,
and now bound to C0FF33D00D:25406. After connecting to the client at
008BADF00D, It learns that I'm C0FF33D00D and my locator is 10.10.0.1.
When I move to another floor, the laptop WLAN interface gets a new
locator, but my identifier stays the same. It's now
C0FF33D00D:10.10.4.7. The OS is implementing a host-based Loc/Id split
architecture, so I quickly send a _loc/id update_ message to the
server at 10.10.5.253 that my locator for C0FF33D00D has changed to
10.10.4.7, and it updates its mapping. The Loc/Id-aware TCP state
machine in my laptop had some packet loss to deal with while I was in
the elevator, but other than that, since it was bound to my identifier
the connection remains intact.
Nice! Splitting an address into a locator and identifier has a pretty
elegant solution to mobility.
Notice I didn't give the routers identifiers parts in their
address? That's on purpose.
Let's take a little thought experiment.
Instead of moving to the other floor, I already have a laptop already
sitting there. Its WLAN interface has address COFFEEBABE:10.10.4.7.
{{<figure width="40%" src="/blog/20221207-loc-id-mobility-2.png">}}
Now, what I do in this thought experiment, is copy the entire _program
state_ of my SSH client to that other laptop, _including_ the TCP
state[^2] and fork it as a new process on the other laptop. What is
needed to make it work from a network perspective?
Well, like when actually moving with my laptop, I need to update the
server that my identifier C0FF33D00D has moved to another locator at
10.10.4.7. That should do the trick, quite easy.
Unless there was already another application connected on port 25406
on that destination laptop. Then there is no way for the incoming
laptop to know where to deliver the packets to. Unless the identifier
is in the packet header. But host-based Loc/Id split had them
optional? This seems to hint that host-based Loc/Id split supports
device mobility but cannot fully support application mobility[^3].
So, what is that identifier actually naming? Well, all that moved was
the application state, and the identifier seemed to move with
it... And since the routers in the example don't run "end-host"
applications, they don't need identifiers.
# What does the Ouroboros model say?
Ouroboros[^4] gives each application process a name, which is mapped
to an IPCP's address[^5]. The O7s application name basically
corresponds to the _identifier_, and the IPCPs address maps to the
_locator_.
{{<figure width="30%" src="/blog/20220228-flm-app.png">}}
Let's compare the architecture of Ouroboros above with the figure at
the top.
First, the similarities. The Ouroboros model conjectures a split of
the transport layer into an _application end-to-end layer_ (roughly
TCP without congestion avoidance) and a network end-to-end layer that
includes the _flow allocator_.
The _flow allocator_ in O7s performs the name <--> address mapping
that is similar to id <--> loc mapping. Interesting to note is that in
O7s, the Flow allocator is present in every IPCP, which is needed for
Congestion Notifications. Given that identifiers are mapping to
application names, resolving in name <--> address in other nodes than
the source, like in network-based Loc/Id split, is not violating the
O7s architecture. But we haven't considered this as it doesn't look
feasible from a scalability perspective.
Now, the differences. First, the naming. The "identifier" in Ouroboros
is a network/globally unique application name[^6]. Processes[^7] can
be _bound_ to an application name. If a single process binds to an
application name it's unicast, if multiple processes on the same
server bind to the same name, it provides per-connection
load-balancing between these processes. If multiple processes on
different servers bind to the same name, it provides a form of anycast
name-based load-balancing.
Second, Ouroboros endpoint identifiers (EIDs) are only known to the
Flow Allocator at the endpoint and specify the application. The O7s
EID can be viewed as a combination of the L3 _protocol_ field and the
L4 _port_ field into a single field that sits in between L3 and L4
(the Loc/Id proposed sublayer). This allows O7s to allocate a new flow
(assigning new EIDs) while keeping the connection state in the process
(FRCP) intact, and thus allowing full application mobility in addition
to device mobility. Taking another look at the Loc/Id split figure,
note that Ouroboros splits "network" from "application" just above the
"Sub-layer", instead of above the "transport layer".
# Wrapping up
The discussions on Loc/Id split were quite interesting. A lot of the
steps and solutions it proposes are in line with the O7s model. What
strikes me most is that LoC/Id split is still not very well-defined as
a _model_. What exactly _are_ identifiers? What exactly _are_
locators? The thing that sets O7s apart is that the model consists of
a limited amount of objects (forwarding elements and flooding
elements, which form Layers[^8], application, process, ...) that have
well-defined names[^9] that are immutable and exist only for as long
as the object exists.
[^1]: https://doi.org/10.1109/COMST.2017.2728478
[^2]: This is hard to do with TCP state being in the kernel, but let's
forget about that and memory addresses and others stuff for a
moment and assume the complete application state is a nice
containerized package.
[^3]: The Ouroboros model does allow complete application
mobility. The problem in this Loc/Id proposal is that the port
is still part of the Transport Layer state (see the figure at
the start of the post).
[^4]: This, and a lot of other things in O7s, were proposed in the
RINA architecture, that's where the credit should go.
[^5]: To be accurate: we hash the application name.
[^6]: At least, for a public Internetwork, they should be globally
unique.
[^7]: In O7s, processes are named with a process name (which in the
implementation maps to the linux process id (pid). Process names
are only local (system) scope.
[^8]: I capitalize Layers, as these Layers that are made up of
forwarding elements (unicast Layers) or flooding elements
(broadcast Layers) have a different meaning than the layers in
the discussion above. Maybe we should call them _strata_ instead
of Layers...
[^9]: Synonyms are allowed, but they serve no function in the
architecture. As an example, application names are hashed (a
synonym) which has practical implications for security and
implementation simplicity, but the architecture is theoretically
identical without that hash.
|