aboutsummaryrefslogtreecommitdiff
path: root/content/en/blog
diff options
context:
space:
mode:
Diffstat (limited to 'content/en/blog')
-rw-r--r--content/en/blog/20220520-oping-flm.md87
-rw-r--r--content/en/blog/20221207-loc-id-mobility-1.pngbin0 -> 403093 bytes
-rw-r--r--content/en/blog/20221207-loc-id-mobility-2.pngbin0 -> 411592 bytes
-rw-r--r--content/en/blog/20221207-loc-id-split.md216
-rw-r--r--content/en/blog/20221207-loc-id.pngbin0 -> 81278 bytes
5 files changed, 303 insertions, 0 deletions
diff --git a/content/en/blog/20220520-oping-flm.md b/content/en/blog/20220520-oping-flm.md
new file mode 100644
index 0000000..31268e4
--- /dev/null
+++ b/content/en/blog/20220520-oping-flm.md
@@ -0,0 +1,87 @@
+---
+date: 2022-05-20
+title: "What is there to learn from oping about flow liveness monitoring?"
+linkTitle: "learning from oping (1): cleaning up"
+author: Thijs Paelman
+---
+
+### Cleaning up flows
+
+While I was browsing through some oping code
+(trying to get a feeling about how to do [broadcast](https://ouroboros.rocks/blog/2021/04/02/how-does-ouroboros-do-anycast-and-multicast/#broadcast)),
+I stumbled about the [cleaner thread](https://ouroboros.rocks/cgit/ouroboros/tree/src/tools/oping/oping_server.c?id=bec8f9ac7d6ebefbce6bd4c882c0f9616f561f1c#n54).
+As we can see, it was used to clean up 'stale' flows (sanitized):
+
+```C
+void * cleaner_thread(void * o)
+{
+ int deadline_ms = 10000;
+
+ while (true) {
+ for (/* all active flows i */) {
+
+ diff = /* diff in ms between last valid ping packet and now */;
+
+ if (diff > deadline_ms) {
+ printf("Flow %d timed out.\n", i);
+ flow_dealloc(i);
+ }
+ }
+ sleep(1);
+ }
+}
+```
+
+But we have since version 19.x flow liveness monitoring (FLM), which does this for us!
+So all this code could be thrown away, right?
+
+Turns out I was semi-wrong!
+It's all about semantics, or 'what do you want to achieve'.
+
+If this thread was there for cleaning up flows from which the peers stopped their flow (and stopped sending keep-alives),
+then we could throw it away by all means! Because FLM does that job.
+
+Or was it there to clean up valid flows, but from which the peers didn't send any ping packets anymore (they *do* send keep-alives, otherwise FLM kicks in)?
+Then we should of course keep it, because this is a server-side decision to cut those peers off.
+This might protect for example against client implementations which connect, send a few pings, but then leave the flow open.
+Or a better illustration of the 'cleaner' thread might be to cut off peers after a 100 pings,
+showing that this decision to 'clean up' has nothing to do with flow timeouts.
+
+### Keeping timed-out flows
+
+On the other side of the spectrum, we have those flows that are timing out (no keep-alives are coming in anymore).
+This is my proposal for the server side parsing of messages:
+
+```C
+while(/* get next fd on which an event happened */) {
+ msg_len = flow_read(fd, buf, OPING_BUF_SIZE);
+ if (msg_len < 0) {
+ /* if-statement is the only difference with before */
+ if (msg_len == -EFLOWPEER) {
+ fset_del(server.flows, fd);
+ flow_dealloc(fd);
+ }
+ continue;
+ }
+ /* continue with parsing and responding */
+}
+```
+
+We can see here that the decision is taken to 'clean up' (= `flow_dealloc`) those flows that are timing out.
+But, as we can see, it's an application decision!
+We might as well decide to keep it open for another 10 min to see if the client (or the network in between) recovers from interruptions, e.g..
+
+We might for example use this mechanism to show to the user that the peer seems to be down[^overleaf] and even take measures (like saving or removing state), but also allow to just wait until the peer is live again.
+
+### Conclusion
+
+As an application, you have total freedom (and responsibility) over your flows.
+Ouroboros will only inform you that your flow is timing out (and your peer thus appears to be down),
+but it's up to you to decide if you deallocate your side of the flow and when.
+
+Excited for my first blog post & always learning,
+
+Thijs
+
+
+[^overleaf]: I'm thinking about things like the Overleaf banner: `Lost Connection. Reconnecting in 2 secs. Try Now`
diff --git a/content/en/blog/20221207-loc-id-mobility-1.png b/content/en/blog/20221207-loc-id-mobility-1.png
new file mode 100644
index 0000000..87bb04a
--- /dev/null
+++ b/content/en/blog/20221207-loc-id-mobility-1.png
Binary files differ
diff --git a/content/en/blog/20221207-loc-id-mobility-2.png b/content/en/blog/20221207-loc-id-mobility-2.png
new file mode 100644
index 0000000..4fedee9
--- /dev/null
+++ b/content/en/blog/20221207-loc-id-mobility-2.png
Binary files differ
diff --git a/content/en/blog/20221207-loc-id-split.md b/content/en/blog/20221207-loc-id-split.md
new file mode 100644
index 0000000..bad82ac
--- /dev/null
+++ b/content/en/blog/20221207-loc-id-split.md
@@ -0,0 +1,216 @@
+---
+date: 2022-12-07
+title: "Loc/Id split and the Ouroboros network model"
+linkTitle: "On Loc/Id split"
+author: Dimitri Staessens
+---
+
+A few weeks back I had a drink with Thijs who is now doing a master's
+thesis on Loc/Id split, so we dug into the concepts behind Locators
+and Identifiers and see if matches or in any way interferes with the
+Ouroboros network model.
+
+For this, we started from the paper _Locator/Identifier Split
+Networking: A Promising Future Internet Architecture_[^1].
+
+# Loc/Id split?
+
+In a nutshell, Loc/Id split starts from the observation that the
+transport layer (TCP, UDP) is tightly coupled to network (IP)
+addresses via a certain TCP/UDP port.
+
+Assuming our IPv4 local address is 10.10.0.1 /24 and there is an SSH
+server on 10.10.5.253 /24 listening on port 22, after making a
+connection, our client application could be bound to 10.10.0.1 /24 on
+port 25406. If we move our laptop to another room that is on an access
+point in a different subnet, and we receive IP address 10.10.4.7 /24,
+our TCP connection to the SSL server will break.
+
+Loc/Id split suggest to split the "address" into two parts, an
+Identifier that is location-independent and specifies the _who_ at the
+transport layer, and a locator that is location-dependent and
+specifies the _where_ at the network layer. Since an IPv6 address has
+more than enough (128) bits, there's plenty of space to chop it up and
+attach some semantics to the individual pieces.
+
+Of course, after the split, identifiers need to be mapped to locators,
+so there is a mapping system needed to resolve the locator given the
+identifier. This mapping system resides in a Sub-Layer between the
+transport layer and the network layer. If this mapping system sounds a
+lot like DNS to you, then you're right, but then remember that TCP
+doesn't bind to a DNS name + port, but to an IP address + port. That's
+where the issue lies that the Identifier tries to solve.
+
+Resolving the Locator from the Identifier usually happens in the
+end-host, but some Loc/Id split proposals may forward this
+responsibility to other nodes in the network. When only end-hosts
+perfom Id->Loc resolution, it's called a host-based Loc/Id split
+architecture, if some other nodes perform Id->Loc resolution it's
+called a network-based architecture. In a network-based architecture,
+the identifier MUST be part of the packet header (in a host-based
+architecture it's optional), and the network nodes forward towards a
+resolver node based on the identifier and then when the locator is
+known based on the locator towards the end-host. I have my doubts that
+this can ever scale, so in this article, I'll focus on host based
+Loc/Id split. Host-based architectures are summarized in the figure
+below, taken from the survey paper[^1].
+
+{{<figure width="60%" src="/blog/20221207-loc-id.png">}}
+
+My first reaction to seeing that was _sounds about right to me_, it's
+almost identical to what O7s proposes for a fully scalable and
+evolvable architecture. But before I get to that, let's first dig a
+bit deeper into those locators and identifiers. What _are_ these
+beasts?
+
+# Mobility in Loc/Id split
+
+{{<figure width="40%" src="/blog/20221207-loc-id-mobility-1.png">}}
+
+Let's assume the previous example where, from my laptop, I'm connected
+to some SSH server, but this time we're in a Loc/Id split network. So
+my laptop got a different address for its interface, an identifier,
+say COFF33D00D, and, since I'm in the green network, a locator that is
+conveniently the IPv4 address for my wireless LAN interface,
+10.10.0.1 /24. The TCP connection in the SSH client is Loc/Id aware,
+and now bound to C0FF33D00D:25406. After connecting to the client at
+008BADF00D, It learns that I'm C0FF33D00D and my locator is 10.10.0.1.
+
+When I move to another floor, the laptop WLAN interface gets a new
+locator, but my identifier stays the same. It's now
+C0FF33D00D:10.10.4.7. The OS is implementing a host-based Loc/Id split
+architecture, so I quickly send a _loc/id update_ message to the
+server at 10.10.5.253 that my locator for C0FF33D00D has changed to
+10.10.4.7, and it updates its mapping. The Loc/Id-aware TCP state
+machine in my laptop had some packet loss to deal with while I was in
+the elevator, but other than that, since it was bound to my identifier
+the connection remains intact.
+
+Nice! Splitting an address into a locator and identifier has a pretty
+elegant solution to mobility.
+
+Notice I didn't give the routers identifiers parts in their
+address? That's on purpose.
+
+Let's take a little thought experiment.
+
+Instead of moving to the other floor, I already have a laptop already
+sitting there. Its WLAN interface has address COFFEEBABE:10.10.4.7.
+
+{{<figure width="40%" src="/blog/20221207-loc-id-mobility-2.png">}}
+
+Now, what I do in this thought experiment, is copy the entire _program
+state_ of my SSH client to that other laptop, _including_ the TCP
+state[^2] and fork it as a new process on the other laptop. What is
+needed to make it work from a network perspective?
+
+Well, like when actually moving with my laptop, I need to update the
+server that my identifier C0FF33D00D has moved to another locator at
+10.10.4.7. That should do the trick, quite easy.
+
+Unless there was already another application connected on port 25406
+on that destination laptop. Then there is no way for the incoming
+laptop to know where to deliver the packets to. Unless the identifier
+is in the packet header. But host-based Loc/Id split had them
+optional? This seems to hint that host-based Loc/Id split supports
+device mobility but cannot fully support application mobility[^3].
+
+So, what is that identifier actually naming? Well, all that moved was
+the application state, and the identifier seemed to move with
+it... And since the routers in the example don't run "end-host"
+applications, they don't need identifiers.
+
+# What does the Ouroboros model say?
+
+Ouroboros[^4] gives each application process a name, which is mapped
+to an IPCP's address[^5]. The O7s application name basically
+corresponds to the _identifier_, and the IPCPs address maps to the
+_locator_.
+
+{{<figure width="30%" src="/blog/20220228-flm-app.png">}}
+
+Let's compare the architecture of Ouroboros above with the figure at
+the top.
+
+First, the similarities. The Ouroboros model conjectures a split of
+the transport layer into an _application end-to-end layer_ (roughly
+TCP without congestion avoidance) and a network end-to-end layer that
+includes the _flow allocator_.
+
+The _flow allocator_ in O7s performs the name <--> address mapping
+that is similar to id <--> loc mapping. Interesting to note is that in
+O7s, the Flow allocator is present in every IPCP, which is needed for
+Congestion Notifications. Given that identifiers are mapping to
+application names, resolving in name <--> address in other nodes than
+the source, like in network-based Loc/Id split, is not violating the
+O7s architecture. But we haven't considered this as it doesn't look
+feasible from a scalability perspective.
+
+Now, the differences. First, the naming. The "identifier" in Ouroboros
+is a network/globally unique application name[^6]. Processes[^7] can
+be _bound_ to an application name. If a single process binds to an
+application name it's unicast, if multiple processes on the same
+server bind to the same name, it provides per-connection
+load-balancing between these processes. If multiple processes on
+different servers bind to the same name, it provides a form of anycast
+name-based load-balancing.
+
+Second, Ouroboros endpoint identifiers (EIDs) are only known to the
+Flow Allocator at the endpoint and specify the application. The O7s
+EID can be viewed as a combination of the L3 _protocol_ field and the
+L4 _port_ field into a single field that sits in between L3 and L4
+(the Loc/Id proposed sublayer). This allows O7s to allocate a new flow
+(assigning new EIDs) while keeping the connection state in the process
+(FRCP) intact, and thus allowing full application mobility in addition
+to device mobility. Taking another look at the Loc/Id split figure,
+note that Ouroboros splits "network" from "application" just above the
+"Sub-layer", instead of above the "transport layer".
+
+# Wrapping up
+
+The discussions on Loc/Id split were quite interesting. A lot of the
+steps and solutions it proposes are in line with the O7s model. What
+strikes me most is that LoC/Id split is still not very well-defined as
+a _model_. What exactly _are_ identifiers? What exactly _are_
+locators? The thing that sets O7s apart is that the model consists of
+a limited amount of objects (forwarding elements and flooding
+elements, which form Layers[^8], application, process, ...) that have
+well-defined names[^9] that are immutable and exist only for as long
+as the object exists.
+
+
+[^1]: https://doi.org/10.1109/COMST.2017.2728478
+
+[^2]: This is hard to do with TCP state being in the kernel, but let's
+ forget about that and memory addresses and others stuff for a
+ moment and assume the complete application state is a nice
+ containerized package.
+
+[^3]: The Ouroboros model does allow complete application
+ mobility. The problem in this Loc/Id proposal is that the port
+ is still part of the Transport Layer state (see the figure at
+ the start of the post).
+
+[^4]: This, and a lot of other things in O7s, were proposed in the
+ RINA architecture, that's where the credit should go.
+
+[^5]: To be accurate: we hash the application name.
+
+[^6]: At least, for a public Internetwork, they should be globally
+ unique.
+
+[^7]: In O7s, processes are named with a process name (which in the
+ implementation maps to the linux process id (pid). Process names
+ are only local (system) scope.
+
+[^8]: I capitalize Layers, as these Layers that are made up of
+ forwarding elements (unicast Layers) or flooding elements
+ (broadcast Layers) have a different meaning than the layers in
+ the discussion above. Maybe we should call them _strata_ instead
+ of Layers...
+
+[^9]: Synonyms are allowed, but they serve no function in the
+ architecture. As an example, application names are hashed (a
+ synonym) which has practical implications for security and
+ implementation simplicity, but the architecture is theoretically
+ identical without that hash. \ No newline at end of file
diff --git a/content/en/blog/20221207-loc-id.png b/content/en/blog/20221207-loc-id.png
new file mode 100644
index 0000000..51a046d
--- /dev/null
+++ b/content/en/blog/20221207-loc-id.png
Binary files differ