diff options
-rw-r--r-- | content/en/blog/20220520-oping-flm.md | 87 | ||||
-rw-r--r-- | content/en/blog/20221207-loc-id-mobility-1.png | bin | 0 -> 403093 bytes | |||
-rw-r--r-- | content/en/blog/20221207-loc-id-mobility-2.png | bin | 0 -> 411592 bytes | |||
-rw-r--r-- | content/en/blog/20221207-loc-id-split.md | 216 | ||||
-rw-r--r-- | content/en/blog/20221207-loc-id.png | bin | 0 -> 81278 bytes | |||
-rw-r--r-- | content/en/docs/Contributions/_index.md | 8 |
6 files changed, 311 insertions, 0 deletions
diff --git a/content/en/blog/20220520-oping-flm.md b/content/en/blog/20220520-oping-flm.md new file mode 100644 index 0000000..31268e4 --- /dev/null +++ b/content/en/blog/20220520-oping-flm.md @@ -0,0 +1,87 @@ +--- +date: 2022-05-20 +title: "What is there to learn from oping about flow liveness monitoring?" +linkTitle: "learning from oping (1): cleaning up" +author: Thijs Paelman +--- + +### Cleaning up flows + +While I was browsing through some oping code +(trying to get a feeling about how to do [broadcast](https://ouroboros.rocks/blog/2021/04/02/how-does-ouroboros-do-anycast-and-multicast/#broadcast)), +I stumbled about the [cleaner thread](https://ouroboros.rocks/cgit/ouroboros/tree/src/tools/oping/oping_server.c?id=bec8f9ac7d6ebefbce6bd4c882c0f9616f561f1c#n54). +As we can see, it was used to clean up 'stale' flows (sanitized): + +```C +void * cleaner_thread(void * o) +{ + int deadline_ms = 10000; + + while (true) { + for (/* all active flows i */) { + + diff = /* diff in ms between last valid ping packet and now */; + + if (diff > deadline_ms) { + printf("Flow %d timed out.\n", i); + flow_dealloc(i); + } + } + sleep(1); + } +} +``` + +But we have since version 19.x flow liveness monitoring (FLM), which does this for us! +So all this code could be thrown away, right? + +Turns out I was semi-wrong! +It's all about semantics, or 'what do you want to achieve'. + +If this thread was there for cleaning up flows from which the peers stopped their flow (and stopped sending keep-alives), +then we could throw it away by all means! Because FLM does that job. + +Or was it there to clean up valid flows, but from which the peers didn't send any ping packets anymore (they *do* send keep-alives, otherwise FLM kicks in)? +Then we should of course keep it, because this is a server-side decision to cut those peers off. +This might protect for example against client implementations which connect, send a few pings, but then leave the flow open. +Or a better illustration of the 'cleaner' thread might be to cut off peers after a 100 pings, +showing that this decision to 'clean up' has nothing to do with flow timeouts. + +### Keeping timed-out flows + +On the other side of the spectrum, we have those flows that are timing out (no keep-alives are coming in anymore). +This is my proposal for the server side parsing of messages: + +```C +while(/* get next fd on which an event happened */) { + msg_len = flow_read(fd, buf, OPING_BUF_SIZE); + if (msg_len < 0) { + /* if-statement is the only difference with before */ + if (msg_len == -EFLOWPEER) { + fset_del(server.flows, fd); + flow_dealloc(fd); + } + continue; + } + /* continue with parsing and responding */ +} +``` + +We can see here that the decision is taken to 'clean up' (= `flow_dealloc`) those flows that are timing out. +But, as we can see, it's an application decision! +We might as well decide to keep it open for another 10 min to see if the client (or the network in between) recovers from interruptions, e.g.. + +We might for example use this mechanism to show to the user that the peer seems to be down[^overleaf] and even take measures (like saving or removing state), but also allow to just wait until the peer is live again. + +### Conclusion + +As an application, you have total freedom (and responsibility) over your flows. +Ouroboros will only inform you that your flow is timing out (and your peer thus appears to be down), +but it's up to you to decide if you deallocate your side of the flow and when. + +Excited for my first blog post & always learning, + +Thijs + + +[^overleaf]: I'm thinking about things like the Overleaf banner: `Lost Connection. Reconnecting in 2 secs. Try Now` diff --git a/content/en/blog/20221207-loc-id-mobility-1.png b/content/en/blog/20221207-loc-id-mobility-1.png Binary files differnew file mode 100644 index 0000000..87bb04a --- /dev/null +++ b/content/en/blog/20221207-loc-id-mobility-1.png diff --git a/content/en/blog/20221207-loc-id-mobility-2.png b/content/en/blog/20221207-loc-id-mobility-2.png Binary files differnew file mode 100644 index 0000000..4fedee9 --- /dev/null +++ b/content/en/blog/20221207-loc-id-mobility-2.png diff --git a/content/en/blog/20221207-loc-id-split.md b/content/en/blog/20221207-loc-id-split.md new file mode 100644 index 0000000..bad82ac --- /dev/null +++ b/content/en/blog/20221207-loc-id-split.md @@ -0,0 +1,216 @@ +--- +date: 2022-12-07 +title: "Loc/Id split and the Ouroboros network model" +linkTitle: "On Loc/Id split" +author: Dimitri Staessens +--- + +A few weeks back I had a drink with Thijs who is now doing a master's +thesis on Loc/Id split, so we dug into the concepts behind Locators +and Identifiers and see if matches or in any way interferes with the +Ouroboros network model. + +For this, we started from the paper _Locator/Identifier Split +Networking: A Promising Future Internet Architecture_[^1]. + +# Loc/Id split? + +In a nutshell, Loc/Id split starts from the observation that the +transport layer (TCP, UDP) is tightly coupled to network (IP) +addresses via a certain TCP/UDP port. + +Assuming our IPv4 local address is 10.10.0.1 /24 and there is an SSH +server on 10.10.5.253 /24 listening on port 22, after making a +connection, our client application could be bound to 10.10.0.1 /24 on +port 25406. If we move our laptop to another room that is on an access +point in a different subnet, and we receive IP address 10.10.4.7 /24, +our TCP connection to the SSL server will break. + +Loc/Id split suggest to split the "address" into two parts, an +Identifier that is location-independent and specifies the _who_ at the +transport layer, and a locator that is location-dependent and +specifies the _where_ at the network layer. Since an IPv6 address has +more than enough (128) bits, there's plenty of space to chop it up and +attach some semantics to the individual pieces. + +Of course, after the split, identifiers need to be mapped to locators, +so there is a mapping system needed to resolve the locator given the +identifier. This mapping system resides in a Sub-Layer between the +transport layer and the network layer. If this mapping system sounds a +lot like DNS to you, then you're right, but then remember that TCP +doesn't bind to a DNS name + port, but to an IP address + port. That's +where the issue lies that the Identifier tries to solve. + +Resolving the Locator from the Identifier usually happens in the +end-host, but some Loc/Id split proposals may forward this +responsibility to other nodes in the network. When only end-hosts +perfom Id->Loc resolution, it's called a host-based Loc/Id split +architecture, if some other nodes perform Id->Loc resolution it's +called a network-based architecture. In a network-based architecture, +the identifier MUST be part of the packet header (in a host-based +architecture it's optional), and the network nodes forward towards a +resolver node based on the identifier and then when the locator is +known based on the locator towards the end-host. I have my doubts that +this can ever scale, so in this article, I'll focus on host based +Loc/Id split. Host-based architectures are summarized in the figure +below, taken from the survey paper[^1]. + +{{<figure width="60%" src="/blog/20221207-loc-id.png">}} + +My first reaction to seeing that was _sounds about right to me_, it's +almost identical to what O7s proposes for a fully scalable and +evolvable architecture. But before I get to that, let's first dig a +bit deeper into those locators and identifiers. What _are_ these +beasts? + +# Mobility in Loc/Id split + +{{<figure width="40%" src="/blog/20221207-loc-id-mobility-1.png">}} + +Let's assume the previous example where, from my laptop, I'm connected +to some SSH server, but this time we're in a Loc/Id split network. So +my laptop got a different address for its interface, an identifier, +say COFF33D00D, and, since I'm in the green network, a locator that is +conveniently the IPv4 address for my wireless LAN interface, +10.10.0.1 /24. The TCP connection in the SSH client is Loc/Id aware, +and now bound to C0FF33D00D:25406. After connecting to the client at +008BADF00D, It learns that I'm C0FF33D00D and my locator is 10.10.0.1. + +When I move to another floor, the laptop WLAN interface gets a new +locator, but my identifier stays the same. It's now +C0FF33D00D:10.10.4.7. The OS is implementing a host-based Loc/Id split +architecture, so I quickly send a _loc/id update_ message to the +server at 10.10.5.253 that my locator for C0FF33D00D has changed to +10.10.4.7, and it updates its mapping. The Loc/Id-aware TCP state +machine in my laptop had some packet loss to deal with while I was in +the elevator, but other than that, since it was bound to my identifier +the connection remains intact. + +Nice! Splitting an address into a locator and identifier has a pretty +elegant solution to mobility. + +Notice I didn't give the routers identifiers parts in their +address? That's on purpose. + +Let's take a little thought experiment. + +Instead of moving to the other floor, I already have a laptop already +sitting there. Its WLAN interface has address COFFEEBABE:10.10.4.7. + +{{<figure width="40%" src="/blog/20221207-loc-id-mobility-2.png">}} + +Now, what I do in this thought experiment, is copy the entire _program +state_ of my SSH client to that other laptop, _including_ the TCP +state[^2] and fork it as a new process on the other laptop. What is +needed to make it work from a network perspective? + +Well, like when actually moving with my laptop, I need to update the +server that my identifier C0FF33D00D has moved to another locator at +10.10.4.7. That should do the trick, quite easy. + +Unless there was already another application connected on port 25406 +on that destination laptop. Then there is no way for the incoming +laptop to know where to deliver the packets to. Unless the identifier +is in the packet header. But host-based Loc/Id split had them +optional? This seems to hint that host-based Loc/Id split supports +device mobility but cannot fully support application mobility[^3]. + +So, what is that identifier actually naming? Well, all that moved was +the application state, and the identifier seemed to move with +it... And since the routers in the example don't run "end-host" +applications, they don't need identifiers. + +# What does the Ouroboros model say? + +Ouroboros[^4] gives each application process a name, which is mapped +to an IPCP's address[^5]. The O7s application name basically +corresponds to the _identifier_, and the IPCPs address maps to the +_locator_. + +{{<figure width="30%" src="/blog/20220228-flm-app.png">}} + +Let's compare the architecture of Ouroboros above with the figure at +the top. + +First, the similarities. The Ouroboros model conjectures a split of +the transport layer into an _application end-to-end layer_ (roughly +TCP without congestion avoidance) and a network end-to-end layer that +includes the _flow allocator_. + +The _flow allocator_ in O7s performs the name <--> address mapping +that is similar to id <--> loc mapping. Interesting to note is that in +O7s, the Flow allocator is present in every IPCP, which is needed for +Congestion Notifications. Given that identifiers are mapping to +application names, resolving in name <--> address in other nodes than +the source, like in network-based Loc/Id split, is not violating the +O7s architecture. But we haven't considered this as it doesn't look +feasible from a scalability perspective. + +Now, the differences. First, the naming. The "identifier" in Ouroboros +is a network/globally unique application name[^6]. Processes[^7] can +be _bound_ to an application name. If a single process binds to an +application name it's unicast, if multiple processes on the same +server bind to the same name, it provides per-connection +load-balancing between these processes. If multiple processes on +different servers bind to the same name, it provides a form of anycast +name-based load-balancing. + +Second, Ouroboros endpoint identifiers (EIDs) are only known to the +Flow Allocator at the endpoint and specify the application. The O7s +EID can be viewed as a combination of the L3 _protocol_ field and the +L4 _port_ field into a single field that sits in between L3 and L4 +(the Loc/Id proposed sublayer). This allows O7s to allocate a new flow +(assigning new EIDs) while keeping the connection state in the process +(FRCP) intact, and thus allowing full application mobility in addition +to device mobility. Taking another look at the Loc/Id split figure, +note that Ouroboros splits "network" from "application" just above the +"Sub-layer", instead of above the "transport layer". + +# Wrapping up + +The discussions on Loc/Id split were quite interesting. A lot of the +steps and solutions it proposes are in line with the O7s model. What +strikes me most is that LoC/Id split is still not very well-defined as +a _model_. What exactly _are_ identifiers? What exactly _are_ +locators? The thing that sets O7s apart is that the model consists of +a limited amount of objects (forwarding elements and flooding +elements, which form Layers[^8], application, process, ...) that have +well-defined names[^9] that are immutable and exist only for as long +as the object exists. + + +[^1]: https://doi.org/10.1109/COMST.2017.2728478 + +[^2]: This is hard to do with TCP state being in the kernel, but let's + forget about that and memory addresses and others stuff for a + moment and assume the complete application state is a nice + containerized package. + +[^3]: The Ouroboros model does allow complete application + mobility. The problem in this Loc/Id proposal is that the port + is still part of the Transport Layer state (see the figure at + the start of the post). + +[^4]: This, and a lot of other things in O7s, were proposed in the + RINA architecture, that's where the credit should go. + +[^5]: To be accurate: we hash the application name. + +[^6]: At least, for a public Internetwork, they should be globally + unique. + +[^7]: In O7s, processes are named with a process name (which in the + implementation maps to the linux process id (pid). Process names + are only local (system) scope. + +[^8]: I capitalize Layers, as these Layers that are made up of + forwarding elements (unicast Layers) or flooding elements + (broadcast Layers) have a different meaning than the layers in + the discussion above. Maybe we should call them _strata_ instead + of Layers... + +[^9]: Synonyms are allowed, but they serve no function in the + architecture. As an example, application names are hashed (a + synonym) which has practical implications for security and + implementation simplicity, but the architecture is theoretically + identical without that hash.
\ No newline at end of file diff --git a/content/en/blog/20221207-loc-id.png b/content/en/blog/20221207-loc-id.png Binary files differnew file mode 100644 index 0000000..51a046d --- /dev/null +++ b/content/en/blog/20221207-loc-id.png diff --git a/content/en/docs/Contributions/_index.md b/content/en/docs/Contributions/_index.md index ad33af3..558298e 100644 --- a/content/en/docs/Contributions/_index.md +++ b/content/en/docs/Contributions/_index.md @@ -7,6 +7,14 @@ description: > How to contribute to Ouroboros. --- +### Ongoing work + +Ouroboros is far from complete. Plenty of things need to be researched +and implemented. We don't really keep a list, but this +[epic board](https://tree.taiga.io/project/dstaesse-ouroboros/epics) can +give you some ideas of what is still on our mind and where you may be +able to contribute. + ### Communication There are 2 ways that will be used to communicate: The mailing list |