diff options
Diffstat (limited to 'content')
| -rw-r--r-- | content/en/blog/20210402-multicast.md | 2 | ||||
| -rw-r--r-- | content/en/blog/20220520-oping-flm.md | 87 | 
2 files changed, 88 insertions, 1 deletions
diff --git a/content/en/blog/20210402-multicast.md b/content/en/blog/20210402-multicast.md index b363794..bce44f3 100644 --- a/content/en/blog/20210402-multicast.md +++ b/content/en/blog/20210402-multicast.md @@ -398,7 +398,7 @@ dissemination function _ROUTING_, but if you know a better word that  avoids confusion, we'll take it. _ROUTING_ is distinct from adjacency  management, in the sense that adjacency management is administrative,  and tells the networks which links it is allowed to use, which links -_exist_. _ROUTING- will make use of these links and make decisions +_exist_. _ROUTING_ will make use of these links and make decisions  when they are unavailable, for instance due to failures.  Let's apply the Ouroboros model to Ethernet. Ethernet implements both diff --git a/content/en/blog/20220520-oping-flm.md b/content/en/blog/20220520-oping-flm.md new file mode 100644 index 0000000..31268e4 --- /dev/null +++ b/content/en/blog/20220520-oping-flm.md @@ -0,0 +1,87 @@ +--- +date: 2022-05-20 +title: "What is there to learn from oping about flow liveness monitoring?" +linkTitle: "learning from oping (1): cleaning up" +author: Thijs Paelman +--- + +### Cleaning up flows + +While I was browsing through some oping code +(trying to get a feeling about how to do [broadcast](https://ouroboros.rocks/blog/2021/04/02/how-does-ouroboros-do-anycast-and-multicast/#broadcast)), +I stumbled about the [cleaner thread](https://ouroboros.rocks/cgit/ouroboros/tree/src/tools/oping/oping_server.c?id=bec8f9ac7d6ebefbce6bd4c882c0f9616f561f1c#n54). +As we can see, it was used to clean up 'stale' flows (sanitized): + +```C +void * cleaner_thread(void * o) +{ +        int deadline_ms = 10000; + +        while (true) { +                for (/* all active flows i */) { + +                        diff = /* diff in ms between last valid ping packet and now */; + +                        if (diff > deadline_ms) { +                                printf("Flow %d timed out.\n", i); +                                flow_dealloc(i); +                        } +                } +                sleep(1); +        } +} +``` + +But we have since version 19.x flow liveness monitoring (FLM), which does this for us! +So all this code could be thrown away, right? + +Turns out I was semi-wrong! +It's all about semantics, or 'what do you want to achieve'. + +If this thread was there for cleaning up flows from which the peers stopped their flow (and stopped sending keep-alives), +then we could throw it away by all means! Because FLM does that job. + +Or was it there to clean up valid flows, but from which the peers didn't send any ping packets anymore (they *do* send keep-alives, otherwise FLM kicks in)? +Then we should of course keep it, because this is a server-side decision to cut those peers off. +This might protect for example against client implementations which connect, send a few pings, but then leave the flow open. +Or a better illustration of the 'cleaner' thread might be to cut off peers after a 100 pings, +showing that this decision to 'clean up' has nothing to do with flow timeouts. + +### Keeping timed-out flows + +On the other side of the spectrum, we have those flows that are timing out (no keep-alives are coming in anymore). +This is my proposal for the server side parsing of messages: + +```C +while(/* get next fd on which an event happened */) { +        msg_len = flow_read(fd, buf, OPING_BUF_SIZE); +        if (msg_len < 0) { +                /* if-statement is the only difference with before */ +                if (msg_len == -EFLOWPEER) { +                    fset_del(server.flows, fd); +                    flow_dealloc(fd); +                } +                continue; +        } +        /* continue with parsing and responding */ +} +``` + +We can see here that the decision is taken to 'clean up' (= `flow_dealloc`) those flows that are timing out. +But, as we can see, it's an application decision! +We might as well decide to keep it open for another 10 min to see if the client (or the network in between) recovers from interruptions, e.g.. + +We might for example use this mechanism to show to the user that the peer seems to be down[^overleaf] and even take measures (like saving or removing state), but also allow to just wait until the peer is live again. + +### Conclusion + +As an application, you have total freedom (and responsibility) over your flows. +Ouroboros will only inform you that your flow is timing out (and your peer thus appears to be down), +but it's up to you to decide if you deallocate your side of the flow and when. + +Excited for my first blog post & always learning, + +Thijs + + +[^overleaf]: I'm thinking about things like the Overleaf banner: `Lost Connection. Reconnecting in 2 secs. Try Now`  | 
