2 files changed, 88 insertions, 1 deletions
diff --git a/content/en/blog/20210402-multicast.md b/content/en/blog/20210402-multicast.md
index b363794..bce44f3 100644
--- a/content/en/blog/20210402-multicast.md
+++ b/content/en/blog/20210402-multicast.md
@@ -398,7 +398,7 @@ dissemination function _ROUTING_, but if you know a better word that
 avoids confusion, we'll take it. _ROUTING_ is distinct from adjacency
 management, in the sense that adjacency management is administrative,
 and tells the networks which links it is allowed to use, which links
-_exist_. _ROUTING- will make use of these links and make decisions
+_exist_. _ROUTING_ will make use of these links and make decisions
 when they are unavailable, for instance due to failures.
 
 Let's apply the Ouroboros model to Ethernet. Ethernet implements both
diff --git a/content/en/blog/20220520-oping-flm.md b/content/en/blog/20220520-oping-flm.md
new file mode 100644
index 0000000..31268e4
--- /dev/null
+++ b/content/en/blog/20220520-oping-flm.md
@@ -0,0 +1,87 @@
+---
+date: 2022-05-20
+title: "What is there to learn from oping about flow liveness monitoring?"
+linkTitle: "learning from oping (1): cleaning up"
+author: Thijs Paelman
+---
+
+### Cleaning up flows
+
+While I was browsing through some oping code
+(trying to get a feeling about how to do [broadcast](https://ouroboros.rocks/blog/2021/04/02/how-does-ouroboros-do-anycast-and-multicast/#broadcast)),
+I stumbled about the [cleaner thread](https://ouroboros.rocks/cgit/ouroboros/tree/src/tools/oping/oping_server.c?id=bec8f9ac7d6ebefbce6bd4c882c0f9616f561f1c#n54).
+As we can see, it was used to clean up 'stale' flows (sanitized):
+
+```C
+void * cleaner_thread(void * o)
+{
+        int deadline_ms = 10000;
+
+        while (true) {
+                for (/* all active flows i */) {
+
+                        diff = /* diff in ms between last valid ping packet and now */;
+
+                        if (diff > deadline_ms) {
+                                printf("Flow %d timed out.\n", i);
+                                flow_dealloc(i);
+                        }
+                }
+                sleep(1);
+        }
+}
+```
+
+But we have since version 19.x flow liveness monitoring (FLM), which does this for us!
+So all this code could be thrown away, right?
+
+Turns out I was semi-wrong!
+It's all about semantics, or 'what do you want to achieve'.
+
+If this thread was there for cleaning up flows from which the peers stopped their flow (and stopped sending keep-alives),
+then we could throw it away by all means! Because FLM does that job.
+
+Or was it there to clean up valid flows, but from which the peers didn't send any ping packets anymore (they *do* send keep-alives, otherwise FLM kicks in)?
+Then we should of course keep it, because this is a server-side decision to cut those peers off.
+This might protect for example against client implementations which connect, send a few pings, but then leave the flow open.
+Or a better illustration of the 'cleaner' thread might be to cut off peers after a 100 pings,
+showing that this decision to 'clean up' has nothing to do with flow timeouts.
+
+### Keeping timed-out flows
+
+On the other side of the spectrum, we have those flows that are timing out (no keep-alives are coming in anymore).
+This is my proposal for the server side parsing of messages:
+
+```C
+while(/* get next fd on which an event happened */) {
+        msg_len = flow_read(fd, buf, OPING_BUF_SIZE);
+        if (msg_len < 0) {
+                /* if-statement is the only difference with before */
+                if (msg_len == -EFLOWPEER) {
+                    fset_del(server.flows, fd);
+                    flow_dealloc(fd);
+                }
+                continue;
+        }
+        /* continue with parsing and responding */
+}
+```
+
+We can see here that the decision is taken to 'clean up' (= `flow_dealloc`) those flows that are timing out.
+But, as we can see, it's an application decision!
+We might as well decide to keep it open for another 10 min to see if the client (or the network in between) recovers from interruptions, e.g..
+
+We might for example use this mechanism to show to the user that the peer seems to be down[^overleaf] and even take measures (like saving or removing state), but also allow to just wait until the peer is live again.
+
+### Conclusion
+
+As an application, you have total freedom (and responsibility) over your flows.
+Ouroboros will only inform you that your flow is timing out (and your peer thus appears to be down),
+but it's up to you to decide if you deallocate your side of the flow and when.
+
+Excited for my first blog post & always learning,
+
+Thijs
+
+
+[^overleaf]: I'm thinking about things like the Overleaf banner: `Lost Connection. Reconnecting in 2 secs. Try Now`