aboutsummaryrefslogtreecommitdiff
path: root/content/en
diff options
context:
space:
mode:
Diffstat (limited to 'content/en')
-rw-r--r--content/en/blog/20210402-multicast.md2
-rw-r--r--content/en/blog/20220520-oping-flm.md87
2 files changed, 88 insertions, 1 deletions
diff --git a/content/en/blog/20210402-multicast.md b/content/en/blog/20210402-multicast.md
index b363794..bce44f3 100644
--- a/content/en/blog/20210402-multicast.md
+++ b/content/en/blog/20210402-multicast.md
@@ -398,7 +398,7 @@ dissemination function _ROUTING_, but if you know a better word that
avoids confusion, we'll take it. _ROUTING_ is distinct from adjacency
management, in the sense that adjacency management is administrative,
and tells the networks which links it is allowed to use, which links
-_exist_. _ROUTING- will make use of these links and make decisions
+_exist_. _ROUTING_ will make use of these links and make decisions
when they are unavailable, for instance due to failures.
Let's apply the Ouroboros model to Ethernet. Ethernet implements both
diff --git a/content/en/blog/20220520-oping-flm.md b/content/en/blog/20220520-oping-flm.md
new file mode 100644
index 0000000..31268e4
--- /dev/null
+++ b/content/en/blog/20220520-oping-flm.md
@@ -0,0 +1,87 @@
+---
+date: 2022-05-20
+title: "What is there to learn from oping about flow liveness monitoring?"
+linkTitle: "learning from oping (1): cleaning up"
+author: Thijs Paelman
+---
+
+### Cleaning up flows
+
+While I was browsing through some oping code
+(trying to get a feeling about how to do [broadcast](https://ouroboros.rocks/blog/2021/04/02/how-does-ouroboros-do-anycast-and-multicast/#broadcast)),
+I stumbled about the [cleaner thread](https://ouroboros.rocks/cgit/ouroboros/tree/src/tools/oping/oping_server.c?id=bec8f9ac7d6ebefbce6bd4c882c0f9616f561f1c#n54).
+As we can see, it was used to clean up 'stale' flows (sanitized):
+
+```C
+void * cleaner_thread(void * o)
+{
+ int deadline_ms = 10000;
+
+ while (true) {
+ for (/* all active flows i */) {
+
+ diff = /* diff in ms between last valid ping packet and now */;
+
+ if (diff > deadline_ms) {
+ printf("Flow %d timed out.\n", i);
+ flow_dealloc(i);
+ }
+ }
+ sleep(1);
+ }
+}
+```
+
+But we have since version 19.x flow liveness monitoring (FLM), which does this for us!
+So all this code could be thrown away, right?
+
+Turns out I was semi-wrong!
+It's all about semantics, or 'what do you want to achieve'.
+
+If this thread was there for cleaning up flows from which the peers stopped their flow (and stopped sending keep-alives),
+then we could throw it away by all means! Because FLM does that job.
+
+Or was it there to clean up valid flows, but from which the peers didn't send any ping packets anymore (they *do* send keep-alives, otherwise FLM kicks in)?
+Then we should of course keep it, because this is a server-side decision to cut those peers off.
+This might protect for example against client implementations which connect, send a few pings, but then leave the flow open.
+Or a better illustration of the 'cleaner' thread might be to cut off peers after a 100 pings,
+showing that this decision to 'clean up' has nothing to do with flow timeouts.
+
+### Keeping timed-out flows
+
+On the other side of the spectrum, we have those flows that are timing out (no keep-alives are coming in anymore).
+This is my proposal for the server side parsing of messages:
+
+```C
+while(/* get next fd on which an event happened */) {
+ msg_len = flow_read(fd, buf, OPING_BUF_SIZE);
+ if (msg_len < 0) {
+ /* if-statement is the only difference with before */
+ if (msg_len == -EFLOWPEER) {
+ fset_del(server.flows, fd);
+ flow_dealloc(fd);
+ }
+ continue;
+ }
+ /* continue with parsing and responding */
+}
+```
+
+We can see here that the decision is taken to 'clean up' (= `flow_dealloc`) those flows that are timing out.
+But, as we can see, it's an application decision!
+We might as well decide to keep it open for another 10 min to see if the client (or the network in between) recovers from interruptions, e.g..
+
+We might for example use this mechanism to show to the user that the peer seems to be down[^overleaf] and even take measures (like saving or removing state), but also allow to just wait until the peer is live again.
+
+### Conclusion
+
+As an application, you have total freedom (and responsibility) over your flows.
+Ouroboros will only inform you that your flow is timing out (and your peer thus appears to be down),
+but it's up to you to decide if you deallocate your side of the flow and when.
+
+Excited for my first blog post & always learning,
+
+Thijs
+
+
+[^overleaf]: I'm thinking about things like the Overleaf banner: `Lost Connection. Reconnecting in 2 secs. Try Now`