aboutsummaryrefslogtreecommitdiff
path: root/content/en/blog/20220228-flow-liveness-monitoring.md
blob: e94d5a0d957641517f8b1b48c6b90911089bd3ba (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
---
date: 2022-02-28
title: "Application-level flow liveness monitoring"
linkTitle: "Flows vs connections/sockets (3)"
author: Dimitri Staessens
---

This week I completed the (probably final) implementation of flow
liveness monitoring, but now in the application. In the next prototype
version (0.19) Ouroboros will allow setting a keepalive timeout on
flows. If there is no other traffic to send, either side will send
periodic keepalive packets to keep the flow alive. If no activity has
been observed for the keepalive time, the peer will be considered
down, and IPC calls (flow_read / flow_write) will fail with
-EFLOWPEER. This does not remove any flow state in the system, it only
notifies each side that the peer is unresponsive (presumed dead,
either it crashed, or deallocated the flow). It's up to the
application how to respond to this event.

The duration can be set using the timeout value on the QoS
specification. It is specified in milliseconds, currently as a 32-bit
unsigned integer. This allows timeouts up to about 50 days. Each side
will send a keepalive packet at 1/4 of the specified period (not
configurable yet, but this may be useful at some point). To disable
keepalive, set the timeout to 0. I've set the current default value to
2 minutes, but I'm open to other suggestions.

The modified oecho application looks as follows (decluttered). On the
server side, we have:

```C
       while (true) {
                fd = flow_accept(NULL, NULL);

                printf("New flow.\n");

                count = flow_read(fd, &buf, BUF_SIZE);

                printf("Message from client is %.*s.\n", (int) count, buf);

                flow_write(fd, buf, count);

                flow_dealloc(fd);
        }

        return 0;
```

And on the client side, the following example sets a keepalive of 4 seconds:
```C
        char *  message = "Client says hi!";
        qosspec_t qs = qos_raw;
        qs.timeout = 4000;

        fd = flow_alloc("oecho", &qs, NULL);

        flow_write(fd, message, strlen(message) + 1);

        count = flow_read(fd, buf, BUF_SIZE);

        printf("Server replied with %.*s\n", (int) count, buf);

        /* The server has deallocated the flow, this read should fail. */
        count = flow_read(fd, buf, BUF_SIZE);
        if (count < 0) {
                printf("Failed to read packet: %zd.\n", count);
                flow_dealloc(fd);
                return -1;
        }

        flow_dealloc(fd);
```

Running the client against the server will result in (1006 indicates EFLOWPEER).

```
[dstaesse@heteropoda website]$ oecho
Server replied with Client says hi!
Failed to read packet: -1006.
```

How does it work?

In the
[first post on this topic](/blog/2021/12/29/behaviour-of-ouroboros-flows-vs-udp-sockets-and-tcp-connections/sockets/),
I explained my reasoning how Ouroboros should deal with half-closed
flows (flow deallocation from one side should eventually result in a
terminated flow at the other side). The implementation should work
with any kind of flow, which means we can't put in the the FRCP
protocol. And thus, I argued, it had to be below the application, in
the flow allocator. This is also where we implemented it in RINA a few
years back, so it was easy to think this would directly translate to
O7s. I was convinced it was right.

I was wrong.

After the initial implementation, I noticed that I needed to leak the
FRCP timeout (remaining Delta-t) into the IPCP. I was not planning on
doing that, as it's a _layer violation_. In RINA that is not as
obvious, as DTCP is already in the IPCP. But in O7s, the deallocation
first waits for Delta-t to expire in the application[^1] before
telling the IPCP to get rid of the flow (where it's an instantaneous
operation). This means that for flows with retransmission, the
keepalive timeout will first wait for the peers' Delta-t timer to
expire (because the flow isn't deallocated in the peer's IPCP until it
does), and then again wait for the keepalive to expire in it's own
IPCP. With 2 minutes each, that means the application would only
timeout after 4 minutes after the deallocation. To solve that with
keepalive in the flow allocator, I would need to pass the timeout to
the flow allocator, and on dealloc tell it to stop sending keepalives,
and wait for the longest of the [keepalive, delta-t] to expire before
getting rid of the flow state. It would work, it wouldn't even be a
huge mess to most eyes. But it bugged me tremendously. It had to be in
the application, as shown in the figure below.

{{<figure width="80%" src="/blog/20220228-flm-app.png">}}

But this poses a different problem: how to spot keepalive packets from
regular traffic. As I said many times before, it can't be in FRCP, as
it wouldn't work with raw flows. It also has to work with
encryption. Raw flows have no header, so I can't mark them easily, and
adding a header just for marking keepalive flows is also a bridge too
far.

I think I found an elegant solution. _0-length packets_. No header. No
flags. Nothing. Nada. The flow at the receiver gets notified of a
packet with a length of 0 bytes from the flow, updates it last
activity time, and drops the packet without waking up application
reads. Works with any type of traffic on the flow. 0-byte reads on the
receiver already have a semantic of a partial read that was completed
with exactly the buffer size[^2]. The sender can send 0-length
packets, but the effect will be that it is a purposeful keepalive
initiated at the sender.

[^1]: Logically in the application. After all packets are
      acknowledged, the application will exit and the IRMd will just
      wait for the remaining timeout before telling the IPCP to
      deallocate the flow. This is also a leak of the timeout from the
      application to the IRMd, but it's an optimization that is really
      needed. Nobody wants to wait 4 minutes for an application to
      terminate after hitting Ctrl-C. This isn't really a clear-cut
      "layer violation" as the IRMd should be considered part of the
      Operating System. It's similar to TCP connections being in
      TIME_WAIT in the kernel for 2 MSL.


[^2]: If flow\_read(fd, buf, 128) returns 128, it should be called
      again. If it returns 0, it means that the message was 128 bytes
      long, if it returns another value, it is still part of the
      previous message.