From 20df52a54fc03ef067cb4bce3e176e19129b4a84 Mon Sep 17 00:00:00 2001 From: Dimitri Staessens Date: Sun, 14 Feb 2021 17:42:22 +0100 Subject: content: Move releases to docs and add 0.18 notes --- content/en/blog/news/20191006-new-site.md | 7 - content/en/blog/news/20200116-hn.md | 30 -- content/en/blog/news/20200212-ecmp.md | 68 ---- content/en/blog/news/20200216-ecmp.md | 118 ------- content/en/blog/news/20200502-frcp.md | 236 -------------- content/en/blog/news/20200507-python-lb.png | Bin 218383 -> 0 bytes content/en/blog/news/20200507-python.md | 74 ----- .../en/blog/news/20201212-congestion-avoidance.md | 358 --------------------- content/en/blog/news/20201212-congestion.png | Bin 54172 -> 0 bytes .../en/blog/news/20201219-congestion-avoidance.md | 313 ------------------ content/en/blog/news/20201219-congestion.png | Bin 189977 -> 0 bytes content/en/blog/news/20201219-exp.svg | 1 - content/en/blog/news/20201219-ws-0.png | Bin 419135 -> 0 bytes content/en/blog/news/20201219-ws-1.png | Bin 432812 -> 0 bytes content/en/blog/news/20201219-ws-2.png | Bin 428663 -> 0 bytes content/en/blog/news/20201219-ws-3.png | Bin 417961 -> 0 bytes content/en/blog/news/20201219-ws-4.png | Bin 423835 -> 0 bytes content/en/blog/news/_index.md | 5 - 18 files changed, 1210 deletions(-) delete mode 100644 content/en/blog/news/20191006-new-site.md delete mode 100644 content/en/blog/news/20200116-hn.md delete mode 100644 content/en/blog/news/20200212-ecmp.md delete mode 100644 content/en/blog/news/20200216-ecmp.md delete mode 100644 content/en/blog/news/20200502-frcp.md delete mode 100644 content/en/blog/news/20200507-python-lb.png delete mode 100644 content/en/blog/news/20200507-python.md delete mode 100644 content/en/blog/news/20201212-congestion-avoidance.md delete mode 100644 content/en/blog/news/20201212-congestion.png delete mode 100644 content/en/blog/news/20201219-congestion-avoidance.md delete mode 100644 content/en/blog/news/20201219-congestion.png delete mode 100644 content/en/blog/news/20201219-exp.svg delete mode 100644 content/en/blog/news/20201219-ws-0.png delete mode 100644 content/en/blog/news/20201219-ws-1.png delete mode 100644 content/en/blog/news/20201219-ws-2.png delete mode 100644 content/en/blog/news/20201219-ws-3.png delete mode 100644 content/en/blog/news/20201219-ws-4.png delete mode 100644 content/en/blog/news/_index.md (limited to 'content/en/blog/news') diff --git a/content/en/blog/news/20191006-new-site.md b/content/en/blog/news/20191006-new-site.md deleted file mode 100644 index c04ff2d..0000000 --- a/content/en/blog/news/20191006-new-site.md +++ /dev/null @@ -1,7 +0,0 @@ ---- -date: 2019-10-06 -title: "New Website" -linkTitle: "New Ouroboros website" -description: "Announcing the new website" -author: Dimitri Staessens ---- diff --git a/content/en/blog/news/20200116-hn.md b/content/en/blog/news/20200116-hn.md deleted file mode 100644 index b80a7bd..0000000 --- a/content/en/blog/news/20200116-hn.md +++ /dev/null @@ -1,30 +0,0 @@ ---- -date: 2020-01-16 -title: "Getting back to work" -linkTitle: "Getting back to work" -description: "Show HN - Ouroboros" -author: Dimitri Staessens ---- - -Yesterday there was a bit of an unexpected spike in interest in -Ouroboros following a [post on -HN](https://news.ycombinator.com/item?id=22052416). I'm really -humbled by the response and grateful to all the people that show -genuine interest in this project. - -I fully understand that people would like to know a lot more details -about Ouroboros than the current site provides. It was the top -priority on the todo list, and this new interest gives me some -additional motivation to get to it. There's a lot to Ouroboros that's -not so trivial, which makes writing clear documentation a tricky -thing to do. - -I will also tackle some of the questions from the HN in a series of -blog posts in the next few days, replacing the (very old and outdated) -FAQ section. I hope these will be useful. - -Again thank you for your interest. - -Sincerely, - -Dimitri diff --git a/content/en/blog/news/20200212-ecmp.md b/content/en/blog/news/20200212-ecmp.md deleted file mode 100644 index 019b40d..0000000 --- a/content/en/blog/news/20200212-ecmp.md +++ /dev/null @@ -1,68 +0,0 @@ ---- -date: 2020-02-12 -title: "Equal-Cost Multipath (ECMP)" -linkTitle: "Adding Equal-Cost multipath (ECMP)" -description: "ECMP is coming to Ouroboros (finally)" -author: Dimitri Staessens ---- - -Some recent news -- Multi-Path TCP (MPTCP) implementation is [landing -in mainstream Linux kernel -5.6](https://www.phoronix.com/scan.php?page=news_item&px=Linux-5.6-Starts-Multipath-TCP) --- finally got me to integrate the equal-cost multipath (ECMP) -implementation from [Nick Aerts's master -thesis](https://lib.ugent.be/nl/catalog/rug01:002494958) into -Ouroboros. And working on the ECMP implementation in gives me an -excuse to rant a little bit about MPTCP. - -The first question that comes to mind is: _Why is it called -multi-**path** TCP_? IP is routing packets, not TCP, and there are -equal-cost multipath options for IP in both [IS-IS and -OSPF](https://tools.ietf.org/html/rfc2991). Maybe _multi-flow TCP_ -would be a better name? This would also be more transparent to the -fact that running MPTCP over longer hops will make less sense, since -the paths are more likely to converge over the same link. - -So _why is there a need for multi-path TCP_? The answer, of course, is -that the Internet Protocol routes packets between IP endpoints, which -are _interfaces_, not _hosts_. So, if a server is connected over 4 -interfaces, ECMP routing will not be of any help if one of them goes -down. The TCP connections will time out. Multipath TCP, however, is -actually establishing 4 subflows, each over a different interface. If -an interface goes down, MPTCP will still have 3 subflows ready. The -application is listening the the main TCP connection, and will not -notice a TCP-subflow timing out[^1]. - -This brings us, of course, to the crux of the problem. IP names the -[point of attachment](https://tools.ietf.org/html/rfc1498); IP -addresses are assigned to interfaces. Another commonly used workaround -is a virtual IP interface on the loopback, but then you need a lot of -additional configuration (and if that were the perfect solution, one -wouldn't need MPTCP!). MPTCP avoids the network configuration mess, -but does require direct modification in the application using -[additions to the sockets -API](https://tools.ietf.org/html/draft-hesmans-mptcp-socket-03) in the -form of a bunch of (ugly) setsockopts. - -Now this is a far from ideal situation, but given its constraints, -MPTCP is a workable engineering solution that will surely see its -uses. It's strange that it took years for MPTCP to get to this stage. - -Now, of course, Ouroboros does not assign addresses to -points-of-attachments ( _flow endpoints_). It doesn't even assign -addresses to hosts/nodes! Instead, the address is derived from the -forwarding protocol machines inside each node. (For the details, see -the [article](https://arxiv.org/pdf/2001.09707.pdf)). The net effect -is that an ECMP routing algorithm can cleanly handle hosts with -multiple interfaces. Details about the routing algorithm are not -exposed to application APIs. Instead, Ouroboros applications request -an implementation-independent _service_. - -The ECMP patch for Ouroboros is coming _soon_. Once it's available I -will also add a couple of tutorials on it. - -Peace. - -Dimitri - -[^1]: Question: Why are the subflows not UDP? That would avoid a lot of duplicated overhead (sequence numbers etc)... Would it be too messy on the socket API side? \ No newline at end of file diff --git a/content/en/blog/news/20200216-ecmp.md b/content/en/blog/news/20200216-ecmp.md deleted file mode 100644 index ce632c9..0000000 --- a/content/en/blog/news/20200216-ecmp.md +++ /dev/null @@ -1,118 +0,0 @@ ---- -date: 2020-02-16 -title: "Equal-Cost Multipath (ECMP) routing" -linkTitle: "Equal-Cost multipath (ECMP) example" -description: "A very quick example of ECMP" -author: Dimitri Staessens ---- - -As promised, I added equal cost multipath routing to the Ouroboros -unicast IPCP. I will add some more explanations later when it's fully -tested and merge into the master branch, but you can already try it. -You will need to pull the _be_ branch. You will also need to have -_fuse_ installed to monitor the flows from _/tmp/ouroboros/_. The -following script will bootstrap a 4-node unicast network on your -machine that routes using ECMP: - -```bash -#!/bin/bash - -# create a local IPCP. This emulates the "Internet" -irm i b t local n local l local - -#create the first unicast IPCP with ecmp -irm i b t unicast n uni.a l net routing ecmp - -#bind the unicast IPCP to the names net and uni.a -irm b i uni.a n net -irm b i uni.a n uni.a - -#register these 2 names in the local IPCP -irm n r net l local -irm n r uni.a l local - -#create 3 more unicast IPCPs, and enroll them with the first -irm i e t unicast n uni.b l net -irm b i uni.b n net -irm b i uni.b n uni.b -irm n r uni.b l local - -irm i e t unicast n uni.c l net -irm b i uni.c n net -irm b i uni.c n uni.c -irm n r uni.c l local - -irm i e t unicast n uni.d l net -irm b i uni.d n net -irm b i uni.d n uni.d -irm n r uni.d l local - -#connect uni.b to uni.a this creates a DT flow and a mgmt flow -irm i conn name uni.b dst uni.a - -#now do the same for the others, creating a square -irm i conn name uni.c dst uni.b -irm i conn name uni.d dst uni.c -irm i conn name uni.d dst uni.a - -#register the oping application at 4 different locations -#this allows us to check the multipath implementation -irm n r oping.a i uni.a -irm n r oping.b i uni.b -irm n r oping.c i uni.c -irm n r oping.d i uni.d - -#bind oping program to oping names -irm b prog oping n oping.a -irm b prog oping n oping.b -irm b prog oping n oping.c -irm b prog oping n oping.d - -#good to go! -``` - -In order to test the setup, start an irmd (preferably in a terminal so -you can see what's going on). In another terminal, run the above -script and then start an oping server: - -```bash -$ ./ecmpscript -$ oping -l -Ouroboros ping server started. -``` - -This single server program will accept all flows for oping from any of -the unicast IPCPs. Ouroboros _multi-homing_ in action. - -Open another terminal, and type the following command: - -```bash -$ watch -n 1 'grep "sent (packets)" /tmp/ouroboros/uni.a/dt.*/6* | sed -n -e 1p -e 7p' -``` - -This will show you the packet statistics from the 2 data transfer -flows from the first IPCP (uni.a). - -On my machine it looks like this: - -``` -Every 1,0s: grep "sent (packets)" /tmp/ouroboros/uni.a/dt.*/6* | sed -n -e 1p -e 7p - -/tmp/ouroboros/uni.a/dt.1896199821/65: sent (packets): 10 -/tmp/ouroboros/uni.a/dt.1896199821/67: sent (packets): 6 -``` - -Now, from yet another terminal, run connect an oping client to oping.c -(the client should attach to the first IPCP, so oping.c should be the -one with 2 equal cost paths) and watch both counters increase: - -```bash -oping -n oping.c -i 100ms -``` - -When you do this to the other destinations (oping.b and oping.d) you -should see only one of the flow counters increasing. - -Hope you enjoyed this little demo! - -Dimitri diff --git a/content/en/blog/news/20200502-frcp.md b/content/en/blog/news/20200502-frcp.md deleted file mode 100644 index 28c5794..0000000 --- a/content/en/blog/news/20200502-frcp.md +++ /dev/null @@ -1,236 +0,0 @@ ---- -date: 2020-05-02 -title: "Flow and Retransmission Control Protocol (FRCP) implementation" -linkTitle: "Flow and Retransmission Control Protocol (FRCP)" -description: "A quick demo of FRCP" -author: Dimitri Staessens ---- - -With the longer weekend I had some fun implementing (parts of) the -[Flow and Retransmission Control Protocol (FRCP)](/docs/concepts/protocols/#flow-and-retransmission-control-protocol-frcp) -to the point that it's stable enough to bring you a very quick demo of it. - -FRCP is the Ouroboros alternative to TCP / QUIC / LLC. It assures -delivery of packets when the network itself isn't very reliable. - -The setup is simple: we run Ouroboros over the Ethernet loopback -adapter _lo_, -``` -systemctl restart ouroboros -irm i b t eth-dix l dix n dix dev lo -``` -to which we add some impairment -[_qdisc_](http://man7.org/linux/man-pages/man8/tc-netem.8.html): - -``` -$ sudo tc qdisc add dev lo root netem loss 8% duplicate 3% reorder 10% delay 1 -``` - -This causes the link to lose, duplicate and reorder packets. - -We can use the oping tool to uses different [QoS -specs](https://ouroboros.rocks/cgit/ouroboros/tree/include/ouroboros/qos.h) -and watch the behaviour. Quality-of-Service (QoS) specs are a -technology-agnostic way to request a network service (current -status - not finalized yet). I'll also capture tcpdump output. - -We start an oping server and tell Ouroboros for it to listen to the _name_ "oping": -``` -#bind the program oping to the name oping -irm b prog oping n oping -#register the name oping in the Ethernet layer that is attached to the loopback -irm n r oping l dix -#run the oping server -oping -l -``` - -We'll now send 20 pings. If you try this, it can be that the flow -allocation fails, due to the loss of a flow allocation packet (a bit -similar to TCP losing the first SYN). The oping client currently -doesn't retry flow allocation. The default payload for oping is 64 -bytes (of zeros); oping waits 2 seconds for all packets it has -sent. It doesn't detect duplicates. - -Let's first look at the _raw_ QoS cube. That's like best-effort -UDP/IP. In Ouroboros, however, it doesn't require a packet header at -all. - -First, the output of the client using a _raw_ QoS cube: -``` -$ oping -n oping -c 20 -i 200ms -q raw -Pinging oping with 64 bytes of data (20 packets): - -64 bytes from oping: seq=0 time=0.880 ms -64 bytes from oping: seq=1 time=0.742 ms -64 bytes from oping: seq=4 time=1.303 ms -64 bytes from oping: seq=6 time=0.739 ms -64 bytes from oping: seq=6 time=0.771 ms [out-of-order] -64 bytes from oping: seq=6 time=0.789 ms [out-of-order] -64 bytes from oping: seq=7 time=0.717 ms -64 bytes from oping: seq=8 time=0.759 ms -64 bytes from oping: seq=9 time=0.716 ms -64 bytes from oping: seq=10 time=0.729 ms -64 bytes from oping: seq=11 time=0.720 ms -64 bytes from oping: seq=12 time=0.718 ms -64 bytes from oping: seq=13 time=0.722 ms -64 bytes from oping: seq=14 time=0.700 ms -64 bytes from oping: seq=16 time=0.670 ms -64 bytes from oping: seq=17 time=0.712 ms -64 bytes from oping: seq=18 time=0.716 ms -64 bytes from oping: seq=19 time=0.674 ms -Server timed out. - ---- oping ping statistics --- -20 packets transmitted, 18 received, 2 out-of-order, 10% packet loss, time: 6004.273 ms -rtt min/avg/max/mdev = 0.670/0.765/1.303/0.142 ms -``` - -The _netem_ did a good job of jumbling up the traffic! There were a -couple out-of-order, duplicates, and quite some packets lost. - -Let's dig into an Ethernet frame captured from the "wire". The most -interesting thing its small total size: 82 bytes. - -``` -13:37:25.875092 00:00:00:00:00:00 (oui Ethernet) > 00:00:00:00:00:00 (oui Ethernet), ethertype Unknown (0xa000), length 82: - 0x0000: 0042 0040 0000 0001 0000 0011 e90c 0000 .B.@............ - 0x0010: 0000 0000 203f 350f 0000 0000 0000 0000 .....?5......... - 0x0020: 0000 0000 0000 0000 0000 0000 0000 0000 ................ - 0x0030: 0000 0000 0000 0000 0000 0000 0000 0000 ................ - 0x0040: 0000 0000 -``` - -The first 12 bytes are the two MAC addresses (all zeros), then 2 bytes -for the "Ethertype" (the default for an Ouroboros layer is 0xa000, so -you can create more layers and seperate them by Ethertype[^1]. The -Ethernet Payload is thus 68 bytes. The Ouroboros _ipcpd-eth-dix_ adds -and extra header of 4 bytes with 2 extra "fields". The first field we -needed to take over from our [Data -Transfer](/docs/concepts/protocols/) protocol: the Endpoint Identifier -that identifies the flow. The _ipcpd-eth-dix_ has two endpoints, one -for the client and one for the server. 0x0042 (66) is the destination -EID of the server, 0x0043 (67) is the destination EID of the client. -The second field is the _length_ of the payload in octets, 0x0040 = -64. This is needed because Ethernet II has a minimum frame size of 64 -bytes and pads smaller frames (called _runt frames_)[^2]. The -remaining 64 bytes are the oping payload, giving us an 82 byte packet. - -That's it for the raw QoS. The next one is _voice_. A voice service -usually requires packets to be delivered with little delay and jitter -(i.e. ASAP). Out-of-order packets are rejected since they cause -artifacts in the audio output. The voice QoS will enable FRCP, because -it needs to track sequence numbers. - -``` -$ oping -n oping -c 20 -i 200ms -q voice -Pinging oping with 64 bytes of data (20 packets): - -64 bytes from oping: seq=0 time=0.860 ms -64 bytes from oping: seq=2 time=0.704 ms -64 bytes from oping: seq=3 time=0.721 ms -64 bytes from oping: seq=4 time=0.706 ms -64 bytes from oping: seq=5 time=0.721 ms -64 bytes from oping: seq=6 time=0.710 ms -64 bytes from oping: seq=7 time=0.721 ms -64 bytes from oping: seq=8 time=0.691 ms -64 bytes from oping: seq=10 time=0.691 ms -64 bytes from oping: seq=12 time=0.702 ms -64 bytes from oping: seq=13 time=0.730 ms -64 bytes from oping: seq=14 time=0.716 ms -64 bytes from oping: seq=15 time=0.725 ms -64 bytes from oping: seq=16 time=0.709 ms -64 bytes from oping: seq=17 time=0.703 ms -64 bytes from oping: seq=18 time=0.693 ms -64 bytes from oping: seq=19 time=0.666 ms -Server timed out. - ---- oping ping statistics --- -20 packets transmitted, 17 received, 0 out-of-order, 15% packet loss, time: 6004.243 ms -rtt min/avg/max/mdev = 0.666/0.716/0.860/0.040 ms -``` - -As you can see, packets are delivered in-order, and some packets are -missing. Nothing fancy. Let's look at a data packet: - -``` -14:06:05.607699 00:00:00:00:00:00 (oui Ethernet) > 00:00:00:00:00:00 (oui Ethernet), ethertype Unknown (0xa000), length 94: - 0x0000: 0045 004c 0100 0000 eb1e 73ad 0000 0000 .E.L......s..... - 0x0010: 0000 0000 0000 0012 a013 0000 0000 0000 ................ - 0x0020: 705c e53a 0000 0000 0000 0000 0000 0000 p\.:............ - 0x0030: 0000 0000 0000 0000 0000 0000 0000 0000 ................ - 0x0040: 0000 0000 0000 0000 0000 0000 0000 0000 ................ - -``` - -The same 18-byte header is present. The flow endpoint ID is a -different one, and the length is also different. The packet is 94 -bytes, the payload length for the _ipcp-eth_dix_ is 0x004c = 76 -octets. So the FRCP header adds 12 bytes, the total overhead is 30 -bytes. Maybe a bit more detail on the FRCP header contents (more depth -is available the protocol documentation). The first 2 bytes are the -FLAGS (0x0100). There are only 7 flags, it's 16 bits for memory -alignment. This packet only has the DATA bit set. Then follows the -flow control window, which is 0 (not implemented yet). Then we have a -4 byte sequence number (eb1e 73ae = 3944641454)[^3] and a 4 byte ACK -number, which is 0. The remaining 64 bytes are the oping payload. - -Next, the data QoS: - -``` -$ oping -n oping -c 20 -i 200ms -q data -Pinging oping with 64 bytes of data (20 packets): - -64 bytes from oping: seq=0 time=0.932 ms -64 bytes from oping: seq=1 time=0.701 ms -64 bytes from oping: seq=2 time=200.949 ms -64 bytes from oping: seq=3 time=0.817 ms -64 bytes from oping: seq=4 time=0.753 ms -64 bytes from oping: seq=5 time=0.730 ms -64 bytes from oping: seq=6 time=0.726 ms -64 bytes from oping: seq=7 time=0.887 ms -64 bytes from oping: seq=8 time=0.878 ms -64 bytes from oping: seq=9 time=0.883 ms -64 bytes from oping: seq=10 time=0.865 ms -64 bytes from oping: seq=11 time=401.192 ms -64 bytes from oping: seq=12 time=201.047 ms -64 bytes from oping: seq=13 time=0.872 ms -64 bytes from oping: seq=14 time=0.966 ms -64 bytes from oping: seq=15 time=0.856 ms -64 bytes from oping: seq=16 time=0.849 ms -64 bytes from oping: seq=17 time=0.843 ms -64 bytes from oping: seq=18 time=0.797 ms -64 bytes from oping: seq=19 time=0.728 ms - ---- oping ping statistics --- -20 packets transmitted, 20 received, 0 out-of-order, 0% packet loss, time: 4004.491 ms -rtt min/avg/max/mdev = 0.701/40.864/401.192/104.723 ms -``` - -With the data spec, we have no packet loss, but some packets have been -retransmitted (hence the higher latency). The reason for the very high -latency is that the current implementation only ACKs on data packets, -this will be fixed soon. - -Looking at an Ethernet frame, it's again 94 bytes: - -``` -14:35:42.612066 00:00:00:00:00:00 (oui Ethernet) > 00:00:00:00:00:00 (oui Ethernet), ethertype Unknown (0xa000), length 94: - 0x0000: 0044 004c 0700 0000 81b8 0259 e2f3 eb59 .D.L.......Y...Y - 0x0010: 0000 0000 0000 0012 911a 0000 0000 0000 ................ - 0x0020: 86b3 273b 0000 0000 0000 0000 0000 0000 ..';............ - 0x0030: 0000 0000 0000 0000 0000 0000 0000 0000 ................ - 0x0040: 0000 0000 0000 0000 0000 0000 0000 0000 ................ - -``` - -The main difference is that it has 2 flags set (DATA + ACK), and it -thus contains both a sequence number (81b8 0259) and an -acknowledgement (e2f3 eb59). - -That's about it for now. More to come soon. - -Dimitri - -[^1]: Don't you love standards? One of the key design objectives for Ouroboros is exactly to avoid such shenanigans. Modify/abuse a header and Ouroboros should reject it because it _cannot work_, not because some standard says one shouldn't do it. -[^2]: Lesser known fact: Gigabit Ethernet has a 512 byte minimum frame size; but _carrier extension_ handles this transparently. -[^3]: In _network byte order_. \ No newline at end of file diff --git a/content/en/blog/news/20200507-python-lb.png b/content/en/blog/news/20200507-python-lb.png deleted file mode 100644 index 89e710e..0000000 Binary files a/content/en/blog/news/20200507-python-lb.png and /dev/null differ diff --git a/content/en/blog/news/20200507-python.md b/content/en/blog/news/20200507-python.md deleted file mode 100644 index d4b3504..0000000 --- a/content/en/blog/news/20200507-python.md +++ /dev/null @@ -1,74 +0,0 @@ ---- -date: 2020-05-07 -title: "A Python API for Ouroboros" -linkTitle: "Python" -description: "Python" -author: Dimitri Staessens ---- - -Support for other programming languages than C/C++ has been on my todo -list for quite some time. The initial approach was using -[SWIG](http://www.swig.org), but we found the conversion always -clunky, it didn't completely work as we wanted to, and a while back we -just decided to deprecate it. Apart from C/C++ we only had a [rust -wrapper](https://github.com/chritchens/ouroboros-rs). - -Until now! I finally took the time to sink my teeth into the bindings -for Python. I had some brief looks at the -[ctypes](https://docs.python.org/3/library/ctypes.html) library a -while back, but this time I looked into -[cffi](https://cffi.readthedocs.io/en/latest/) and I was amazed at how -simple it was to wrap the more difficult functions that manipulate -blocks of memory (flow\_read, but definitely the async fevent() call). -And now there is path towards a 'nice' Python API. - -Here is a taste of what the -[oecho](https://ouroboros.rocks/cgit/ouroboros/tree/src/tools/oecho/oecho.c) -tool looks like in Python: - -```Python -from ouroboros import * -import argparse - - -def client(): - f = flow_alloc("oecho") - f.writeline("Hello, PyOuroboros!") - print(f.readline()) - f.dealloc() - - -def server(): - print("Starting the server.") - while True: - f = flow_accept() - print("New flow.") - line = f.readline() - print("Message from client is " + line) - f.writeline(line) - f.dealloc() - - -if __name__ == "__main__": - parser = argparse.ArgumentParser(description='A simple echo client/server') - parser.add_argument('-l', '--listen', help='run as a server', action='store_true') - args = parser.parse_args() - if args.listen is True: - server() - else: - client() -``` - -I have more time in the next couple of days, so I expect this to be -released after the weekend. - -Oh, and here is a picture of Ouroboros load-balancing between the C (top right) -and Python (top left) implementations using the C and Python clients: - -{{
}} - -Can't wait to get the full API done! - -Cheers, - -Dimitri diff --git a/content/en/blog/news/20201212-congestion-avoidance.md b/content/en/blog/news/20201212-congestion-avoidance.md deleted file mode 100644 index f395a4f..0000000 --- a/content/en/blog/news/20201212-congestion-avoidance.md +++ /dev/null @@ -1,358 +0,0 @@ ---- -date: 2020-12-12 -title: "Congestion avoidance in Ouroboros" -linkTitle: "Congestion avoidance" -description: "API for congestion avoidance and the Ouroboros MB-ECN algorithm" -author: Dimitri Staessens ---- - -The upcoming 0.18 version of the prototype has a bunch of big -additions coming in, but the one that I'm most excited about is the -addition of congestion avoidance. Now that the implementation is -reaching its final shape, I just couldn't wait to share with the world -what it looks like, so here I'll talk a bit about how it works. - -# Congestion avoidance - -Congestion avoidance is a mechanism for a network to avoid situations -where the where the total traffic offered on a network element (link -or node) systemically exceeds its capacity to handle this traffic -(temporary overload due to traffic burstiness is not -congestion). While bursts can be handled with adding buffers to -network elements, the solution to congestion is to reduce the ingest -of traffic at the network endpoints that are sources for the traffic -over the congested element(s). - -I won't be going into too many details here, but there are two classes -of mechanisms to inform traffic sources of congestion. One is Explicit -Congestion Notification (ECN), where information is sent to the sender -that its traffic is traversing a congested element. This is a solution -that is, for instance, used by -[DataCenter TCP (DCTCP)](https://tools.ietf.org/html/rfc8257), -and is also supported by -[QUIC](https://www.ietf.org/archive/id/draft-ietf-quic-recovery-33.txt). -The other mechanism is implicit congestion detection, for instance by -inferring congestion from packet loss (most TCP flavors) or increases -in round-trip-time (TCP vegas). - -Once the sender is aware that its traffic is experiencing congestion, -it has to take action. A simple (proven) way is the AIMD algorithm -(Additive Increase, Multiplicative Decrease). When there is no sign of -congestion, senders will steadily increase the amount of traffic they -are sending (Additive Increase). When congestion is detected, they -will quickly back off (Multiplicative Decrease). Usually this is -augmented with a Slow Start (Multiplicative Increase) phase when the -senders begins to send, to reach the maximum bandwidth more -quickly. AIMD is used by TCP and QUIC (among others), and Ouroboros is -no different. It's been proven to work mathematically. - -Now that the main ingredients are known, we can get to the -preparation of the course. - -# Ouroboros congestion avoidance - -Congestion avoidance is in a very specific location in the Ouroboros -architecture: at the ingest point of the network; it is the -responsibility of the network, not the client application. In -OSI-layer terminology, we could say that in Ouroboros, it's in "Layer -3", not in "Layer 4". - -Congestion has to be dealt with for each individual traffic -source/destination pair. In TCP this is called a connection, in -Ouroboros we call it a _flow_. - -Ouroboros _flows_ are abstractions for the most basic of packet flows. -A flow is defined by two endpoints and all that a flow guarantees is -that there exist strings of bytes (packets) that, when offered at the -ingress endpoint, have a non-zero chance of emerging at the egress -endpoint. I say 'there exist' to allow, for instance, for maximum -packet lengths. If it helps, think of flow endpoints as an IP:UDP -address:port pair (but emphatically _NOT_ an IP:TCP address:port -pair). There is no protocol assumed for the packets that traverse the -flow. To the ingress and egress point, they are just a bunch of bytes. - -Now this has one major implication: We will need to add some -information to these packets to infer congestion indirectly or -explicitly. It should be obvious that explicit congestion notification -is the simplest solution here. The Ouroboros prototype (currently) -allows an octet for ECN. - -# Functional elements of the congestion API - -This section glances over the API in an informal way. A reference -manual for the actual C API will be added after 0.18 is in the master -branch of the prototype. The most important thing to keep in mind is -that the architecture dictates this API, not any particular algorithm -for congestion that we had in mind. In fact, to be perfectly honest, -up front I wasn't 100% sure that congestion avoidance was feasible -without adding additional fields fields to the DT protocol, such as a -packet counter, or sending some feedback for measuring the Round-Trip -Time (RTT). But as the algorithm below will show, it can be done. - -When flows are created, some state can be stored, which we call the -_congestion context_. For now it's not important to know what state is -stored in that context. If you're familiar with the inner workings of -TCP, think of it as a black-box generalization of the _tranmission -control block_. Both endpoints of a flow have such a congestion -context. - -At the sender side, the congestion context is updated for each packet -that is sent on the flow. Now, the only information that is known at -the ingress is 1) that there is a packet to be sent, and 2) the length -of this packet. The call at the ingress is thus: - -``` - update_context_at_sender -``` - -This function has to inform when it is allowed to actually send the -packet, for instance by blocking for a certain period. - -At the receiver flow endpoint, we have a bit more information, 1) that -a packet arrived, 2) the length of this packet, and 3) the value of -the ECN octet associated with this packet. The call at the egress is -thus: - -``` - update_context_at_receiver -``` - -Based on this information, receiver can decide if and when to update -the sender. We are a bit more flexible in what can be sent, at this -point, the prototype allows sending a packet (which we call -FLOW_UPDATE) with a 16-bit Explicit Congestion Experienced (ECE) field. - -This implies that the sender can get this information from the -receiver, so it knows 1) that such a packet arrived, and 2) the value -of the ECE field. - -``` - update_context_at_sender_ece -``` - -That is the API for the endpoints. In each Ouroboros IPCP (think -'router'), the value of the ECN field is updated. - -``` - update_ecn_in_router -``` - -That's about as lean as as it gets. Now let's have a look at the -algorithm that I designed and -[implemented](https://ouroboros.rocks/cgit/ouroboros/tree/src/ipcpd/unicast/pol/ca-mb-ecn.c?h=be) -as part of the prototype. - -# The Ouroboros multi-bit Forward ECN (MB-ECN) algorithm - -The algorithm is based on the workings of DataCenter TCP -(DCTCP). Before I dig into the details, I will list the main -differences, without any judgement. - -* The rate for additive increase is the same _constant_ for all flows - (but could be made configurable for each network layer if - needed). This is achieved by having a window that is independent of - the Round-Trip Time (RTT). This may make it more fair, as congestion - avoidance in DCTCP (and in most -- if not all -- TCP variants), is - biased in favor of flows with smaller RTT[^1]. - -* Because it is operating at the _flow_ level, it estimates the - _actual_ bandwidth sent, including retransmissions, ACKs and what - not from protocols operating on the flow. DCTCP estimates bandwidth - based on which data offsets are acknowledged. - -* The algorithm uses 8 bits to indicate the queue depth in each - router, instead of a single bit (due to IP header restrictions) for - DCTCP. - -* MB-ECN sends a (small) out-of-band FLOW_UPDATE packet, DCTCP updates - in-band TCP ECN/ECE bits in acknowledgment (ACK) packets. Note that - DCTCP sends an immediate ACK with ECE set at the start of - congestion, and sends an immediate ACK with ECE not set at the end - of congestion. Otherwise, the ECE is set accordingly for any - "regular" ACKs. - -* The MB-ECN algorithm can be implemented without the need for - dividing numbers (apart from bit shifts). At least in the linux - kernel implementation, DCTCP has a division for estimating the - number of bytes that experienced congestion from the received acks - with ECE bits set. I'm not sure this can be avoided[^2]. - -Now, on to the MB-ECN algorithm. The values for some constants -presented here have only been quickly tested; a _lot_ more scientific -scrutiny is definitely needed here to make any statements about the -performance of this algorithm. I will just explain the operation, and -provide some very preliminary measurement results. - -First, like DCTCP, the routers mark the ECN field based on the -outgoing queue depth. The current minimum queue depth to trigger and -ECN is 16 packets (implemented as a bit shift of the queue size when -writing a packet). We perform a logical OR with the previous value of -the packet. If the width of the ECN field would be a single bit, this -operation would be identical to DCTCP. - -At the _receiver_ side, the context maintains two state variables. - -* The floating sum (ECE) of the value of the (8-bit) ECN field over the -last 2N packets is maintained (currently N=5, so 32 -packets). This is a value between 0 and 28 + 5 - 1. - -* The number of packets received during a period of congestion. This - is just for internal use. - -If th ECE value is 0, no actions are performed at the receiver. - -If this ECE value becomes higher than 0 (there is some indication of -start of congestion), an immediate FLOW_UPDATE is sent with this -value. If a packet arrives with ECN = 0, the ECE value is _halved_. - -For every _increase_ in the ECE value, an immediate update is sent. - -If the ECE value remains stable or decreases, an update is sent only -every M packets (currently, M = 8). This is what the counter is for. - -If the ECE value returns to 0 after a period of congestion, an -immediate FLOW_UPDATE with the value 0 is sent. - -At the _sender_ side, the context keeps track of the actual congestion -window. The sender keeps track of: - -* The current sender ECE value, which is updated when receiving a - FLOW_UPDATE. - -* A bool indicating Slow Start, which is set to false when a - FLOW_UPDATE arrives. - -* A sender_side packet counter. If this exceeds the value of N, the - ECE is reset to 0. This protects the sender from lost FLOW_UPDATES - that signal the end of congestion. - -* The window size multiplier W. For all flows, the window starts at a - predetermined size, 2W ns. Currently W = 24, starting at - about 16.8ms. The power of 2 allows us to perform operations on the - window boundaries using bit shift arithmetic. - -* The current window start time (a single integer), based on the - multiplier. - -* The number of packets sent in the current window. If this is below a - PKT_MIN threshold before the start of a window period, the new - window size is doubled. If this is above a PKT_MAX threshold before - the start of a new window period, the new window size is halved. The - thresholds are currently set to 8 and 64, scaling the window width - to average sending ~36 packets in a window. When the window scales, - the value for the allowed bytes to send in this window (see below) - scales accordingly to keep the sender bandwidth at the same - level. These values should be set with the value of N at the - receiver side in mind. - -* The number bytes sent in this window. This is updated when sending - each packet. - -* The number of allowed bytes in this window. This is calculated at - the start of a new window: doubled at Slow Start, multiplied by a - factor based on sender ECE when there is congestion, and increased - by a fixed (scaled) value when there is no congestion outside of - Slow Start. Currently, the scaled value is 64KiB per 16.8ms. - -There is one caveat: what if no FLOW_UPDATE packets arrive at all? -DCTCP (being TCP) will timeout at the Retransmission TimeOut (RTO) -value (since its ECE information comes from ACK packets), but this -algorithm has no such mechanism at this point. The answer is that we -currently do not monitor flow liveness from the flow allocator, but a -Keepalive or Bidirectional Forwarding Detection (BFD)-like mechanism -for flows should be added for QoS maintenance, and can serve to -timeout the flow and reset it (meaning a full reset of the -context). - -# MB-ECN in action - -From version 0.18 onwards[^3], the state of the flow -- including its -congestion context -- can be monitored from the flow allocator -statics: - -```bash -$ cat /tmp/ouroboros/unicast.1/flow-allocator/66 -Flow established at: 2020-12-12 09:54:27 -Remote address: 99388 -Local endpoint ID: 2111124142794211394 -Remote endpoint ID: 4329936627666255938 -Sent (packets): 1605719 -Sent (bytes): 1605719000 -Send failed (packets): 0 -Send failed (bytes): 0 -Received (packets): 0 -Received (bytes): 0 -Receive failed (packets): 0 -Receive failed (bytes): 0 -Congestion avoidance algorithm: Multi-bit ECN -Upstream congestion level: 0 -Upstream packet counter: 0 -Downstream congestion level: 48 -Downstream packet counter: 0 -Congestion window size (ns): 65536 -Packets in this window: 7 -Bytes in this window: 7000 -Max bytes in this window: 51349 -Current congestion regime: Multiplicative dec -``` - -I ran a quick test using the ocbr tool (modified to show stats every -100ms) on a jFed testbed using 3 Linux servers (2 clients and a -server) in star configuration with a 'router' (a 4th Linux server) in -the center. The clients are connected to the 'router' over Gigabit -Ethernet, the link between the 'router' and server is capped to 100Mb -using ethtool[^4]. - -Output from the ocbr tool: - -``` -Flow 64: 998 packets ( 998000 bytes)in 101 ms => 9880.8946 pps, 79.0472 Mbps -Flow 64: 1001 packets ( 1001000 bytes)in 101 ms => 9904.6149 pps, 79.2369 Mbps -Flow 64: 999 packets ( 999000 bytes)in 101 ms => 9882.8697 pps, 79.0630 Mbps -Flow 64: 998 packets ( 998000 bytes)in 101 ms => 9880.0143 pps, 79.0401 Mbps -Flow 64: 999 packets ( 999000 bytes)in 101 ms => 9887.6627 pps, 79.1013 Mbps -Flow 64: 999 packets ( 999000 bytes)in 101 ms => 9891.0891 pps, 79.1287 Mbps -New flow. -Flow 64: 868 packets ( 868000 bytes)in 102 ms => 8490.6583 pps, 67.9253 Mbps -Flow 65: 542 packets ( 542000 bytes)in 101 ms => 5356.5781 pps, 42.8526 Mbps -Flow 64: 540 packets ( 540000 bytes)in 101 ms => 5341.5105 pps, 42.7321 Mbps -Flow 65: 534 packets ( 534000 bytes)in 101 ms => 5285.6111 pps, 42.2849 Mbps -Flow 64: 575 packets ( 575000 bytes)in 101 ms => 5691.4915 pps, 45.5319 Mbps -Flow 65: 535 packets ( 535000 bytes)in 101 ms => 5291.0053 pps, 42.3280 Mbps -Flow 64: 561 packets ( 561000 bytes)in 101 ms => 5554.3455 pps, 44.4348 Mbps -Flow 65: 533 packets ( 533000 bytes)in 101 ms => 5272.0079 pps, 42.1761 Mbps -Flow 64: 569 packets ( 569000 bytes)in 101 ms => 5631.3216 pps, 45.0506 Mbps -``` - -With only one client running, the flow is congestion controlled to -about ~80Mb/s (indicating the queue limit at 16 packets may be a bit -too low a bar). When the second client starts sending, both flows go -quite quickly (at most 100ms) to a fair state of about 42 Mb/s. - -The IO graph from wireshark shows a reasonably stable profile (i.e. no -big oscillations because of AIMD), when switching the flows on the -clients on and off which is on par with DCTCP and not unexpected -keeping in mind the similarities between the algorithms: - -{{
}} - -The periodic "gaps" were not seen at the ocbr endpoint applicationand -may have been due to tcpdump not capturing everything that those -points, or possibly a bug somewhere. - -As said, a lot more work is needed analyzing this algorithm in terms -of performance and stability[^5]. But I am feeling some excitement about its -simplicity and -- dare I say it? -- elegance. - -Stay curious! - -Dimitri - -[^1]: Additive Increase increases the window size with 1 MSS each - RTT. Slow Start doubles the window size each RTT. - -[^2]: I'm pretty sure the kernel developers would if they could. -[^3]: Or the current "be" branch for the less patient. -[^4]: Using Linux traffic control (```tc```) to limit traffic adds - kernel queues and may interfere with MB-ECN. -[^5]: And the prototype implementation as a whole! diff --git a/content/en/blog/news/20201212-congestion.png b/content/en/blog/news/20201212-congestion.png deleted file mode 100644 index 8e5b89f..0000000 Binary files a/content/en/blog/news/20201212-congestion.png and /dev/null differ diff --git a/content/en/blog/news/20201219-congestion-avoidance.md b/content/en/blog/news/20201219-congestion-avoidance.md deleted file mode 100644 index 7391091..0000000 --- a/content/en/blog/news/20201219-congestion-avoidance.md +++ /dev/null @@ -1,313 +0,0 @@ ---- -date: 2020-12-19 -title: "Exploring Ouroboros with wireshark" -linkTitle: "Exploring Ouroboros with wireshark " -description: "" -author: Dimitri Staessens ---- - -I recently did some -[quick tests](/blog/2020/12/12/congestion-avoidance-in-ouroboros/#mb-ecn-in-action) -with the new congestion avoidance implementation, and thought to -myself that it was a shame that Wireshark could not identify the -Ouroboros flows, as that could give me some nicer graphs. - -Just to be clear, I think generic network tools like tcpdump and -wireshark -- however informative and nice-to-use they are -- are a -symptom of a lack of network security. The whole point of Ouroboros is -that it is _intentionally_ designed to make it hard to analyze network -traffic. Ouroboros is not a _network stack_[^1]: one can't simply dump -a packet from the wire and derive the packet contents all the way up -to the application by following identifiers for protocols and -well-known ports. Using encryption to hide the network structure from -the packet is shutting the door after the horse has bolted. - -To write an Ouroboros dissector, one needs to know the layered -structure of the network at the capturing point at that specific point -in time. It requires information from the Ouroboros runtime on the -capturing machine and at the exact time of the capture, to correctly -analyze traffic flows. I just wrote a dissector that works for my -specific setup[^2]. - -## Congestion avoidance test - -First, a quick refresh on the experiment layout, it's the the same -4-node experiment as in the -[previous post](/blog/2020/12/12/congestion-avoidance-in-ouroboros/#mb-ecn-in-action) - -{{
}} - -I tried to draw the setup as best as I can in the figure above. - -There are 4 rack mounted 1U servers, connected over Gigabit Ethernet -(GbE). Physically there is a big switch connecting all of them, but -each "link" is separated as a port-based VLAN, so there are 3 -independent Ethernet segments. We create 3 ethernet _layers_, drawn -in a lighter gray, with a single unicast layer -- consisting of 4 -unicast IPC processes (IPCPs) -- on top, drawn in a darker shade of -gray. The link between the router and server has been capped to 100 -megabit/s using ```ethtool```[^3], and traffic is captured on the -Ethernet NIC at the "Server" node using ```tcpdump```. All traffic is -generated with our _constant bit rate_ ```ocbr``` tool trying to send -about 80 Mbit/s of application-level throughput over the unicast -layer. - -{{
}} - -The graph above shows the bandwidth -- as captured on the congested -100Mbit Ethernet link --, separated for each traffic flow, from the -same pcap capture as in my previous post. A flow can be identified by -a (destination address, endpoint ID)-pair, and since the destination -is all the same, I could filter out the flows by simply selecting them -based on the (64-bit) endpoint identifier. - -What you're looking at is that first, a flow (green starts), at around -T=14s, a new flow enters (red) that stops at around T=24s. At around -T=44s, another flow enters (blue) for about 14 seconds, and finally, a -fourth (orange) flow enters at T=63s. The first (green) flow exits at -around T=70s, leaving all the available bandwidth for the orange flow. - -The most important thing that I wanted to check is that when there are -multiple flows, _if_ and _how fast_ they would converge to the same -bandwidth. I'm not dissatisfied with the initial result: the answers -seem to be _yes_ and _pretty fast_, with no observable oscillation to -boot[^4] - -## Protocol overview - -Now, the wireshark dissector can be used to present some more details -about the Ouroboros protocols in a familiar setting -- make it more -accessible to some -- so, let's have a quick look. - -The Ouroboros network protocol has -[5 fields](/docs/concepts/protocols/#network-protocol): - -``` -| DST | TTL | QOS | ECN | EID | -``` - -which we had to map to the Ethernet II protocol for our ipcpd-eth-dix -implementation. The basic Ethernet II MAC (layer-2) header is pretty -simple. It has 2 6-byte addresses (dst, src) and a 2-byte Ethertype. - -Since Ethernet doesn't do QoS or congestion, the main missing field -here is the EID. We could have mapped it to the Ethertype, but we -noticed that a lot of routers and switches drop unknown Ethertypes -(and, for the purposes of this blog post here: it would have all but -prevented to write the dissector). So we made the ethertype -configurable per layer (so it can be set to a value that is not -blocked by the network), and added 2 16-bit fields after the Ethernet -MAC header for an Ouroboros layer: - -* Endpoint ID **eid**, which works just like in the unicast layer, to - identify the N+1 application (in our case: a data transfer flow and - a management flow for a unicast IPC process). - -* A length field **len**, which is needed because Ethernet NICs pad - frames that are smaller than 64 bytes in length with trailing zeros - (and we receive these zeros in our code). A length field is present - in Ethernet type I, but since most "Layer 3" protocols also had a - length field, it was re-purposed as Ethertype in Ethernet II. The - value of the **len** field is the length of the **data** payload. - -The Ethernet layer that spans that 100Mbit link has Ethertype 0xA000 -set (which is the Ouroboros default), the Ouroboros plugin hooks into -that ethertype. - -On top of the Ethernet layer, we have a unicast, layer with the 5 -fields specified above. The dissector also shows the contents of the -flow allocation messages, which are (currently) sent to EID = 0. - -So, the protocol header as analysed in the experiment is, starting -from the "wire": - -``` -+---------+---------+-----------+-----+-----+------ -| dst MAC | src MAC | Ethertype | eid | len | data /* ETH LAYER */ -+---------+---------+-----------+-----+-----+------ - - /* eid == 0 -> ipcpd-eth flow allocator, */ - /* this is not analysed */ - -+-----+-----+-----+-----+-----+------ -| DST | QOS | TTL | ECN | EID | DATA /* UNICAST LAYER */ -+-----+-----+-----+-----+-----+------ - - /* EID == 0 -> flow allocator */ - -+-----+-------+-------+------+------+-----+-------------+ -| SRC | R_EID | S_EID | CODE | RESP | ECE | ... QOS ....| /* FA */ -+-----+-------+-------+------+------+-----+-------------+ -``` - -## The network protocol - -{{
}} - -We will first have a look at packets captured around the point in time -where the second (red) flow enters the network, about 14 seconds into -the capture. The "N+1 Data" packets in the image above all belong to -the green flow. The ```ocbr``` tool that we use sends 1000-byte data -units that are zeroed-out. The packet captured on the wire is 1033 -bytes in length, so we have a protocol overhead of 33 bytes[^5]. We -can break this down to: - -``` - ETHERNET II HEADER / 14 / - 6 bytes Ethernet II dst - 6 bytes Ethernet II src - 2 bytes Ethernet II Ethertype - OUROBOROS ETH-DIX HEADER / 4 / - 2 bytes eid - 2 byte len - OUROBOROS UNICAST NETWORK HEADER / 15 / - 4 bytes DST - 1 byte QOS - 1 byte TTL - 1 byte ECN - 8 bytes EID - --- TOTAL / 33 / - 33 bytes -``` - -The **Data (1019 bytes)** reported by wireshark is what Ethernet II -sees as data, and thus includes the 19 bytes for the two Ouroboros -headers. Note that DST length is configurable, currently up to 64 -bits. - -Now, let's have a brief look at the values for these fields. The -**eid** is 65, this means that the _data-transfer flow_ established -between the unicast IPCPs on the router and the server (_uni-r_ and -_uni-s_ in our experiment figure) is identified by endpoint id 65 in -the eth-dix IPCP on the Server machine. The **len** is 1015. Again, no -surprises, this is the length of the Ouroboros unicast network header -(15 bytes) + the 1000 bytes payload. - -**DST**, the destination address is 4135366193, a 32-bit address -that was randomly assigned to the _uni-s_ IPCP. The QoS cube is 0, -which is the default best-effort QoS class. *TTL* is 59. The starting -TTL is configurable for a layer, the default is 60, and it was -decremented by 1 in the _uni-r_ process on the router node. The packet -experienced no congestion (**ECN** is 0), and the endpoint ID is a -64-bit random number, 475...56. This endpoint ID identifies the flow -endpoint for the ```ocbr``` server. - -## The flow request - -{{
}} - -The first "red" packet that was captured is the one for the flow -allocation request, **FLOW REQUEST**[^6]. As mentioned before, the -endpoint ID for the flow allocator is 0. - -A rather important remark is in place here: Ouroboros does not allow a -UDP-like _datagram service_ from a layer. With which I mean: fabricate -a packet with the correct destination address and some known EID and -dump it in the network. All traffic that is offered to an Ouroboros -layer requires a _flow_ to be allocated. This keeps the network layer -in control its resources; the protocol details inside a layer are a -secret to that layer. - -Now, what about that well-known EID=0 for the flow allocator (FA)? And -the directory (Distributed Hash Table, DHT) for that matter, which is -currently on EID=1? Doesn't that contradict the "no datagram service" -statement above? Well, no. These components are part of the layer and -are thus inside the layer. The DHT and FA are internal -components. They are direct clients of the Data Transfer component. -The globally known EID for these components is an absolute necessity -since they need to be able to reach endpoints more than a hop -(i.e. a flow in a lower layer) away. - -Let's now look inside that **FLOW REQUEST** message. We know it is a -request from the **msg code** field[^7]. - -This is the **only** packet that contains the source (and destination) -address for this flow. There is a small twist, this value is decoded -with different _endianness_ than the address in the DT protocol output -(probably a bug in my dissector). The source address 232373199 in the -FA message corresponds to the address 3485194509 in the DT protocol -(and in the experiment image at the top): the source of our red flow -is the "Client 2" node. Since this is a **FLOW REQUEST**, the remote -endpoint id is not yet known, and set to 0[^8. The source endpoint ID --- a 64-bit randomly generated value unique to the source IPC -process[^9] -- is sent to the remote. The other fields are not -relevant for this message. - -## The flow reply - -{{
}} - -Now, the **FLOW REPLY** message for our request. It originates our -machine, so you will notice that the TTL is the starting value of 60. -The destination address is what we sent in our original **FLOW -REQUEST** -- add some endianness shenanigans. The **FLOW REPLY** -mesage response sends the newly generated source endpoint[^10] ID, and -this packet is the **only** packet that contains both endpoint IDs -for this flow. - -## Congestion / flow update - -{{
}} - -Now a quick look at the congestion avoidance mechanisms. The -information for the Additive Increase / Multiple Decrease algorithm is -gathered from the **ECN** field in the packets. When both flows are -active, they experience congestion since the requested bandwidth from -the two ```ocbr``` clients (180Mbit) exceeds the 100Mbit link, and the -figure above shows a packet marked with an ECN value of 11. - -{{
}} - -When the packets on a flow experience congestion, the flow allocator -at the endpoint (the one our _uni-s_ IPCP) will update the sender with -an **ECE** _Explicit Congestion Experienced_ value; in this case, 297. -The higher this value, the quicker the sender will decrease its -sending rate. The algorithm is explained a bit in my previous -post. - -That's it for today's post, I hope it provides some new insights how -Ouroboros works. As always, stay curious. - -Dimitri - -[^1]: Neither is RINA, for that matter. - -[^2]: This quick-and-dirty dissector is available in the - ouroboros-eth-uni branch on my - [github](https://github.com/dstaesse/wireshark/) - -[^3]: The prototype is able to handle Gigabit Ethernet, this is mostly - to make the size of the capture files somewhat manageable. - -[^4]: Of course, this needs more thorough evaluation with more - clients, distributions on the latency, different configurations - for the FRCP protocol in the N+1 and all that jazz. I have, - however, limited amounts of time to spare and am currently - focusing on building and documenting the prototype and tools so - that more thorough evaluations can be done if someone feels like - doing them. - -[^5]: A 4-byte Ethernet Frame Check Sequence (FCS) is not included in - the 'bytes on the wire'. As a reference, the minimum overhead - for this kind of setup using UDP/IPv4 is 14 bytes Ethernet + 20 - bytes IPv4 + 8 bytes UDP = 42 bytes. - -[^6]: Actually, in a larger network there could be some DHT traffic - related to resolving the address, but in such a small network, - the DHT is basically a replicated database between all 4 nodes. - -[^7]: The reason it's not the first field in the protocol has to to - with performance of memory alignment in x86 architectures. - -[^8]: We haven't optimised the FA protocol not to send fields it - doesn't need for that particular message type -- yet. - -[^9]: Not the host machine, but that particular IPCP on the host - machine. You can have multiple IPCPs for the same layer on the - same machine, but in this case, expect correlation between their - addresses. 64-bits / IPCP should provide some security against - remotes trying to hack into another service on the same host by - guessing EIDs. - -[^10]: This marks the point in space-time where I notice the - misspelling in the dissector. \ No newline at end of file diff --git a/content/en/blog/news/20201219-congestion.png b/content/en/blog/news/20201219-congestion.png deleted file mode 100644 index 5675438..0000000 Binary files a/content/en/blog/news/20201219-congestion.png and /dev/null differ diff --git a/content/en/blog/news/20201219-exp.svg b/content/en/blog/news/20201219-exp.svg deleted file mode 100644 index 68e09e2..0000000 --- a/content/en/blog/news/20201219-exp.svg +++ /dev/null @@ -1 +0,0 @@ - \ No newline at end of file diff --git a/content/en/blog/news/20201219-ws-0.png b/content/en/blog/news/20201219-ws-0.png deleted file mode 100644 index fd7a83a..0000000 Binary files a/content/en/blog/news/20201219-ws-0.png and /dev/null differ diff --git a/content/en/blog/news/20201219-ws-1.png b/content/en/blog/news/20201219-ws-1.png deleted file mode 100644 index 0f07fd0..0000000 Binary files a/content/en/blog/news/20201219-ws-1.png and /dev/null differ diff --git a/content/en/blog/news/20201219-ws-2.png b/content/en/blog/news/20201219-ws-2.png deleted file mode 100644 index 7cd8b7d..0000000 Binary files a/content/en/blog/news/20201219-ws-2.png and /dev/null differ diff --git a/content/en/blog/news/20201219-ws-3.png b/content/en/blog/news/20201219-ws-3.png deleted file mode 100644 index 2a6f6d5..0000000 Binary files a/content/en/blog/news/20201219-ws-3.png and /dev/null differ diff --git a/content/en/blog/news/20201219-ws-4.png b/content/en/blog/news/20201219-ws-4.png deleted file mode 100644 index 3a0ef8c..0000000 Binary files a/content/en/blog/news/20201219-ws-4.png and /dev/null differ diff --git a/content/en/blog/news/_index.md b/content/en/blog/news/_index.md deleted file mode 100644 index c10cfa2..0000000 --- a/content/en/blog/news/_index.md +++ /dev/null @@ -1,5 +0,0 @@ ---- -title: "News About Docsy" -linkTitle: "News" -weight: 20 ---- -- cgit v1.2.3