aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorDimitri Staessens <dimitri@ouroboros.rocks>2020-02-12 21:26:43 +0100
committerDimitri Staessens <dimitri@ouroboros.rocks>2020-02-12 21:26:43 +0100
commit31ef8203bc6e2df3b6e102a1d1979e1e756367e8 (patch)
tree8991c477c28ea8793596ff19d4c1553df503623a
parent7aaa53420e9557a11d83793addcac92cab3bd5d2 (diff)
downloadwebsite-31ef8203bc6e2df3b6e102a1d1979e1e756367e8.tar.gz
website-31ef8203bc6e2df3b6e102a1d1979e1e756367e8.zip
blog: Add a blog post on ecmp
-rw-r--r--content/en/blog/news/20200212-ecmp.md70
1 files changed, 70 insertions, 0 deletions
diff --git a/content/en/blog/news/20200212-ecmp.md b/content/en/blog/news/20200212-ecmp.md
new file mode 100644
index 0000000..74a39de
--- /dev/null
+++ b/content/en/blog/news/20200212-ecmp.md
@@ -0,0 +1,70 @@
+---
+date: 2020-02-12
+title: "Equal-Cost Multipath (ECMP)"
+linkTitle: "Adding Equal-Cost multipath (ECMP)"
+description: "ECMP is coming to Ouroboros (finally)"
+author: Dimitri Staessens
+---
+
+Some recent news -- Multi-Path TCP (MPTCP) implementation is [landing
+in mainstream Linux kernel
+5.6](https://www.phoronix.com/scan.php?page=news_item&px=Linux-5.6-Starts-Multipath-TCP)
+-- finally got me to integrate the equal-cost multipath (ECMP)
+implementation from [Nick Aerts's master
+thesis](https://lib.ugent.be/nl/catalog/rug01:002494958) into
+Ouroboros. And working on the ECMP implementation in gives me an
+excuse to rant a little bit about MPTCP.
+
+The first question that comes to mind is: _Why is it called
+multi-**path** TCP_? IP is routing packets, not TCP, and there are
+equal-cost multipath options for IP in both [IS-IS and
+OSPF](https://tools.ietf.org/html/rfc2991). Maybe _multi-flow TCP_
+would be a better name? This would also be more transparent to the
+fact that running MPTCP over longer hops will make less sense, since
+the paths are more likely to converge over the same link.
+
+So _why is there a need for multi-path TCP_? The answer, of course, is
+that the Internet Protocol routes packets between IP endpoints, which
+are _interfaces_, not _hosts_. So, if a server is connected over 4
+interfaces, ECMP routing will not be of any help if one of them goes
+down. The TCP connections will time out. Multipath TCP, however, is
+actually establishing 4 subflows, each over a different interface. If
+an interface goes down, MPTCP will still have 3 subflows ready. The
+application is listening the the main TCP connection, and will not
+notice a TCP-subflow timing out[^1].
+
+This brings us, of course, to the crux of the problem. IP names the
+[point of attachment](https://tools.ietf.org/html/rfc1498); IP
+addresses are assigned to interfaces. Another commonly used workaround
+is a virtual IP interface on the loopback, but then you need a lot of
+additional configuration (and if that were the perfect solution, one
+wouldn't need MPTCP!). MPTCP avoids the network configuration mess,
+but does require direct modification in the application using
+[additions to the sockets
+API](https://tools.ietf.org/html/draft-hesmans-mptcp-socket-03) in the
+form of a bunch of (ugly) setsockopts.
+
+Now this is a far from ideal situation, but given its constraints,
+MPTCP is a workable engineering solution that will surely see its
+uses. It's strange that it took years for MPTCP to get to this stage.
+
+Now, of course, Ouroboros does not assign addresses to
+points-of-attachments ( _flow endpoints_). It doesn't even assign
+addresses to hosts/nodes! Instead, the address is derived from the
+forwarding protocol machines inside each node. (For the details, see
+the [article](https://arxiv.org/pdf/2001.09707.pdf)). The net effect
+is that an ECMP routing algorithm can cleanly handle hosts with
+multiple interfaces. Details about the routing algorithm are not
+exposed to application APIs. Instead, Ouroboros applications request
+an implementation-independent _service_.
+
+The ECMP patch for Ouroboros is coming _soon_. Once it's available I
+will also add a couple of tutorials on it.
+
+Peace.
+
+Dimitri
+
+[^1]: Question: Why are the subflows not UDP? That would avoid a lot
+of duplicated overhead (sequence numbers etc)... Would it be too messy
+on the socket API side? \ No newline at end of file