diff options
Diffstat (limited to 'content/en/blog')
-rw-r--r-- | content/en/blog/news/20200212-ecmp.md | 70 |
1 files changed, 70 insertions, 0 deletions
diff --git a/content/en/blog/news/20200212-ecmp.md b/content/en/blog/news/20200212-ecmp.md new file mode 100644 index 0000000..74a39de --- /dev/null +++ b/content/en/blog/news/20200212-ecmp.md @@ -0,0 +1,70 @@ +--- +date: 2020-02-12 +title: "Equal-Cost Multipath (ECMP)" +linkTitle: "Adding Equal-Cost multipath (ECMP)" +description: "ECMP is coming to Ouroboros (finally)" +author: Dimitri Staessens +--- + +Some recent news -- Multi-Path TCP (MPTCP) implementation is [landing +in mainstream Linux kernel +5.6](https://www.phoronix.com/scan.php?page=news_item&px=Linux-5.6-Starts-Multipath-TCP) +-- finally got me to integrate the equal-cost multipath (ECMP) +implementation from [Nick Aerts's master +thesis](https://lib.ugent.be/nl/catalog/rug01:002494958) into +Ouroboros. And working on the ECMP implementation in gives me an +excuse to rant a little bit about MPTCP. + +The first question that comes to mind is: _Why is it called +multi-**path** TCP_? IP is routing packets, not TCP, and there are +equal-cost multipath options for IP in both [IS-IS and +OSPF](https://tools.ietf.org/html/rfc2991). Maybe _multi-flow TCP_ +would be a better name? This would also be more transparent to the +fact that running MPTCP over longer hops will make less sense, since +the paths are more likely to converge over the same link. + +So _why is there a need for multi-path TCP_? The answer, of course, is +that the Internet Protocol routes packets between IP endpoints, which +are _interfaces_, not _hosts_. So, if a server is connected over 4 +interfaces, ECMP routing will not be of any help if one of them goes +down. The TCP connections will time out. Multipath TCP, however, is +actually establishing 4 subflows, each over a different interface. If +an interface goes down, MPTCP will still have 3 subflows ready. The +application is listening the the main TCP connection, and will not +notice a TCP-subflow timing out[^1]. + +This brings us, of course, to the crux of the problem. IP names the +[point of attachment](https://tools.ietf.org/html/rfc1498); IP +addresses are assigned to interfaces. Another commonly used workaround +is a virtual IP interface on the loopback, but then you need a lot of +additional configuration (and if that were the perfect solution, one +wouldn't need MPTCP!). MPTCP avoids the network configuration mess, +but does require direct modification in the application using +[additions to the sockets +API](https://tools.ietf.org/html/draft-hesmans-mptcp-socket-03) in the +form of a bunch of (ugly) setsockopts. + +Now this is a far from ideal situation, but given its constraints, +MPTCP is a workable engineering solution that will surely see its +uses. It's strange that it took years for MPTCP to get to this stage. + +Now, of course, Ouroboros does not assign addresses to +points-of-attachments ( _flow endpoints_). It doesn't even assign +addresses to hosts/nodes! Instead, the address is derived from the +forwarding protocol machines inside each node. (For the details, see +the [article](https://arxiv.org/pdf/2001.09707.pdf)). The net effect +is that an ECMP routing algorithm can cleanly handle hosts with +multiple interfaces. Details about the routing algorithm are not +exposed to application APIs. Instead, Ouroboros applications request +an implementation-independent _service_. + +The ECMP patch for Ouroboros is coming _soon_. Once it's available I +will also add a couple of tutorials on it. + +Peace. + +Dimitri + +[^1]: Question: Why are the subflows not UDP? That would avoid a lot +of duplicated overhead (sequence numbers etc)... Would it be too messy +on the socket API side?
\ No newline at end of file |