| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
| |
The fset add function was notifying for each packet already stored in
the rx rbuff, which isn't needed.
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It doesn't really make sense to manually and one-sidedly configure the
timeout of delayed acknowledgements, as setting it too high upsets the
peer's sRTT estimates. Even worse, it also causes a lot of spurious
retransmissions if it exceeds the sRTT mean deviation calculated by
the receiver. Compensating on bare acknowledgment for the ack delay
could improve the RTT estimate deviation, but not the spurious
retransmissions if it was set too high. This sets the delayed ack to
wait for a single RTT mean deviation. Probably needs more tweaking to
further reduce differences between the RTT estimates at the sender and
receiver, e.g. compensate the RTT estimate for delayed acks, or
increase the RTO to add 8 mdevs to sRTT instead of 4. However, it
looks like the mdev estimate is the trickiest one to get to sync, not
the RTT average. Linux reduces the sample weight for mdev from 1/4 to
1/32 in some cases, will give that a shot some day too to see if that
further align sRTT estimates. In any case, this patch already improves
things a lot.
Also fixes a bug where the sender was sending acknowlegments on the
first packets in flight for the 0 sequence number. The receiver
activity was measured in seconds but compared to a timeout value in
nanoseconds.
There's still a lot of spurious retransmissions that start after
actual packet loss occurs, I'm still investigating what causes it.
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|
|
|
|
|
|
|
|
|
|
| |
This exposes some additional metrics relating to FRCT / Flow control:
the number of duplicate packets received, number of packets received
out of the flow control window and / or reordering queue, and the
number of rendez-vous messages sent.
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There still were a couple of bugs in the timerwheel. If the future
schedule was coinciding with the slot currently being processed
(i.e. exactly RXMQ_SLOTS in the future), the list_add_tail caused an
infinite loop. Another bug was causing the slots at higher levels to
be processed too soon.
Retransmissions should now schedule correctly.
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The timerwheel was retransmitting packets and the error check for
negative values of the rbuff allocation was instead checking for
non-zero values, causing a buffer allocation to succeed but the
program to continue down the unhappy path leaving that packet stuck in
the buffer unattended.
Also fixes wrongly scheduled retransmissions that cause packet storms.
FRCP is much more stable now. Still needs some work for high
bandwidth-delay products (fast-retransmit).
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The maximum packet lifetime (MPL) is a property of the flow that needs
to be passed to the reliable transmission protocol (FRCP) for its
correct operation. Previously, the value of MPL was set fixed as one
of the (fixed) Delta-t parameters. This patch makes the MPL a property
of the layer, and it can now be set per layer-type at build time.
This is a step towards a proper MPL estimator in the flow allocator.
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|
|
|
|
|
|
|
|
|
| |
The parameters were set directly from the build configs. A first step
to making FRCP configurable at runtime, is to pass the parameters to
the frcti_create() function.
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|
|
|
|
|
|
|
| |
The notorious off-by-one hit again.
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|
|
|
|
|
|
|
| |
If the keepalive would underflow if set to 1-3 ms.
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On exit of the IRMd all flows will now be flagged as down, so external
applications will not hang anymore. Note: reads keep work from flows
that are down until there are no more remaining packets in the buffer,
but no more packets can be written.
When the RIB is used, the external application may exit a bit later
than the IRMd, so I added a brief sleep before the IRMd tries to
remove the fuse main directory.
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|
|
|
|
|
|
|
|
|
| |
There was a lock reversal in the timerwheel. There still is a thorough
revision needed of the locking in dev.c after the FRCP logic is
completed and tuned.
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|
|
|
|
|
|
|
|
|
|
| |
IPCPs would call rib_fini() twice, once after cleaning up their
managed RIB, and once again for the program-generic RIB, which is not
initialized for IPCPs. rib_fini() checked if the mount name was valid,
but it didn't unset it after execution.
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|
|
|
|
|
|
|
| |
The rib_init return value wasn't checked.
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|
|
|
|
|
|
|
|
| |
Bare FRCP messages (ACKs without data, Rendez-vous packets) were not
encrypted on encrypted flows, causing the receiver to fail decryption.
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|
|
|
|
|
|
|
|
|
| |
The qosspec_t now has a timeout value that sets the timeout value of
the flow. Flows with a peer that has timed out will now return
-EFLOWPEER on flow_read() or flow_write().
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|
|
|
|
|
|
|
| |
The checked condition can't happen.
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds flow liveness monitoring for flows, with a fixed timeout of
120s. I will make it configurable at flow allocation later on (timeout
needs to be communicated to the peer). If one peer dies, or doesn't
call any IPC calls (flow_write/flow_read/fevent) it will stop sending
keepalives and the other peer's read/writes will error on an
-EFLOWDOWN after the timeout expires.
Packets without a payload (0 length packets) are interpreted as
keepalive packets for the flow. They can be sent from any application,
but they will not trigger a message read at the receiver side (0 as a
return value on flow_read indicates a previous partial read has
completed at exactly the buffer size).
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|
|
|
|
|
|
|
|
|
| |
The flow_set will now keep a list of the flows in the set, this makes
it more efficient to iterate over the flows. Extending the public API
for fset_t with an iterator will also be useful.
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|
|
|
|
|
|
|
|
|
|
| |
The blocking read from the rbuff was not correctly handling flow down
states, returning a valid index. The attempt to fetch the header then
failed on an assertion. The blocking read will now return -EFLOWDOWN
if the flow is marked down by the IPCP.
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|
|
|
|
|
|
|
|
|
| |
The free of the buffer in the failure path of the readdir RIB
functions was taking the wrong pointer in a couple of places. The FRCT
RIB readdir was missing error handling for malloc and strdup.
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The fccntl call FRCTSFLAGS was using a pointer to a flags so set
flags, which should just be a regular uint16_t.
For instance, the FRCTLINGER flags can now be turned off using
fccntl(fd, FRCTSFLAGS, FRCTFRESCNTL | FRCTFRTX)
leaving only resource control (flow control, FRCTFRESCNTL) and
retransmission enabled. Note that retransmission (FRCTFRTX) can't be
enabled or disabled on a live flow, it will be set on flow allocation.
Updates the man page for fccntl to add these FRCT options.
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|
|
|
|
|
|
|
| |
It was taking a write lock when a read lock was sufficient.
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|
|
|
|
|
|
|
|
|
|
| |
This is a fix to wait for outstanding retransmissions when a flow is
deallocated. Instead of waiting the full timeout, it will now wait in
the same tic increments used within FRCT. Bit of a stopgap at the
moment, FRCT and the flows are in need of a serious refactor.
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|
|
|
|
|
|
|
| |
There was a missing unlock in FRCT. Also fixes some indentation.
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|
|
|
|
|
|
|
|
|
|
| |
If the timeout is already expired, the wait variable would be negative
and return a negative value for the __frcti_dealloc function, thinking
that the timeout was not expired causing an unnecessary wait even if
all packets are acknowledged.
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|
|
|
|
|
|
|
|
|
|
|
| |
The initial sender right window edge (indicating acknowledged packet
sequence number) was initialized to seqno - 1. This should be the same
as seqno, since we acknowledge with the next expected sequence number.
It also indicates that a flow without traffic has no outstanding
acknowledgements.
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|
|
|
|
|
|
|
|
| |
There was some leftover code in dev.c wrt to the process RIB that is
not needed anymore.
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|
|
|
|
|
|
|
|
|
| |
Arithmetic with NULL pointers is undefined behaviour. Caught by clang
13. Fixed by using uintptr_t, which is guaranteed to be the size of a
pointer.
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This will skip rib_init() at __init() for IPCPs (or at least,
processes that have "ipcpd" in the executable name). The previous code
tried to unmount the generic mount and then remount under the ipcp
name, but it often failed because fuse_mount() is asynchronous and the
mount was not up at the time of the unmount() call. Renaming the mount
instead of unmounting failed for the same reason. This is a better
fix for now.
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|
|
|
|
|
|
|
|
|
|
| |
Application flows can now be monitored from the RIB, exposing FRCT
statistics (window edges, retransmission timeout, rtt estimate, etc).
Application RIB requires user permissions to be able to access
/dev/fuse.
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The read functions for the RIB will now receive the full path, instead
of only the entry name. For IPCPs, we organized the RIB in an
/<ipcp>/<component>/entries
structure with a directory per component, so we don't need the full
path at this point. For process flow information, it's a lot more
convenient to organize it the following way
/<pid>/<fd>/stat
We can then register/unregister the flow descriptor when the frct
instance is created, and for getting the stats, we'd know the flow
descriptor from the fuse file path. If we would create a file per flow
instead of a directory per flow, something like
/<pid>/flows/<fd>
we'd need to do additional bookkeeping to list the contents of that
directory (we would need to track all flows with an active FRCT
instance), that fuse knows because it tracks the directories.
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|
|
|
|
|
|
|
|
|
|
| |
The RIB API had a struct stat in the getattr() function, which made
all components that exposed variables via the RIB dependent on
<sys/stat.h>. The rib now has its own struct rib_attr to set
attributes such as size and last modified time.
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|
|
|
|
|
|
|
|
|
|
|
| |
Compilation failed on FreeBSD 14 with fuse enabled because of some
missing definitions. __XSI_VISIBLE must be set before including
<ouroboros/rib.h> for some definitions in <sys/stat.h>. FreeBSD
doesn't know the MSG_CONFIRM flag to sendto() or
CLOCK_REALTIME_COARSE, which are Linux-specific.
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This add an ouroboros/pthread.h header that wraps the
pthread_..._unlock() functions for cleanup using
pthread_cleanup_push() as this casting is not safe (and there were
definitely bad casts in the code). The close() function is now also
wrapped for cleanup in ouroboros/sockets.h.
This allows enabling more compiler checks.
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|
|
|
|
|
|
|
|
|
| |
This assert() causes ipcpd and subsequent irmd abort() when shutting
down debug builds. Should be fixed some day when other components are
more robust (frct retransmissions and routing).
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This moves Resource Information Base (RIB) initialization into the
ipcp_init() function, so all IPCPs initialize a RIB. The RIB not shows
some common IPCP information, such as the IPCP name, IPCP state and
the layer name if the IPCP is part of a layer.
The initialization of the hash algorithm and layer name was moved out
of the common ipcp source because IPCPs may only know this information
after enrollment. Some IPCPs were not even storing this information.
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|
|
|
|
|
|
|
|
|
|
| |
This removes the raptor IPCP. The code hasn't been updated for a
while, and wouldn't compile. Raptor served its purpose as a PoC for
Ouroboros-over-Ethernet-Layer-1, but giving the extreme niche hardware
needed to run it, it's not worth maintaining this anymore.
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The UDP layer will now use a single (configurable) UDP port, default
3435. This makes it easer to allocate flows as a client from behind a
NAT firewall without having to configure port forwarding rules. So
basically, from now on Ouroboros traffic is transported over a
bidirectional <src><port>:<dst><port> UDP tunnel. The reason for not
using/allowing different client/server ports is that it would require
reading from different sockets using select() or something similar,
but since we need the EID anyway (mgmt packets arrive on the same
server UDP port), there's not a lot of benefit in doing it. Now the
operation is similar to the ipcpd-eth, with the port somewhat
functioning as a "layer name", where in UDP, the Ethertype functions
as a "layer name".
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|
|
|
|
|
|
|
|
| |
The ugent email addresses are shut down, updated to Ouroboros mail
addresses.
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|
|
|
|
|
|
|
| |
Happy New Year, Ouroboros!
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|
|
|
|
|
|
|
|
| |
DH key creation was returning -ECRYPT if opennssl is not installed,
instead of success (0).
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|
|
|
|
|
|
|
| |
This causes builds to fail on systems where OpenSSL is not available.
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds congestion avoidance policies to the unicast IPCP. The
default policy is a multi-bit explicit congestion avoidance algorithm
based on data-center TCP congestion avoidance (DCTCP) to relay
information about the maximum queue depth that packets experienced to
the receiver. There's also a "nop" policy to disable congestion
avoidance for testing and benchmarking purposes.
The (initial) API for congestion avoidance policies is:
void * (* ctx_create)(void);
void (* ctx_destroy)(void * ctx);
These calls create / and or destroy a context for congestion control
for a specific flow. Thread-safety of the context is the
responsability of the flow allocator (operations on the ctx should be
performed under a lock).
ca_wnd_t (* ctx_update_snd)(void * ctx,
size_t len);
This is the sender call to update the context, and should be called
for every packet that is sent on the flow. The len parameter in this
API is the packet length, which allows calculating the bandwidth. It
returns an opaque union type that is used for the call to check/wait
if the congestion window is open or closed (and allowing to release
locks before waiting).
bool (* ctx_update_rcv)(void * ctx,
size_t len,
uint8_t ecn,
uint16_t * ece);
This is the call to update the flow congestion context on the receiver
side. It should be called for every received packet. It gets the ecn
value from the packet and its length, and returns the ECE (explicit
congestion experienced) value to be sent to the sender in case of
congestion. The boolean returned signals whether or not a congestion
update needs to be sent.
void (* ctx_update_ece)(void * ctx,
uint16_t ece);
This is the call for the sending side top update the context when it
receives an ECE update from the receiver.
void (* wnd_wait)(ca_wnd_t wnd);
This is a (blocking) call that waits for the congestion window to
clear. It should be stateless (to avoid waiting under locks). This may
change later on if passing the context is needed for different algorithms.
uint8_t (* calc_ecn)(int fd,
size_t len);
This is the call that intermediate IPCPs(routers) should use to update
the ECN field on passing packets.
The multi-bit ECN policy bases the value for the ECN field on the
depth of the rbuff queue packets will be sent on. I created another
call to grab the queue depth as fccntl is write-locking the
application. We can further optimize this to avoid most locking on the
rbuff.
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|
|
|
|
|
|
|
|
|
|
|
| |
The timerwheel is checked during IPC calls (fevent, flow_read),
causing huge load on CPU consumption in IPCPs, since they have a lot
of fevent() threads for QoS. The timerwheel will need further
optimization), but for now I reduced the default tick time to 5 ms and
added a boolean to check that the wheel is actually used.
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|
|
|
|
|
|
|
|
| |
I mistakenly set the default to the (buggy) lockless rbuff
implementation instead of the pthread one in commit 3aec660e.
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|
|
|
|
|
|
|
|
| |
This reverts commit 978266fe4beba21292daad2d341fe5ff22e08aba.
We were incorrectly unmounting the directory under normal conditions.
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds the rendez-vous mechanism to handle the case where the
sending window is closed and window updates get lost. If the sending
window is closed, the sender side will send an RDVS every DELT_RDV
time (100ms), and give up after MAX_RDV time (1 second). Upon
reception of a RDVS packet, a window update is sent immediately. We
can make this much more configurable later on (build options for
defaults, fccntl for runtime tuning).
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|
|
|
|
|
|
|
|
|
|
|
| |
If the sending window for flow control is closed, the sending
application will now block until the window opens. Beware that until
the rendez-vous mechanism is implemented, shutting down a server while
the client is sending (with non-timed-out blocking write) will cause
the client to hang indefinitely because its window will close.
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|
|
|
|
|
|
|
| |
Refactor flow_write cleanup.
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|
|
|
|
|
|
|
|
|
| |
This adds sending and receiving window updates for flow control. I
used the 8 pad bits as part of the window update field, so it's 24
bits, allowing for ~16 million packets in flight.
Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks>
Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
|