First of all warn that a prefix was dropped. In the generate an update
code handle possible overflows of attributes and NLRI and withdraw the
affected prefix. This way the peer will not have stale data.
OK tb@
filtered prefixes in the Loc-RIB
This includes filtered prefixes into the Loc-RIB but they are marked
ineligible so nothing will select them but it is possible to show them
in bgpctl. So 'bppctl show rib filtered' will return all prefixes filtered
out by the input filters.
OK tb@
Before the RDE used host byte order for remote_bgpid but all the other
code used network byte order. The reason for that was that bgpid was
initially an IPv4 address but since RFC 6286 in 2011 this is much more
relaxed and so it makes more sense to just treat them as numbers and
so host byte order.
OK tb@
This converts community_add(), community_large_add() and community_ext_add()
and as a result removes some hacks from rde_attr_add() and rde_attr_parse().
OK tb@
Rewrite rde_update_dispatch() to use ibufs. Because of this
rde_update_err(), rde_get_mp_nexthop(), nlri_get_prefix() and
friends are switched to use ibufs. For rde_attr_parse() a minimal
change was done for now.
OK tb@
When a session is established determine the possible interface scope of that
session. The scope is only set when the remote address is directly connected.
This interface scope is passed to the RDE that uses this information when
link-local nexthops are received. Again checking that a link-local nexthop
is actually acceptable.
OK tb@
With draft-ietf-sidrops-aspa-profile-16 and
draft-ietf-sidrops-aspa-verification-15 the AFI dependence of ASPA
records was dropped. So remove this complication form the code.
This only removes the AFI handling internally in bgpd but still allows
the old syntax in aspa-set tables. The optional address family is just
ignored and records are merged together.
For RTR sessions draft-ietf-sidrops-8210bis has not yet been updated so
right now we still handle RTR sessions as specified there. The IPv4 and
IPv6 ASPA entries are handled in two trees and merged together into one
AFI independent tree. This is the best we can do for now until IETF
updates draft-ietf-sidrops-8210bis.
OK tb@ job@
This replaces the old way of using a static buffer and a len to build
UPDATEs with a pure ibuf solution. The result is much cleaner and a lot
of almost duplicate code can be removed because often a version for ibufs
and one for this static buffer was implemented (e.g. for mrt or bgpctl).
With and OK tb@
for IMSG_CTL_SHOW_RIB_ATTR. Also drop the attr_optlen() usage in
imsg_create() since it is not stricly needed. With this attr_optlen
follows the path of the dodo.
OK tb@
and Loc-RIB. Flowspec objects are collected in a single flowrib RIB
and then directly distributed into the various Adj-RIB-Outs.
For this to work add a bypass in the filter logic (flowspec AFI/SAFI
are currently accepted without any rule). The filter language lacks
a way to allow prefixes based on AFI/SAFI which is the minimum needed.
OK tb@
Introduce pt_get_flow() and pt_add_flow() to lookup and insert flowspec
objects. Add pt_getflowspec() which works somewhat similar to pt_getaddr()
to extract the flowspec NLRI from a pt_entry.
Make pt_getaddr() to return the destination prefix of the flowspec rule and
handle flowspec in pt_write().
OK tb@
as argument instead of the bgpd_addr + prefixlen.
Do the same with prefix_adjout_update but leave prefix_adjout_lookup
and prefix_adjout_match since those are used by bgpctl code that does
not use pt_entry structs.
With this most of the update code no longer needs struct bgpd_addr and
pt_getaddr().
OK tb@
especially on route-servers the output filters are in the hot path so
reducing the number of rules to check has a big impact. I have seen a
25% to 30% speedup in my big IXP testbench.
The output ruleset is applied and copied for each peer during config reload
and when a peer is initially added.
OK tb@
too conservative. Fixes and changes include:
- add role output to bgpctl, also adjust the capability output.
Note, this changes the JSON output of neighbors a bit.
- adjust the config parser to enable the RFC9234 role capability when
there is a role set. iBGP and sessions with no role will not announce
the role capability.
- adjust the role capability announcement to be only on sessions that
use either AFI IPv4 or IPv6 and SAFI 1 (AID_INET, AID_INET6).
- if there is an OPEN notification indicating that the role capability
is bad only disable the capability if it is not enforced.
- Adjust capability negotiation, store remote_role on the peer since
the neighbors role is no longer needed by the RDE.
- inject the OTC attribute on ingress only for AID_INET and AID_INET6.
For other AIDs clear the F_ATTR_OTC_LOOP flag.
- Adjust the role logic in the RDE and use the peer->role (local role of
the system) for all checks. Also remove the check if the role capability
was negotiated between peers.
- In prefix_eligible() check also if the F_ATTR_OTC_LOOP flag is set.
The RFC requires that prefixes must be considered ineligible (and not
treat as withdraw as done before)
- When generating an UPDATE include the OTC attribute unless the AID is
neither AID_INET or AID_INET6.
Fixes https://github.com/openbgpd-portable/openbgpd-portable/issues/51
Reported by Pier Carlo Chiodi
OK tb@
With this the newbest and oldbest arguments can go since the infromation
is part of the rib_entry. Especially the prefix in the rib_entry is
always valid so simplify some code in various functions below to use
this information.
OK tb@
stat numbers, just send the peerid and have the RDE response with the
stats. The control code will then merge these counters into the real
peer struct and send that to bgpctl. This reduces the number of bytes
sent around a fair bit.
OK tb@
For this use the validation state (vstate) in struct prefix and
struct filterstate to store both the ASPA and ROA validity.
Introduce helper functions to set and get the various states for
struct prefix and make sure struct filterstate is also setup properly.
Change the ASPA state in rde_aspath to be AFI/AID and role independent
by storing all 4 possible outcomes. Also add a ASPA generation count
which is used to update the rde_aspath ASPA state cache on reloads.
Rework the rde_aspa.c code to be AFI/AID and role independent. Doing
this for roles is trivial but AFI switch goes deep and is so unnecessary.
The reload is combined with the ROA reload logic and renamed to RPKI
softreload.
OK tb@
- rde_filterstate_init(): initialize a filterstate to default values
- rde_filterstate_copy(): copy from a filterstate into a new state object
- rde_filterstate_prep(): set filtersate based on prefix passed as argument.
This makes the code a bit easier to read.
OK tb@
This implements ASPA validation based on the current draft. Implementing
this showed various weaknesses in the current ASPA draft which I hope to
fix in the near future.
Unlike the algorithm specified in the draft our version validates the
AS_PATH attribute in a single path doing one or two lookups depending on
the sessions BGP role.
The code is not yet hooked up into the RDE (see the NOTYET blocks).
Missing are reload logic, bgpctl integration and the loading of the
merged ASPA set from the rtr process.
OK tb@
The generic add-path code up_generate_addpath() reevaluates everything
since this is the simplest way to select the announced paths. For add-path
all this is overkill since there is no dependency between prefixes and so
individual prefixes can be handled more efficently.
Extend rde_generate_updates() to pass the current newbest and oldbest
prefixes (for the selected best path) but now also include newpath and
oldpath (which is the prefix that is added/removed/modified).
If newpath or oldpath is set then a single prefix was altered and
up_generate_addpath_all() can just remove or add this prefix.
If newpath and oldpath are NULL than the full list based on newbest
needs to be inserted and any old path/prefix removed in the process.
This improves update generation performance on big route collectors using
add-path all substantially.
OK tb@
Use a per peer path_id_tx to assign to paths received from none add-path
enabled peers. This skips two extra walks of the RIB prefix list and is
a big speed-up when there are many regular sessions. If the session uses
add-path recv then the old way of assigning random path_ids needs to be
used.
With input and OK tb@
In some cases only a "small" part of the RIB needs to be looked at. Like
bgpctl show rib 10/8 or-longer that only needs to travers nodes under
10/8 all other RIB entries do not matter. By setting the start node to
the RB_NFIND(10/8) the all nodes below this point can be skipped.
Using prefix_compare() while walking the tree with RB_NEXT() the walker
know when it steps outside of the 10/8 subtree and stops.
With this the or-longer commands become a lot faster.
Looks good to tb@
Only the RDE used a hashtable for lookups while the session engine
switched from a list to RB tree some time ago.
Use peer_foreach() in the mrt code instead of passing the peer list
as an argument.
OK benno@ tb@
them on the per peer imsg queue. This is mainly for IMSG_SESSION_DOWN.
Delaying the session down can race against IMSG_SESSION_ADD which is
handled immediatly and as a result an establised connection may be
removed in the RDE because of it.
The various graceful restart imsgs need similar treatment for similar
reasons. In the end when a session is reset/closed the RDE needs to
stop all work and flush the per peer imsg queue.
With this only update and route refresh messages are handled via the
imsg queue.
OK tb@
In rev 1.90 of rde_decide.c the re->active cache of the best prefix was
replaced with a call to prefix_best(). This introduced a bug because the
nexthop state at that time may have changed already. As a result when
a nexthop became unreachable prefix_evaluate() had oldbest = NULL and
newbest = NULL and did not withdraw the prefix from FIB and Adj-RIB-Out.
To fix this store the nexthop state per prefix and introduce
prefix_evaluate_nexthop() which removes the prefix from the decision list,
updates the nexthop state of the prefix and reinserts the prefix. Doing
this ensures that prefix_best() always reports the same result once the
decison process is done. prefix_best() and prefix_eligible() only depend
on data stored on the prefix itself.
OK tb@
This allows to send out more then one path per perfix to a neighbor that
supports add-path receive. OpenBGPD supports a few different modes to
select which paths to send:
- all: send all valid paths (the ones with a * in bgpctl output)
- best: send out only the single best path
- ecmp: send out paths that evaluate the same up and including
the nexthop metric
- as-wide-best: send out paths that evaluete the same up but not including
the nexthop metric
Currently ecmp and as-wide-best are the same. On top of this best, ecmp
and as-wide-best allow to include extra paths (e.g. best plus 2) and
for the multipath modes there is also a maximum (e.g. ecmp plus 2 max 4)
OK tb@
Adjust prefix_adjout_update() to properly handle path_id_tx.
Move the lookup of the prefix out of prefix_adjout_update() and to
up_generate_updates(). While that code uses prefix_adjout_lookup() to
find the current prefix in the Adj-RIB-Out and add-path aware function
will use prefix_adjout_get().
In up_generate_default() just use 0 for path_id_tx since for this peer
that is the only prefix installed into the Adj-RIB-Out.
OK tb@
For add-path a unique path_id needs to be assigne to all prefixes.
Use a random number since the RFC explicitly mentions that there is no
meaning what the value means. The local path_id is inherited to all
the RIBs. Adj-RIB-Out handling is not yet down.
OK tb@