them on the per peer imsg queue. This is mainly for IMSG_SESSION_DOWN.
Delaying the session down can race against IMSG_SESSION_ADD which is
handled immediatly and as a result an establised connection may be
removed in the RDE because of it.
The various graceful restart imsgs need similar treatment for similar
reasons. In the end when a session is reset/closed the RDE needs to
stop all work and flush the per peer imsg queue.
With this only update and route refresh messages are handled via the
imsg queue.
OK tb@
Remove F_KERNEL and replace the checks using the F_BGPD flag.
Also do not convert the priority in kr_tofull() instead provide
kr_priority() which does this and is used by the bgpctl imsg
commands. Also in dispatch_rtmsg_addr() convert to RTP_MINE if
the priority is equal to the configured priority.
OK tb@
Only problem is when route(8) is used to modify/delete a bgpd owned route.
Exact behaviour for that is still a bit unclear but F_KERNEL does not help
in this case either. In the kr_fib_delete/change remove F_BGPD_INSERTED
in that case as a first step.
OK tb@
- introduce network_free() to properly free a network struct including
the possible rtlabel reference.
- change expand_networks() and the reload code to not only expand the
main network config but also the network configs inside L3VPN sections.
- adjust reload logic to properly match any kind of network struct.
Up until now rtlabel and priority network statememnts were not correctly
reloaded.
OK tb@
This allows to send out more then one path per perfix to a neighbor that
supports add-path receive. OpenBGPD supports a few different modes to
select which paths to send:
- all: send all valid paths (the ones with a * in bgpctl output)
- best: send out only the single best path
- ecmp: send out paths that evaluate the same up and including
the nexthop metric
- as-wide-best: send out paths that evaluete the same up but not including
the nexthop metric
Currently ecmp and as-wide-best are the same. On top of this best, ecmp
and as-wide-best allow to include extra paths (e.g. best plus 2) and
for the multipath modes there is also a maximum (e.g. ecmp plus 2 max 4)
OK tb@
this prefix with respect to its previous one.
Currently the plan is to distinguish the best prefix (only one), ecmp
prefixes (currently the same as as-wide-multipath), as-wide-multipath
prefixes, valid prefixes and invalid prefixes.
This information will be used to implement add-path send but also for
ecmp support in bgpd.
OK tb@
The only user of struct kif was the session engine for the 'depend on'
feature. Switch the imsg exchange to a new struct session_dependon and
rename the IMSG as well.
OK tb@
struct kroute and kroute6.
Rename knexthop_node to knexthop as well. Mostly mechanical but fix
at least one log format string to have the correct order of arguments.
OK tb@
indicates that the route was successfully added to the FIB.
Filter out dynamic routes, like it is already done for ARP and ND routes) and
kill F_DYNAMIC.
Also remove the protect_lo() bits. Adding dummy kroute entries does no longer
prevent bad routes to hit the FIB. Also loopback IPs are checked in a few
other places to prevent bad routes to be installed into the FIB.
OK tb@
kr_nexthop_add() and kr_nexthop_delete() only operate on the main table
so just pass in the right rdomain id.
kr_shutdown() and kr_dispatch_msg() don't really need the rdomain passed.
The was done for kif_remove(), since that function needs to remove connected
routes from the rdomain table. Connected routes can only exists in the
interfaces rdomain so just use kif->k.rdomain. If such routes exist that
table exists as well. If the table does not exists there are also no
connected routes to track.
OK tb@
Instead of passing it around all the time put the fib_priority into the
kroute state. It is only needed in send_rtmsg() in the end.
Additionally insert F_BGP_INSERTED routes with a special RTP_MINE priority.
This makes changing the fib_priority at runtime simpler because there
is no need to alter the kroute table anymore.
OK tb@ deraadt@
When max-communities X is set on a filterrule the filter will match when
more than X communities are present in the path. In other words
max-communities 0 means no communities are allowed and max-communities 3
limits it up to 3 communities.
There is max-communities, max-ext-communities and max-large-communities
for each of the 3 community attributes. These three max checks can be used
together.
OK tb@ job@
First of all the detection logic was totally wrong. Then filter out
non-transitive extended communities when received from an ebgp peer.
Also cleanup the type handling of ext-communities. Mainly to not have
to handle the transitive vs non-transitive versions the type is masked
with EXT_COMMUNITY_VALUE before doing the switch case for the various
types.
With this my test using ext-communities works.
OK tb@
optional expires timestamp. The rtr process is walking the roa-set every
5min and removes every prefix that is expired.
With this stale RPKI data will slowly disapear and not linger around.
OK job@
side of RFC7911 and the send portion will follow.
The path-id is extracted from the NLRI encoding an put into struct
prefix. To do this the prefix_by_peer() function gets a path-id
argument. If a session is not path-id enabled this argument will
be always 0. If a session is path-id enabled the value is taken
from the NLRI and can be anything, including 0. The value has no
meaning in itself. Still to make sure the decision process is able
to break a tie the path-id is checked as the last step (this is not
part of the RFC but required).
OK benno@
to dump add-path enabled systems because the NLRI format changes based
on the add-path capability and there is no way to know which format is
in use so new message types had to be added.
Also extend the ctl_show_rib structure to include the path_id.
OK benno@
can be enabled with 'announce enhanced refresh yes'
Similar to graceful restart this allows to mark routes as stale, refresh
them and the flush out routes that are still stale. Enhanced route refresh
uses a begin of rr and a end of rr message to signal the various stages.
A future enhancement would be the addition of a timeout in case the EoRR
message is not sent in reasonable time.
OK denis@ job@
(RFC7313). This is the frist step toward this.
It adds the capability parsers for the two no capabilities, extends the
capability struct and adds the capability negotiation bits.
The route refresh message parser and generator are extended to support
the BoRR and EoRR message. Also add the new NOTIFICATION type and subtype
for the route refresh message.
route-server environments.
By default only the best path is sent to peers and if that path is filtered
then the path is hidden for that peer. On route-servers this is sometimes
not desried. For this 'rde evaluate all' will cause the evaluation process
to fall back to alternate routes and will redistribute the first non-filtered
path to the peer. This is very similar to per-peer RIBs but accomplishes
the same effect without the massive increase in memory usage. Compared to
the default mode this requires more CPU resources but it is probably less
than what per-peer RIBs would require.
'rde evaluate all' can be set and reset globally, on groups and on idividual
neighbors. It is not limited to route-server configs but route loops are
possible if not properly used.
OK benno@
The RTR client runs in a new process where the protocol handling is done
and when new data is available all sources are merged into one ROA set
which is then loaded into the RDE. The roa-set from the config is also
handled by the new RTR engine.
Tested by and ok job@
The main reason is that AS_SET does not play nice with RPKI ROA.
Introduce a per neighbor and global config option
'reject as-set yes' and 'reject as-set no'
If set to yes received UPDATES with AS_SET segements are rejected.
This is done the same way other ASPATH soft-errors are handled. The UPDATE
is marked invalid and all prefixes are treated as withdraws.
`bgpctl show rib in error` can be used to show prefixes that where denied
and treated as withdraws because of errors.
By default this feature is off.
OK benno@
equal versions put the RD and lable stack right into struct bgpd_addr.
For non-VPN addresses these extra fields are ignored. Since VPN and non-VPN
addresses encode the prefix in the same way now some code can be simplified.
In most cases a fallthrough or reuse of encoding functions is now possible.
It should also reduce the size of struct bgpd_addr a bit.
OK denis@
prefix-sets loaded into the RDE. For now only the number of prefixes or
asnumbers are shown plus the time since the last change was done to the table.
OK benno@
in the parent to a simple RB tree based on struct roa. With this overlapping
ROAs (same prefix & source-as but different maxlen) are now merged in the RDE
when the lookup trie is constructed.
OK benno@
The problem is that this timer only looks at the receive side of the TCP
session. If for some reason the send side stalls the system fully depends
on the remote BGP peer to reset the session. As seen in an ever growing
OutQ and as a result important changes can get stalled and cause routing
troubles.
This change introduces a SEND HOLD timer. The timer is reset whenever the
session engine was able to write data to the TCP socket. If the send hold
timer expires bgpd was not able to send any data to that neighbor for at
least 90 seconds and therefor the session is forcefully closed with a hold
timer expired notification.
The send hold timer acts as a last resort to detect faulty peers. On an
idle session it can take a long time until this timer triggers but the
main goal here is to reset a stuck session at some point which did not
happen before.
With and OK job@
bgpd_process and changing the behaviour that way add a new filterset
type ACTION_SET_NEXTHOP_REF which is used when the nexthop reference
of the union is used. Adjust the RDE to convert ACTION_SET_NEXTHOP to
ACTION_SET_NEXTHOP_REF when receiving the filtersets.
OK benno@
an IPv4 and IPv6 local-address on a group and the neighbors bind to the
right local-address. Also implement 'no local-address' to reset a previously
set local address back to zero. This should help with IBGP and multihop
session config and hopefully reduce repetition in bgpd configs.
OK sthen@ benno@
sessions and vice versa) from the RDE to the SE. The SE is the right place
for this since there getsockname(2) fetches the local address and so the
alternate one can be fetched there as well.
With this the route pledge is no longer needed in the RDE and the pledge
is now just "stdio recvfd".
OK benno@
This is an easy safety switch to not leak full tables to upstreams and
peers. If the limit is hit a Cease notification is sent and the session
is closed.
This implements most of https://tools.ietf.org/html/draft-sa-idr-maxprefix-00
OK job@
uses CLOCK_MONOTONIC. Convert the control messages to return the relative
age of the prefix instead of the absolute age. Adjust the mrt dump code
to stil dump the route age in seconds since epoch as defined in the RFC.
With this all times in bgpd are now based on CLOCK_MONOTONIC.
OK denis@
match all prefixes that have a shorter prefixlen than the one in the request.
It will print all routes which cover the specified prefix.
OK job@ sthen@
route evaluation is modified. In both cases the softreconfig code will
now walk the RIB and ensure that everything is in proper sync.
Additionally remove 'route-collector yes|no' from the bgpd config, instead
use 'rde rib Loc-RIB no evaluate' with the benefit that you can alter
the setting now during runtime.
Tested and OK benno@
and l3vpns instead of temporary globals. Also rework rde_reload_done to
free filters and sets earlier. The soft-reconfiguration process no longer
needs the previous filters / sets to do its work since there is a full
Adj-RIB-Out.
OK benno@
from the RDE. Make sure that all nexthops don't get removed in the FIB
when a FIB table is removed. This should only happen for the main FIB.
Remove F_RIB_HASNOFIB which is just confusing since there is already
F_RIB_NOFIB and F_RIB_NOFIBSYNC.
OK benno@
4 times the read size. This helps to increase the efficency of poll()
since now most of the time the read and write call can operate on full
buffers.
OK benno@ phessler@
attributes are put into a new data structure when parsing the UPDATE.
The filter code can quickly lookup and modify this data structure.
When creating an UPDATE the data is put back into wire format.
Setups using a lot of communities benefit a lot from this.
Input and OK benno@
process in this process. The refreshing of the keys is done whenever the
session state is changes to state IDLE or ACTIVE. This should behave better
when reloading configs with auth changes.
OK benno@
were missed before (e.g. network related objects). This helps to detect
memory leaks.
Start using new_config() and free_config() in all places where bgpd_config
structure are used. This way the struct is properly initialised and cleaned
up. Introduce copy_config() to only copy the values into the other struct
leaving the pointers as they were.
Looks good to benno@
instead of sockaddr_storage. This again helps protability and simplifies
some code. sa2addr now takes an optional pointer to return the port of
the sockaddr.
OK benno@
and setting. This allows rules like:
ext-community * * # delete any ext-community
ext-community ovs * # delete any ext-community of specified type
ext-community rt 1.2.3.4:*
and
ext-community rt 65001:local-as
ext-community rt local-as:11111
Note: Sometimes the type of the ext-community is underspecified when using
wildchars or expands. So 'ext-community rt *' or 'ext-community soo *' will
match for any of the 3 possible types (2-byte AS, 4-byte AS and IP address).
If local-as/neighbor-as is used as an expand of as-number like
ext-community rt local-as:11111
then bgpd will default to the 4-byte AS type to encode the community.
OK benno@
export the interface info in a way that does not need OS specific functions
to print it. Link state and media are now strings that are set by bgpd.
bgpctl can just print them. Move get_linkstate and get_media_descr to
kroute.c where all other system specific stuff is.
OK sthen@
of some strange maximum. The poll loop in bgpd.c already limits the
maximum wait time so there is no need to double it.
While there switch to using time_t for the calculation.
OK phessler@
mpeX' config was a bit redundant. Also to make it more flexible (e.g. having
more than one mpeX interface per rdomain the syntax was changed.
To make this possible especially the network distribution logic had to be
adjusted and cleaned up. This should in general make network statements
well defined and conflicts between 'network A.B.C.D/N' and e.g. 'network static'
are handled in a well defined way ('network A.B.C.D/N' has preference).
With and OK dlg@, OK denis@
local AS in AS paths. This is sometimes needed in bigger transport networks
where private AS numbers are used in multiple locations.
The implementation is done using a filterset which modifies the AS path -
somewhat inspired by the set attribute code. Setting as-override yes will add
match from <neighbor> set { as-override }
to the start of the filter rules. Since this is filters the Adj-RIB-In still
holds the original path and so reloads changing the setting just work.
With and OK markus@
If it is used abort startup or let a reload fail.
Sockets are now not unlinked anymore on regular shutdown.
This helps a lot when one tries to do a config check without -n.
Inputs and OK claudio@
but then bgpctl can quickly exit and bgpd still has to do all the work.
Instead introduce a terminate imsg to stop such long running commands if
bgpctl closes the connection before the run is over.
OK benno@, sthen@, deraadt@
multiple ext-communities at the same time as well. Additionally this fixes
parsing some of the ext-community types. Now all communities are handled
by one common struct.
OK benno@ plus some input from denis@
communities into one filter_community struct and allow it that more then
one community can be used in filter rules (currently up to 3).
Also rework the code handling bgpctl show rib commands. The special IMSG
types for the various filters are gone and the code is in general simpler.
OK job@, phessler@
lists are no longer needed and make it possible to share rde_aspath between
peers & prefixes. Instead of the lists the rde_aspath is now reference counted.
With this struct prefix is now the central place where everything is connected
to making the RIB a bit easier to handle.
With input and OK denis@