rfc9816.original   rfc9816.txt 
Network Working Group K. Patel Internet Engineering Task Force (IETF) K. Patel
Internet-Draft Arrcus, Inc. Request for Comments: 9816 Arrcus, Inc.
Intended status: Informational A. Lindem Category: Informational A. Lindem
Expires: 27 July 2025 LabN Consulting, L.L.C. ISSN: 2070-1721 LabN Consulting, L.L.C.
S. Zandi S. Zandi
LinkedIn
G. Dawra G. Dawra
Linkedin Linkedin
J. Dong J. Dong
Huawei Technologies Huawei Technologies
23 January 2025 July 2025
Usage and Applicability of BGP Link-State Shortest Path Routing (BGP- Usage and Applicability of BGP Link-State Shortest Path Routing (BGP-
SPF) in Data Centers SPF) in Data Centers
draft-ietf-lsvr-applicability-22
Abstract Abstract
This document discusses the usage and applicability of BGP Link-State This document discusses the usage and applicability of BGP Link-State
Shortest Path First (BGP-SPF) extensions in data center networks Shortest Path First (BGP-SPF) extensions in data center networks
utilizing Clos or Fat-Tree topologies. The document is intended to utilizing Clos or Fat Tree topologies. The document is intended to
provide simplified guidance for the deployment of BGP-SPF extensions. provide simplified guidance for the deployment of BGP-SPF extensions.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This document is not an Internet Standards Track specification; it is
provisions of BCP 78 and BCP 79. published for informational purposes.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months This document is a product of the Internet Engineering Task Force
and may be updated, replaced, or obsoleted by other documents at any (IETF). It represents the consensus of the IETF community. It has
time. It is inappropriate to use Internet-Drafts as reference received public review and has been approved for publication by the
material or to cite them other than as "work in progress." Internet Engineering Steering Group (IESG). Not all documents
approved by the IESG are candidates for any level of Internet
Standard; see Section 2 of RFC 7841.
This Internet-Draft will expire on 27 July 2025. Information about the current status of this document, any errata,
and how to provide feedback on it may be obtained at
https://www.rfc-editor.org/info/rfc9816.
Copyright Notice Copyright Notice
Copyright (c) 2025 IETF Trust and the persons identified as the Copyright (c) 2025 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/ Provisions Relating to IETF Documents
license-info) in effect on the date of publication of this document. (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
Please review these documents carefully, as they describe your rights carefully, as they describe your rights and restrictions with respect
and restrictions with respect to this document. Code Components to this document. Code Components extracted from this document must
extracted from this document must include Revised BSD License text as include Revised BSD License text as described in Section 4.e of the
described in Section 4.e of the Trust Legal Provisions and are Trust Legal Provisions and are provided without warranty as described
provided without warranty as described in the Revised BSD License. in the Revised BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1. Introduction
2. Recommended Reading . . . . . . . . . . . . . . . . . . . . . 3 2. Recommended Reading
3. Common Deployment Scenario . . . . . . . . . . . . . . . . . 3 3. Common Deployment Scenario
4. Justification for BGP-SPF Extension . . . . . . . . . . . . . 4 4. Justification for the BGP-SPF Extension
5. BGP-SPF Applicability to Clos Networks . . . . . . . . . . . 4 5. BGP-SPF Applicability to Clos Networks
5.1. Usage of BGP-LS SPF SAFI . . . . . . . . . . . . . . . . 5 5.1. Usage of BGP-LS-SPF SAFI
5.1.1. Relationship to Other BGP AFI/SAFI Tuples . . . . . . 5 5.1.1. Relationship to Other BGP AFI/SAFI Tuples
5.2. Peering Models . . . . . . . . . . . . . . . . . . . . . 5 5.2. Peering Models
5.2.1. Sparse Peering Model . . . . . . . . . . . . . . . . 6 5.2.1. Sparse Peering Model
5.2.2. Bi-Connected Graph Heuristic . . . . . . . . . . . . 7 5.2.2. Biconnected Graph Heuristic
5.3. BGP Spine/Leaf Topology Policy . . . . . . . . . . . . . 7 5.3. BGP Spine/Leaf Topology Policy
5.4. BGP Peer Discovery Considerations . . . . . . . . . . . . 8 5.4. BGP Peer Discovery Considerations
5.5. BGP Peer Discovery . . . . . . . . . . . . . . . . . . . 9 5.5. BGP Peer Discovery
5.5.1. BGP IPv6 Simplified Peering . . . . . . . . . . . . . 9 5.5.1. BGP IPv6 Simplified Peering
5.5.2. BGP-LS SPF Topology Visibility for Management . . . . 9 5.5.2. BGP-LS SPF Topology Visibility for Management
5.5.3. Data Center Interconnect (DCI) Applicability . . . . 10 5.5.3. Data Center Interconnect (DCI) Applicability
6. Non-CLOS/FAT Tree Topology Applicability . . . . . . . . . . 10 6. Non-Clos / Fat Tree Topology Applicability
7. Non-Transit Node Capability . . . . . . . . . . . . . . . . . 10 7. Non-Transit Node Capability
8. BGP Policy Applicability . . . . . . . . . . . . . . . . . . 10 8. BGP Policy Applicability
9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 9. IANA Considerations
10. Security Considerations . . . . . . . . . . . . . . . . . . . 11 10. Security Considerations
11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 11 11. References
12. References . . . . . . . . . . . . . . . . . . . . . . . . . 11 11.1. Normative References
12.1. Normative References . . . . . . . . . . . . . . . . . . 11 11.2. Informative References
12.2. Informative References . . . . . . . . . . . . . . . . . 11 Acknowledgements
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 13 Authors' Addresses
1. Introduction 1. Introduction
This document complements [I-D.ietf-lsvr-bgp-spf] by discussing the This document complements [RFC9815] by discussing the applicability
applicability of the BGP-SPF technology in a simple and fairly common of the BGP-SPF technology in a simple and fairly common deployment
deployment scenario, which is described in Section 3. scenario, which is described in Section 3.
Section 4 describes the reasons for BGP modifications for such Section 4 describes the reasons for BGP modifications for such
deployments. deployments.
Section 5 covers the BGP Link-State Shortest Path First (IGP-SPF) Section 5 covers the BGP-SPF protocol enhancements to BGP to meet
protocol enhancements to BGP to meet these requirements and their these requirements and their applicability to data center [Clos]
applicability to data center [Clos] networks. networks.
2. Recommended Reading 2. Recommended Reading
This document assumes knowledge of existing data center networks and This document assumes knowledge of existing data center networks and
data center network topologies [Clos]. This document also assumes data center network topologies [Clos]. This document also assumes
knowledge of data center routing protocols such as BGP [RFC4271], knowledge of data center routing protocols such as BGP [RFC4271],
BGP-SPF [I-D.ietf-lsvr-bgp-spf], OSPF [RFC2328] [RFC5340], as well as BGP-SPF [RFC9815], and OSPF [RFC2328] [RFC5340] as well as data
data center Operations, Administration, and Maintenance (OAM) center Operations, Administration, and Maintenance (OAM) protocols
protocols like Link Layer Discovery Protocol (LLDP) [RFC4957] and Bi- like the Link Layer Discovery Protocol (LLDP) [RFC4957] and
Directional Forwarding Detection (BFD) [RFC5580]. Bidirectional Forwarding Detection (BFD) [RFC5880].
3. Common Deployment Scenario 3. Common Deployment Scenario
Within a data center, servers are commonly interconnected using the Within a data center, servers are commonly interconnected using the
Clos topology [Clos]. The Clos topology is fully non-blocking and Clos topology [Clos]. The Clos topology is fully non-blocking, and
the topology is realized using Equal Cost Multi-Path (ECMP). In a the topology is realized using Equal-Cost Multipath (ECMP). In a
multi-stage Clos topology, the minimum number of parallel paths in multi-stage Clos topology, the minimum number of parallel paths in
each tier is determined by the width of the stage as shown in the each tier is determined by the width of the stage as shown in
figure 1. Figure 1.
Tier 1 Tier 1
+-----+ +-----+
|NODE | |NODE |
+->| 1 |--+ +->| 1 |--+
| +-----+ | | +-----+ |
Tier 2 | | Tier 2 Tier 2 | | Tier 2
+-----+ | +-----+ | +-----+ +-----+ | +-----+ | +-----+
+------------->|NODE |--+->|NODE |--+--|NODE |--------------+ +------------->|NODE |--+->|NODE |--+--|NODE |--------------+
| +-----| 5 |--+ | 2 | +--| 7 |-----+ | | +-----| 5 |--+ | 2 | +--| 7 |-----+ |
skipping to change at page 3, line 47 skipping to change at line 139
| | | +---| 6 |--+->| 3 |--+--| 8 |---+ | | | | | | +---| 6 |--+->| 3 |--+--| 8 |---+ | | |
| | | | +-----+ | +-----+ | +-----+ | | | | | | | | +-----+ | +-----+ | +-----+ | | | |
| |Tier 3| | | | | |Tier 3| | | |Tier 3| | | | | |Tier 3| |
+-----+ +-----+ | +-----+ | +-----+ +-----+ +-----+ +-----+ | +-----+ | +-----+ +-----+
|NODE | |NODE | +->|NODE |--+ |NODE | |NODE | |NODE | |NODE | +->|NODE |--+ |NODE | |NODE |
| 9 | | 10 | | 4 | | 11 | | 12 | | 9 | | 10 | | 4 | | 11 | | 12 |
+-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+
| | | | | | | | | | | | | | | | | | | | | | | |
<- Servers -> <- Servers -> <- Servers -> <- Servers ->
Tier 1 is comprised of Nodes 1, 2, 3, and 4 Figure 1: Illustration of the Basic Clos
Tier 2 is comprised of Nodes 5, 6, 7, and 8
Tier 3 is comprised of Nodes 9, 10, 11, and 12
Figure 1: Illustration of the basic Clos * Tier 1 is comprised of Nodes 1, 2, 3, and 4
4. Justification for BGP-SPF Extension * Tier 2 is comprised of Nodes 5, 6, 7, and 8
To simplify L3 routing and operations, many data centers use BGP as a * Tier 3 is comprised of Nodes 9, 10, 11, and 12
routing protocol to create both an underlay and an overlay network
for their Clos Topologies [RFC7938]. However, BGP is a path-vector 4. Justification for the BGP-SPF Extension
routing protocol. Since it does not create a fabric topology, it
uses hop-by-hop External BGP (EBGP) peering to facilitate hop-by-hop To simplify Layer 3 (L3) routing and operations, many data centers
routing to create the underlay network and to resolve any overlay use BGP as a routing protocol to create both an underlay and an
next hops. The hop-by-hop BGP peering paradigm imposes several overlay network for their Clos topologies [RFC7938]. However, BGP is
restrictions within a Clos. It prohibits the deployment of Route a path-vector routing protocol. Since it does not create a fabric
Reflectors/Route Controllers as the EBGP sessions are congruent with topology, it uses hop-by-hop External BGP (EBGP) peering to
the data path. The BGP best-path algorithm is prefix-based and it facilitate hop-by-hop routing to create the underlay network and to
prevents announcements of prefixes to other BGP speakers until the resolve any overlay next hops. The hop-by-hop BGP peering paradigm
best-path decision process has been performed for the prefix at each imposes several restrictions within a Clos. It prohibits the
intermediate hop. These restrictions significantly delay the overall deployment of route reflectors / route controllers as the EBGP
convergence of the underlay network within a Clos network. sessions are congruent with the data path. The BGP best-path
algorithm is prefix based, and it prevents announcements of prefixes
to other BGP speakers until the best-path decision process has been
performed for the prefix at each intermediate hop. These
restrictions significantly delay the overall convergence of the
underlay network within a Clos network.
The BGP-SPF modifications allow BGP to overcome these limitations. The BGP-SPF modifications allow BGP to overcome these limitations.
Furthermore, using the BGP-LS Network Layer Reachability Information Furthermore, using the BGP-LS Network Layer Reachability Information
(NLRI) format allows the BGP-SPF data to be advertised for nodes, (NLRI) format allows the BGP-SPF data to be advertised for nodes,
links, and prefixes in the BGP routing domain and used for Short- links, and prefixes in the BGP routing domain and used for SPF
Path-First (SPF) computations [RFC9552]. computations [RFC9552].
Additional motivation for deploying BGP-SPF is included in Additional motivation for deploying BGP-SPF is included in [RFC9815].
[I-D.ietf-lsvr-bgp-spf].
5. BGP-SPF Applicability to Clos Networks 5. BGP-SPF Applicability to Clos Networks
With the BGP-SPF extensions [I-D.ietf-lsvr-bgp-spf], the BGP best- With the BGP-SPF extensions [RFC9815], the BGP best-path computation
path computation and route computation are replaced with link-state and route computation are replaced with link-state algorithms such as
algorithms such as those used by OSPF [RFC2328], both to determine those used by OSPF [RFC2328], both to determine whether a BGP-LS-SPF
whether an BGP-LS-SPF NLRI has changed and needs to be re-advertised NLRI has changed and needs to be readvertised and to compute the BGP
and to compute the BGP routes. These modifications will routes. These modifications will significantly improve convergence
significantly improve convergence of the underlay while affording the of the underlay while affording the operational benefits of a single
operational benefits of a single routing protocol [RFC7938]. routing protocol [RFC7938].
Data center controllers typically require visibility to the BGP Data center controllers typically require visibility to the BGP
topology to compute traffic-engineered paths. These controllers topology to compute traffic-engineered paths. These controllers
learn the topology and other relevant information via the BGP-LS learn the topology and other relevant information via the BGP-LS
address family [RFC9552] which is totally independent of the underlay address family [RFC9552], which is totally independent of the
address families (usually IPv4/IPv6 unicast). Furthermore, in underlay address families (usually IPv4/IPv6 unicast). Furthermore,
traditional BGP underlays, all the BGP routers will need to advertise in traditional BGP underlays, all the BGP routers will need to
their BGP-LS information independently. With the BGP-SPF extensions, advertise their BGP-LS information independently. With the BGP-SPF
controllers can learn the topology using the same BGP advertisements extensions, controllers can learn the topology using the same BGP
used to compute the underlay routes. Furthermore, these data center advertisements used to compute the underlay routes. Furthermore,
controllers can avail the convergence advantages of the BGP-SPF these data center controllers can avail the convergence advantages of
extensions. The placement of controllers can be outside of the the BGP-SPF extensions. The placement of controllers can be outside
forwarding path or within the forwarding path. of the forwarding path or within the forwarding path.
Alternatively, as each and every router in the BGP-SPF domain will Alternatively, as each and every router in the BGP-SPF domain will
have a complete view of the topology, the operator can also choose to have a complete view of the topology, the operator can also choose to
configure BGP sessions in the hop-by-hop peering model described in configure BGP sessions in the hop-by-hop peering model described in
[RFC7938] along with BFD [RFC5580]. In doing so, while the hop-by- [RFC7938] along with BFD [RFC5580]. In doing so, while the hop-by-
hop peering model lacks the inherent benefits of the controller-based hop peering model lacks the inherent benefits of the controller-based
model, BGP updates need not be serialized by the BGP best-path model, BGP updates need not be serialized by the BGP best-path
algorithm in either of these models. This helps overall network algorithm in either of these models. This helps overall network
convergence. convergence.
5.1. Usage of BGP-LS SPF SAFI 5.1. Usage of BGP-LS-SPF SAFI
Section 5.1 of [I-D.ietf-lsvr-bgp-spf] defines a new BGP-LS-SPF SAFI Section 5.1 of [RFC9815] defines a new BGP-LS-SPF SAFI for
for announcement of the BGP-SPF link-state. The NLRI format and its announcement of the BGP-SPF link-state. The NLRI format and its
associated attributes follow the format of BGP-LS for node, link, and associated attributes follow the format of BGP-LS for node, link, and
prefix announcements. Whether the peering model within a Clos prefix announcements. Whether the peering model within a Clos
follows hop-by-hop peering described in [RFC7938] or any controller- follows hop-by-hop peering described in [RFC7938] or any controller-
based or route-reflector peering, an operator can exchange BGP-LS-SPF based or route-reflector peering, an operator can exchange BGP-LS-SPF
SAFI routes over the BGP peering by simply configuring BGP-LS-SPF SAFI routes over the BGP peering by simply configuring BGP-LS-SPF
SAFI between the necessary BGP speakers. SAFI between the necessary BGP speakers.
The BGP-LS-SPF SAFI can also co-exist with BGP IP Unicast SAFI The BGP-LS-SPF SAFI can also coexist with BGP IP Unicast SAFI
[RFC4760] which could exchange overlapping IP routes. One use case [RFC4760], which could exchange overlapping IP routes. One use case
for this is where BGP-LS-SPF routes are used for the underlay and BGP for this is where BGP-LS-SPF routes are used for the underlay and BGP
IP Unicast routes for VPNs are advertised in the overlay as described IP Unicast routes for VPNs are advertised in the overlay as described
in [RFC4364]. The routes received by these SAFIs are evaluated, in [RFC4364]. The routes received by these SAFIs are evaluated,
stored, and announced independently according to the rules of stored, and announced independently according to the rules of
[RFC4760]. The tie-breaking of route installation is a matter of the [RFC4760]. The tiebreaking of route installation is a matter of the
local policies and preferences of the network operator. local policies and preferences of the network operator.
Finally, as the BGP-SPF peering is done following the procedures Finally, as the BGP-SPF peering is done following the procedures
described in [RFC4271], all the existing transport security described in [RFC4271], all the existing transport security
mechanisms including [RFC5925] are available for the BGP-LS-SPF SAFI. mechanisms including those in [RFC5925] are available for the BGP-LS-
SPF SAFI.
5.1.1. Relationship to Other BGP AFI/SAFI Tuples 5.1.1. Relationship to Other BGP AFI/SAFI Tuples
Normally, the BGP-LS-SPF AFI/SAFI is used solely to compute the Normally, the BGP-LS-SPF AFI/SAFI is used solely to compute the
underlay and is given precedence over other AFI/SAFIs in route underlay and is given precedence over other AFI/SAFIs in route
processing. Other BGP SAFIs, e.g., IPv6/IPv6 Unicast VPN would use processing. Other BGP SAFIs, e.g., IPv6/IPv6 unicast VPN, would use
the BGP-SPF computed routes for next hop resolution. the BGP-SPF computed routes for next-hop resolution.
5.2. Peering Models 5.2. Peering Models
As previously stated, BGP-SPF can be deployed using the existing As previously stated, BGP-SPF can be deployed using the existing
peering model where there is a single-hop BGP session on each and peering model where there is a single-hop BGP session on each and
every link in the data center fabric [RFC7938]. This provides for every link in the data center fabric [RFC7938]. This provides for
both the advertisement of routes and the determination of link and both the advertisement of routes and the determination of link and
neighboring router availability. With BGP-SPF, the underlay will neighboring router availability. With BGP-SPF, the underlay will
converge faster due to changes to the decision process that will converge faster due to changes to the decision process that will
allow NLRI changes to be advertised faster after detecting a change. allow NLRI changes to be advertised faster after detecting a change.
5.2.1. Sparse Peering Model 5.2.1. Sparse Peering Model
Alternately, BFD [RFC5580] can be used to swiftly determine the Alternately, BFD [RFC5580] can be used to swiftly determine the
availability of links and the BGP peering model can be significantly availability of links, and the BGP peering model can be significantly
sparser than the data center fabric. BGP-SPF sessions only need to sparser than the data center fabric. BGP-SPF sessions only need to
be established with enough peers to provide a bi-connected graph. If be established with enough peers to provide a biconnected graph. If
Internal BGP (IBGP) is used, then the BGP routers at tier N-1 will Internal BGP (IBGP) is used, then the BGP routers at tier N-1 will
act as route-reflectors for the routers at tier N. act as route-reflectors for the routers at tier N.
The obvious usage of sparse peering is to avoid parallel BGP sessions The obvious usage of sparse peering is to avoid parallel BGP sessions
on links between the same two routers in the data center fabric. on links between the same two routers in the data center fabric.
However, this use case is not very useful since parallel L3 links However, this use case is not very useful since parallel L3 links
between the same two BGP routers are rare in Clos or Fat-Tree between the same two BGP routers are rare in Clos or Fat Tree
topologies. Additionally, when there are multiple links, they are topologies. Additionally, when there are multiple links, they are
often aggregated at the link layer using Link Aggregation Groups often aggregated using Link Aggregation Groups (LAGs) at the link
(LAGs) [IEEE.802.1AX] rather than at the IP layer. Two more layer [IEEE.802.1AX] rather than at the IP layer. Two more
interesting scenarios are described below. interesting scenarios are described below.
In current data center topologies, there is often a very dense mesh In current data center topologies, there is often a very dense mesh
of links between levels, e.g., leaf and spine, providing 32-way, of links between levels, e.g., leaf and spine, providing 32-way
64-way, or more Equal-Cost Multi-Path (ECMP) paths. In these paths, 64-way paths, or more ECMPs. In these topologies, it is
topologies, it is desirable not to have a BGP session on every link desirable not to have a BGP session on every link, and techniques
and techniques such as the one described in Section 5.2.2 can be used such as the one described in Section 5.2.2 can be used to establish
to establish sessions on some subset of northbound links. For sessions on some subset of northbound links. For example, in a
example, in a Spine-Leaf topology, each leaf router would only peer Spine/Leaf topology, each leaf router would only peer with a subset
with a subset of the spines dependent on the flooding redundancy of the spines dependent on the flooding redundancy required to be
required to be reasonably certain that every node within the BGP-SPF reasonably certain that every node within the BGP-SPF routing domain
routing domain has the complete topology. has the complete topology.
Alternately, controller-based data center topologies are envisioned Alternately, controller-based data center topologies are envisioned
where BGP speakers within the data center only establish BGP sessions where BGP speakers within the data center only establish BGP sessions
with two or more controllers. In these topologies, fabric nodes with two or more controllers. In these topologies, fabric nodes
below the first tier, as shown in Figure 1 of [RFC7938], will below the first tier, as shown in Figure 1 of [RFC7938], will
establish BGP multi-hop sessions with the controllers. For the establish BGP multi-hop sessions with the controllers. For the
multi-hop sessions, determining the route to the controllers without multi-hop sessions, determining the route to the controllers without
depending on BGP would need to be through some other means beyond the depending on BGP would need to be through some other means beyond the
scope of this document. However, the BGP discovery mechanisms scope of this document. However, the BGP discovery mechanisms
described in Section 5.5 would be one possibility. described in Section 5.5 would be one possibility.
5.2.2. Bi-Connected Graph Heuristic 5.2.2. Biconnected Graph Heuristic
With this heuristic, discovery of BGP SPF peers is assumed, e.g., as With a biconnected graph heuristic, discovery of BGP SPF peers is
described in Section 5.5. In this context, "bi-connected" refers to assumed, e.g., as described in Section 5.5. In this context,
the fact that there must be an adverised link NLRI for both BGP SPF "biconnected" refers to the fact that there must be an advertised
peers associated with the link before the link can be used in the BGP Link NLRI for both BGP and SPF peers associated with the link before
SPF route calcuation. Additionally, it assumed that the direction of the link can be used in the BGP SPF route calculation. Additionally,
the peering can be ascertained. In the context of a data center it is assumed that the direction of the peering can be ascertained.
fabric, the direction is either northbound (toward the spine), In the context of a data center fabric, the direction is either
southbound (toward the Top-Of-Rack (ToR) routers) or east-west (same northbound (toward the spine), southbound (toward the Top-of-Rack
level in the hierarchy). The determination of the direction is (ToR) routers), or east-west (same level in the hierarchy). The
beyond the scope of this document. However, it would be reasonable determination of the direction is beyond the scope of this document.
to assume a technique where the ToR routers can be identified and the However, it would be reasonable to assume a technique where the ToR
number of hops to the ToR is used to determine the direction. routers can be identified and the number of hops to the ToR is used
to determine the direction.
In this heuristic, BGP speakers allow passive session establishment In this heuristic, BGP speakers allow passive session establishment
for southbound BGP sessions. For northbound sessions, BGP speakers for southbound BGP sessions. For northbound sessions, BGP speakers
will attempt to maintain two northbound BGP sessions with different will attempt to maintain two northbound BGP sessions with different
routers. For east-west sessions, passive BGP session establishment routers. For east-west sessions, passive BGP session establishment
is allowed. However, a BGP speaker will never actively establish an is allowed. However, a BGP speaker will never actively establish an
east-west BGP session unless it cannot establish two northbound BGP east-west BGP session unless it cannot establish two northbound BGP
sessions. sessions.
BGP SPF sparse peering deployments not using this hueristic are BGP SPF sparse peering deployments not using this heuristic are
possible but are not described herein and are considered out of possible but are not described herein and are considered out of
scope. scope.
5.3. BGP Spine/Leaf Topology Policy 5.3. BGP Spine/Leaf Topology Policy
One of the advantages of using BGP-SPF as the underlay protocol is One of the advantages of using BGP-SPF as the underlay protocol is
that BGP policy can be applied at any level. For example, depending that BGP policy can be applied at any level. For example, depending
on the topology, it may be possible to aggregate or filter prefix on the topology, it may be possible to aggregate or filter prefix
advertisements using existing BGP policy. In Spine/Leaf topologies, advertisements using the existing BGP policy. In Spine/Leaf
it is not necessary to advertise BGP-LS Prefix NLRI received by leaf topologies, it is not necessary to advertise a BGP-LS Prefix NLRI
nodes from the spine back to other spine nodes. If a common AS is received by leaf nodes from the spine back to other spine nodes. If
used for the spine nodes, this can easily be accomplished with EBGP a common Autonomous System (AS) is used for the spine nodes, this can
and a simple policy to filter advertisements from the leaves to the easily be accomplished with EBGP and a simple policy to filter
spine if the first AS in the AS path is the spine AS. advertisements from the leaves to the spine if the first AS in the AS
path is the spine AS.
In the figure below, the leaves would not advertise any NLRI with AS In the figure below, the leaves would not advertise any NLRIs with AS
64512 as the first AS in the AS path. 64512 as the first AS in the AS path.
+--------+ +--------+ +--------+ +--------+ +--------+ +--------+
AS 64512 | | | | | | AS 64512 | | | | | |
for Spine | Spine 1+----+ Spine 2+- ......... -+ Spine N| for Spine | Spine 1+----+ Spine 2+- ......... -+ Spine N|
Nodes at | | | | | | Nodes at | | | | | |
this Level +-+-+-+-++ ++-+-+-+-+ +-+-+-+-++ this Level +-+-+-+-++ ++-+-+-+-+ +-+-+-+-++
+------+ | | | | | | | | | | | +------+ | | | | | | | | | | |
| +-----|-|-|------+ | | | | | | | | +-----|-|-|------+ | | | | | | |
| | +--|-|-|--------+-|-|-----------------+ | | | | | +--|-|-|--------+-|-|-----------------+ | | |
skipping to change at page 8, line 29 skipping to change at line 354
| | | | +------|--|--------------+ | | | | | | | | | +------|--|--------------+ | | | | |
| | | +------+ | | | | | | | | | | | +------+ | | | | | | | |
++--+--++ +-+-+--++ ++-+--+-+ ++-+--+-+ ++--+--++ +-+-+--++ ++-+--+-+ ++-+--+-+
| Leaf 1| | Leaf 2| ........ | Leaf X| | Leaf Y| | Leaf 1| | Leaf 2| ........ | Leaf X| | Leaf Y|
+-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
Figure 2: Spine/Leaf Topology Policy Figure 2: Spine/Leaf Topology Policy
5.4. BGP Peer Discovery Considerations 5.4. BGP Peer Discovery Considerations
The basic functionality of peer discovery is to be discover the The basic functionality of peer discovery is to discover the address
address of a single-hop peer in case where the peer address is not of a single-hop peer in the case where the peer address is not
pre-configured. This is being accomplished today by using IPv6 preconfigured. This is being accomplished today by using IPv6 Router
Router Advertisements (RA) [RFC4861] and assuming that a BGP session Advertisements (RAs) [RFC4861] and assuming that a BGP session is
is desired with any discovered peer. Beyond the basic functionality, desired with any discovered peer. Beyond the basic functionality, it
it may be useful to have the following information relating to the may be useful to have the following information relating to the BGP
BGP session: session:
* Autonomous System (AS) and BGP Identifier of a potential peer. * The AS and BGP Identifier of a potential peer.
* Security capabilities supported and for cryptographic * Supported security capabilities, and for cryptographic
authentication, the security capabilities and possibly a key-chain authentication, the security capabilities and possibly a key chain
[RFC8177] to be used. [RFC8177] for use.
* Session Policy Identifier - A group number or name used to * A Session Policy Identifier, which is a group number or name used
associate common session parameters with the peer. For example, to associate common session parameters with the peer. For
in a data center, BGP sessions with a ToR device could have example, in a data center, BGP sessions with a ToR device could
different parameters than BGP sessions between leaf and spine. have different parameters than BGP sessions between leaf and
spine.
In a data center fabric, it is often useful to know whether a peer is In a data center fabric, it is often useful to know whether a peer is
southbound (towards the servers) or northbound (towards the spine or southbound (towards the servers) or northbound (towards the spine or
super-spine), e.g., Section 5.2.2. One mechanism, without specifying super-spine), e.g., see Section 5.2.2. One mechanism, without
all the details, might be for the ToR routers to be identified when specifying all the details, might be for the ToR routers to be
installed and for the others routers in the fabric to determine their identified when installed and for the other routers in the fabric to
level based on the distance from the closest ToR router. determine their level based on the distance from the closest ToR
router.
If there are multiple links between BGP speakers or the links between If there are multiple links between BGP speakers or the links between
BGP speakers are unnumbered, it is also useful to be able to BGP speakers are unnumbered, it is also useful to be able to
establish multi-hop sessions using the loopback addresses. This will establish multi-hop sessions using the loopback addresses. This will
often require the discovery protocol to install route(s) toward the often require the discovery protocol to install one or more routes
potential peer loopback addresses prior to BGP session establishment. toward the potential peer loopback addresses prior to BGP session
establishment.
Finally, a simple BGP discovery protocol may be used to establish a Finally, a simple BGP discovery protocol may be used to establish a
multi-hop session with one or more controllers by advertising multi-hop session with one or more controllers by advertising
connectivity to one or more controllers. connectivity to one or more controllers.
5.5. BGP Peer Discovery 5.5. BGP Peer Discovery
5.5.1. BGP IPv6 Simplified Peering 5.5.1. BGP IPv6 Simplified Peering
To conserve IPv4 address space and simplify operations, BGP-SPF To conserve IPv4 address space and simplify operations, BGP-SPF
routers in Clos/Fat Tree deployments can use IPv6 addresses as peer routers in Clos / Fat Tree deployments can use IPv6 addresses as the
address. For IPv4 address families, IPv6 peering as specified in peer address. For IPv4 address families, IPv6 peering as specified
[RFC8950] can be deployed to avoid configuring IPv4 addresses on in [RFC8950] can be deployed to avoid configuring IPv4 addresses on
router interfaces. When this is done, dynamic discovery mechanisms, router interfaces. When this is done, dynamic discovery mechanisms,
as described in Section 5.5, can be used to learn the global or link- as described in Section 5.5, can be used to learn the global or link-
local IPv6 peer addresses and IPv4 addresses need not be configured local IPv6 peer addresses, and IPv4 addresses need not be configured
on these interfaces. If IPv6 link-local peering is used, then on these interfaces. If IPv6 link-local peering is used, then
configuration of IPv6 global addresses is also not required [RFC7404] configuration of IPv6 global addresses is also not required
. The Link Local/Remote Identifiers of the peering interfaces MUST be [RFC7404]. The Link Local/Remote Identifiers of the peering
used in the link NLRI as described in section 5.2.2 of interfaces MUST be used in the Link NLRI as described in
[I-D.ietf-lsvr-bgp-spf]. Section 5.2.2 of [RFC9815].
5.5.2. BGP-LS SPF Topology Visibility for Management 5.5.2. BGP-LS SPF Topology Visibility for Management
Irrespective of whether or not BGP-SPF is used for route calculation, Irrespective of whether or not BGP-SPF is used for route calculation,
the BGP-LS-SPF route advertisements can be used to periodically the BGP-LS-SPF route advertisements can be used to periodically
construct the Clos/Fat Tree topology. This is especially useful in construct the Clos / Fat Tree topology. This is especially useful in
deployments where an Interior Gateway Protocol (IGP) is not used and deployments where an Interior Gateway Protocol (IGP) is not used and
the base BGP-LS routes [RFC9552] are not available. The resultant the base BGP-LS routes [RFC9552] are not available. The resultant
topology visibility can then be used for troubleshooting and topology visibility can then be used for troubleshooting and
consistency checking. This would normally be done on a central consistency checking. This would normally be done on a central
controller or other management tool which could also be used for controller or other management tool that could also be used for
fabric data path verification. The precise algorithms and fabric data path verification. The precise algorithms and
heuristics, as well as the complete set of management applications is heuristics, as well as the complete set of management applications,
beyond the scope of this document. is beyond the scope of this document.
5.5.3. Data Center Interconnect (DCI) Applicability 5.5.3. Data Center Interconnect (DCI) Applicability
Since BGP-SPF is to be used for the routing underlay and DCI gateway Since BGP-SPF is to be used for the routing underlay and Data Center
boxes typically have direct or very simple connectivity, BGP external Interconnect (DCI) gateway boxes typically have direct or very simple
sessions would typically not include the BGP-LS-SPF SAFI. connectivity, BGP external sessions would typically not include the
BGP-LS-SPF SAFI.
6. Non-CLOS/FAT Tree Topology Applicability 6. Non-Clos / Fat Tree Topology Applicability
The BGP-SPF extensions [I-D.ietf-lsvr-bgp-spf] can be used in other The BGP-SPF extensions [RFC9815] can be used in other topologies and
topologies and avail the inherent convergence improvements. avail the inherent convergence improvements. Additionally, sparse
Additionally, sparse peering techniques may be utilized Section 5.2. peering techniques may be utilized Section 5.2. However, determining
However, determining whether to establish a BGP session is more whether to establish a BGP session is more complex, and the heuristic
complex and the heuristic described in Section 5.2.2 cannot be used. described in Section 5.2.2 cannot be used. In such topologies, other
In such topologies, other techniques such as those described in techniques such as those described in [RFC9667] may be employed. One
[RFC9667] may be employed. One potential deployment would be the potential deployment would be the underlay for a Service Provider
underlay for a Service Provider (SP) backbone where usage of a single (SP) backbone where usage of a single protocol, i.e., BGP, is
protocol, i.e., BGP, is desired. desired.
7. Non-Transit Node Capability 7. Non-Transit Node Capability
In certain scenarios, a BGP node wishes to participate in the BGP-SPF In certain scenarios, a BGP node wishes to participate in the BGP-SPF
topology but never be used for transit traffic. These include topology but never be used for transit traffic. These include
situations where a server wants to make application services situations where a server wants to make application services
available to clients homed at subnets throughout the BGP-SPF domain available to clients homed at subnets throughout the BGP-SPF domain
but does not ever want to be used as a router (i.e., carry transit but does not ever want to be used as a router (i.e., carry transit
traffic). Another specific instance is where a controller is traffic). Another specific instance is where a controller is
resident on a server and direct connectivity to the controller is resident on a server and direct connectivity to the controller is
required throughout the entire domain. This can readily be required throughout the entire domain. This can readily be
accomplished using the BGP-LS Node NLRI Attribute SPF Status TLV as accomplished using the BGP-LS-SPF Node NLRI Attribute SPF Status TLV
described in [I-D.ietf-lsvr-bgp-spf]. as described in [RFC9815].
8. BGP Policy Applicability 8. BGP Policy Applicability
Existing BGP policy such as prefix filtering may be used in Existing BGP policy such as prefix filtering may be used in
conjunction with the BGP-LS-SPF SAFI. When BGP policy is used with conjunction with the BGP-LS-SPF SAFI. When BGP policy is used with
the BGP-LS-SPF SAFI, BGP speakers in the BGP-LS-SPF routing domain the BGP-LS-SPF SAFI, BGP speakers in the BGP-LS-SPF routing domain
will not all have the same set of NLRI and will compute a different will not all have the same set of NLRIs and will compute a different
BGP local routing table. Consequently, care must be taken to assure BGP local routing table. Consequently, care must be taken to assure
routing is consistent and blackholes or routing loops do not ensue. routing is consistent and blackholes or routing loops do not ensue.
However, this is no different than if traditional BGP routing using However, this is no different than if traditional BGP routing using
the IPv4 and IPv6 address families were used. the IPv4 and IPv6 address families were used.
9. IANA Considerations 9. IANA Considerations
No IANA updates are requested by this document. This document has no IANA actions.
10. Security Considerations 10. Security Considerations
This document introduces no new security considerations above and This document introduces no new security considerations above and
beyond those already specified in the [RFC4271] and beyond those already specified in [RFC4271] and [RFC9815].
[I-D.ietf-lsvr-bgp-spf].
11. Acknowledgements
The authors would like to thank Alvaro Retana, Yan Filyurin, Boris
Hassanov, Stig Venaas, Ron Bonica, Mallory Knodel, Dhruv Dhody, Erik
Kline, Eric Vyncke, and John Scudder for their review and comments.
12. References 11. References
12.1. Normative References 11.1. Normative References
[I-D.ietf-lsvr-bgp-spf] [RFC9815] Patel, K., Lindem, A., Zandi, S., and W. Henderickx, "BGP
Patel, K., Lindem, A., Zandi, S., and W. Henderickx, "BGP Link-State Shortest Path First (SPF) Routing", RFC 9815,
Link-State Shortest Path First (SPF) Routing", Work in DOI 10.17487/RFC9815, July 2025,
Progress, Internet-Draft, draft-ietf-lsvr-bgp-spf-51, 14 <https://www.rfc-editor.org/info/rfc9815>.
January 2025,
<https://datatracker.ietf.org/api/v1/doc/document/draft-
ietf-lsvr-bgp-spf/>.
12.2. Informative References 11.2. Informative References
[Clos] "A Study of Non-Blocking Switching Networks", The Bell [Clos] Clos, C., "A Study of Non-Blocking Switching Networks",
System Technical Journal, Vol. 32(2), DOI The Bell System Technical Journal, vol. 32, no. 2, pp.
10.1002/j.1538-7305.1953.tb01433.x, March 1953. 406-424, DOI 10.1002/j.1538-7305.1953.tb01433.x, March
1953,
<https://doi.org/10.1002/j.1538-7305.1953.tb01433.x>.
[IEEE.802.1AX] [IEEE.802.1AX]
IEEE, "IEEE Standard for Local and Metropolitan Area IEEE, "IEEE Standard for Local and Metropolitan Area
Networks: Link Aggregation", IEEE Std 802.1AX-2020, 2020, Networks--Link Aggregation", IEEE Std 802.1AX-2020,
<https://standards.ieee.org/standard/802_1AX-2020.html>. DOI 10.1109/IEEESTD.2020.9105034, May 2020,
<https://doi.org/10.1109/IEEESTD.2020.9105034>.
[RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328, [RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328,
DOI 10.17487/RFC2328, April 1998, DOI 10.17487/RFC2328, April 1998,
<https://www.rfc-editor.org/info/rfc2328>. <https://www.rfc-editor.org/info/rfc2328>.
[RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A [RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A
Border Gateway Protocol 4 (BGP-4)", RFC 4271, Border Gateway Protocol 4 (BGP-4)", RFC 4271,
DOI 10.17487/RFC4271, January 2006, DOI 10.17487/RFC4271, January 2006,
<https://www.rfc-editor.org/info/rfc4271>. <https://www.rfc-editor.org/info/rfc4271>.
skipping to change at page 12, line 30 skipping to change at line 537
[RFC5340] Coltun, R., Ferguson, D., Moy, J., and A. Lindem, "OSPF [RFC5340] Coltun, R., Ferguson, D., Moy, J., and A. Lindem, "OSPF
for IPv6", RFC 5340, DOI 10.17487/RFC5340, July 2008, for IPv6", RFC 5340, DOI 10.17487/RFC5340, July 2008,
<https://www.rfc-editor.org/info/rfc5340>. <https://www.rfc-editor.org/info/rfc5340>.
[RFC5580] Tschofenig, H., Ed., Adrangi, F., Jones, M., Lior, A., and [RFC5580] Tschofenig, H., Ed., Adrangi, F., Jones, M., Lior, A., and
B. Aboba, "Carrying Location Objects in RADIUS and B. Aboba, "Carrying Location Objects in RADIUS and
Diameter", RFC 5580, DOI 10.17487/RFC5580, August 2009, Diameter", RFC 5580, DOI 10.17487/RFC5580, August 2009,
<https://www.rfc-editor.org/info/rfc5580>. <https://www.rfc-editor.org/info/rfc5580>.
[RFC5880] Katz, D. and D. Ward, "Bidirectional Forwarding Detection
(BFD)", RFC 5880, DOI 10.17487/RFC5880, June 2010,
<https://www.rfc-editor.org/info/rfc5880>.
[RFC5925] Touch, J., Mankin, A., and R. Bonica, "The TCP [RFC5925] Touch, J., Mankin, A., and R. Bonica, "The TCP
Authentication Option", RFC 5925, DOI 10.17487/RFC5925, Authentication Option", RFC 5925, DOI 10.17487/RFC5925,
June 2010, <https://www.rfc-editor.org/info/rfc5925>. June 2010, <https://www.rfc-editor.org/info/rfc5925>.
[RFC7404] Behringer, M. and E. Vyncke, "Using Only Link-Local [RFC7404] Behringer, M. and E. Vyncke, "Using Only Link-Local
Addressing inside an IPv6 Network", RFC 7404, Addressing inside an IPv6 Network", RFC 7404,
DOI 10.17487/RFC7404, November 2014, DOI 10.17487/RFC7404, November 2014,
<https://www.rfc-editor.org/info/rfc7404>. <https://www.rfc-editor.org/info/rfc7404>.
[RFC7938] Lapukhov, P., Premji, A., and J. Mitchell, Ed., "Use of [RFC7938] Lapukhov, P., Premji, A., and J. Mitchell, Ed., "Use of
skipping to change at page 13, line 21 skipping to change at line 576
[RFC9552] Talaulikar, K., Ed., "Distribution of Link-State and [RFC9552] Talaulikar, K., Ed., "Distribution of Link-State and
Traffic Engineering Information Using BGP", RFC 9552, Traffic Engineering Information Using BGP", RFC 9552,
DOI 10.17487/RFC9552, December 2023, DOI 10.17487/RFC9552, December 2023,
<https://www.rfc-editor.org/info/rfc9552>. <https://www.rfc-editor.org/info/rfc9552>.
[RFC9667] Li, T., Ed., Psenak, P., Ed., Chen, H., Jalil, L., and S. [RFC9667] Li, T., Ed., Psenak, P., Ed., Chen, H., Jalil, L., and S.
Dontula, "Dynamic Flooding on Dense Graphs", RFC 9667, Dontula, "Dynamic Flooding on Dense Graphs", RFC 9667,
DOI 10.17487/RFC9667, October 2024, DOI 10.17487/RFC9667, October 2024,
<https://www.rfc-editor.org/info/rfc9667>. <https://www.rfc-editor.org/info/rfc9667>.
Acknowledgements
The authors would like to thank Alvaro Retana, Yan Filyurin, Boris
Hassanov, Stig Venaas, Ron Bonica, Mallory Knodel, Dhruv Dhody, Erik
Kline, Éric Vyncke, and John Scudder for their reviews and comments.
Authors' Addresses Authors' Addresses
Keyur Patel Keyur Patel
Arrcus, Inc. Arrcus, Inc.
2077 Gateway Pl 2077 Gateway Pl
San Jose, CA, 95110 San Jose, CA 95110
United States of America United States of America
Email: keyur@arrcus.com Email: keyur@arrcus.com
Acee Lindem Acee Lindem
LabN Consulting, L.L.C. LabN Consulting, L.L.C.
301 Midenhall Way 301 Midenhall Way
Cary, NC, 95110 Cary, NC 95110
United States of America United States of America
Email: acee.ietf@gmail.com Email: acee.ietf@gmail.com
Shawn Zandi Shawn Zandi
Linkedin LinkedIn
222 2nd Street 222 2nd Street
San Francisco, CA 94105 San Francisco, CA 94105
United States of America United States of America
Email: szandi@linkedin.com Email: szandi@linkedin.com
Gaurav Dawra Gaurav Dawra
Linkedin Linkedin
222 2nd Street 222 2nd Street
San Francisco, CA 94105 San Francisco, CA 94105
United States of America United States of America
 End of changes. 69 change blocks. 
226 lines changed or deleted 237 lines changed or added

This html diff was produced by rfcdiff 1.48.