rfc9722v1.txt   rfc9722.txt 
skipping to change at line 136 skipping to change at line 136
of a site. of a site.
NDF: Non-Designated Forwarder. A PE that is currently blocking NDF: Non-Designated Forwarder. A PE that is currently blocking
traffic (see DF above). traffic (see DF above).
EVI: EVPN Instance. It spans the PE devices participating in that EVI: EVPN Instance. It spans the PE devices participating in that
EVPN. EVPN.
HRW: Highest Random Weight algorithm [HRW98] HRW: Highest Random Weight algorithm [HRW98]
Service carving: DF Election is also referred to as "service Service carving: This refers to DF election, as defined in
carving" in [RFC7432] [RFC7432].
SCT: Service Carving Time. Defined in this document as the time at SCT: Service Carving Time. Defined in this document as the time at
which all nodes participating in an Ethernet Segment perform DF which all nodes participating in an Ethernet Segment perform DF
Election. Election.
1.3. Challenges with Existing Mechanism 1.3. Challenges with Existing Mechanism
In EVPN technology, multiple PE devices encapsulate and decapsulate In EVPN technology, multiple PE devices encapsulate and decapsulate
data belonging to the same VLAN. Under certain conditions, this may data belonging to the same VLAN. Under certain conditions, this may
cause duplicated Ethernet packets and potential loops if there is a cause duplicated Ethernet packets and potential loops if there is a
skipping to change at line 206 skipping to change at line 206
is no handshake mechanism between PE1 and PE2, overlapping of DF is no handshake mechanism between PE1 and PE2, overlapping of DF
roles for a given VLAN is possible, which leads to duplication of roles for a given VLAN is possible, which leads to duplication of
traffic as well as Layer 2 loops. traffic as well as Layer 2 loops.
Current EVPN specifications [RFC7432] and [RFC8584] rely on a timer- Current EVPN specifications [RFC7432] and [RFC8584] rely on a timer-
based approach for transferring the DF role to the newly inserted based approach for transferring the DF role to the newly inserted
device. This can cause the following issues: device. This can cause the following issues:
* Loops and duplicates, if the timer value is too short * Loops and duplicates, if the timer value is too short
* Prolonged traffic blackholing, if the timer value is too long * Prolonged traffic loss, if the timer value is too long
1.4. Design Principles for a Solution 1.4. Design Principles for a Solution
The clock-synchronization solution for fast DF recovery presented in The clock-synchronization solution for fast DF recovery presented in
this document follows several design principles and offers multiple this document follows several design principles and offers multiple
advantages, namely: advantages, namely:
* Complex handshake signaling mechanisms and state machines are * Complex handshake signaling mechanisms and state machines are
avoided in favor of a simple unidirectional signaling approach. avoided in favor of a simple unidirectional signaling approach.
skipping to change at line 232 skipping to change at line 232
* The fast DF recovery solution is independent of any BGP delays in * The fast DF recovery solution is independent of any BGP delays in
propagation of Ethernet Segment routes (Route Type 4) propagation of Ethernet Segment routes (Route Type 4)
* The fast DF recovery solution is agnostic of the actual time * The fast DF recovery solution is agnostic of the actual time
synchronization mechanism used; however, an NTP-based synchronization mechanism used; however, an NTP-based
representation of time is used for EVPN signaling. representation of time is used for EVPN signaling.
The solution in this document relies on nodes in the topology, more The solution in this document relies on nodes in the topology, more
specifically the peering nodes of each Ethernet-Segment, to be clock- specifically the peering nodes of each Ethernet-Segment, to be clock-
synchronized and to advertise Time Synchronization capability. When synchronized and to advertise the Time Synchronization capability.
this is not the case, or when clocks are badly desynchronized, When this is not the case, or when clocks are badly desynchronized,
network convergence and DF Election is no worse than that described network convergence and DF Election is no worse than that described
in [RFC7432] due to the timestamp range checking (Section 2.2). in [RFC7432] due to the timestamp range checking (Section 2.2).
2. DF Election Synchronization Solution 2. DF Election Synchronization Solution
The fast DF recovery solution relies on the concept of common clock The fast DF recovery solution relies on the concept of common clock
alignment between partner PEs participating in a common Ethernet alignment between partner PEs participating in a common Ethernet
Segment, i.e., PE1 and PE2 in Figure 1. The main idea is to have all Segment, i.e., PE1 and PE2 in Figure 1. The main idea is to have all
peering PEs of that Ethernet Segment perform DF election and apply peering PEs of that Ethernet Segment perform DF election and apply
the result at the same previously announced time. the result at the same previously announced time.
skipping to change at line 300 skipping to change at line 300
2.1. BGP Encoding 2.1. BGP Encoding
A BGP extended community, with Type 0x06 and Sub-Type 0x0F, is A BGP extended community, with Type 0x06 and Sub-Type 0x0F, is
defined to communicate the SCT for each Ethernet Segment: defined to communicate the SCT for each Ethernet Segment:
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type = 0x06 | Sub-Type(0x0F)| Timestamp Seconds ~ | Type = 0x06 | Sub-Type(0x0F)| Timestamp Seconds ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~ Timestamp Seconds | Timestamp Fractional Seconds | ~ Timestamp Seconds | Timestamp Fraction |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 2: Service Carving Time Figure 2: Service Carving Time
The timestamp exchanged uses the NTP prime epoch of January 1, 1900 The timestamp exchanged uses the NTP prime epoch of 0 h 1 January
[RFC5905] and an adapted form of the 64-bit NTP Timestamp Format. 1900 UTC [RFC5905] and an adapted form of the 64-bit NTP timestamp
format.
The 64-bit NTP Timestamp Format consists of a 32-bit part for Seconds The 64-bit NTP timestamp format consists of a 32-bit unsigned seconds
and a 32-bit part for Fractional Seconds, which are encoded in the field and a 32-bit fraction field, which are encoded in the Service
Service Carving Time as follows: Carving Time as follows:
Timestamp Seconds: 32-bit NTP seconds are encoded in this field. Timestamp Seconds: 32-bit NTP seconds are encoded in this field.
Timestamp Fractional Seconds: The high-order 16 bits of the NTP Timestamp Fraction: The high-order 16 bits of the NTP "Fraction"
"Fraction" field are encoded in this field. field are encoded in this field.
When rebuilding a 64-bit NTP Timestamp Format using the values from a When rebuilding a 64-bit NTP timestamp format using the values from a
received SCT BGP extended community, the lower-order 16 bits of the received SCT BGP extended community, the lower-order 16 bits of the
Fractional field are set to 0. The use of a 16-bit fractional NTP "Fraction" field are set to 0. The use of a 16-bit fractional
seconds value yields adequate precision of 15 microseconds (2^-16 s). seconds value yields adequate precision of 15 microseconds (2^-16 s).
This document introduces a new flag called Time Synchronization The format of the DF Election Extended Community that is used in this
indicated by "T" in the "DF Election Capabilities" registry defined document is:
in [RFC8584] for use in DF Election Extended Community.
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type = 0x06 | Sub-Type(0x06)| RSV | DF Alg | Bitmap ~ | Type = 0x06 | Sub-Type(0x06)| RSV | DF Alg | Bitmap ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~ Bitmap | Reserved | ~ Bitmap | Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 3: DF Election Extended Community (Figure 4 in RFC 8584) Figure 3: DF Election Extended Community (RFC 8584)
The Bitmap field (2 octets) encodes "capabilities" [RFC8584], where
this document introduces a new Time Synchronization capability
indicated by "T".
1 1 1 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |A| |T| | | |A| |T| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 4: DF Election Capabilities Figure 4: Bitmap Field in the DF Election Extended Community
Bit 3: Time Synchronization (corresponds to Bit 27 of the DF Bit 3: Time Synchronization (corresponds to Bit 27 of the DF
Election Extended Community). When set to 1, it indicates the Election Extended Community). When set to 1, it indicates the
desire to use Time Synchronization capability with the rest of the desire to use the Time Synchronization capability with the rest of
PEs in the Ethernet Segment. the PEs in the Ethernet Segment.
This capability is utilized in conjunction with the agreed-upon DF This capability is utilized in conjunction with the agreed-upon DF
Election Type. For instance, if all the PE devices in the Ethernet Election Type. For instance, if all the PE devices in the Ethernet
Segment indicate the desire to use the Time Synchronization Segment indicate the desire to use the Time Synchronization
capability and request the DF Election Type to be the HRW, then the capability and request the DF Election Type to be the HRW, then the
HRW algorithm is used in conjunction with this capability. A PE that HRW algorithm is used in conjunction with this capability. A PE that
does not support the procedures set out in this document or that does not support the procedures set out in this document or that
receives a route from another PE in which the capability is not set receives a route from another PE in which the capability is not set
MUST NOT delay DF election as this could lead to duplicate traffic in MUST NOT delay DF election as this could lead to duplicate traffic in
some instances (overlapping DFs). some instances (overlapping DFs).
skipping to change at line 419 skipping to change at line 423
accompanying forwarding updates to the DF and NDF states are also accompanying forwarding updates to the DF and NDF states are also
deferred. deferred.
Item 9 in Section 2.1 of [RFC8584], in the list "Corresponding Item 9 in Section 2.1 of [RFC8584], in the list "Corresponding
actions when transitions are performed or states are entered/exited", actions when transitions are performed or states are entered/exited",
is changed as follows: is changed as follows:
| 9. DF_CALC on CALCULATED: Mark the election result for the VLAN | 9. DF_CALC on CALCULATED: Mark the election result for the VLAN
| or VLAN bundle. | or VLAN bundle.
| |
| 9.1 If an SCT timestamp is present during the RCVD_ES event | 9.1 If no Service Carving Time is present during the RCVD_ES
| of Action 11, wait until the time indicated by the SCT | event of Action 11, proceed to step 9.4
| minus skew before proceeding to step 9.3.
| |
| 9.2 If an SCT timestamp is present during the RCVD_ES event | 9.2 If a Service Carving Time is present during the RCVD_ES
| of Action 11, wait until the time indicated by the SCT | event of Action 11, wait until the time indicated by the
| before proceeding to step 9.4. | SCT minus skew before proceeding to step 9.3.
| |
| 9.3 Assume the role of NDF for the local PE concerning the | 9.3 Assume the role of NDF for the local PE concerning the
| VLAN or VLAN bundle and transition to the DF_DONE state. | VLAN or VLAN bundle. Wait the remaining skew time before
| proceeding to step 9.4.
| |
| 9.4 Assume the role of DF for the local PE concerning the | 9.4 Assume the election result's role (DF or NDF) for the
| VLAN or VLAN bundle and transition to the DF_DONE state. | local PE concerning the VLAN or VLAN bundle and
| transition to the DF_DONE state.
This revised approach ensures proper timing and synchronization in This revised approach ensures proper timing and synchronization in
the DF election process, avoiding conflicts and ensuring accurate the DF election process, avoiding conflicts and ensuring accurate
forwarding updates. forwarding updates.
3. Synchronization Scenarios 3. Synchronization Scenarios
Consider Figure 1 as an example, where initially PE2 has failed and Consider Figure 1 as an example, where initially PE2 has failed and
PE1 has taken over. This scenario illustrates the problem with the PE1 has taken over. This scenario illustrates the problem with the
DF Election mechanism described in Section 8.5 of [RFC7432], DF Election mechanism described in Section 8.5 of [RFC7432],
skipping to change at line 502 skipping to change at line 507
the following: the following:
* DF-to-NDF Transition(s): at t=SCT minus skew, where both PEs are * DF-to-NDF Transition(s): at t=SCT minus skew, where both PEs are
NDF for the skew duration. NDF for the skew duration.
* NDF-to-DF Transition(s): at t=SCT. * NDF-to-DF Transition(s): at t=SCT.
This split behavior ensures a smooth DF role transition with minimal This split behavior ensures a smooth DF role transition with minimal
loss. loss.
Using the SCT approach, the negative effect of the timer to allow the The SCT approach mitigates the negative effect of requiring a timer
reception of Ethernet Segment (ES) RT-4 from other PE nodes is for discovery of Ethernet Segment (ES) RT-4 from other PE nodes.
mitigated. Furthermore, the BGP transmission delay (from PE2 to PE1) Furthermore, the BGP transmission delay (from PE2 to PE1) of the ES
of the ES RT-4 becomes a non-issue. The SCT approach shortens the RT-4 becomes a non-issue. The SCT approach shortens the 3-second
3-second timer window to the order of milliseconds. timer window to the order of milliseconds.
The peering timer is a configurable value where 3 seconds represents The peering timer is a configurable value where 3 seconds represents
the default. Configuring a timer value of 0, or so small as to the default. Configuring a timer value of 0, or so small as to
expire during propagation of the BGP routes, is outside the scope of expire during propagation of the BGP routes, is outside the scope of
this document. In reality, the use of the SCT approach presented in this document. In reality, the use of the SCT approach presented in
this document encourages the use of larger peering timer values to this document encourages the use of larger peering timer values to
overcome any sort of BGP route propagation delays. overcome any sort of BGP route propagation delays.
3.1. Concurrent Recoveries 3.1. Concurrent Recoveries
skipping to change at line 709 skipping to change at line 714
Authors' Addresses Authors' Addresses
Patrice Brissette Patrice Brissette
Cisco Cisco
Email: pbrisset@cisco.com Email: pbrisset@cisco.com
Ali Sajassi Ali Sajassi
Cisco Cisco
Email: sajassi@cisco.com Email: sajassi@cisco.com
Luc Andre Burdet (editor) Luc André Burdet (editor)
Cisco Cisco
Email: lburdet@cisco.com Email: lburdet@cisco.com
John Drake John Drake
Independent Independent
Email: je_drake@yahoo.com Email: je_drake@yahoo.com
Jorge Rabadan Jorge Rabadan
Nokia Nokia
Email: jorge.rabadan@nokia.com Email: jorge.rabadan@nokia.com
 End of changes. 19 change blocks. 
37 lines changed or deleted 42 lines changed or added

This html diff was produced by rfcdiff 1.48.