rfc9722v1.txt | rfc9722.txt | |||
---|---|---|---|---|
skipping to change at line 136 ¶ | skipping to change at line 136 ¶ | |||
of a site. | of a site. | |||
NDF: Non-Designated Forwarder. A PE that is currently blocking | NDF: Non-Designated Forwarder. A PE that is currently blocking | |||
traffic (see DF above). | traffic (see DF above). | |||
EVI: EVPN Instance. It spans the PE devices participating in that | EVI: EVPN Instance. It spans the PE devices participating in that | |||
EVPN. | EVPN. | |||
HRW: Highest Random Weight algorithm [HRW98] | HRW: Highest Random Weight algorithm [HRW98] | |||
Service carving: DF Election is also referred to as "service | Service carving: This refers to DF election, as defined in | |||
carving" in [RFC7432] | [RFC7432]. | |||
SCT: Service Carving Time. Defined in this document as the time at | SCT: Service Carving Time. Defined in this document as the time at | |||
which all nodes participating in an Ethernet Segment perform DF | which all nodes participating in an Ethernet Segment perform DF | |||
Election. | Election. | |||
1.3. Challenges with Existing Mechanism | 1.3. Challenges with Existing Mechanism | |||
In EVPN technology, multiple PE devices encapsulate and decapsulate | In EVPN technology, multiple PE devices encapsulate and decapsulate | |||
data belonging to the same VLAN. Under certain conditions, this may | data belonging to the same VLAN. Under certain conditions, this may | |||
cause duplicated Ethernet packets and potential loops if there is a | cause duplicated Ethernet packets and potential loops if there is a | |||
skipping to change at line 206 ¶ | skipping to change at line 206 ¶ | |||
is no handshake mechanism between PE1 and PE2, overlapping of DF | is no handshake mechanism between PE1 and PE2, overlapping of DF | |||
roles for a given VLAN is possible, which leads to duplication of | roles for a given VLAN is possible, which leads to duplication of | |||
traffic as well as Layer 2 loops. | traffic as well as Layer 2 loops. | |||
Current EVPN specifications [RFC7432] and [RFC8584] rely on a timer- | Current EVPN specifications [RFC7432] and [RFC8584] rely on a timer- | |||
based approach for transferring the DF role to the newly inserted | based approach for transferring the DF role to the newly inserted | |||
device. This can cause the following issues: | device. This can cause the following issues: | |||
* Loops and duplicates, if the timer value is too short | * Loops and duplicates, if the timer value is too short | |||
* Prolonged traffic blackholing, if the timer value is too long | * Prolonged traffic loss, if the timer value is too long | |||
1.4. Design Principles for a Solution | 1.4. Design Principles for a Solution | |||
The clock-synchronization solution for fast DF recovery presented in | The clock-synchronization solution for fast DF recovery presented in | |||
this document follows several design principles and offers multiple | this document follows several design principles and offers multiple | |||
advantages, namely: | advantages, namely: | |||
* Complex handshake signaling mechanisms and state machines are | * Complex handshake signaling mechanisms and state machines are | |||
avoided in favor of a simple unidirectional signaling approach. | avoided in favor of a simple unidirectional signaling approach. | |||
skipping to change at line 232 ¶ | skipping to change at line 232 ¶ | |||
* The fast DF recovery solution is independent of any BGP delays in | * The fast DF recovery solution is independent of any BGP delays in | |||
propagation of Ethernet Segment routes (Route Type 4) | propagation of Ethernet Segment routes (Route Type 4) | |||
* The fast DF recovery solution is agnostic of the actual time | * The fast DF recovery solution is agnostic of the actual time | |||
synchronization mechanism used; however, an NTP-based | synchronization mechanism used; however, an NTP-based | |||
representation of time is used for EVPN signaling. | representation of time is used for EVPN signaling. | |||
The solution in this document relies on nodes in the topology, more | The solution in this document relies on nodes in the topology, more | |||
specifically the peering nodes of each Ethernet-Segment, to be clock- | specifically the peering nodes of each Ethernet-Segment, to be clock- | |||
synchronized and to advertise Time Synchronization capability. When | synchronized and to advertise the Time Synchronization capability. | |||
this is not the case, or when clocks are badly desynchronized, | When this is not the case, or when clocks are badly desynchronized, | |||
network convergence and DF Election is no worse than that described | network convergence and DF Election is no worse than that described | |||
in [RFC7432] due to the timestamp range checking (Section 2.2). | in [RFC7432] due to the timestamp range checking (Section 2.2). | |||
2. DF Election Synchronization Solution | 2. DF Election Synchronization Solution | |||
The fast DF recovery solution relies on the concept of common clock | The fast DF recovery solution relies on the concept of common clock | |||
alignment between partner PEs participating in a common Ethernet | alignment between partner PEs participating in a common Ethernet | |||
Segment, i.e., PE1 and PE2 in Figure 1. The main idea is to have all | Segment, i.e., PE1 and PE2 in Figure 1. The main idea is to have all | |||
peering PEs of that Ethernet Segment perform DF election and apply | peering PEs of that Ethernet Segment perform DF election and apply | |||
the result at the same previously announced time. | the result at the same previously announced time. | |||
skipping to change at line 300 ¶ | skipping to change at line 300 ¶ | |||
2.1. BGP Encoding | 2.1. BGP Encoding | |||
A BGP extended community, with Type 0x06 and Sub-Type 0x0F, is | A BGP extended community, with Type 0x06 and Sub-Type 0x0F, is | |||
defined to communicate the SCT for each Ethernet Segment: | defined to communicate the SCT for each Ethernet Segment: | |||
1 2 3 | 1 2 3 | |||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| Type = 0x06 | Sub-Type(0x0F)| Timestamp Seconds ~ | | Type = 0x06 | Sub-Type(0x0F)| Timestamp Seconds ~ | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
~ Timestamp Seconds | Timestamp Fractional Seconds | | ~ Timestamp Seconds | Timestamp Fraction | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
Figure 2: Service Carving Time | Figure 2: Service Carving Time | |||
The timestamp exchanged uses the NTP prime epoch of January 1, 1900 | The timestamp exchanged uses the NTP prime epoch of 0 h 1 January | |||
[RFC5905] and an adapted form of the 64-bit NTP Timestamp Format. | 1900 UTC [RFC5905] and an adapted form of the 64-bit NTP timestamp | |||
format. | ||||
The 64-bit NTP Timestamp Format consists of a 32-bit part for Seconds | The 64-bit NTP timestamp format consists of a 32-bit unsigned seconds | |||
and a 32-bit part for Fractional Seconds, which are encoded in the | field and a 32-bit fraction field, which are encoded in the Service | |||
Service Carving Time as follows: | Carving Time as follows: | |||
Timestamp Seconds: 32-bit NTP seconds are encoded in this field. | Timestamp Seconds: 32-bit NTP seconds are encoded in this field. | |||
Timestamp Fractional Seconds: The high-order 16 bits of the NTP | Timestamp Fraction: The high-order 16 bits of the NTP "Fraction" | |||
"Fraction" field are encoded in this field. | field are encoded in this field. | |||
When rebuilding a 64-bit NTP Timestamp Format using the values from a | When rebuilding a 64-bit NTP timestamp format using the values from a | |||
received SCT BGP extended community, the lower-order 16 bits of the | received SCT BGP extended community, the lower-order 16 bits of the | |||
Fractional field are set to 0. The use of a 16-bit fractional | NTP "Fraction" field are set to 0. The use of a 16-bit fractional | |||
seconds value yields adequate precision of 15 microseconds (2^-16 s). | seconds value yields adequate precision of 15 microseconds (2^-16 s). | |||
This document introduces a new flag called Time Synchronization | The format of the DF Election Extended Community that is used in this | |||
indicated by "T" in the "DF Election Capabilities" registry defined | document is: | |||
in [RFC8584] for use in DF Election Extended Community. | ||||
1 2 3 | 1 2 3 | |||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| Type = 0x06 | Sub-Type(0x06)| RSV | DF Alg | Bitmap ~ | | Type = 0x06 | Sub-Type(0x06)| RSV | DF Alg | Bitmap ~ | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
~ Bitmap | Reserved | | ~ Bitmap | Reserved | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
Figure 3: DF Election Extended Community (Figure 4 in RFC 8584) | Figure 3: DF Election Extended Community (RFC 8584) | |||
The Bitmap field (2 octets) encodes "capabilities" [RFC8584], where | ||||
this document introduces a new Time Synchronization capability | ||||
indicated by "T". | ||||
1 1 | 1 1 | |||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| |A| |T| | | | |A| |T| | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
Figure 4: DF Election Capabilities | Figure 4: Bitmap Field in the DF Election Extended Community | |||
Bit 3: Time Synchronization (corresponds to Bit 27 of the DF | Bit 3: Time Synchronization (corresponds to Bit 27 of the DF | |||
Election Extended Community). When set to 1, it indicates the | Election Extended Community). When set to 1, it indicates the | |||
desire to use Time Synchronization capability with the rest of the | desire to use the Time Synchronization capability with the rest of | |||
PEs in the Ethernet Segment. | the PEs in the Ethernet Segment. | |||
This capability is utilized in conjunction with the agreed-upon DF | This capability is utilized in conjunction with the agreed-upon DF | |||
Election Type. For instance, if all the PE devices in the Ethernet | Election Type. For instance, if all the PE devices in the Ethernet | |||
Segment indicate the desire to use the Time Synchronization | Segment indicate the desire to use the Time Synchronization | |||
capability and request the DF Election Type to be the HRW, then the | capability and request the DF Election Type to be the HRW, then the | |||
HRW algorithm is used in conjunction with this capability. A PE that | HRW algorithm is used in conjunction with this capability. A PE that | |||
does not support the procedures set out in this document or that | does not support the procedures set out in this document or that | |||
receives a route from another PE in which the capability is not set | receives a route from another PE in which the capability is not set | |||
MUST NOT delay DF election as this could lead to duplicate traffic in | MUST NOT delay DF election as this could lead to duplicate traffic in | |||
some instances (overlapping DFs). | some instances (overlapping DFs). | |||
skipping to change at line 419 ¶ | skipping to change at line 423 ¶ | |||
accompanying forwarding updates to the DF and NDF states are also | accompanying forwarding updates to the DF and NDF states are also | |||
deferred. | deferred. | |||
Item 9 in Section 2.1 of [RFC8584], in the list "Corresponding | Item 9 in Section 2.1 of [RFC8584], in the list "Corresponding | |||
actions when transitions are performed or states are entered/exited", | actions when transitions are performed or states are entered/exited", | |||
is changed as follows: | is changed as follows: | |||
| 9. DF_CALC on CALCULATED: Mark the election result for the VLAN | | 9. DF_CALC on CALCULATED: Mark the election result for the VLAN | |||
| or VLAN bundle. | | or VLAN bundle. | |||
| | | | |||
| 9.1 If an SCT timestamp is present during the RCVD_ES event | | 9.1 If no Service Carving Time is present during the RCVD_ES | |||
| of Action 11, wait until the time indicated by the SCT | | event of Action 11, proceed to step 9.4 | |||
| minus skew before proceeding to step 9.3. | ||||
| | | | |||
| 9.2 If an SCT timestamp is present during the RCVD_ES event | | 9.2 If a Service Carving Time is present during the RCVD_ES | |||
| of Action 11, wait until the time indicated by the SCT | | event of Action 11, wait until the time indicated by the | |||
| before proceeding to step 9.4. | | SCT minus skew before proceeding to step 9.3. | |||
| | | | |||
| 9.3 Assume the role of NDF for the local PE concerning the | | 9.3 Assume the role of NDF for the local PE concerning the | |||
| VLAN or VLAN bundle and transition to the DF_DONE state. | | VLAN or VLAN bundle. Wait the remaining skew time before | |||
| proceeding to step 9.4. | ||||
| | | | |||
| 9.4 Assume the role of DF for the local PE concerning the | | 9.4 Assume the election result's role (DF or NDF) for the | |||
| VLAN or VLAN bundle and transition to the DF_DONE state. | | local PE concerning the VLAN or VLAN bundle and | |||
| transition to the DF_DONE state. | ||||
This revised approach ensures proper timing and synchronization in | This revised approach ensures proper timing and synchronization in | |||
the DF election process, avoiding conflicts and ensuring accurate | the DF election process, avoiding conflicts and ensuring accurate | |||
forwarding updates. | forwarding updates. | |||
3. Synchronization Scenarios | 3. Synchronization Scenarios | |||
Consider Figure 1 as an example, where initially PE2 has failed and | Consider Figure 1 as an example, where initially PE2 has failed and | |||
PE1 has taken over. This scenario illustrates the problem with the | PE1 has taken over. This scenario illustrates the problem with the | |||
DF Election mechanism described in Section 8.5 of [RFC7432], | DF Election mechanism described in Section 8.5 of [RFC7432], | |||
skipping to change at line 502 ¶ | skipping to change at line 507 ¶ | |||
the following: | the following: | |||
* DF-to-NDF Transition(s): at t=SCT minus skew, where both PEs are | * DF-to-NDF Transition(s): at t=SCT minus skew, where both PEs are | |||
NDF for the skew duration. | NDF for the skew duration. | |||
* NDF-to-DF Transition(s): at t=SCT. | * NDF-to-DF Transition(s): at t=SCT. | |||
This split behavior ensures a smooth DF role transition with minimal | This split behavior ensures a smooth DF role transition with minimal | |||
loss. | loss. | |||
Using the SCT approach, the negative effect of the timer to allow the | The SCT approach mitigates the negative effect of requiring a timer | |||
reception of Ethernet Segment (ES) RT-4 from other PE nodes is | for discovery of Ethernet Segment (ES) RT-4 from other PE nodes. | |||
mitigated. Furthermore, the BGP transmission delay (from PE2 to PE1) | Furthermore, the BGP transmission delay (from PE2 to PE1) of the ES | |||
of the ES RT-4 becomes a non-issue. The SCT approach shortens the | RT-4 becomes a non-issue. The SCT approach shortens the 3-second | |||
3-second timer window to the order of milliseconds. | timer window to the order of milliseconds. | |||
The peering timer is a configurable value where 3 seconds represents | The peering timer is a configurable value where 3 seconds represents | |||
the default. Configuring a timer value of 0, or so small as to | the default. Configuring a timer value of 0, or so small as to | |||
expire during propagation of the BGP routes, is outside the scope of | expire during propagation of the BGP routes, is outside the scope of | |||
this document. In reality, the use of the SCT approach presented in | this document. In reality, the use of the SCT approach presented in | |||
this document encourages the use of larger peering timer values to | this document encourages the use of larger peering timer values to | |||
overcome any sort of BGP route propagation delays. | overcome any sort of BGP route propagation delays. | |||
3.1. Concurrent Recoveries | 3.1. Concurrent Recoveries | |||
skipping to change at line 709 ¶ | skipping to change at line 714 ¶ | |||
Authors' Addresses | Authors' Addresses | |||
Patrice Brissette | Patrice Brissette | |||
Cisco | Cisco | |||
Email: pbrisset@cisco.com | Email: pbrisset@cisco.com | |||
Ali Sajassi | Ali Sajassi | |||
Cisco | Cisco | |||
Email: sajassi@cisco.com | Email: sajassi@cisco.com | |||
Luc Andre Burdet (editor) | Luc André Burdet (editor) | |||
Cisco | Cisco | |||
Email: lburdet@cisco.com | Email: lburdet@cisco.com | |||
John Drake | John Drake | |||
Independent | Independent | |||
Email: je_drake@yahoo.com | Email: je_drake@yahoo.com | |||
Jorge Rabadan | Jorge Rabadan | |||
Nokia | Nokia | |||
Email: jorge.rabadan@nokia.com | Email: jorge.rabadan@nokia.com | |||
End of changes. 19 change blocks. | ||||
37 lines changed or deleted | 42 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. |