Internet-Draft EVPN MH Port-Active November 2021
Brissette, et al. Expires 16 May 2022 [Page]
BESS Working Group
Intended Status:
Standards Track
P. Brissette, Ed.
Cisco Systems
A. Sajassi
Cisco Systems
LA. Burdet, Ed.
Cisco Systems
S. Thoria
Cisco Systems
B. Wen
E. Leyton
Verizon Wireless
J. Rabadan

EVPN multi-homing port-active load-balancing


The Multi-Chassis Link Aggregation Group (MC-LAG) technology enables establishing a logical link-aggregation connection with a redundant group of independent nodes. The purpose of multi-chassis LAG is to provide a solution to achieve higher network availability, while providing different modes of sharing/balancing of traffic. RFC7432 defines EVPN based MC-LAG with single-active and all-active multi‑homing load‑balancing mode. The current draft expands on existing redundancy mechanisms supported by EVPN and introduces support for a new port-active load‑balancing mode.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 16 May 2022.

Table of Contents

1. Introduction

EVPN, as per [RFC7432], provides all-active per flow load‑balancing for multi‑homing. It also defines single-active with service carving mode, where one of the PEs, in redundancy relationship, is active per service.

While these two multi‑homing scenarios are most widely utilized in data center and service provider access networks, there are scenarios where active-standby per interface multi‑homing load‑balancing is useful and required. The main consideration for this mode of load‑balancing is the determinism of traffic forwarding through a specific interface rather than statistical per flow load‑balancing across multiple PEs providing multi‑homing. The determinism provided by active-standby per interface is also required for certain QOS features to work. While using this mode, customers also expect minimized convergence during failures.

A new type of load‑balancing mode, port-active load‑balancing, is defined. This draft describes how the new load‑balancing mode can be supported via EVPN. The new mode may also be referred to as per interface active/standby.

                 | PE3 |
              |  MPLS/IP  |
              |  CORE     |
            +-----+   +-----+
            | PE1 |   | PE2 |
            +-----+   +-----+
               |         |
               I1       I2
                 \     /
                  \   /
Figure 1: MC-LAG Topology

Figure 1 shows a MC-LAG multi‑homing topology where PE1 and PE2 are part of the same redundancy group providing multi‑homing to CE1 via interfaces I1 and I2. Interfaces I1 and I2 are members of a LAG running LACP protocol. The core, shown as IP or MPLS enabled, provides wide range of L2 and L3 services. MC-LAG multi‑homing functionality is decoupled from those services in the core and it focuses on providing multi‑homing to the CE. With per-port active/standby load‑balancing, only one of the two interface I1 or I2 would be in forwarding, the other interface will be in standby. This also implies that all services on the active interface are in active mode and all services on the standby interface operate in standby mode.

1.1. Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

When a CE is multi‑homed to a set of PE nodes using the [802.1AX] Link Aggregation Control Protocol (LACP), the PEs must act as if they were a single LACP speaker for the Ethernet links to form and operate as a Link Aggregation Group (LAG). To achieve this, the PEs connected to the same multi‑homed CE must synchronize LACP configuration and operational data among them. Interchassis Communication Protocol (ICCP) [RFC7275] has been used for that purpose. EVPN LAG simplifies greatly that solution. Along with the simplification comes few assumptions:

Any discrepancies from this list are left for future study. Furthermore, mis-configuration and mis-wiring detection across peering PEs are also left for further study.

3. Port-active Load-balancing Procedure

Following steps describe the proposed procedure with EVPN LAG to support port-active load‑balancing mode:

  1. The Ethernet-Segment Identifier (ESI) MUST be assigned per access interface as described in [RFC7432], which may be auto derived or manually assigned. Access interface MAY be a Layer‑2 or Layer‑3 interface. The usage of ESI over Layer‑3 interfce is newly described in this document.
  2. Ethernet-Segment (ES) MUST be configured in port-active load‑balancing mode on peering PEs for specific access interface.
  3. Peering PEs MAY exchange only Ethernet-Segment (ES) route (Route Type‑4) when ESI is configured on a Layer‑3 interface.
  4. PEs in the redundancy group leverage the DF election defined in [RFC8584] to determine which PE keeps the port in active mode and which one(s) keep it in standby mode. While the DF election defined in [RFC8584] is per [ES, Ethernet Tag] granularity, for port-active mode of multi‑homing, the DF election is done per [ES]. The details of this algorithm are described in Section 4.
  5. DF router MUST keep corresponding access interface in up and forwarding active state for that Ethernet-Segment
  6. Non-DF routers MAY bring and keep peering access interface attached to it in operational down state. If the interface is running LACP protocol, then the non-DF PE MAY also set the LACP state to OOS (Out of Sync) as opposed to interface state down. This allows for better convergence on standby to active transition.
  7. For EVPN-VPWS service, the usage of primary/backup bits of EVPN Layer‑2 attributes extended community [RFC8214] is highly recommended to achieve better convergence.

4. Designated Forwarder Algorithm to Elect per Port-active PE

The ES routes, running in port-active load‑balancing mode, are advertised with a new capability in the DF Election Extended Community as defined in [RFC8584]. Moreover, the ES associated to the port leverages existing procedure of single-active, and signals single-active bit along with Ethernet-AD per-ES route. Finally, as in [RFC7432], the ESI-label based split-horizon procedures should be used to avoid transient echo'ed packets when Layer‑2 circuits are involved.

The various algorithms for DF Election are discussed in Sections 4.2 to 4.4 for completeness, although the choice of algorithm in this solution doesn't affect complexity or performance as in other load-balancing modes.

4.1. Capability Flag

[RFC8584] defines a DF Election extended community, and a Bitmap field to encode "capabilities" to use with the DF election algorithm in the DF algorithm field. Bitmap (2 octets) is extended by the following value:

                         1 1 1 1 1 1
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
    |D|A|     |P|                   |
Figure 2: Amended Bitmap field in the DF Election Extended Community
Bit 0: D bit or 'Don't Preempt' bit', as explained in [I-D.ietf-bess-evpn-pref-df].
Bit 1: AC-DF Capability (AC-Influenced DF election), as explained in [RFC8584].
Bit 5: (corresponds to Bit 29 of the DF Election Extended Community and it is defined by this document): P bit or 'Port Mode' bit (P hereafter), determines that the DF-Algorithm should be modified to consider the port only and not the Ethernet Tags.

4.2. Modulo-based Algorithm

The default DF Election algorithm, or modulus-based algorithm as in [RFC7432] and updated by [RFC8584], is used here, at the granularity of ES only. Given the fact, ES-Import RT community inherits from ESI only byte 1-6, many deployments differentiate ESI within these bytes only. For Modulo calculation, bytes 3‑6 are used to determine the designated forwarder using Modulo-based DF assignment.

4.3. HRW Algorithm

Highest Random Weight (HRW) algorithm defined in [RFC8584] MAY also be used and signaled, and modified to operate at the granularity of [ES] rather than per [ES, VLAN].

[RFC8584] describes computing a 32 bit CRC over the concatenation of Ethernet Tag and ESI. For port-active load‑balancing mode, the Ethernet Tag is simply removed from the CRC computation.

4.4. Preference-based DF Election

When the new capability 'Port-Mode' is signaled, the algorithm is modified to consider the port only and not any associated Ethernet Tags. Furthermore, the "port-based" capability MUST be compatible with the "Don't Preempt" bit. When an interface recovers, a peering PE signaling D-bit will enable non-revertive behaviour at the port level. The AC-DF bit MUST be set to zero. When an AC (sub-interface) goes down, it does not influence the DF election.

5. Convergence considerations

To improve the convergence, upon failure and recovery, when port‑active load‑balancing mode is used, some advanced synchronization between peering PEs may be required. Port-active is challenging in a sense that the "standby" port is in down state. It takes some time to bring a "standby" port in up-state and settle the network. For IRB and L3 services, ARP / ND cache may be synchronized. Moreover, associated VRF tables may also be synchronized. For L2 services, MAC table synchronization may be considered.

Finally, for members of a LAG running LACP the ability to set the "standby" port in "out-of-sync" state a.k.a "warm‑standby" can be leveraged.

5.1. Primary / Backup per Ethernet-Segment

The L2 Info Extended Community MAY be advertised in Ethernet A-D per ES route for fast convergence. Only the P and B bits are relevant to this specification. When advertised, the L2 Info Extended Community SHALL have only P or B bits set and all other bits must be zero. MTU must also be zero. Remote PE receiving optional L2 Info Extended Community on Ethernet A-D per ES routes SHALL consider only P and B bits. P and B bits received on Ethernet A-D per EVI routes per [RFC8214] are overridden.

5.2. Backward Compatibility

Implementations that comply with [RFC7432] or [RFC8214] only (i.e., implementations that predate this specification) will not advertise the L2 Info Extended Community in Ethernet A-D per ES routes. That means that all remote PEs in the ES will not receive P and B bit per ES and will continue to receive and honour the P and B bits received in Ethernet A-D per EVI route(s). Similarly, an implementation that complies with [RFC7432] or [RFC8214] only and that receives a L2 Info Extended Community will ignore it and will continue to use the default path resolution algorithm.

6. Applicability

A common deployment is to provide L2 or L3 service on the PEs providing multi‑homing. The services could be any L2 EVPN such as EVPN VPWS, EVPN [RFC7432], etc. L3 service could be in VPN context [RFC4364] or in global routing context. When a PE provides first hop routing, EVPN IRB could also be deployed on the PEs. The mechanism defined in this draft is used between the PEs providing the L2 and/or L3 service, when the requirement is to use per port active.

A possible alternate solution is the one described in this draft is MC-LAG with ICCP [RFC7275] active-standby redundancy. However, ICCP requires LDP to be enabled as a transport of ICCP messages. There are many scenarios where LDP is not required e.g. deployments with VXLAN or SRv6. The solution defined in this draft with EVPN does not mandate the need to use LDP or ICCP and is independent of the underlay encapsulation.

7. Overall Advantages

The use of port-active multi‑homing brings the following benefits to EVPN networks:

  1. Open standards based per interface single-active load‑balancing mechanism that eliminates the need to run ICCP and LDP.
  2. Agnostic of underlay technology (MPLS, VXLAN, SRv6) and associated services (L2, L3, Bridging, E-LINE, etc).
  3. Provides a way to enable deterministic QOS over MC-LAG attachment circuits.
  4. Fully compliant with [RFC7432], does not require any new protocol enhancement to existing EVPN RFCs.
  5. Can leverage various DF election algorithms e.g. modulo, HRW, etc.
  6. Replaces legacy MC-LAG ICCP-based solution, and offers following additional benefits:

    • Efficiently supports 1+N redundancy mode (with EVPN using BGP RR) where as ICCP requires full mesh of LDP sessions among PEs in redundancy group.
    • Fast convergence with mass-withdraw is possible with EVPN, no equivalent in ICCP.
  7. Customers want per interface single-active load‑balancing, but don't want to enable LDP (e.g. they may be running VXLAN or SRv6 in the network). Currently there is no alternative to this.

8. IANA Considerations

This document solicits the allocation of the following values:

9. Security Considerations

The same Security Considerations described in [RFC7432] are valid for this document.

10. Acknowledgements

The authors thank Anoop Ghanwani for his comments and suggestions.

11. References

11.1. Normative References

Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <>.
Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, , <>.
Boutros, S., Sajassi, A., Salam, S., Drake, J., and J. Rabadan, "Virtual Private Wire Service Support in Ethernet VPN", RFC 8214, DOI 10.17487/RFC8214, , <>.
Rabadan, J., Ed., Mohanty, S., Ed., Sajassi, A., Drake, J., Nagaraj, K., and S. Sathappan, "Framework for Ethernet VPN Designated Forwarder Election Extensibility", RFC 8584, DOI 10.17487/RFC8584, , <>.

11.2. Informative References

Rabadan, J., Sathappan, S., Przygienda, T., Lin, W., Drake, J., Sajassi, A., and, "Preference-based EVPN DF Election", Work in Progress, Internet-Draft, draft-ietf-bess-evpn-pref-df-07, , <>.
Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, , <>.
Martini, L., Salam, S., Sajassi, A., Bocci, M., Matsushima, S., and T. Nadeau, "Inter-Chassis Communication Protocol for Layer 2 Virtual Private Network (L2VPN) Provider Edge (PE) Redundancy", RFC 7275, DOI 10.17487/RFC7275, , <>.

Authors' Addresses

Patrice Brissette (editor)
Cisco Systems
Ottawa ON
Ali Sajassi
Cisco Systems
United States of America
Luc Andre Burdet (editor)
Cisco Systems
Samir Thoria
Cisco Systems
United States of America
Bin Wen
United States of America
Edward Leyton
Verizon Wireless
United States of America
Jorge Rabadan
United States of America