rfc9957v2.txt   rfc9957.txt 
skipping to change at line 70 skipping to change at line 70
1.1. Document Roadmap 1.1. Document Roadmap
1.2. Terminology 1.2. Terminology
2. Approach (In Brief) 2. Approach (In Brief)
2.1. Mechanism 2.1. Mechanism
2.2. Policy 2.2. Policy
2.2.1. Policy Conditions 2.2.1. Policy Conditions
2.2.2. Policy Action 2.2.2. Policy Action
3. Necessary Flow Behaviour 3. Necessary Flow Behaviour
4. Pseudocode Walk-Through 4. Pseudocode Walk-Through
4.1. Input Parameters, Constants, and Variables 4.1. Input Parameters, Constants, and Variables
4.2. Queue Protection Data Path 4.2. QProt Data Path
4.2.1. The qprotect() Function 4.2.1. The qprotect() Function
4.2.2. The pick_bucket() Function 4.2.2. The pick_bucket() Function
4.2.3. The fill_bucket() Function 4.2.3. The fill_bucket() Function
4.2.4. The calcProbNative() Function 4.2.4. The calcProbNative() Function
5. Rationale 5. Rationale
5.1. Rationale: Blame for Queuing, Not for Rate in Itself 5.1. Rationale: Blame for Queuing, Not for Rate in Itself
5.2. Rationale: Constant Aging of the Queuing Score 5.2. Rationale: Constant Aging of the Queuing Score
5.3. Rationale: Transformed Queuing Score 5.3. Rationale: Transformed Queuing Score
5.4. Rationale: Policy Conditions 5.4. Rationale: Policy Conditions
5.5. Rationale: Reclassification as the Policy Action 5.5. Rationale: Reclassification as the Policy Action
skipping to change at line 110 skipping to change at line 110
Although the algorithm is defined in Annex P of [DOCSIS], it relies Although the algorithm is defined in Annex P of [DOCSIS], it relies
on cross references to other parts of the set of specifications. on cross references to other parts of the set of specifications.
This document pulls all the strands together into one self-contained This document pulls all the strands together into one self-contained
document. The core of the document is a similar pseudocode walk- document. The core of the document is a similar pseudocode walk-
through to that in the DOCSIS specification, but it also includes through to that in the DOCSIS specification, but it also includes
additional material: additional material:
i. a brief overview, i. a brief overview,
ii. a definition of how a data sender needs to behave to avoid ii. a definition of how a data sender needs to behave to avoid
triggering Queue Protection, and triggering queue protection, and
iii. a section giving the rationale for the design choices. iii. a section giving the rationale for the design choices.
Low queuing delay depends on hosts sending their data smoothly, Low queuing delay depends on hosts sending their data smoothly,
either at a low rate or responding to Explicit Congestion either at a low rate or responding to Explicit Congestion
Notification (ECN) (see [RFC8311] and [RFC9331]). Therefore, low- Notification (ECN) (see [RFC8311] and [RFC9331]). Therefore, low-
latency queuing is something hosts create themselves, not something latency queuing is something hosts create themselves, not something
the network gives them. This tends to ensure that self-interest the network gives them. This tends to ensure that self-interest
alone does not drive flows to mismark their packets for the low- alone does not drive flows to mismark their packets for the low-
latency queue. However, traffic from an application that does not latency queue. However, traffic from an application that does not
skipping to change at line 211 skipping to change at line 211
contained within packets of a flow that have the Congestion contained within packets of a flow that have the Congestion
Experienced (CE) codepoint set in the IP-ECN field [RFC3168] Experienced (CE) codepoint set in the IP-ECN field [RFC3168]
(including IP headers unless specified otherwise). Congestion- (including IP headers unless specified otherwise). Congestion-
bit-rate and congestion-volume were introduced in [RFC7713] and bit-rate and congestion-volume were introduced in [RFC7713] and
[RFC6789]. [RFC6789].
DOCSIS: Data-Over-Cable System Interface Specification. "DOCSIS" is DOCSIS: Data-Over-Cable System Interface Specification. "DOCSIS" is
a registered trademark of Cable Television Laboratories, Inc. a registered trademark of Cable Television Laboratories, Inc.
("CableLabs"). ("CableLabs").
QProt: The DOCSIS Queue Protection function
non-queue-building: A flow that tends not to build a queue. non-queue-building: A flow that tends not to build a queue.
Note the use of lowercase "non-queue-building", which describes Note the use of lowercase "non-queue-building", which describes
the behaviour of a flow, in contrast to uppercase Non-Queue- the behaviour of a flow, in contrast to uppercase Non-Queue-
Building (NQB), which refers to a Diffserv marking [RFC9956]. A Building (NQB), which refers to a Diffserv marking [RFC9956]. A
flow identifying itself as uppercase Non-Queue-Building may not flow identifying itself as uppercase NQB may not behave in a non-
behave in a non-queue-building way, which is what the QProt queue-building way, which is what the QProt algorithm is intended
algorithm is intended to detect. to detect.
queue-building: A flow that builds a queue. If it is classified queue-building: A flow that builds a queue. If it is classified
into the Low-Latency queue, it is therefore a candidate for the into the Low-Latency queue, it is therefore a candidate for the
Queue Protection algorithm to detect and sanction. QProt algorithm to detect and sanction.
ECN: Explicit Congestion Notification [RFC3168] ECN: Explicit Congestion Notification [RFC3168]
ECN marking or ECN signalling: Setting the IP-ECN field of an ECN marking or ECN signalling: Setting the IP-ECN field of an
increasing proportion of packets to the Congestion Experienced increasing proportion of packets to the Congestion Experienced
(CE) codepoint [RFC3168] as queuing worsens. (CE) codepoint [RFC3168] as queuing worsens.
QProt: The Queue Protection function
2. Approach (In Brief) 2. Approach (In Brief)
The QProt algorithm is divided into mechanism and policy. There is The QProt algorithm is divided into mechanism and policy. There is
only a tiny amount of policy code, but policy might need to be only a tiny amount of policy code, but policy might need to be
changed in the future. So, where hardware implementation is being changed in the future. So, where hardware implementation is being
considered, it would be advisable to implement the policy aspects in considered, it would be advisable to implement the policy aspects in
firmware or software: firmware or software:
* The mechanism aspects identify flows, maintain flow state, and * The mechanism aspects identify flows, maintain flow state, and
accumulate per-flow queuing scores; accumulate per-flow queuing scores;
skipping to change at line 287 skipping to change at line 287
accumulates faster: accumulates faster:
* the higher the degree of queuing and * the higher the degree of queuing and
* the faster the flow's packets arrive when there is queuing. * the faster the flow's packets arrive when there is queuing.
Section 5.1 explains further why this score represents blame for Section 5.1 explains further why this score represents blame for
queuing. queuing.
The algorithm, as described so far, would accumulate a number that The algorithm, as described so far, would accumulate a number that
would rise at the so-called Congestion-rate of the flow (see would rise at the so-called congestion-rate of the flow (see
Section 1.2), i.e., the rate at which the flow is contributing to Section 1.2), i.e., the rate at which the flow is contributing to
congestion or the rate at which the AQM is forwarding bytes of the congestion or the rate at which the AQM is forwarding bytes of the
flow that are ECN-marked. However, rather than growing continually, flow that are ECN-marked. However, rather than growing continually,
the queuing score is also reduced (or "aged") at a constant rate. the queuing score is also reduced (or "aged") at a constant rate.
This is because it is unavoidable for capacity-seeking flows to This is because it is unavoidable for capacity-seeking flows to
induce a continuous low level of congestion as they track available induce a continuous low level of congestion as they track available
capacity. Section 5.2 explains why this allowance can be set to the capacity. Section 5.2 explains why this allowance can be set to the
same constant for any scalable flow, whatever its bit rate. same constant for any scalable flow, whatever its bit rate.
For implementation efficiency, the queuing score is transformed into For implementation efficiency, the queuing score is transformed into
skipping to change at line 334 skipping to change at line 334
low-latency queue is to reclassify a packet into the Classic queue low-latency queue is to reclassify a packet into the Classic queue
(this is justified in Section 5.5). (this is justified in Section 5.5).
3. Necessary Flow Behaviour 3. Necessary Flow Behaviour
The QProt algorithm described here can be used for responsive and/or The QProt algorithm described here can be used for responsive and/or
unresponsive flows. unresponsive flows.
* It is possible to objectively describe the least responsive way * It is possible to objectively describe the least responsive way
that a flow will need to respond to congestion signals in order to that a flow will need to respond to congestion signals in order to
avoid triggering Queue Protection, no matter the link capacity and avoid triggering queue protection, no matter the link capacity and
no matter how much other traffic there is. no matter how much other traffic there is.
* It is not possible to describe how fast or smooth an unresponsive * It is not possible to describe how fast or smooth an unresponsive
flow should be to avoid Queue Protection because this depends on flow should be to avoid queue protection because this depends on
how much other traffic there is and the capacity of the link, how much other traffic there is and the capacity of the link,
which an application is unable to know. However, the more which an application is unable to know. However, the more
smoothly an unresponsive flow paces its packets and the lower its smoothly an unresponsive flow paces its packets and the lower its
rate relative to typical broadband link capacities, the less rate relative to typical broadband link capacities, the less
likelihood that it will risk causing enough queueing to trigger likelihood that it will risk causing enough queueing to trigger
Queue Protection. queue protection.
Responsive low-latency flows can use a Low Latency, Low Loss, and Responsive low-latency flows can use a Low Latency, Low Loss, and
Scalable throughput (L4S) ECN codepoint [RFC9331] to get classified Scalable throughput (L4S) ECN codepoint [RFC9331] to get classified
into the low-latency queue. into the low-latency queue.
A sender can arrange for flows that are smooth but do not respond to A sender can arrange for flows that are smooth but do not respond to
ECN marking to be classified into the low-latency queue by using the ECN marking to be classified into the low-latency queue by using the
Non-Queue-Building (NQB) Diffserv codepoint [RFC9956], which the Non-Queue-Building (NQB) Diffserv codepoint [RFC9956], which the
DOCSIS specifications support, or an operator can use various other DOCSIS specifications support, or an operator can use various other
local classifiers. local classifiers.
skipping to change at line 370 skipping to change at line 370
The algorithm that calculates this internal variable is run on the The algorithm that calculates this internal variable is run on the
arrival of every LL packet, whether or not it is ECN-capable, so that arrival of every LL packet, whether or not it is ECN-capable, so that
it can be used by the QProt algorithm. But the variable is only used it can be used by the QProt algorithm. But the variable is only used
to ECN-mark packets that are ECN-capable. to ECN-mark packets that are ECN-capable.
Not only does this dual use of the variable improve processing Not only does this dual use of the variable improve processing
efficiency, but it also makes the basis of the QProt algorithm efficiency, but it also makes the basis of the QProt algorithm
visible and transparent, at least for responsive ECN-capable flows. visible and transparent, at least for responsive ECN-capable flows.
Then, it is possible to state objectively that a flow can avoid Then, it is possible to state objectively that a flow can avoid
triggering queue protection by keeping the bit rate of ECN-marked triggering queue protection by keeping the bit rate of ECN-marked
packets (the Congestion-rate) below AGING, which is a configured packets (the congestion-rate) below AGING, which is a configured
constant of the algorithm (default 2^19 B/s ≈ 4 Mb/s). Note that it constant of the algorithm (default 2^19 B/s ≈ 4 Mb/s). Note that it
is in a congestion controller's own interest to keep its average is in a congestion controller's own interest to keep its average
Congestion-rate well below this level (e.g., ~1 Mb/s) to ensure that congestion-rate well below this level (e.g., ~1 Mb/s) to ensure that
it does not trigger queue protection during transient dynamics. it does not trigger queue protection during transient dynamics.
If the QProt algorithm is used in other settings, it would still need If the QProt algorithm is used in other settings, it would still need
to be based on the visible level of congestion signalling, in a to be based on the visible level of congestion signalling, in a
similar way to the DOCSIS approach. Without transparency of the similar way to the DOCSIS approach. Without transparency of the
basis of the algorithm's decisions, end-systems would not be able to basis of the algorithm's decisions, end-systems would not be able to
avoid triggering Queue Protection on an objective basis. avoid triggering queue protection on an objective basis.
4. Pseudocode Walk-Through 4. Pseudocode Walk-Through
4.1. Input Parameters, Constants, and Variables 4.1. Input Parameters, Constants, and Variables
The operator input parameters that set the parameters in the first The operator input parameters that set the parameters in the first
two blocks of pseudocode below are defined for cable modems (CMs) in two blocks of pseudocode below are defined for cable modems (CMs) in
[DOCSIS-CM-OSS] and for CMTSs in [DOCSIS-CCAP-OSS]. Then, further [DOCSIS-CM-OSS] and for CMTSs in [DOCSIS-CCAP-OSS]. Then, further
constants are either derived from the input parameters or hard-coded. constants are either derived from the input parameters or hard-coded.
skipping to change at line 403 skipping to change at line 403
latest DOCSIS specifications are the definitive references. Also, latest DOCSIS specifications are the definitive references. Also,
any operator might set certain parameters to non-default values. any operator might set certain parameters to non-default values.
<CODE BEGINS> <CODE BEGINS>
// Input Parameters // Input Parameters
MAX_RATE; // Configured maximum sustained rate [b/s] MAX_RATE; // Configured maximum sustained rate [b/s]
QPROTECT_ON; // Queue Protection is enabled [Default: TRUE] QPROTECT_ON; // Queue Protection is enabled [Default: TRUE]
CRITICALqL_us; // LL queue threshold delay [µs] Default: MAXTH_us CRITICALqL_us; // LL queue threshold delay [µs] Default: MAXTH_us
CRITICALqLSCORE_us;// The threshold queuing score [Default: 4000 µs] CRITICALqLSCORE_us;// The threshold queuing score [Default: 4000 µs]
LG_AGING; // The aging rate of the q'ing score [Default: 19] LG_AGING; // The aging rate of the q'ing score [Default: 19]
// as log base 2 of the Congestion-rate [lg(B/s)] // as log base 2 of the congestion-rate [lg(B/s)]
// Input Parameters for the calcProbNative() algorithm: // Input Parameters for the calcProbNative() algorithm:
MAXTH_us; // Max LL AQM marking threshold [Default: 1000 µs] MAXTH_us; // Max LL AQM marking threshold [Default: 1000 µs]
LG_RANGE; // Log base 2 of the range of ramp [lg(ns)] LG_RANGE; // Log base 2 of the range of ramp [lg(ns)]
// Default: 2^19 = 524288 ns (roughly 525 µs) // Default: 2^19 = 524288 ns (roughly 525 µs)
<CODE ENDS> <CODE ENDS>
<CODE BEGINS> <CODE BEGINS>
// Constants, either derived from input parameters or hard-coded // Constants, either derived from input parameters or hard-coded
T_RES; // Resolution of t_exp [ns] T_RES; // Resolution of t_exp [ns]
skipping to change at line 500 skipping to change at line 500
Payload (ESP) [RFC4303]. Payload (ESP) [RFC4303].
The Microflow Classification section of the Queue Protection Annex of The Microflow Classification section of the Queue Protection Annex of
the DOCSIS specification [DOCSIS] defines various strategies to find the DOCSIS specification [DOCSIS] defines various strategies to find
these headers by skipping extension headers or encapsulations. If these headers by skipping extension headers or encapsulations. If
they cannot be found, the specification defines various less-specific they cannot be found, the specification defines various less-specific
3-tuples that would be used. The DOCSIS specification should be 3-tuples that would be used. The DOCSIS specification should be
referred to for all these strategies, which will not be repeated referred to for all these strategies, which will not be repeated
here. here.
The array of bucket structures defined below is used by all the Queue The array of bucket structures defined below is used by all the QProt
Protection functions: functions:
<CODE BEGINS> <CODE BEGINS>
struct bucket { // The leaky bucket structure to hold per-flow state struct bucket { // The leaky bucket structure to hold per-flow state
id; // identifier (e.g., 5-tuple) of flow using bucket id; // identifier (e.g., 5-tuple) of flow using bucket
t_exp; // expiry time in units of T_RES t_exp; // expiry time in units of T_RES
// (t_exp - now) = flow's transformed q'ing score // (t_exp - now) = flow's transformed q'ing score
}; };
struct bucket buckets[NBUCKETS+1]; struct bucket buckets[NBUCKETS+1];
<CODE ENDS> <CODE ENDS>
4.2. Queue Protection Data Path 4.2. QProt Data Path
All the functions of Queue Protection operate on the data path, All the functions of QProt operate on the data path, driven by packet
driven by packet arrivals. arrivals.
The following functions that maintain per-flow queuing scores and The following functions that maintain per-flow queuing scores and
manage per-flow state are considered primarily as mechanism: manage per-flow state are considered primarily as mechanism:
pick_bucket(uflow_id); // Returns bucket identifier pick_bucket(uflow_id); // Returns bucket identifier
fill_bucket(bucket_id, pkt_size, probNative); /* Returns queuing fill_bucket(bucket_id, pkt_size, probNative); /* Returns queuing
* score */ * score */
calcProbNative(qdelay); /* Returns ECN-marking probability of the calcProbNative(qdelay); /* Returns ECN-marking probability of the
* Native LL AQM */ * Native LL AQM */
skipping to change at line 892 skipping to change at line 892
This suggests that the QProt algorithm will not sanction a well- This suggests that the QProt algorithm will not sanction a well-
behaved scalable flow if it ages out the queuing score at a behaved scalable flow if it ages out the queuing score at a
sufficient constant rate. The constant will need to be somewhat sufficient constant rate. The constant will need to be somewhat
above the average of a well-behaved scalable flow to allow for normal above the average of a well-behaved scalable flow to allow for normal
dynamics. dynamics.
Relating QProt's aging constant to a scalable flow does not mean that Relating QProt's aging constant to a scalable flow does not mean that
a flow has to behave like a scalable flow: it can be less aggressive a flow has to behave like a scalable flow: it can be less aggressive
but not more aggressive. For instance, a longer RTT flow can run at but not more aggressive. For instance, a longer RTT flow can run at
a lower Congestion-rate than the aging rate, but it can also increase a lower congestion-rate than the aging rate, but it can also increase
its aggressiveness to equal the rate of short RTT scalable flows its aggressiveness to equal the rate of short RTT scalable flows
[ScalingCC]. The constant aging of QProt also means that a long- [ScalingCC]. The constant aging of QProt also means that a long-
running unresponsive flow will be prone to trigger QProt if it runs running unresponsive flow will be prone to trigger QProt if it runs
faster than a competing responsive scalable flow would. And, of faster than a competing responsive scalable flow would. And, of
course, if a flow causes excessive queuing in the short term, its course, if a flow causes excessive queuing in the short term, its
queuing score will still rise faster than the constant aging process queuing score will still rise faster than the constant aging process
will decrease it. Then, QProt will still eject the flow's packets will decrease it. Then, QProt will still eject the flow's packets
before they harm the low latency of the shared queue. before they harm the low latency of the shared queue.
5.3. Rationale: Transformed Queuing Score 5.3. Rationale: Transformed Queuing Score
skipping to change at line 928 skipping to change at line 928
AGING * Dt Dt AGING * Dt Dt
Figure 3: Transformation of Queuing Score Figure 3: Transformation of Queuing Score
The accumulation and aging of the queuing score is shown on the left The accumulation and aging of the queuing score is shown on the left
of Figure 3 in token bucket form. Dt is the difference between the of Figure 3 in token bucket form. Dt is the difference between the
times when the scores of the current and previous packets were times when the scores of the current and previous packets were
processed. processed.
A transformed equivalent of this token bucket is shown on the right A transformed equivalent of this token bucket is shown on the right
of Figure 3, dividing both the input and output by the constant AGING of Figure 3, dividing both the input and output by the constant rate,
rate. The result is a bucket-depth that represents time and it AGING. The result is a bucket-depth that represents time and it
drains at the rate that time passes. drains at the rate that time passes.
As a further optimization, the time the bucket was last updated is As a further optimization, the time the bucket was last updated is
not stored with the flow state. Instead, when the bucket is not stored with the flow state. Instead, when the bucket is
initialized, the queuing score is added to the system time "now" and initialized, the queuing score is added to the system time "now" and
the resulting expiry time is written into the bucket. Subsequently, the resulting expiry time is written into the bucket. Subsequently,
if the bucket has not expired, the incremental queuing score is added if the bucket has not expired, the incremental queuing score is added
to the time already held in the bucket. Then, the queuing score to the time already held in the bucket. Then, the queuing score
always represents the expiry time of the flow state itself. This always represents the expiry time of the flow state itself. This
means that the queuing score does not need to be aged explicitly means that the queuing score does not need to be aged explicitly
skipping to change at line 1183 skipping to change at line 1183
attack flows. attack flows.
For an attacker to keep buckets busy, it is more efficient to hold For an attacker to keep buckets busy, it is more efficient to hold
onto them by cycling regularly through a set of port numbers (94 in onto them by cycling regularly through a set of port numbers (94 in
the above example) rather than to keep occupying and releasing the above example) rather than to keep occupying and releasing
buckets with single packet flows across a much larger number of buckets with single packet flows across a much larger number of
ports. ports.
During such an attack, the coupled marking probability will have During such an attack, the coupled marking probability will have
saturated at 100%. Therefore, to hold a bucket, the rate of an attack saturated at 100%. Therefore, to hold a bucket, the rate of an attack
flow needs to be no less than the AGING rate of each bucket: 4 Mb/s flow needs to be no less than the aging rate of each bucket: 4 Mb/s
by default. However, for an attack flow to be sure to hold on to its by default. However, for an attack flow to be sure to hold on to its
bucket, it would need to send somewhat faster. Thus, an attack with bucket, it would need to send somewhat faster. Thus, an attack with
100 flows would need a total force of more than 100 * 4 Mb/s = 400 100 flows would need a total force of more than 100 * 4 Mb/s = 400
Mb/s. Mb/s.
This attack can be mitigated (but not prevented) by increasing the This attack can be mitigated (but not prevented) by increasing the
number of buckets. The required attack force scales linearly with number of buckets. The required attack force scales linearly with
the number of buckets, NBUCKETS. Therefore, if NBUCKETS were doubled the number of buckets, NBUCKETS. Therefore, if NBUCKETS were doubled
to 64, twice as many 4 Mb/s flows would be needed to maintain the to 64, twice as many 4 Mb/s flows would be needed to maintain the
same impact on innocent flows. same impact on innocent flows.
 End of changes. 20 change blocks. 
25 lines changed or deleted 25 lines changed or added

This html diff was produced by rfcdiff 1.48.