rfc9969v1.txt   rfc9969.txt 
Internet Architecture Board (IAB) M. Nottingham Internet Architecture Board (IAB) M. Nottingham
Request for Comments: 9969 Request for Comments: 9969
Category: Informational S. Krishnan Category: Informational S. Krishnan
ISSN: 2070-1721 April 2026 ISSN: 2070-1721 May 2026
IAB AI-CONTROL Workshop Report Report from the IAB Workshop on AI-CONTROL
Abstract Abstract
The AI-CONTROL Workshop was convened by the Internet Architecture The AI-CONTROL Workshop was convened by the Internet Architecture
Board (IAB) in September 2024. This report summarizes its Board (IAB) in September 2024. This report summarizes its
significant points of discussion and identifies topics that may significant points of discussion and identifies topics that may
warrant further consideration and work. warrant further consideration and work.
Note that this document is a report on the proceedings of the Note that this document is a report on the proceedings of the
workshop. The views and positions documented in this report are workshop. The views and positions documented in this report are
skipping to change at line 88 skipping to change at line 88
1. Introduction 1. Introduction
The Internet Architecture Board (IAB) holds occasional workshops The Internet Architecture Board (IAB) holds occasional workshops
designed to consider long-term issues and strategies for the designed to consider long-term issues and strategies for the
Internet, and to suggest future directions for the Internet Internet, and to suggest future directions for the Internet
architecture. This long-term planning function of the IAB is architecture. This long-term planning function of the IAB is
complementary to the ongoing engineering efforts performed by working complementary to the ongoing engineering efforts performed by working
groups of the Internet Engineering Task Force (IETF). groups of the Internet Engineering Task Force (IETF).
The Internet is one of the major sources of data used to train large The Internet is one of the major sources of data used to train Large
language models (Large Language Models (LLMs) or, more generally, Language Models (LLMs) (or, more generally, Artificial Intelligence
Artificial Intelligence (AI)). Because this use was not envisioned (AI)). Because this use was not envisioned by most publishers of
by most publishers of information on the Internet, a means of information on the Internet, a means of expressing the owners'
expressing the owners' preferences regarding AI crawling has emerged, preferences regarding AI crawling has emerged, sometimes backed by
sometimes backed by law (e.g., in the European Union's AI Act law (e.g., in the European Union's AI Act [AI-ACT]).
[AI-ACT]).
The IAB convened the AI-CONTROL Workshop on 19-20 September 2024 to The IAB convened the AI-CONTROL Workshop on 19-20 September 2024 to
"explore practical opt-out mechanisms for AI and build an "explore practical opt-out mechanisms for AI and build an
understanding of use cases, requirements, and other considerations in understanding of use cases, requirements, and other considerations in
this space" [CFP]. In particular, the emerging practice of using the this space" [CFP]. In particular, the emerging practice of using the
Robots Exclusion Protocol [RFC9309] -- also known as "robots.txt" -- Robots Exclusion Protocol [RFC9309] -- also known as "robots.txt" --
has not been coordinated between AI crawlers, resulting in has not been coordinated between AI crawlers, resulting in
considerable differences in how they treat it. Furthermore, considerable differences in how they treat it. Furthermore,
robots.txt may or may not be a suitable way to control AI crawlers. robots.txt may or may not be a suitable way to control AI crawlers.
However, discussion was not limited to consideration of robots.txt, However, discussion was not limited to consideration of robots.txt,
skipping to change at line 167 skipping to change at line 166
whole industries. whole industries.
However, there was quick agreement that both viewpoints were harmed However, there was quick agreement that both viewpoints were harmed
by the current state of AI opt-out -- a situation where "no one is by the current state of AI opt-out -- a situation where "no one is
better off" (in the words of one participant). better off" (in the words of one participant).
Much of that dysfunction was attributed to the lack of coordination Much of that dysfunction was attributed to the lack of coordination
and standards for AI opt-out. Currently, content publishers need to and standards for AI opt-out. Currently, content publishers need to
consult with each AI vendor to understand how to opt out of training consult with each AI vendor to understand how to opt out of training
their products, as there is significant variance in each vendor's their products, as there is significant variance in each vendor's
behavior. Furthermore, publishers need to continually monitor for behavior. Furthermore, publishers need to continually monitor both
both new vendors and changes to the policies of the vendors they are new vendors and policy updates from the vendors they are aware of.
aware of.
Underlying those immediate issues, however, are significant Underlying those immediate issues, however, are significant
constraints that could be attributed to uncertainties in the legal constraints that could be attributed to uncertainties in the legal
context, the nature of AI, and the implications of needing to opt out context, the nature of AI, and the implications of needing to opt out
of crawling for it. of crawling for it.
2.1. Crawl Time vs. Inference Time 2.1. Crawl Time vs. Inference Time
Perhaps most significant is the "crawl time vs. inference time" Perhaps most significant is the "crawl time vs. inference time"
problem. Statements of preference are apparent at crawl time, bound problem. Statements of preference are apparent at crawl time, bound
skipping to change at line 244 skipping to change at line 242
Compounding this issue was the observation that preferences change Compounding this issue was the observation that preferences change
over time, whereas LLMs are created over long time frames and cannot over time, whereas LLMs are created over long time frames and cannot
easily be updated to reflect those changes. Of particular concern to easily be updated to reflect those changes. Of particular concern to
some was how this makes an opt-out regime "stickier" because content some was how this makes an opt-out regime "stickier" because content
that has no associated preference (such as that which predates the that has no associated preference (such as that which predates the
authors' knowledge of LLMs) is allowed to be used for these authors' knowledge of LLMs) is allowed to be used for these
unforeseen purposes. unforeseen purposes.
2.2. Trust 2.2. Trust
This disconnection between the statement of preferences and its Participants felt that the disconnection between the statement of
application was felt by participants to contribute to a lack of trust preferences and its application contribute to a lack of trust in the
in the ecosystem, along with the typical lack of attribution for data ecosystem, along with the typical lack of attribution for data
sources in LLMs, lack of an incentive for publishers to contribute sources in LLMs, a lack of an incentive for publishers to contribute
data, and finally (and most noted) lack of any means of monitoring data, and finally (and most noted) a lack of any means of monitoring
compliance with preferences. compliance with preferences.
This lack of trust led some participants to question whether This lack of trust led some participants to question whether
communicating preferences is sufficient in all cases without an communicating preferences is sufficient in all cases without an
accompanying way to enforce them, or even to audit adherence to them. accompanying way to enforce them, or even to audit adherence to them.
Some participants also indicated that a lack of trust was the primary Some participants also indicated that a lack of trust was the primary
cause of the increasingly prevalent blocking of AI crawler IP cause of the increasingly prevalent blocking of AI crawler IP
addresses, among other measures. addresses, among other measures.
2.3. Attachment 2.3. Attachment
skipping to change at line 301 skipping to change at line 299
2.3.2. Embedding 2.3.2. Embedding
Another mechanism for associating preferences with content is to Another mechanism for associating preferences with content is to
embed them into the content itself. Many formats used on the embed them into the content itself. Many formats used on the
Internet allow this; for example, HTML has the <meta> tag, images Internet allow this; for example, HTML has the <meta> tag, images
have Extensible Metadata Platform (XMP) and similar metadata have Extensible Metadata Platform (XMP) and similar metadata
sections, and XML and JSON have rich potential for extensions to sections, and XML and JSON have rich potential for extensions to
carry such data. carry such data.
Embedded preferences were seen to have the advantage of granularity, Embedded preferences were seen to have the advantage of granularity,
and of "traveling with" content as it is produced, when it is moved and of "traveling with" content as it is produced, when the content
from site to site or when it is stored offline. that embeds the preferences is moved from site to site or when it is
stored offline.
However, several participants pointed out that embedded preferences However, several participants pointed out that embedded preferences
are easily stripped from most formats. This is a common practice for are easily stripped from most formats. This is a common practice for
reducing the size of a file (thereby improving performance when reducing the size of a file (thereby improving performance when
downloading it) and for assuring privacy (since metadata often leaks downloading it) and for assuring privacy (since metadata often leaks
information unintentionally). information unintentionally).
Furthermore, some types of content are not suitable for embedding. Furthermore, some types of content are not suitable for embedding.
For example, it is not possible to embed preferences into purely For example, it is not possible to embed preferences into purely
textual content, and web pages with content from several producers textual content, and web pages with content from several producers
 End of changes. 6 change blocks. 
19 lines changed or deleted 18 lines changed or added

This html diff was produced by rfcdiff 1.48.