| rfc9969v1.txt | rfc9969.txt | |||
|---|---|---|---|---|
| Internet Architecture Board (IAB) M. Nottingham | Internet Architecture Board (IAB) M. Nottingham | |||
| Request for Comments: 9969 | Request for Comments: 9969 | |||
| Category: Informational S. Krishnan | Category: Informational S. Krishnan | |||
| ISSN: 2070-1721 April 2026 | ISSN: 2070-1721 May 2026 | |||
| IAB AI-CONTROL Workshop Report | Report from the IAB Workshop on AI-CONTROL | |||
| Abstract | Abstract | |||
| The AI-CONTROL Workshop was convened by the Internet Architecture | The AI-CONTROL Workshop was convened by the Internet Architecture | |||
| Board (IAB) in September 2024. This report summarizes its | Board (IAB) in September 2024. This report summarizes its | |||
| significant points of discussion and identifies topics that may | significant points of discussion and identifies topics that may | |||
| warrant further consideration and work. | warrant further consideration and work. | |||
| Note that this document is a report on the proceedings of the | Note that this document is a report on the proceedings of the | |||
| workshop. The views and positions documented in this report are | workshop. The views and positions documented in this report are | |||
| skipping to change at line 88 ¶ | skipping to change at line 88 ¶ | |||
| 1. Introduction | 1. Introduction | |||
| The Internet Architecture Board (IAB) holds occasional workshops | The Internet Architecture Board (IAB) holds occasional workshops | |||
| designed to consider long-term issues and strategies for the | designed to consider long-term issues and strategies for the | |||
| Internet, and to suggest future directions for the Internet | Internet, and to suggest future directions for the Internet | |||
| architecture. This long-term planning function of the IAB is | architecture. This long-term planning function of the IAB is | |||
| complementary to the ongoing engineering efforts performed by working | complementary to the ongoing engineering efforts performed by working | |||
| groups of the Internet Engineering Task Force (IETF). | groups of the Internet Engineering Task Force (IETF). | |||
| The Internet is one of the major sources of data used to train large | The Internet is one of the major sources of data used to train Large | |||
| language models (Large Language Models (LLMs) or, more generally, | Language Models (LLMs) (or, more generally, Artificial Intelligence | |||
| Artificial Intelligence (AI)). Because this use was not envisioned | (AI)). Because this use was not envisioned by most publishers of | |||
| by most publishers of information on the Internet, a means of | information on the Internet, a means of expressing the owners' | |||
| expressing the owners' preferences regarding AI crawling has emerged, | preferences regarding AI crawling has emerged, sometimes backed by | |||
| sometimes backed by law (e.g., in the European Union's AI Act | law (e.g., in the European Union's AI Act [AI-ACT]). | |||
| [AI-ACT]). | ||||
| The IAB convened the AI-CONTROL Workshop on 19-20 September 2024 to | The IAB convened the AI-CONTROL Workshop on 19-20 September 2024 to | |||
| "explore practical opt-out mechanisms for AI and build an | "explore practical opt-out mechanisms for AI and build an | |||
| understanding of use cases, requirements, and other considerations in | understanding of use cases, requirements, and other considerations in | |||
| this space" [CFP]. In particular, the emerging practice of using the | this space" [CFP]. In particular, the emerging practice of using the | |||
| Robots Exclusion Protocol [RFC9309] -- also known as "robots.txt" -- | Robots Exclusion Protocol [RFC9309] -- also known as "robots.txt" -- | |||
| has not been coordinated between AI crawlers, resulting in | has not been coordinated between AI crawlers, resulting in | |||
| considerable differences in how they treat it. Furthermore, | considerable differences in how they treat it. Furthermore, | |||
| robots.txt may or may not be a suitable way to control AI crawlers. | robots.txt may or may not be a suitable way to control AI crawlers. | |||
| However, discussion was not limited to consideration of robots.txt, | However, discussion was not limited to consideration of robots.txt, | |||
| skipping to change at line 167 ¶ | skipping to change at line 166 ¶ | |||
| whole industries. | whole industries. | |||
| However, there was quick agreement that both viewpoints were harmed | However, there was quick agreement that both viewpoints were harmed | |||
| by the current state of AI opt-out -- a situation where "no one is | by the current state of AI opt-out -- a situation where "no one is | |||
| better off" (in the words of one participant). | better off" (in the words of one participant). | |||
| Much of that dysfunction was attributed to the lack of coordination | Much of that dysfunction was attributed to the lack of coordination | |||
| and standards for AI opt-out. Currently, content publishers need to | and standards for AI opt-out. Currently, content publishers need to | |||
| consult with each AI vendor to understand how to opt out of training | consult with each AI vendor to understand how to opt out of training | |||
| their products, as there is significant variance in each vendor's | their products, as there is significant variance in each vendor's | |||
| behavior. Furthermore, publishers need to continually monitor for | behavior. Furthermore, publishers need to continually monitor both | |||
| both new vendors and changes to the policies of the vendors they are | new vendors and policy updates from the vendors they are aware of. | |||
| aware of. | ||||
| Underlying those immediate issues, however, are significant | Underlying those immediate issues, however, are significant | |||
| constraints that could be attributed to uncertainties in the legal | constraints that could be attributed to uncertainties in the legal | |||
| context, the nature of AI, and the implications of needing to opt out | context, the nature of AI, and the implications of needing to opt out | |||
| of crawling for it. | of crawling for it. | |||
| 2.1. Crawl Time vs. Inference Time | 2.1. Crawl Time vs. Inference Time | |||
| Perhaps most significant is the "crawl time vs. inference time" | Perhaps most significant is the "crawl time vs. inference time" | |||
| problem. Statements of preference are apparent at crawl time, bound | problem. Statements of preference are apparent at crawl time, bound | |||
| skipping to change at line 244 ¶ | skipping to change at line 242 ¶ | |||
| Compounding this issue was the observation that preferences change | Compounding this issue was the observation that preferences change | |||
| over time, whereas LLMs are created over long time frames and cannot | over time, whereas LLMs are created over long time frames and cannot | |||
| easily be updated to reflect those changes. Of particular concern to | easily be updated to reflect those changes. Of particular concern to | |||
| some was how this makes an opt-out regime "stickier" because content | some was how this makes an opt-out regime "stickier" because content | |||
| that has no associated preference (such as that which predates the | that has no associated preference (such as that which predates the | |||
| authors' knowledge of LLMs) is allowed to be used for these | authors' knowledge of LLMs) is allowed to be used for these | |||
| unforeseen purposes. | unforeseen purposes. | |||
| 2.2. Trust | 2.2. Trust | |||
| This disconnection between the statement of preferences and its | Participants felt that the disconnection between the statement of | |||
| application was felt by participants to contribute to a lack of trust | preferences and its application contribute to a lack of trust in the | |||
| in the ecosystem, along with the typical lack of attribution for data | ecosystem, along with the typical lack of attribution for data | |||
| sources in LLMs, lack of an incentive for publishers to contribute | sources in LLMs, a lack of an incentive for publishers to contribute | |||
| data, and finally (and most noted) lack of any means of monitoring | data, and finally (and most noted) a lack of any means of monitoring | |||
| compliance with preferences. | compliance with preferences. | |||
| This lack of trust led some participants to question whether | This lack of trust led some participants to question whether | |||
| communicating preferences is sufficient in all cases without an | communicating preferences is sufficient in all cases without an | |||
| accompanying way to enforce them, or even to audit adherence to them. | accompanying way to enforce them, or even to audit adherence to them. | |||
| Some participants also indicated that a lack of trust was the primary | Some participants also indicated that a lack of trust was the primary | |||
| cause of the increasingly prevalent blocking of AI crawler IP | cause of the increasingly prevalent blocking of AI crawler IP | |||
| addresses, among other measures. | addresses, among other measures. | |||
| 2.3. Attachment | 2.3. Attachment | |||
| skipping to change at line 301 ¶ | skipping to change at line 299 ¶ | |||
| 2.3.2. Embedding | 2.3.2. Embedding | |||
| Another mechanism for associating preferences with content is to | Another mechanism for associating preferences with content is to | |||
| embed them into the content itself. Many formats used on the | embed them into the content itself. Many formats used on the | |||
| Internet allow this; for example, HTML has the <meta> tag, images | Internet allow this; for example, HTML has the <meta> tag, images | |||
| have Extensible Metadata Platform (XMP) and similar metadata | have Extensible Metadata Platform (XMP) and similar metadata | |||
| sections, and XML and JSON have rich potential for extensions to | sections, and XML and JSON have rich potential for extensions to | |||
| carry such data. | carry such data. | |||
| Embedded preferences were seen to have the advantage of granularity, | Embedded preferences were seen to have the advantage of granularity, | |||
| and of "traveling with" content as it is produced, when it is moved | and of "traveling with" content as it is produced, when the content | |||
| from site to site or when it is stored offline. | that embeds the preferences is moved from site to site or when it is | |||
| stored offline. | ||||
| However, several participants pointed out that embedded preferences | However, several participants pointed out that embedded preferences | |||
| are easily stripped from most formats. This is a common practice for | are easily stripped from most formats. This is a common practice for | |||
| reducing the size of a file (thereby improving performance when | reducing the size of a file (thereby improving performance when | |||
| downloading it) and for assuring privacy (since metadata often leaks | downloading it) and for assuring privacy (since metadata often leaks | |||
| information unintentionally). | information unintentionally). | |||
| Furthermore, some types of content are not suitable for embedding. | Furthermore, some types of content are not suitable for embedding. | |||
| For example, it is not possible to embed preferences into purely | For example, it is not possible to embed preferences into purely | |||
| textual content, and web pages with content from several producers | textual content, and web pages with content from several producers | |||
| End of changes. 6 change blocks. | ||||
| 19 lines changed or deleted | 18 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. | ||||