Diff: rfc9969.original

	rfc9969.original	rfc9969.txt


	Network Working Group M. Nottingham	Internet Architecture Board (IAB) M. Nottingham
	Internet-Draft	Request for Comments: 9969
	Intended status: Informational S. Krishnan	Category: Informational S. Krishnan
	Expires: 10 March 2026 6 September 2025	ISSN: 2070-1721 May 2026


	IAB AI-CONTROL Workshop Report	Report from the IAB Workshop on AI-CONTROL
	draft-iab-ai-control-report-02

	Abstract	Abstract

	The AI-CONTROL Workshop was convened by the Internet Architecture	The AI-CONTROL Workshop was convened by the Internet Architecture
	Board (IAB) in September 2024. This report summarizes its	Board (IAB) in September 2024. This report summarizes its
	significant points of discussion and identifies topics that may	significant points of discussion and identifies topics that may
	warrant further consideration and work.	warrant further consideration and work.

	Note that this document is a report on the proceedings of the	Note that this document is a report on the proceedings of the
	workshop. The views and positions documented in this report are	workshop. The views and positions documented in this report are
	those of the workshop participants and do not necessarily reflect IAB	those of the workshop participants and do not necessarily reflect IAB
	views and positions.	views and positions.


	Discussion Venues

	This note is to be removed before publishing as an RFC.

	Source for this draft and an issue tracker can be found at
	https://github.com/intarchboard/draft-iab-ai-control-report.

	Status of This Memo	Status of This Memo


	This Internet-Draft is submitted in full conformance with the	This document is not an Internet Standards Track specification; it is
	provisions of BCP 78 and BCP 79.	published for informational purposes.

	Internet-Drafts are working documents of the Internet Engineering
	Task Force (IETF). Note that other groups may also distribute
	working documents as Internet-Drafts. The list of current Internet-
	Drafts is at https://datatracker.ietf.org/drafts/current/.


	Internet-Drafts are draft documents valid for a maximum of six months	This document is a product of the Internet Architecture Board (IAB)
	and may be updated, replaced, or obsoleted by other documents at any	and represents information that the IAB has deemed valuable to
	time. It is inappropriate to use Internet-Drafts as reference	provide for permanent record. It represents the consensus of the
	material or to cite them other than as "work in progress."	Internet Architecture Board (IAB). Documents approved for
		publication by the IAB are not candidates for any level of Internet
		Standard; see Section 2 of RFC 7841.


	This Internet-Draft will expire on 10 March 2026.	Information about the current status of this document, any errata,
		and how to provide feedback on it may be obtained at
		https://www.rfc-editor.org/info/rfc9969.

	Copyright Notice	Copyright Notice


	Copyright (c) 2025 IETF Trust and the persons identified as the	Copyright (c) 2026 IETF Trust and the persons identified as the
	document authors. All rights reserved.	document authors. All rights reserved.

	This document is subject to BCP 78 and the IETF Trust's Legal	This document is subject to BCP 78 and the IETF Trust's Legal

	Provisions Relating to IETF Documents (https://trustee.ietf.org/	Provisions Relating to IETF Documents
	license-info) in effect on the date of publication of this document.	(https://trustee.ietf.org/license-info) in effect on the date of
	Please review these documents carefully, as they describe your rights	publication of this document. Please review these documents
	and restrictions with respect to this document.	carefully, as they describe your rights and restrictions with respect
		to this document.

	Table of Contents	Table of Contents


	1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2	1. Introduction
	1.1. Chatham House Rule . . . . . . . . . . . . . . . . . . . 3	1.1. Chatham House Rule
	1.2. Views Expressed in this Report . . . . . . . . . . . . . 3	1.2. Views Expressed in This Report
	2. Workshop Scope and Discussion . . . . . . . . . . . . . . . . 4	2. Workshop Scope and Discussion
	2.1. Crawl Time vs. Inference Time . . . . . . . . . . . . . . 5	2.1. Crawl Time vs. Inference Time
	2.1.1. Multiple Uses for Crawl Data . . . . . . . . . . . . 5	2.1.1. Multiple Uses for Crawl Data
	2.1.2. Application of Preferences . . . . . . . . . . . . . 5	2.1.2. Application of Preferences
	2.2. Trust . . . . . . . . . . . . . . . . . . . . . . . . . . 6	2.2. Trust
	2.3. Attachment . . . . . . . . . . . . . . . . . . . . . . . 6	2.3. Attachment
	2.3.1. robots.txt (and similar) . . . . . . . . . . . . . . 6	2.3.1. robots.txt (and Similar)
	2.3.2. Embedding . . . . . . . . . . . . . . . . . . . . . . 7	2.3.2. Embedding
	2.3.3. Registries . . . . . . . . . . . . . . . . . . . . . 8	2.3.3. Registries
	2.4. Vocabulary . . . . . . . . . . . . . . . . . . . . . . . 8	2.4. Vocabulary
	3. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 8	3. Conclusions
	3.1. Potential Standards Work . . . . . . . . . . . . . . . . 9	3.1. Potential Standards Work
	3.1.1. Out of Initial Scope . . . . . . . . . . . . . . . . 9	3.1.1. Out of Initial Scope
	4. Security Considerations . . . . . . . . . . . . . . . . . . . 9	4. IANA Considerations
	5. Informative References . . . . . . . . . . . . . . . . . . . 9	5. Security Considerations
	Appendix A. About the Workshop . . . . . . . . . . . . . . . . . 10	6. Informative References
	A.1. Agenda . . . . . . . . . . . . . . . . . . . . . . . . . 10	Appendix A. About the Workshop
	A.1.1. Thursday 2024-09-19 . . . . . . . . . . . . . . . . . 11	A.1. Agenda
	A.1.2. Friday 2024-09-20 . . . . . . . . . . . . . . . . . . 11	A.1.1. Thursday, 2024-09-19
	A.2. Attendees . . . . . . . . . . . . . . . . . . . . . . . . 11	A.1.2. Friday, 2024-09-20
	IAB Members at the Time of Approval . . . . . . . . . . . . . . . 12	A.2. Attendees
	Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 12	IAB Members at the Time of Approval
	Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 13	Acknowledgements
		Authors' Addresses

	1. Introduction	1. Introduction

	The Internet Architecture Board (IAB) holds occasional workshops	The Internet Architecture Board (IAB) holds occasional workshops
	designed to consider long-term issues and strategies for the	designed to consider long-term issues and strategies for the
	Internet, and to suggest future directions for the Internet	Internet, and to suggest future directions for the Internet
	architecture. This long-term planning function of the IAB is	architecture. This long-term planning function of the IAB is
	complementary to the ongoing engineering efforts performed by working	complementary to the ongoing engineering efforts performed by working
	groups of the Internet Engineering Task Force (IETF).	groups of the Internet Engineering Task Force (IETF).


	The Internet is one of the major sources of data used to train large	The Internet is one of the major sources of data used to train Large
	language models (Large Language Models (LLMs), or more generally,	Language Models (LLMs) (or, more generally, Artificial Intelligence
	"Artificial Intelligence (AI)"). Because this use was not envisioned	(AI)). Because this use was not envisioned by most publishers of
	by most publishers of information on the Internet, a means of	information on the Internet, a means of expressing the owners'
	expressing the owners' preferences regarding AI crawling has emerged,	preferences regarding AI crawling has emerged, sometimes backed by
	sometimes backed by law (e.g., in the European Union's AI Act	law (e.g., in the European Union's AI Act [AI-ACT]).
	[AI-ACT]).

	The IAB convened the AI-CONTROL Workshop on 19-20 September 2024 to	The IAB convened the AI-CONTROL Workshop on 19-20 September 2024 to
	"explore practical opt-out mechanisms for AI and build an	"explore practical opt-out mechanisms for AI and build an
	understanding of use cases, requirements, and other considerations in	understanding of use cases, requirements, and other considerations in
	this space" [CFP]. In particular, the emerging practice of using the	this space" [CFP]. In particular, the emerging practice of using the
	Robots Exclusion Protocol [RFC9309] -- also known as "robots.txt" --	Robots Exclusion Protocol [RFC9309] -- also known as "robots.txt" --
	has not been coordinated between AI crawlers, resulting in	has not been coordinated between AI crawlers, resulting in
	considerable differences in how they treat it. Furthermore,	considerable differences in how they treat it. Furthermore,
	robots.txt may or may not be a suitable way to control AI crawlers.	robots.txt may or may not be a suitable way to control AI crawlers.
	However, discussion was not limited to consideration of robots.txt,	However, discussion was not limited to consideration of robots.txt,

	skipping to change at page 3, line 39 ¶	skipping to change at line 121 ¶

	Participants agreed to conduct the workshop under the Chatham House	Participants agreed to conduct the workshop under the Chatham House
	Rule [CHATHAM-HOUSE], so this report does not attribute statements to	Rule [CHATHAM-HOUSE], so this report does not attribute statements to
	individuals or organizations without express permission. Most	individuals or organizations without express permission. Most
	submissions to the workshop were public and thus attributable; they	submissions to the workshop were public and thus attributable; they
	are used here to provide substance and context.	are used here to provide substance and context.

	Appendix A.2 lists the workshop participants, unless they requested	Appendix A.2 lists the workshop participants, unless they requested
	that this information be withheld.	that this information be withheld.


	1.2. Views Expressed in this Report	1.2. Views Expressed in This Report

	This document is a report on the proceedings of the workshop. The	This document is a report on the proceedings of the workshop. The
	views and positions documented in this report are expressed during	views and positions documented in this report are expressed during

	the workshop by participants and do not necessarily reflect IAB's	the workshop by participants and do not necessarily reflect the IAB's
	views and positions.	views and positions.

	Furthermore, the content of the report comes from presentations given	Furthermore, the content of the report comes from presentations given
	by workshop participants and notes taken during the discussions,	by workshop participants and notes taken during the discussions,
	without interpretation or validation. Thus, the content of this	without interpretation or validation. Thus, the content of this

	report follows the flow and dialogue of the workshop but does not	report follows the flow and dialog of the workshop but does not
	attempt to capture a consensus.	attempt to capture a consensus.

	2. Workshop Scope and Discussion	2. Workshop Scope and Discussion

	The workshop began by surveying the state of AI control.	The workshop began by surveying the state of AI control.

	Currently, Internet publishers express their preferences for how	Currently, Internet publishers express their preferences for how

	their content is treated for purposes of AI training using a variety	their content is treated for the purposes of AI training using a
	of mechanisms, including declarative ones, such as terms of service,	variety of mechanisms. These include declarative mechanisms, such as
	embedded metadata, and robots.txt [RFC9309], and active ones, such as	terms of service, embedded metadata, and robots.txt [RFC9309], as
	use of paywalls and selective blocking of crawlers (e.g., by IP	well as active mechanisms, such as use of paywalls and selective
	address, User-Agent).	blocking of crawlers (e.g., by IP address or User-Agent).

	There was disagreement about the implications of AI opt-out overall.	There was disagreement about the implications of AI opt-out overall.
	Research presented at the workshop [DECLINE] indicates that the use	Research presented at the workshop [DECLINE] indicates that the use
	of such controls is becoming more prevalent, reducing the	of such controls is becoming more prevalent, reducing the
	availability of data to AI (for purposes including training and	availability of data to AI (for purposes including training and
	inference-time usage). Some of the participants expressed concern	inference-time usage). Some of the participants expressed concern
	about the implications of this -- although at least one AI vendor	about the implications of this -- although at least one AI vendor
	seemed less concerned by this, indicating that "there are plenty of	seemed less concerned by this, indicating that "there are plenty of
	tokens available" for training, even if many opt out. Others	tokens available" for training, even if many opt out. Others
	expressed a need to opt out of AI training because of how they	expressed a need to opt out of AI training because of how they

	skipping to change at page 4, line 37 ¶	skipping to change at line 166 ¶
	whole industries.	whole industries.

	However, there was quick agreement that both viewpoints were harmed	However, there was quick agreement that both viewpoints were harmed
	by the current state of AI opt-out -- a situation where "no one is	by the current state of AI opt-out -- a situation where "no one is
	better off" (in the words of one participant).	better off" (in the words of one participant).

	Much of that dysfunction was attributed to the lack of coordination	Much of that dysfunction was attributed to the lack of coordination
	and standards for AI opt-out. Currently, content publishers need to	and standards for AI opt-out. Currently, content publishers need to
	consult with each AI vendor to understand how to opt out of training	consult with each AI vendor to understand how to opt out of training
	their products, as there is significant variance in each vendor's	their products, as there is significant variance in each vendor's

	behaviour. Furthermore, publishers need to continually monitor both	behavior. Furthermore, publishers need to continually monitor both
	for new vendors, and for changes to the policies of the vendors they	new vendors and policy updates from the vendors they are aware of.
	are aware of.

	Underlying those immediate issues, however, are significant	Underlying those immediate issues, however, are significant
	constraints that could be attributed to uncertainties in the legal	constraints that could be attributed to uncertainties in the legal
	context, the nature of AI, and the implications of needing to opt out	context, the nature of AI, and the implications of needing to opt out
	of crawling for it.	of crawling for it.

	2.1. Crawl Time vs. Inference Time	2.1. Crawl Time vs. Inference Time

	Perhaps most significant is the "crawl time vs. inference time"	Perhaps most significant is the "crawl time vs. inference time"
	problem. Statements of preference are apparent at crawl time, bound	problem. Statements of preference are apparent at crawl time, bound
	to content either by location (e.g., robots.txt) or embedded inside	to content either by location (e.g., robots.txt) or embedded inside
	the content itself as metadata. However, the target of those	the content itself as metadata. However, the target of those
	directives is often disassociated from the crawler, either because	directives is often disassociated from the crawler, either because

	the crawl data is not only used for training AI models, or because	the crawl data is not only used for training AI models or because the
	the preferences could be applicable at inference time.	preferences could be applicable at inference time.

	2.1.1. Multiple Uses for Crawl Data	2.1.1. Multiple Uses for Crawl Data

	A crawl's data might have multiple uses because the vendor also has	A crawl's data might have multiple uses because the vendor also has

	another product that uses it (e.g., a search engine), or because the	another product that uses it (e.g., a search engine) or because the
	crawl is performed by a party other than the AI vendor. Both are	crawl is performed by a party other than the AI vendor. Both are

	very common patterns: operators of many Internet search engines also	very common patterns: Operators of many Internet search engines also
	train AI models, and many AI models use third-party crawl data. In	train AI models, and many AI models use third-party crawl data. In
	either case, conflating different uses can change the incentives for	either case, conflating different uses can change the incentives for
	publishers to cooperate with the crawler.	publishers to cooperate with the crawler.


	Well-established uses of crawling, such as Internet search, were seen	Well-established uses of crawling, such as Internet searches, were
	by participants as at least partially aligned with the interests of	seen by participants as at least partially aligned with the interests
	publishers: they allow their sites to be crawled, and in return, they	of publishers: They allow their sites to be crawled, and in return,
	receive higher traffic and attention due to being in the search	they receive higher traffic and attention due to being in the search
	index. However, several participants pointed out that this symbiotic	index. However, several participants pointed out that this symbiotic
	relationship does not exist for AI training uses -- with some viewing	relationship does not exist for AI training uses -- with some viewing

	AI as hostile to publishers, because it has the capacity to take	AI as hostile to publishers because it has the capacity to take
	traffic away from their sites.	traffic away from their sites.

	Therefore, when a crawler has multiple uses that include AI,	Therefore, when a crawler has multiple uses that include AI,
	participants observed that "collateral damage" was likely for non-AI	participants observed that "collateral damage" was likely for non-AI
	uses, especially when publishers take more active control measures,	uses, especially when publishers take more active control measures,
	such as blocking or paywalls, to protect their interests.	such as blocking or paywalls, to protect their interests.

	Several participants expressed concerns about this phenomenon's	Several participants expressed concerns about this phenomenon's
	effects on the ecosystem, effectively "locking down the Web" with one	effects on the ecosystem, effectively "locking down the Web" with one
	opining that there were implications for freedom of expression	opining that there were implications for freedom of expression
	overall.	overall.

	2.1.2. Application of Preferences	2.1.2. Application of Preferences

	When data is used to train an LLM, the resulting model does not have	When data is used to train an LLM, the resulting model does not have
	the ability to only selectively use a portion of it when performing a	the ability to only selectively use a portion of it when performing a

	task, because inference uses the whole model, and it is not possible	task because inference uses the whole model, and it is not possible
	to identify specific input data for its use in doing so.	to identify specific input data for its use in doing so.

	This means that while publishers' preferences may be available when	This means that while publishers' preferences may be available when
	content is crawled, they generally are not when inference takes	content is crawled, they generally are not when inference takes
	place. Those preferences that are stated in reference to use by AI	place. Those preferences that are stated in reference to use by AI

	-- for example, "no military uses" or "non-commercial only" cannot be	-- for example, "no military uses" or "non-commercial only" -- cannot
	applied by a general-purpose "foundation" model.	be applied by a general-purpose "foundation" model.

	This leaves a few unappealing choices to AI vendors that wish to	This leaves a few unappealing choices to AI vendors that wish to
	comply with those preferences. They can simply omit such data from	comply with those preferences. They can simply omit such data from

	foundation models, thereby reducing their viability. Or, they can	foundation models, thereby reducing their viability. Or they can
	create a separate model for each permutation of preferences -- with a	create a separate model for each permutation of preferences -- with a
	likely proliferation of models as the set of permutations expands.	likely proliferation of models as the set of permutations expands.

	Compounding this issue was the observation that preferences change	Compounding this issue was the observation that preferences change
	over time, whereas LLMs are created over long time frames and cannot	over time, whereas LLMs are created over long time frames and cannot
	easily be updated to reflect those changes. Of particular concern to	easily be updated to reflect those changes. Of particular concern to
	some was how this makes an opt-out regime "stickier" because content	some was how this makes an opt-out regime "stickier" because content
	that has no associated preference (such as that which predates the	that has no associated preference (such as that which predates the
	authors' knowledge of LLMs) is allowed to be used for these	authors' knowledge of LLMs) is allowed to be used for these
	unforeseen purposes.	unforeseen purposes.

	2.2. Trust	2.2. Trust


	This disconnection between the statement of preferences and its	Participants felt that the disconnection between the statement of
	application was felt by participants to contribute to a lack of trust	preferences and its application contribute to a lack of trust in the
	in the ecosystem, along with the typical lack of attribution for data	ecosystem, along with the typical lack of attribution for data
	sources in LLMs, lack of an incentive for publishers to contribute	sources in LLMs, a lack of an incentive for publishers to contribute
	data, and finally (and most noted) a lack of any means of monitoring	data, and finally (and most noted) a lack of any means of monitoring
	compliance with preferences.	compliance with preferences.

	This lack of trust led some participants to question whether	This lack of trust led some participants to question whether
	communicating preferences is sufficient in all cases without an	communicating preferences is sufficient in all cases without an
	accompanying way to enforce them, or even to audit adherence to them.	accompanying way to enforce them, or even to audit adherence to them.
	Some participants also indicated that a lack of trust was the primary	Some participants also indicated that a lack of trust was the primary
	cause of the increasingly prevalent blocking of AI crawler IP	cause of the increasingly prevalent blocking of AI crawler IP
	addresses, among other measures.	addresses, among other measures.

	2.3. Attachment	2.3. Attachment


	One of the primary focuses of the workshop was on _attachment_ -- how	One of the primary focuses of the workshop was on _attachment_, i.e.,
	preferences are associated with content on the Internet. A range of	how preferences are associated with content on the Internet. A range
	mechanisms was discussed.	of mechanisms was discussed.


	2.3.1. robots.txt (and similar)	2.3.1. robots.txt (and Similar)


	The Robots Exclusion Protocol [RFC9309] is widely recognised by AI	The Robots Exclusion Protocol [RFC9309] is widely recognized by AI
	vendors as an attachment mechanism for preferences. Several	vendors as an attachment mechanism for preferences. Several
	deficiencies were discussed.	deficiencies were discussed.

	First, it does not scale to offer granular control over large sites	First, it does not scale to offer granular control over large sites
	where authors might want to express different policies for a range of	where authors might want to express different policies for a range of
	content (for example, YouTube).	content (for example, YouTube).


	Robots.txt is also typically under the control of the site	robots.txt is also typically under the control of the site
	administrator. If a site has content from many creators (as is often	administrator. If a site has content from many creators (as is often
	the case for social media and similar platforms), the administrator	the case for social media and similar platforms), the administrator
	may not allow them to express their preferences fully, or at all.	may not allow them to express their preferences fully, or at all.

	If content is copied or moved to a different site, the preferences at	If content is copied or moved to a different site, the preferences at

	the new site need to be explicitly transferred, because robots.txt is	the new site need to be explicitly transferred because robots.txt is
	a separate resource.	a separate resource.

	These deficiencies led many participants to feel that robots.txt	These deficiencies led many participants to feel that robots.txt

	cannot be the only solution to opt-out: rather, it should be part of	cannot be the only solution to opt-out: Rather, it should be part of
	a larger system that addresses its shortcomings.	a larger system that addresses its shortcomings.


	Participants noted that other, similar attachment mechanisms have	Participants noted that other similar attachment mechanisms have been
	been proposed. However, none appear to have gained as much attention	proposed. However, none appear to have gained as much attention or
	or implementation (both by AI vendors and content owners) as	implementation (both by AI vendors and content owners) as robots.txt.
	robots.txt.

	2.3.2. Embedding	2.3.2. Embedding

	Another mechanism for associating preferences with content is to	Another mechanism for associating preferences with content is to
	embed them into the content itself. Many formats used on the	embed them into the content itself. Many formats used on the
	Internet allow this; for example, HTML has the <meta> tag, images	Internet allow this; for example, HTML has the <meta> tag, images

	have XMP and similar metadata sections, and XML and JSON have rich	have Extensible Metadata Platform (XMP) and similar metadata
	potential for extensions to carry such data.	sections, and XML and JSON have rich potential for extensions to
		carry such data.

	Embedded preferences were seen to have the advantage of granularity,	Embedded preferences were seen to have the advantage of granularity,

	and of "travelling with" content as it is produced, when it is moved	and of "traveling with" content as it is produced, when the content
	from site to site, or when it is stored offline.	that embeds the preferences is moved from site to site or when it is
		stored offline.

	However, several participants pointed out that embedded preferences	However, several participants pointed out that embedded preferences
	are easily stripped from most formats. This is a common practice for	are easily stripped from most formats. This is a common practice for
	reducing the size of a file (thereby improving performance when	reducing the size of a file (thereby improving performance when

	downloading it), and for assuring privacy (since metadata often leaks	downloading it) and for assuring privacy (since metadata often leaks
	information unintentionally).	information unintentionally).

	Furthermore, some types of content are not suitable for embedding.	Furthermore, some types of content are not suitable for embedding.
	For example, it is not possible to embed preferences into purely	For example, it is not possible to embed preferences into purely

	textual content, and Web pages with content from several producers	textual content, and web pages with content from several producers
	(such as a social media or comments feed) cannot easily reflect	(such as a social media or comment feeds) cannot easily reflect
	preferences for each one.	preferences for each one.

	Participants noted that the means of embedding preferences in many	Participants noted that the means of embedding preferences in many
	formats would need to be determined by or coordinated with	formats would need to be determined by or coordinated with

	organisations outside the IETF. For example, HTML and many image	organizations outside the IETF. For example, HTML and many image
	formats are maintained by external bodies.	formats are maintained by external bodies.

	2.3.3. Registries	2.3.3. Registries

	In some existing copyright management regimes, it is already common	In some existing copyright management regimes, it is already common
	to have a registry of works that is consulted upon use. For example,	to have a registry of works that is consulted upon use. For example,
	this approach is often used for photographs, music, and video.	this approach is often used for photographs, music, and video.

	Typically, registries use hashing mechanisms to create a	Typically, registries use hashing mechanisms to create a
	"fingerprint" for the content that is robust to changes.	"fingerprint" for the content that is robust to changes.


	Using a registry decouples the content in question from its location,	Using a registry decouples the content in question from its location
	so that it can be found even if moved. It is also claimed to be	so that it can be found even if moved. It is also claimed to be
	robust against stripping of embedded metadata, which is a common	robust against stripping of embedded metadata, which is a common
	practice to improve performance and/or privacy.	practice to improve performance and/or privacy.

	However, several participants pointed out issues with deploying	However, several participants pointed out issues with deploying

	registries at Internet scale. While they may be effective for	registries at the scale of the Internet. While they may be effective
	(relatively) closed and well-known ecosystems such as commercial	for (relatively) closed and well-known ecosystems, such as commercial
	music publishing, applying them to a diverse and very large ecosystem	music publishing, applying them to a diverse and very large ecosystem
	like the Internet has proven problematic.	like the Internet has proven problematic.

	2.4. Vocabulary	2.4. Vocabulary

	Another major focus area for the workshop was on _vocabulary_ -- the	Another major focus area for the workshop was on _vocabulary_ -- the
	specific semantics of the opt-out signal. Several participants noted	specific semantics of the opt-out signal. Several participants noted
	that there are already many proposals for vocabularies, as well as	that there are already many proposals for vocabularies, as well as
	many conflicting vocabularies already in use. Several examples were	many conflicting vocabularies already in use. Several examples were
	discussed, including where existing terms were ambiguous, did not	discussed, including where existing terms were ambiguous, did not

	skipping to change at page 8, line 48 ¶	skipping to change at line 358 ¶
	different actors.	different actors.

	Although no conclusions regarding exact vocabulary were reached, it	Although no conclusions regarding exact vocabulary were reached, it
	was generally agreed that a complex vocabulary is unlikely to	was generally agreed that a complex vocabulary is unlikely to
	succeed.	succeed.

	3. Conclusions	3. Conclusions

	Participants generally agreed that on its current path, the ecosystem	Participants generally agreed that on its current path, the ecosystem
	is not sustainable. As one remarked, "robots.txt is broken and we	is not sustainable. As one remarked, "robots.txt is broken and we

	broke it."	broke it".

	Legal uncertainty, along with fundamental limitations of opt-out	Legal uncertainty, along with fundamental limitations of opt-out
	regimes pointed out above, limit the effectiveness of any technical	regimes pointed out above, limit the effectiveness of any technical
	solution, which will be operating in a system unlike either	solution, which will be operating in a system unlike either
	robots.txt (where there is a symbiotic relationship between content	robots.txt (where there is a symbiotic relationship between content
	owners and the crawlers) or copyright (where the default is	owners and the crawlers) or copyright (where the default is
	effectively opt-in, not opt-out).	effectively opt-in, not opt-out).

	However, the workshop ended with general agreement that positive	However, the workshop ended with general agreement that positive
	steps could be taken to improve the communication of preferences from	steps could be taken to improve the communication of preferences from
	content owners for AI use cases. In discussion, it was evident that	content owners for AI use cases. In discussion, it was evident that
	the discovery of preferences from multiple attachment mechanisms is	the discovery of preferences from multiple attachment mechanisms is

	necessary to meet the diverse needs of content authors, and that	necessary to meet the diverse needs of content authors and,
	therefore defining how they are combined is important.	therefore, that defining how they are combined is important.

	We outline a proposed standard program below.	We outline a proposed standard program below.

	3.1. Potential Standards Work	3.1. Potential Standards Work


	The following items were felt to be good starting points for IETF	The following items were identified as good starting points for IETF
	work:	work:


	* Attachment to Web sites by location (in robots.txt or a similar	* Attachment to websites by location (in robots.txt or a similar
	mechanism)	mechanism)


	* Attachment via embedding in IETF-controlled formats (e.g., HTTP	* Attachment via embedding in IETF-controlled formats (e.g., HTTP
	headers)	headers)


	* Definition of a common core vocabulary	* Definition of a common core vocabulary

	* Definition of the overall regime; e.g., how to combine preferences
		* Definition of the overall regime, e.g., how to combine preferences
	discovered from multiple attachment mechanisms	discovered from multiple attachment mechanisms


	It would be expected that the IETF would coordinate with other SDOs	It would be expected that the IETF would coordinate with other
	to define embedding in other formats (e.g., HTML).	Standards Development Organizations (SDOs) to define embedding in
		other formats (e.g., HTML).

	3.1.1. Out of Initial Scope	3.1.1. Out of Initial Scope

	It was broadly agreed that it would not be useful to work on the	It was broadly agreed that it would not be useful to work on the
	following items, at least to begin with:	following items, at least to begin with:

	* Enforcement mechanisms for preferences	* Enforcement mechanisms for preferences


	* Registry-based solutions	* Registry-based solutions


	* Identifying or authenticating crawlers and/or content owners	* Identifying or authenticating crawlers and/or content owners


	* Audit or transparency mechanisms	* Audit or transparency mechanisms


	4. Security Considerations	4. IANA Considerations

		This document has no IANA actions.

		5. Security Considerations

	This document is a workshop report and does not impact the security	This document is a workshop report and does not impact the security
	of the Internet.	of the Internet.


	5. Informative References	6. Informative References


	[CHATHAM-HOUSE]	[AI-ACT] European Parliament, "Regulation (EU) 2024/1689 of the
	Chatham House, "Chatham House Rule", n.d.,	European Parliament and of the Council of 13 June 2024
	<https://www.chathamhouse.org/about-us/chatham-house-	laying down harmonised rules on artificial intelligence
	rule>.	and amending Regulations (EC) No 300/2008, (EU) No
		167/2013, (EU) No 168/2013, (EU) 2018/858, (EU) 2018/1139
		and (EU) 2019/2144 and Directives 2014/90/EU, (EU)
		2016/797 and (EU) 2020/1828 (Artificial Intelligence Act)
		(Text with EEA relevance)", 13 June 2024,
		<https://eur-lex.europa.eu/eli/reg/2024/1689/oj>.

	[CFP] Internet Architecture Board, "IAB Workshop on AI-CONTROL",	[CFP] Internet Architecture Board, "IAB Workshop on AI-CONTROL",

	n.d.,
	<https://datatracker.ietf.org/group/aicontrolws/about/>.	<https://datatracker.ietf.org/group/aicontrolws/about/>.


	[PAPERS] Internet Architecture Board, "IAB Workshop on AI-CONTROL	[CHATHAM-HOUSE]
	Materials", n.d.,	Chatham House, "Chatham House Rule",
	<https://datatracker.ietf.org/group/aicontrolws/	<https://www.chathamhouse.org/about-us/chatham-house-
	materials/>.	rule>.

	[AI-ACT] European Parliament, "Regulation (eu) 2024/1689 of the
	European Parliament and of the Council", 13 June 2024,
	<https://eur-lex.europa.eu/eli/reg/2024/1689/oj>.

	[DECLINE] Longpre, S., Mahari, R., Lee, A., and C. Lund, "Consent in	[DECLINE] Longpre, S., Mahari, R., Lee, A., and C. Lund, "Consent in
	Crisis: The Rapid Decline of the AI Data Commons", 2025,	Crisis: The Rapid Decline of the AI Data Commons", 2025,
	<https://www.ietf.org/slides/slides-aicontrolws-consent-	<https://www.ietf.org/slides/slides-aicontrolws-consent-
	in-crisis-the-rapid-decline-of-the-ai-data-commons-	in-crisis-the-rapid-decline-of-the-ai-data-commons-
	00.pdf>.	00.pdf>.


		[PAPERS] Internet Architecture Board, "IAB Workshop on AI-CONTROL
		Materials",
		<https://datatracker.ietf.org/group/aicontrolws/
		materials/>.

	[RFC9309] Koster, M., Illyes, G., Zeller, H., and L. Sassman,	[RFC9309] Koster, M., Illyes, G., Zeller, H., and L. Sassman,
	"Robots Exclusion Protocol", RFC 9309,	"Robots Exclusion Protocol", RFC 9309,
	DOI 10.17487/RFC9309, September 2022,	DOI 10.17487/RFC9309, September 2022,

	<https://www.rfc-editor.org/rfc/rfc9309>.	<https://www.rfc-editor.org/info/rfc9309>.

	Appendix A. About the Workshop	Appendix A. About the Workshop

	The AI-CONTROL Workshop was held on 2024-09-19 and 2024-09-20 at	The AI-CONTROL Workshop was held on 2024-09-19 and 2024-09-20 at

	Wilkinson Barker Knauer in Washington DC, USA.	Wilkinson Barker Knauer in Washington, D.C., USA.

	Workshop attendees were asked to submit position papers. These	Workshop attendees were asked to submit position papers. These
	papers are published on the IAB website [PAPERS], unless the	papers are published on the IAB website [PAPERS], unless the
	submitter requested it be withheld.	submitter requested it be withheld.

	The workshop was conducted under the Chatham House Rule	The workshop was conducted under the Chatham House Rule
	[CHATHAM-HOUSE], meaning that statements cannot be attributed to	[CHATHAM-HOUSE], meaning that statements cannot be attributed to
	individuals or organizations without explicit authorization.	individuals or organizations without explicit authorization.

	A.1. Agenda	A.1. Agenda

	This section outlines the broad areas of discussion on each day.	This section outlines the broad areas of discussion on each day.


	A.1.1. Thursday 2024-09-19	A.1.1. Thursday, 2024-09-19


	Setting the stage An overview of the current state of AI opt-out,	Setting the stage: An overview of the current state of AI opt-out,
	its impact, and existing work in this space	its impact, and existing work in this space


	Lightning talks A variety of perspectives from participants	Lightning talks: A variety of perspectives from participants


	A.1.2. Friday 2024-09-20	A.1.2. Friday, 2024-09-20


	Opt-Out Attachment: robots.txt and beyond Considerations in how	Opt-Out Attachment: robots.txt and beyond: Considerations in how
	preferences are attached to content on the Internet	preferences are attached to content on the Internet


	Vocabulary: what opt-out means What information the opt-out signal	Vocabulary: what opt-out means: What information the opt-out signal
	needs to convey	needs to convey


	Discussion and wrap-up Synthesis of the workshop's topics and how	Discussion and wrap-up: Synthesis of the workshop's topics and how
	future work might unfold	future work might unfold

	A.2. Attendees	A.2. Attendees

	Attendees of the workshop are listed with their primary affiliation.	Attendees of the workshop are listed with their primary affiliation.
	Attendees from the program committee (PC) and the Internet	Attendees from the program committee (PC) and the Internet
	Architecture Board (IAB) are also marked.	Architecture Board (IAB) are also marked.

	* Jari Arkko, Ericsson	* Jari Arkko, Ericsson


	* Hirochika Asai, Preferred Networks	* Hirochika Asai, Preferred Networks


	* Farzaneh Badiei, Digital Medusa (PC)	* Farzaneh Badiei, Digital Medusa (PC)


	* Fabrice Canel, Microsoft (PC)	* Fabrice Canel, Microsoft (PC)


	* Lena Cohen, EFF	* Lena Cohen, EFF


	* Alissa Cooper, Knight-Georgetown Institute (PC, IAB)	* Alissa Cooper, Knight-Georgetown Institute (PC, IAB)


	* Marwan Fayed, Cloudflare	* Marwan Fayed, Cloudflare


	* Christopher Flammang, Elsevier	* Christopher Flammang, Elsevier


	* Carl Gahnberg	* Carl Gahnberg


	* Max Gendler, The News Corporation	* Max Gendler, The News Corporation


	* Ted Hardie	* Ted Hardie


	* Dominique Hazaël-Massieux, W3C	* Dominique Hazaël-Massieux, W3C


	* Gary Ilyes, Google (PC)	* Gary Ilyes, Google (PC)


	* Sarah Jennings, UK Department for Science, Innovation and	* Sarah Jennings, UK Department for Science, Innovation and
	Technology	Technology


	* Paul Keller, Open Future	* Paul Keller, Open Future


	* Elizabeth Kendall, Meta	* Elizabeth Kendall, Meta


	* Suresh Krishnan, Cisco (PC, IAB)	* Suresh Krishnan, Cisco (PC, IAB)


	* Mirja Kühlewind, Ericsson (PC, IAB)	* Mirja Kühlewind, Ericsson (PC, IAB)


	* Greg Leppert, Berkman Klein Center	* Greg Leppert, Berkman Klein Center


	* Greg Lindahl, Common Crawl Foundation	* Greg Lindahl, Common Crawl Foundation


	* Mike Linksvayer, GitHub	* Mike Linksvayer, GitHub


	* Fred von Lohmann, OpenAI	* Fred von Lohmann, OpenAI


	* Shayne Longpre, Data Provenance Initiative	* Shayne Longpre, Data Provenance Initiative


	* Don Marti, Raptive	* Don Marti, Raptive


	* Sarah McKenna, Alliance for Responsible Data Collection; Sequentum	* Sarah McKenna, Alliance for Responsible Data Collection; Sequentum


	* Eric Null, Center for Democracy and Technology	* Eric Null, Center for Democracy and Technology


	* Chris Needham, BBC	* Chris Needham, BBC


	* Mark Nottingham, Cloudflare (PC)	* Mark Nottingham, Cloudflare (PC)


	* Paul Ohm, Georgetown Law (PC)	* Paul Ohm, Georgetown Law (PC)


	* Braxton Perkins, NBC Universal	* Braxton Perkins, NBC Universal


	* Chris Petrillo, Wikimedia	* Chris Petrillo, Wikimedia


	* Sebastian Posth, Liccium	* Sebastian Posth, Liccium


	* Michael Prorock	* Michael Prorock


	* Matt Rogerson, Financial Times	* Matt Rogerson, Financial Times


	* Peter Santhanam, IBM	* Peter Santhanam, IBM


	* Jeffrey Sedlik, IPTC/PLUS	* Jeffrey Sedlik, IPTC/PLUS


	* Rony Shalit, Alliance For Responsible Data Collection; Bright Data	* Rony Shalit, Alliance For Responsible Data Collection; Bright Data


	* Ian Sohl, OpenAI	* Ian Sohl, OpenAI


	* Martin Thomson, Mozilla	* Martin Thomson, Mozilla


	* Thom Vaughan, Common Crawl Foundation (PC)	* Thom Vaughan, Common Crawl Foundation (PC)


	* Kat Walsh, Creative Commons	* Kat Walsh, Creative Commons


	* James Whymark, Meta	* James Whymark, Meta

	The following participants requested that their identity and/or	The following participants requested that their identity and/or
	affiliation not be revealed:	affiliation not be revealed:

	* A government official	* A government official

	IAB Members at the Time of Approval	IAB Members at the Time of Approval

	Internet Architecture Board members at the time this document was	Internet Architecture Board members at the time this document was

	skipping to change at page 12, line 35 ¶	skipping to change at line 591 ¶
	affiliation not be revealed:	affiliation not be revealed:

	* A government official	* A government official

	IAB Members at the Time of Approval	IAB Members at the Time of Approval

	Internet Architecture Board members at the time this document was	Internet Architecture Board members at the time this document was
	approved for publication were:	approved for publication were:

	* Matthew Bocci	* Matthew Bocci


	* Roman Danyliw	* Roman Danyliw


	* Dhruv Dhody	* Dhruv Dhody


	* Jana Iyengar	* Jana Iyengar


	* Cullen Jennings	* Cullen Jennings


	* Suresh Krishnan	* Suresh Krishnan


	* Mirja Kühlewind	* Mirja Kühlewind


	* Warren Kumari	* Warren Kumari


	* Jason Livingood	* Jason Livingood


	* Mark Nottingham	* Mark Nottingham


	* Tommy Pauly	* Tommy Pauly


	* Alvaro Retana	* Alvaro Retana


	* Qin Wu	* Qin Wu

	Acknowledgements	Acknowledgements


	The Program Committee and the IAB would like to thank Wilkinson	The program committee and the IAB would like to thank Wilkinson
	Barker Knauer for their generosity in hosting the workshop.	Barker Knauer for their generosity in hosting the workshop.

	We also thank our scribes for capturing notes that assisted in the	We also thank our scribes for capturing notes that assisted in the
	production of this report:	production of this report:

	* Zander Arnao	* Zander Arnao


	* Andrea Dean	* Andrea Dean


	* Patrick Yurky	* Patrick Yurky

	Authors' Addresses	Authors' Addresses

	Mark Nottingham	Mark Nottingham
	Melbourne	Melbourne
	Australia	Australia
	Email: mnot@mnot.net	Email: mnot@mnot.net
	URI: https://www.mnot.net/	URI: https://www.mnot.net/


End of changes. 120 change blocks.
	147 lines changed or deleted	210 lines changed or added
This html diff was produced by rfcdiff 1.48.