| rfc9969.original | rfc9969.txt | |||
|---|---|---|---|---|
| Network Working Group M. Nottingham | Internet Architecture Board (IAB) M. Nottingham | |||
| Internet-Draft | Request for Comments: 9969 | |||
| Intended status: Informational S. Krishnan | Category: Informational S. Krishnan | |||
| Expires: 10 March 2026 6 September 2025 | ISSN: 2070-1721 April 2026 | |||
| IAB AI-CONTROL Workshop Report | IAB AI-CONTROL Workshop Report | |||
| draft-iab-ai-control-report-02 | ||||
| Abstract | Abstract | |||
| The AI-CONTROL Workshop was convened by the Internet Architecture | The AI-CONTROL Workshop was convened by the Internet Architecture | |||
| Board (IAB) in September 2024. This report summarizes its | Board (IAB) in September 2024. This report summarizes its | |||
| significant points of discussion and identifies topics that may | significant points of discussion and identifies topics that may | |||
| warrant further consideration and work. | warrant further consideration and work. | |||
| Note that this document is a report on the proceedings of the | Note that this document is a report on the proceedings of the | |||
| workshop. The views and positions documented in this report are | workshop. The views and positions documented in this report are | |||
| those of the workshop participants and do not necessarily reflect IAB | those of the workshop participants and do not necessarily reflect IAB | |||
| views and positions. | views and positions. | |||
| Discussion Venues | ||||
| This note is to be removed before publishing as an RFC. | ||||
| Source for this draft and an issue tracker can be found at | ||||
| https://github.com/intarchboard/draft-iab-ai-control-report. | ||||
| Status of This Memo | Status of This Memo | |||
| This Internet-Draft is submitted in full conformance with the | This document is not an Internet Standards Track specification; it is | |||
| provisions of BCP 78 and BCP 79. | published for informational purposes. | |||
| Internet-Drafts are working documents of the Internet Engineering | ||||
| Task Force (IETF). Note that other groups may also distribute | ||||
| working documents as Internet-Drafts. The list of current Internet- | ||||
| Drafts is at https://datatracker.ietf.org/drafts/current/. | ||||
| Internet-Drafts are draft documents valid for a maximum of six months | This document is a product of the Internet Architecture Board (IAB) | |||
| and may be updated, replaced, or obsoleted by other documents at any | and represents information that the IAB has deemed valuable to | |||
| time. It is inappropriate to use Internet-Drafts as reference | provide for permanent record. It represents the consensus of the | |||
| material or to cite them other than as "work in progress." | Internet Architecture Board (IAB). Documents approved for | |||
| publication by the IAB are not candidates for any level of Internet | ||||
| Standard; see Section 2 of RFC 7841. | ||||
| This Internet-Draft will expire on 10 March 2026. | Information about the current status of this document, any errata, | |||
| and how to provide feedback on it may be obtained at | ||||
| https://www.rfc-editor.org/info/rfc9969. | ||||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2025 IETF Trust and the persons identified as the | Copyright (c) 2026 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents (https://trustee.ietf.org/ | Provisions Relating to IETF Documents | |||
| license-info) in effect on the date of publication of this document. | (https://trustee.ietf.org/license-info) in effect on the date of | |||
| Please review these documents carefully, as they describe your rights | publication of this document. Please review these documents | |||
| and restrictions with respect to this document. | carefully, as they describe your rights and restrictions with respect | |||
| to this document. | ||||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | 1. Introduction | |||
| 1.1. Chatham House Rule . . . . . . . . . . . . . . . . . . . 3 | 1.1. Chatham House Rule | |||
| 1.2. Views Expressed in this Report . . . . . . . . . . . . . 3 | 1.2. Views Expressed in This Report | |||
| 2. Workshop Scope and Discussion . . . . . . . . . . . . . . . . 4 | 2. Workshop Scope and Discussion | |||
| 2.1. Crawl Time vs. Inference Time . . . . . . . . . . . . . . 5 | 2.1. Crawl Time vs. Inference Time | |||
| 2.1.1. Multiple Uses for Crawl Data . . . . . . . . . . . . 5 | 2.1.1. Multiple Uses for Crawl Data | |||
| 2.1.2. Application of Preferences . . . . . . . . . . . . . 5 | 2.1.2. Application of Preferences | |||
| 2.2. Trust . . . . . . . . . . . . . . . . . . . . . . . . . . 6 | 2.2. Trust | |||
| 2.3. Attachment . . . . . . . . . . . . . . . . . . . . . . . 6 | 2.3. Attachment | |||
| 2.3.1. robots.txt (and similar) . . . . . . . . . . . . . . 6 | 2.3.1. robots.txt (and Similar) | |||
| 2.3.2. Embedding . . . . . . . . . . . . . . . . . . . . . . 7 | 2.3.2. Embedding | |||
| 2.3.3. Registries . . . . . . . . . . . . . . . . . . . . . 8 | 2.3.3. Registries | |||
| 2.4. Vocabulary . . . . . . . . . . . . . . . . . . . . . . . 8 | 2.4. Vocabulary | |||
| 3. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 8 | 3. Conclusions | |||
| 3.1. Potential Standards Work . . . . . . . . . . . . . . . . 9 | 3.1. Potential Standards Work | |||
| 3.1.1. Out of Initial Scope . . . . . . . . . . . . . . . . 9 | 3.1.1. Out of Initial Scope | |||
| 4. Security Considerations . . . . . . . . . . . . . . . . . . . 9 | 4. IANA Considerations | |||
| 5. Informative References . . . . . . . . . . . . . . . . . . . 9 | 5. Security Considerations | |||
| Appendix A. About the Workshop . . . . . . . . . . . . . . . . . 10 | 6. Informative References | |||
| A.1. Agenda . . . . . . . . . . . . . . . . . . . . . . . . . 10 | Appendix A. About the Workshop | |||
| A.1.1. Thursday 2024-09-19 . . . . . . . . . . . . . . . . . 11 | A.1. Agenda | |||
| A.1.2. Friday 2024-09-20 . . . . . . . . . . . . . . . . . . 11 | A.1.1. Thursday, 2024-09-19 | |||
| A.2. Attendees . . . . . . . . . . . . . . . . . . . . . . . . 11 | A.1.2. Friday, 2024-09-20 | |||
| IAB Members at the Time of Approval . . . . . . . . . . . . . . . 12 | A.2. Attendees | |||
| Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 12 | IAB Members at the Time of Approval | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 13 | Acknowledgements | |||
| Authors' Addresses | ||||
| 1. Introduction | 1. Introduction | |||
| The Internet Architecture Board (IAB) holds occasional workshops | The Internet Architecture Board (IAB) holds occasional workshops | |||
| designed to consider long-term issues and strategies for the | designed to consider long-term issues and strategies for the | |||
| Internet, and to suggest future directions for the Internet | Internet, and to suggest future directions for the Internet | |||
| architecture. This long-term planning function of the IAB is | architecture. This long-term planning function of the IAB is | |||
| complementary to the ongoing engineering efforts performed by working | complementary to the ongoing engineering efforts performed by working | |||
| groups of the Internet Engineering Task Force (IETF). | groups of the Internet Engineering Task Force (IETF). | |||
| The Internet is one of the major sources of data used to train large | The Internet is one of the major sources of data used to train large | |||
| language models (Large Language Models (LLMs), or more generally, | language models (Large Language Models (LLMs) or, more generally, | |||
| "Artificial Intelligence (AI)"). Because this use was not envisioned | Artificial Intelligence (AI)). Because this use was not envisioned | |||
| by most publishers of information on the Internet, a means of | by most publishers of information on the Internet, a means of | |||
| expressing the owners' preferences regarding AI crawling has emerged, | expressing the owners' preferences regarding AI crawling has emerged, | |||
| sometimes backed by law (e.g., in the European Union's AI Act | sometimes backed by law (e.g., in the European Union's AI Act | |||
| [AI-ACT]). | [AI-ACT]). | |||
| The IAB convened the AI-CONTROL Workshop on 19-20 September 2024 to | The IAB convened the AI-CONTROL Workshop on 19-20 September 2024 to | |||
| "explore practical opt-out mechanisms for AI and build an | "explore practical opt-out mechanisms for AI and build an | |||
| understanding of use cases, requirements, and other considerations in | understanding of use cases, requirements, and other considerations in | |||
| this space" [CFP]. In particular, the emerging practice of using the | this space" [CFP]. In particular, the emerging practice of using the | |||
| Robots Exclusion Protocol [RFC9309] -- also known as "robots.txt" -- | Robots Exclusion Protocol [RFC9309] -- also known as "robots.txt" -- | |||
| skipping to change at page 3, line 39 ¶ | skipping to change at line 122 ¶ | |||
| Participants agreed to conduct the workshop under the Chatham House | Participants agreed to conduct the workshop under the Chatham House | |||
| Rule [CHATHAM-HOUSE], so this report does not attribute statements to | Rule [CHATHAM-HOUSE], so this report does not attribute statements to | |||
| individuals or organizations without express permission. Most | individuals or organizations without express permission. Most | |||
| submissions to the workshop were public and thus attributable; they | submissions to the workshop were public and thus attributable; they | |||
| are used here to provide substance and context. | are used here to provide substance and context. | |||
| Appendix A.2 lists the workshop participants, unless they requested | Appendix A.2 lists the workshop participants, unless they requested | |||
| that this information be withheld. | that this information be withheld. | |||
| 1.2. Views Expressed in this Report | 1.2. Views Expressed in This Report | |||
| This document is a report on the proceedings of the workshop. The | This document is a report on the proceedings of the workshop. The | |||
| views and positions documented in this report are expressed during | views and positions documented in this report are expressed during | |||
| the workshop by participants and do not necessarily reflect IAB's | the workshop by participants and do not necessarily reflect the IAB's | |||
| views and positions. | views and positions. | |||
| Furthermore, the content of the report comes from presentations given | Furthermore, the content of the report comes from presentations given | |||
| by workshop participants and notes taken during the discussions, | by workshop participants and notes taken during the discussions, | |||
| without interpretation or validation. Thus, the content of this | without interpretation or validation. Thus, the content of this | |||
| report follows the flow and dialogue of the workshop but does not | report follows the flow and dialog of the workshop but does not | |||
| attempt to capture a consensus. | attempt to capture a consensus. | |||
| 2. Workshop Scope and Discussion | 2. Workshop Scope and Discussion | |||
| The workshop began by surveying the state of AI control. | The workshop began by surveying the state of AI control. | |||
| Currently, Internet publishers express their preferences for how | Currently, Internet publishers express their preferences for how | |||
| their content is treated for purposes of AI training using a variety | their content is treated for the purposes of AI training using a | |||
| of mechanisms, including declarative ones, such as terms of service, | variety of mechanisms. These include declarative mechanisms, such as | |||
| embedded metadata, and robots.txt [RFC9309], and active ones, such as | terms of service, embedded metadata, and robots.txt [RFC9309], as | |||
| use of paywalls and selective blocking of crawlers (e.g., by IP | well as active mechanisms, such as use of paywalls and selective | |||
| address, User-Agent). | blocking of crawlers (e.g., by IP address or User-Agent). | |||
| There was disagreement about the implications of AI opt-out overall. | There was disagreement about the implications of AI opt-out overall. | |||
| Research presented at the workshop [DECLINE] indicates that the use | Research presented at the workshop [DECLINE] indicates that the use | |||
| of such controls is becoming more prevalent, reducing the | of such controls is becoming more prevalent, reducing the | |||
| availability of data to AI (for purposes including training and | availability of data to AI (for purposes including training and | |||
| inference-time usage). Some of the participants expressed concern | inference-time usage). Some of the participants expressed concern | |||
| about the implications of this -- although at least one AI vendor | about the implications of this -- although at least one AI vendor | |||
| seemed less concerned by this, indicating that "there are plenty of | seemed less concerned by this, indicating that "there are plenty of | |||
| tokens available" for training, even if many opt out. Others | tokens available" for training, even if many opt out. Others | |||
| expressed a need to opt out of AI training because of how they | expressed a need to opt out of AI training because of how they | |||
| skipping to change at page 4, line 37 ¶ | skipping to change at line 167 ¶ | |||
| whole industries. | whole industries. | |||
| However, there was quick agreement that both viewpoints were harmed | However, there was quick agreement that both viewpoints were harmed | |||
| by the current state of AI opt-out -- a situation where "no one is | by the current state of AI opt-out -- a situation where "no one is | |||
| better off" (in the words of one participant). | better off" (in the words of one participant). | |||
| Much of that dysfunction was attributed to the lack of coordination | Much of that dysfunction was attributed to the lack of coordination | |||
| and standards for AI opt-out. Currently, content publishers need to | and standards for AI opt-out. Currently, content publishers need to | |||
| consult with each AI vendor to understand how to opt out of training | consult with each AI vendor to understand how to opt out of training | |||
| their products, as there is significant variance in each vendor's | their products, as there is significant variance in each vendor's | |||
| behaviour. Furthermore, publishers need to continually monitor both | behavior. Furthermore, publishers need to continually monitor for | |||
| for new vendors, and for changes to the policies of the vendors they | both new vendors and changes to the policies of the vendors they are | |||
| are aware of. | aware of. | |||
| Underlying those immediate issues, however, are significant | Underlying those immediate issues, however, are significant | |||
| constraints that could be attributed to uncertainties in the legal | constraints that could be attributed to uncertainties in the legal | |||
| context, the nature of AI, and the implications of needing to opt out | context, the nature of AI, and the implications of needing to opt out | |||
| of crawling for it. | of crawling for it. | |||
| 2.1. Crawl Time vs. Inference Time | 2.1. Crawl Time vs. Inference Time | |||
| Perhaps most significant is the "crawl time vs. inference time" | Perhaps most significant is the "crawl time vs. inference time" | |||
| problem. Statements of preference are apparent at crawl time, bound | problem. Statements of preference are apparent at crawl time, bound | |||
| to content either by location (e.g., robots.txt) or embedded inside | to content either by location (e.g., robots.txt) or embedded inside | |||
| the content itself as metadata. However, the target of those | the content itself as metadata. However, the target of those | |||
| directives is often disassociated from the crawler, either because | directives is often disassociated from the crawler, either because | |||
| the crawl data is not only used for training AI models, or because | the crawl data is not only used for training AI models or because the | |||
| the preferences could be applicable at inference time. | preferences could be applicable at inference time. | |||
| 2.1.1. Multiple Uses for Crawl Data | 2.1.1. Multiple Uses for Crawl Data | |||
| A crawl's data might have multiple uses because the vendor also has | A crawl's data might have multiple uses because the vendor also has | |||
| another product that uses it (e.g., a search engine), or because the | another product that uses it (e.g., a search engine) or because the | |||
| crawl is performed by a party other than the AI vendor. Both are | crawl is performed by a party other than the AI vendor. Both are | |||
| very common patterns: operators of many Internet search engines also | very common patterns: Operators of many Internet search engines also | |||
| train AI models, and many AI models use third-party crawl data. In | train AI models, and many AI models use third-party crawl data. In | |||
| either case, conflating different uses can change the incentives for | either case, conflating different uses can change the incentives for | |||
| publishers to cooperate with the crawler. | publishers to cooperate with the crawler. | |||
| Well-established uses of crawling, such as Internet search, were seen | Well-established uses of crawling, such as Internet searches, were | |||
| by participants as at least partially aligned with the interests of | seen by participants as at least partially aligned with the interests | |||
| publishers: they allow their sites to be crawled, and in return, they | of publishers: They allow their sites to be crawled, and in return, | |||
| receive higher traffic and attention due to being in the search | they receive higher traffic and attention due to being in the search | |||
| index. However, several participants pointed out that this symbiotic | index. However, several participants pointed out that this symbiotic | |||
| relationship does not exist for AI training uses -- with some viewing | relationship does not exist for AI training uses -- with some viewing | |||
| AI as hostile to publishers, because it has the capacity to take | AI as hostile to publishers because it has the capacity to take | |||
| traffic away from their sites. | traffic away from their sites. | |||
| Therefore, when a crawler has multiple uses that include AI, | Therefore, when a crawler has multiple uses that include AI, | |||
| participants observed that "collateral damage" was likely for non-AI | participants observed that "collateral damage" was likely for non-AI | |||
| uses, especially when publishers take more active control measures, | uses, especially when publishers take more active control measures, | |||
| such as blocking or paywalls, to protect their interests. | such as blocking or paywalls, to protect their interests. | |||
| Several participants expressed concerns about this phenomenon's | Several participants expressed concerns about this phenomenon's | |||
| effects on the ecosystem, effectively "locking down the Web" with one | effects on the ecosystem, effectively "locking down the Web" with one | |||
| opining that there were implications for freedom of expression | opining that there were implications for freedom of expression | |||
| overall. | overall. | |||
| 2.1.2. Application of Preferences | 2.1.2. Application of Preferences | |||
| When data is used to train an LLM, the resulting model does not have | When data is used to train an LLM, the resulting model does not have | |||
| the ability to only selectively use a portion of it when performing a | the ability to only selectively use a portion of it when performing a | |||
| task, because inference uses the whole model, and it is not possible | task because inference uses the whole model, and it is not possible | |||
| to identify specific input data for its use in doing so. | to identify specific input data for its use in doing so. | |||
| This means that while publishers' preferences may be available when | This means that while publishers' preferences may be available when | |||
| content is crawled, they generally are not when inference takes | content is crawled, they generally are not when inference takes | |||
| place. Those preferences that are stated in reference to use by AI | place. Those preferences that are stated in reference to use by AI | |||
| -- for example, "no military uses" or "non-commercial only" cannot be | -- for example, "no military uses" or "non-commercial only" -- cannot | |||
| applied by a general-purpose "foundation" model. | be applied by a general-purpose "foundation" model. | |||
| This leaves a few unappealing choices to AI vendors that wish to | This leaves a few unappealing choices to AI vendors that wish to | |||
| comply with those preferences. They can simply omit such data from | comply with those preferences. They can simply omit such data from | |||
| foundation models, thereby reducing their viability. Or, they can | foundation models, thereby reducing their viability. Or they can | |||
| create a separate model for each permutation of preferences -- with a | create a separate model for each permutation of preferences -- with a | |||
| likely proliferation of models as the set of permutations expands. | likely proliferation of models as the set of permutations expands. | |||
| Compounding this issue was the observation that preferences change | Compounding this issue was the observation that preferences change | |||
| over time, whereas LLMs are created over long time frames and cannot | over time, whereas LLMs are created over long time frames and cannot | |||
| easily be updated to reflect those changes. Of particular concern to | easily be updated to reflect those changes. Of particular concern to | |||
| some was how this makes an opt-out regime "stickier" because content | some was how this makes an opt-out regime "stickier" because content | |||
| that has no associated preference (such as that which predates the | that has no associated preference (such as that which predates the | |||
| authors' knowledge of LLMs) is allowed to be used for these | authors' knowledge of LLMs) is allowed to be used for these | |||
| unforeseen purposes. | unforeseen purposes. | |||
| 2.2. Trust | 2.2. Trust | |||
| This disconnection between the statement of preferences and its | This disconnection between the statement of preferences and its | |||
| application was felt by participants to contribute to a lack of trust | application was felt by participants to contribute to a lack of trust | |||
| in the ecosystem, along with the typical lack of attribution for data | in the ecosystem, along with the typical lack of attribution for data | |||
| sources in LLMs, lack of an incentive for publishers to contribute | sources in LLMs, lack of an incentive for publishers to contribute | |||
| data, and finally (and most noted) a lack of any means of monitoring | data, and finally (and most noted) lack of any means of monitoring | |||
| compliance with preferences. | compliance with preferences. | |||
| This lack of trust led some participants to question whether | This lack of trust led some participants to question whether | |||
| communicating preferences is sufficient in all cases without an | communicating preferences is sufficient in all cases without an | |||
| accompanying way to enforce them, or even to audit adherence to them. | accompanying way to enforce them, or even to audit adherence to them. | |||
| Some participants also indicated that a lack of trust was the primary | Some participants also indicated that a lack of trust was the primary | |||
| cause of the increasingly prevalent blocking of AI crawler IP | cause of the increasingly prevalent blocking of AI crawler IP | |||
| addresses, among other measures. | addresses, among other measures. | |||
| 2.3. Attachment | 2.3. Attachment | |||
| One of the primary focuses of the workshop was on _attachment_ -- how | One of the primary focuses of the workshop was on _attachment_, i.e., | |||
| preferences are associated with content on the Internet. A range of | how preferences are associated with content on the Internet. A range | |||
| mechanisms was discussed. | of mechanisms was discussed. | |||
| 2.3.1. robots.txt (and similar) | 2.3.1. robots.txt (and Similar) | |||
| The Robots Exclusion Protocol [RFC9309] is widely recognised by AI | The Robots Exclusion Protocol [RFC9309] is widely recognized by AI | |||
| vendors as an attachment mechanism for preferences. Several | vendors as an attachment mechanism for preferences. Several | |||
| deficiencies were discussed. | deficiencies were discussed. | |||
| First, it does not scale to offer granular control over large sites | First, it does not scale to offer granular control over large sites | |||
| where authors might want to express different policies for a range of | where authors might want to express different policies for a range of | |||
| content (for example, YouTube). | content (for example, YouTube). | |||
| Robots.txt is also typically under the control of the site | robots.txt is also typically under the control of the site | |||
| administrator. If a site has content from many creators (as is often | administrator. If a site has content from many creators (as is often | |||
| the case for social media and similar platforms), the administrator | the case for social media and similar platforms), the administrator | |||
| may not allow them to express their preferences fully, or at all. | may not allow them to express their preferences fully, or at all. | |||
| If content is copied or moved to a different site, the preferences at | If content is copied or moved to a different site, the preferences at | |||
| the new site need to be explicitly transferred, because robots.txt is | the new site need to be explicitly transferred because robots.txt is | |||
| a separate resource. | a separate resource. | |||
| These deficiencies led many participants to feel that robots.txt | These deficiencies led many participants to feel that robots.txt | |||
| cannot be the only solution to opt-out: rather, it should be part of | cannot be the only solution to opt-out: Rather, it should be part of | |||
| a larger system that addresses its shortcomings. | a larger system that addresses its shortcomings. | |||
| Participants noted that other, similar attachment mechanisms have | Participants noted that other similar attachment mechanisms have been | |||
| been proposed. However, none appear to have gained as much attention | proposed. However, none appear to have gained as much attention or | |||
| or implementation (both by AI vendors and content owners) as | implementation (both by AI vendors and content owners) as robots.txt. | |||
| robots.txt. | ||||
| 2.3.2. Embedding | 2.3.2. Embedding | |||
| Another mechanism for associating preferences with content is to | Another mechanism for associating preferences with content is to | |||
| embed them into the content itself. Many formats used on the | embed them into the content itself. Many formats used on the | |||
| Internet allow this; for example, HTML has the <meta> tag, images | Internet allow this; for example, HTML has the <meta> tag, images | |||
| have XMP and similar metadata sections, and XML and JSON have rich | have Extensible Metadata Platform (XMP) and similar metadata | |||
| potential for extensions to carry such data. | sections, and XML and JSON have rich potential for extensions to | |||
| carry such data. | ||||
| Embedded preferences were seen to have the advantage of granularity, | Embedded preferences were seen to have the advantage of granularity, | |||
| and of "travelling with" content as it is produced, when it is moved | and of "traveling with" content as it is produced, when it is moved | |||
| from site to site, or when it is stored offline. | from site to site or when it is stored offline. | |||
| However, several participants pointed out that embedded preferences | However, several participants pointed out that embedded preferences | |||
| are easily stripped from most formats. This is a common practice for | are easily stripped from most formats. This is a common practice for | |||
| reducing the size of a file (thereby improving performance when | reducing the size of a file (thereby improving performance when | |||
| downloading it), and for assuring privacy (since metadata often leaks | downloading it) and for assuring privacy (since metadata often leaks | |||
| information unintentionally). | information unintentionally). | |||
| Furthermore, some types of content are not suitable for embedding. | Furthermore, some types of content are not suitable for embedding. | |||
| For example, it is not possible to embed preferences into purely | For example, it is not possible to embed preferences into purely | |||
| textual content, and Web pages with content from several producers | textual content, and web pages with content from several producers | |||
| (such as a social media or comments feed) cannot easily reflect | (such as a social media or comment feeds) cannot easily reflect | |||
| preferences for each one. | preferences for each one. | |||
| Participants noted that the means of embedding preferences in many | Participants noted that the means of embedding preferences in many | |||
| formats would need to be determined by or coordinated with | formats would need to be determined by or coordinated with | |||
| organisations outside the IETF. For example, HTML and many image | organizations outside the IETF. For example, HTML and many image | |||
| formats are maintained by external bodies. | formats are maintained by external bodies. | |||
| 2.3.3. Registries | 2.3.3. Registries | |||
| In some existing copyright management regimes, it is already common | In some existing copyright management regimes, it is already common | |||
| to have a registry of works that is consulted upon use. For example, | to have a registry of works that is consulted upon use. For example, | |||
| this approach is often used for photographs, music, and video. | this approach is often used for photographs, music, and video. | |||
| Typically, registries use hashing mechanisms to create a | Typically, registries use hashing mechanisms to create a | |||
| "fingerprint" for the content that is robust to changes. | "fingerprint" for the content that is robust to changes. | |||
| Using a registry decouples the content in question from its location, | Using a registry decouples the content in question from its location | |||
| so that it can be found even if moved. It is also claimed to be | so that it can be found even if moved. It is also claimed to be | |||
| robust against stripping of embedded metadata, which is a common | robust against stripping of embedded metadata, which is a common | |||
| practice to improve performance and/or privacy. | practice to improve performance and/or privacy. | |||
| However, several participants pointed out issues with deploying | However, several participants pointed out issues with deploying | |||
| registries at Internet scale. While they may be effective for | registries at the scale of the Internet. While they may be effective | |||
| (relatively) closed and well-known ecosystems such as commercial | for (relatively) closed and well-known ecosystems, such as commercial | |||
| music publishing, applying them to a diverse and very large ecosystem | music publishing, applying them to a diverse and very large ecosystem | |||
| like the Internet has proven problematic. | like the Internet has proven problematic. | |||
| 2.4. Vocabulary | 2.4. Vocabulary | |||
| Another major focus area for the workshop was on _vocabulary_ -- the | Another major focus area for the workshop was on _vocabulary_ -- the | |||
| specific semantics of the opt-out signal. Several participants noted | specific semantics of the opt-out signal. Several participants noted | |||
| that there are already many proposals for vocabularies, as well as | that there are already many proposals for vocabularies, as well as | |||
| many conflicting vocabularies already in use. Several examples were | many conflicting vocabularies already in use. Several examples were | |||
| discussed, including where existing terms were ambiguous, did not | discussed, including where existing terms were ambiguous, did not | |||
| skipping to change at page 8, line 48 ¶ | skipping to change at line 359 ¶ | |||
| different actors. | different actors. | |||
| Although no conclusions regarding exact vocabulary were reached, it | Although no conclusions regarding exact vocabulary were reached, it | |||
| was generally agreed that a complex vocabulary is unlikely to | was generally agreed that a complex vocabulary is unlikely to | |||
| succeed. | succeed. | |||
| 3. Conclusions | 3. Conclusions | |||
| Participants generally agreed that on its current path, the ecosystem | Participants generally agreed that on its current path, the ecosystem | |||
| is not sustainable. As one remarked, "robots.txt is broken and we | is not sustainable. As one remarked, "robots.txt is broken and we | |||
| broke it." | broke it". | |||
| Legal uncertainty, along with fundamental limitations of opt-out | Legal uncertainty, along with fundamental limitations of opt-out | |||
| regimes pointed out above, limit the effectiveness of any technical | regimes pointed out above, limit the effectiveness of any technical | |||
| solution, which will be operating in a system unlike either | solution, which will be operating in a system unlike either | |||
| robots.txt (where there is a symbiotic relationship between content | robots.txt (where there is a symbiotic relationship between content | |||
| owners and the crawlers) or copyright (where the default is | owners and the crawlers) or copyright (where the default is | |||
| effectively opt-in, not opt-out). | effectively opt-in, not opt-out). | |||
| However, the workshop ended with general agreement that positive | However, the workshop ended with general agreement that positive | |||
| steps could be taken to improve the communication of preferences from | steps could be taken to improve the communication of preferences from | |||
| content owners for AI use cases. In discussion, it was evident that | content owners for AI use cases. In discussion, it was evident that | |||
| the discovery of preferences from multiple attachment mechanisms is | the discovery of preferences from multiple attachment mechanisms is | |||
| necessary to meet the diverse needs of content authors, and that | necessary to meet the diverse needs of content authors and, | |||
| therefore defining how they are combined is important. | therefore, that defining how they are combined is important. | |||
| We outline a proposed standard program below. | We outline a proposed standard program below. | |||
| 3.1. Potential Standards Work | 3.1. Potential Standards Work | |||
| The following items were felt to be good starting points for IETF | The following items were identified as good starting points for IETF | |||
| work: | work: | |||
| * Attachment to Web sites by location (in robots.txt or a similar | * Attachment to websites by location (in robots.txt or a similar | |||
| mechanism) | mechanism) | |||
| * Attachment via embedding in IETF-controlled formats (e.g., HTTP | * Attachment via embedding in IETF-controlled formats (e.g., HTTP | |||
| headers) | headers) | |||
| * Definition of a common core vocabulary | * Definition of a common core vocabulary | |||
| * Definition of the overall regime; e.g., how to combine preferences | ||||
| * Definition of the overall regime, e.g., how to combine preferences | ||||
| discovered from multiple attachment mechanisms | discovered from multiple attachment mechanisms | |||
| It would be expected that the IETF would coordinate with other SDOs | It would be expected that the IETF would coordinate with other | |||
| to define embedding in other formats (e.g., HTML). | Standards Development Organizations (SDOs) to define embedding in | |||
| other formats (e.g., HTML). | ||||
| 3.1.1. Out of Initial Scope | 3.1.1. Out of Initial Scope | |||
| It was broadly agreed that it would not be useful to work on the | It was broadly agreed that it would not be useful to work on the | |||
| following items, at least to begin with: | following items, at least to begin with: | |||
| * Enforcement mechanisms for preferences | * Enforcement mechanisms for preferences | |||
| * Registry-based solutions | * Registry-based solutions | |||
| * Identifying or authenticating crawlers and/or content owners | * Identifying or authenticating crawlers and/or content owners | |||
| * Audit or transparency mechanisms | * Audit or transparency mechanisms | |||
| 4. Security Considerations | 4. IANA Considerations | |||
| This document has no IANA actions. | ||||
| 5. Security Considerations | ||||
| This document is a workshop report and does not impact the security | This document is a workshop report and does not impact the security | |||
| of the Internet. | of the Internet. | |||
| 5. Informative References | 6. Informative References | |||
| [CHATHAM-HOUSE] | [AI-ACT] European Parliament, "Regulation (EU) 2024/1689 of the | |||
| Chatham House, "Chatham House Rule", n.d., | European Parliament and of the Council of 13 June 2024 | |||
| <https://www.chathamhouse.org/about-us/chatham-house- | laying down harmonised rules on artificial intelligence | |||
| rule>. | and amending Regulations (EC) No 300/2008, (EU) No | |||
| 167/2013, (EU) No 168/2013, (EU) 2018/858, (EU) 2018/1139 | ||||
| and (EU) 2019/2144 and Directives 2014/90/EU, (EU) | ||||
| 2016/797 and (EU) 2020/1828 (Artificial Intelligence Act) | ||||
| (Text with EEA relevance)", 13 June 2024, | ||||
| <https://eur-lex.europa.eu/eli/reg/2024/1689/oj>. | ||||
| [CFP] Internet Architecture Board, "IAB Workshop on AI-CONTROL", | [CFP] Internet Architecture Board, "IAB Workshop on AI-CONTROL", | |||
| n.d., | ||||
| <https://datatracker.ietf.org/group/aicontrolws/about/>. | <https://datatracker.ietf.org/group/aicontrolws/about/>. | |||
| [PAPERS] Internet Architecture Board, "IAB Workshop on AI-CONTROL | [CHATHAM-HOUSE] | |||
| Materials", n.d., | Chatham House, "Chatham House Rule", | |||
| <https://datatracker.ietf.org/group/aicontrolws/ | <https://www.chathamhouse.org/about-us/chatham-house- | |||
| materials/>. | rule>. | |||
| [AI-ACT] European Parliament, "Regulation (eu) 2024/1689 of the | ||||
| European Parliament and of the Council", 13 June 2024, | ||||
| <https://eur-lex.europa.eu/eli/reg/2024/1689/oj>. | ||||
| [DECLINE] Longpre, S., Mahari, R., Lee, A., and C. Lund, "Consent in | [DECLINE] Longpre, S., Mahari, R., Lee, A., and C. Lund, "Consent in | |||
| Crisis: The Rapid Decline of the AI Data Commons", 2025, | Crisis: The Rapid Decline of the AI Data Commons", 2025, | |||
| <https://www.ietf.org/slides/slides-aicontrolws-consent- | <https://www.ietf.org/slides/slides-aicontrolws-consent- | |||
| in-crisis-the-rapid-decline-of-the-ai-data-commons- | in-crisis-the-rapid-decline-of-the-ai-data-commons- | |||
| 00.pdf>. | 00.pdf>. | |||
| [PAPERS] Internet Architecture Board, "IAB Workshop on AI-CONTROL | ||||
| Materials", | ||||
| <https://datatracker.ietf.org/group/aicontrolws/ | ||||
| materials/>. | ||||
| [RFC9309] Koster, M., Illyes, G., Zeller, H., and L. Sassman, | [RFC9309] Koster, M., Illyes, G., Zeller, H., and L. Sassman, | |||
| "Robots Exclusion Protocol", RFC 9309, | "Robots Exclusion Protocol", RFC 9309, | |||
| DOI 10.17487/RFC9309, September 2022, | DOI 10.17487/RFC9309, September 2022, | |||
| <https://www.rfc-editor.org/rfc/rfc9309>. | <https://www.rfc-editor.org/info/rfc9309>. | |||
| Appendix A. About the Workshop | Appendix A. About the Workshop | |||
| The AI-CONTROL Workshop was held on 2024-09-19 and 2024-09-20 at | The AI-CONTROL Workshop was held on 2024-09-19 and 2024-09-20 at | |||
| Wilkinson Barker Knauer in Washington DC, USA. | Wilkinson Barker Knauer in Washington, D.C., USA. | |||
| Workshop attendees were asked to submit position papers. These | Workshop attendees were asked to submit position papers. These | |||
| papers are published on the IAB website [PAPERS], unless the | papers are published on the IAB website [PAPERS], unless the | |||
| submitter requested it be withheld. | submitter requested it be withheld. | |||
| The workshop was conducted under the Chatham House Rule | The workshop was conducted under the Chatham House Rule | |||
| [CHATHAM-HOUSE], meaning that statements cannot be attributed to | [CHATHAM-HOUSE], meaning that statements cannot be attributed to | |||
| individuals or organizations without explicit authorization. | individuals or organizations without explicit authorization. | |||
| A.1. Agenda | A.1. Agenda | |||
| This section outlines the broad areas of discussion on each day. | This section outlines the broad areas of discussion on each day. | |||
| A.1.1. Thursday 2024-09-19 | A.1.1. Thursday, 2024-09-19 | |||
| Setting the stage An overview of the current state of AI opt-out, | Setting the stage: An overview of the current state of AI opt-out, | |||
| its impact, and existing work in this space | its impact, and existing work in this space | |||
| Lightning talks A variety of perspectives from participants | Lightning talks: A variety of perspectives from participants | |||
| A.1.2. Friday 2024-09-20 | A.1.2. Friday, 2024-09-20 | |||
| Opt-Out Attachment: robots.txt and beyond Considerations in how | Opt-Out Attachment: robots.txt and beyond: Considerations in how | |||
| preferences are attached to content on the Internet | preferences are attached to content on the Internet | |||
| Vocabulary: what opt-out means What information the opt-out signal | Vocabulary: what opt-out means: What information the opt-out signal | |||
| needs to convey | needs to convey | |||
| Discussion and wrap-up Synthesis of the workshop's topics and how | Discussion and wrap-up: Synthesis of the workshop's topics and how | |||
| future work might unfold | future work might unfold | |||
| A.2. Attendees | A.2. Attendees | |||
| Attendees of the workshop are listed with their primary affiliation. | Attendees of the workshop are listed with their primary affiliation. | |||
| Attendees from the program committee (PC) and the Internet | Attendees from the program committee (PC) and the Internet | |||
| Architecture Board (IAB) are also marked. | Architecture Board (IAB) are also marked. | |||
| * Jari Arkko, Ericsson | * Jari Arkko, Ericsson | |||
| * Hirochika Asai, Preferred Networks | * Hirochika Asai, Preferred Networks | |||
| * Farzaneh Badiei, Digital Medusa (PC) | * Farzaneh Badiei, Digital Medusa (PC) | |||
| * Fabrice Canel, Microsoft (PC) | * Fabrice Canel, Microsoft (PC) | |||
| * Lena Cohen, EFF | * Lena Cohen, EFF | |||
| * Alissa Cooper, Knight-Georgetown Institute (PC, IAB) | * Alissa Cooper, Knight-Georgetown Institute (PC, IAB) | |||
| * Marwan Fayed, Cloudflare | * Marwan Fayed, Cloudflare | |||
| * Christopher Flammang, Elsevier | * Christopher Flammang, Elsevier | |||
| * Carl Gahnberg | * Carl Gahnberg | |||
| * Max Gendler, The News Corporation | * Max Gendler, The News Corporation | |||
| * Ted Hardie | * Ted Hardie | |||
| * Dominique Hazaël-Massieux, W3C | * Dominique Hazaël-Massieux, W3C | |||
| * Gary Ilyes, Google (PC) | * Gary Ilyes, Google (PC) | |||
| * Sarah Jennings, UK Department for Science, Innovation and | * Sarah Jennings, UK Department for Science, Innovation and | |||
| Technology | Technology | |||
| * Paul Keller, Open Future | * Paul Keller, Open Future | |||
| * Elizabeth Kendall, Meta | * Elizabeth Kendall, Meta | |||
| * Suresh Krishnan, Cisco (PC, IAB) | * Suresh Krishnan, Cisco (PC, IAB) | |||
| * Mirja Kühlewind, Ericsson (PC, IAB) | * Mirja Kühlewind, Ericsson (PC, IAB) | |||
| * Greg Leppert, Berkman Klein Center | * Greg Leppert, Berkman Klein Center | |||
| * Greg Lindahl, Common Crawl Foundation | * Greg Lindahl, Common Crawl Foundation | |||
| * Mike Linksvayer, GitHub | * Mike Linksvayer, GitHub | |||
| * Fred von Lohmann, OpenAI | * Fred von Lohmann, OpenAI | |||
| * Shayne Longpre, Data Provenance Initiative | * Shayne Longpre, Data Provenance Initiative | |||
| * Don Marti, Raptive | * Don Marti, Raptive | |||
| * Sarah McKenna, Alliance for Responsible Data Collection; Sequentum | * Sarah McKenna, Alliance for Responsible Data Collection; Sequentum | |||
| * Eric Null, Center for Democracy and Technology | * Eric Null, Center for Democracy and Technology | |||
| * Chris Needham, BBC | * Chris Needham, BBC | |||
| * Mark Nottingham, Cloudflare (PC) | * Mark Nottingham, Cloudflare (PC) | |||
| * Paul Ohm, Georgetown Law (PC) | * Paul Ohm, Georgetown Law (PC) | |||
| * Braxton Perkins, NBC Universal | * Braxton Perkins, NBC Universal | |||
| * Chris Petrillo, Wikimedia | * Chris Petrillo, Wikimedia | |||
| * Sebastian Posth, Liccium | * Sebastian Posth, Liccium | |||
| * Michael Prorock | * Michael Prorock | |||
| * Matt Rogerson, Financial Times | * Matt Rogerson, Financial Times | |||
| * Peter Santhanam, IBM | * Peter Santhanam, IBM | |||
| * Jeffrey Sedlik, IPTC/PLUS | * Jeffrey Sedlik, IPTC/PLUS | |||
| * Rony Shalit, Alliance For Responsible Data Collection; Bright Data | * Rony Shalit, Alliance For Responsible Data Collection; Bright Data | |||
| * Ian Sohl, OpenAI | * Ian Sohl, OpenAI | |||
| * Martin Thomson, Mozilla | * Martin Thomson, Mozilla | |||
| * Thom Vaughan, Common Crawl Foundation (PC) | * Thom Vaughan, Common Crawl Foundation (PC) | |||
| * Kat Walsh, Creative Commons | * Kat Walsh, Creative Commons | |||
| * James Whymark, Meta | * James Whymark, Meta | |||
| The following participants requested that their identity and/or | The following participants requested that their identity and/or | |||
| affiliation not be revealed: | affiliation not be revealed: | |||
| * A government official | * A government official | |||
| IAB Members at the Time of Approval | IAB Members at the Time of Approval | |||
| Internet Architecture Board members at the time this document was | Internet Architecture Board members at the time this document was | |||
| skipping to change at page 12, line 35 ¶ | skipping to change at line 592 ¶ | |||
| affiliation not be revealed: | affiliation not be revealed: | |||
| * A government official | * A government official | |||
| IAB Members at the Time of Approval | IAB Members at the Time of Approval | |||
| Internet Architecture Board members at the time this document was | Internet Architecture Board members at the time this document was | |||
| approved for publication were: | approved for publication were: | |||
| * Matthew Bocci | * Matthew Bocci | |||
| * Roman Danyliw | * Roman Danyliw | |||
| * Dhruv Dhody | * Dhruv Dhody | |||
| * Jana Iyengar | * Jana Iyengar | |||
| * Cullen Jennings | * Cullen Jennings | |||
| * Suresh Krishnan | * Suresh Krishnan | |||
| * Mirja Kühlewind | * Mirja Kühlewind | |||
| * Warren Kumari | * Warren Kumari | |||
| * Jason Livingood | * Jason Livingood | |||
| * Mark Nottingham | * Mark Nottingham | |||
| * Tommy Pauly | * Tommy Pauly | |||
| * Alvaro Retana | * Alvaro Retana | |||
| * Qin Wu | * Qin Wu | |||
| Acknowledgements | Acknowledgements | |||
| The Program Committee and the IAB would like to thank Wilkinson | The program committee and the IAB would like to thank Wilkinson | |||
| Barker Knauer for their generosity in hosting the workshop. | Barker Knauer for their generosity in hosting the workshop. | |||
| We also thank our scribes for capturing notes that assisted in the | We also thank our scribes for capturing notes that assisted in the | |||
| production of this report: | production of this report: | |||
| * Zander Arnao | * Zander Arnao | |||
| * Andrea Dean | * Andrea Dean | |||
| * Patrick Yurky | * Patrick Yurky | |||
| Authors' Addresses | Authors' Addresses | |||
| Mark Nottingham | Mark Nottingham | |||
| Melbourne | Melbourne | |||
| Australia | Australia | |||
| Email: mnot@mnot.net | Email: mnot@mnot.net | |||
| URI: https://www.mnot.net/ | URI: https://www.mnot.net/ | |||
| End of changes. 120 change blocks. | ||||
| 138 lines changed or deleted | 202 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. | ||||