| rfc9969.original.md | rfc9969.md | |||
|---|---|---|---|---|
| --- | --- | |||
| title: "IAB AI-CONTROL Workshop Report" | title: IAB AI-CONTROL Workshop Report | |||
| abbrev: IAB AI-CONTROL Workshop Report | ||||
| category: info | category: info | |||
| docname: draft-iab-ai-control-report-latest | docname: draft-iab-ai-control-report-02 | |||
| number: 9969 | ||||
| ipr: trust200902 | ||||
| submissiontype: IAB | submissiontype: IAB | |||
| number: | date: 2026-04 | |||
| date: | obsoletes: | |||
| updates: | ||||
| consensus: true | consensus: true | |||
| pi: [toc, symrefs, sortrefs] | ||||
| v: 3 | v: 3 | |||
| lang: en | ||||
| keyword: | keyword: | |||
| - policy | - policy | |||
| - Artificial Intelligence | - Artificial Intelligence | |||
| - Robots Exclusion Protocol | - Robots Exclusion Protocol | |||
| - web crawler | - web crawler | |||
| - robots.txt | - robots.txt | |||
| pi: | ||||
| compact: yes | ||||
| subcompact: yes | ||||
| author: | author: | |||
| - | - | |||
| ins: M. Nottingham | ins: M. Nottingham | |||
| name: Mark Nottingham | fullname: Mark Nottingham | |||
| postal: | city: Melbourne | |||
| - Melbourne | ||||
| country: Australia | country: Australia | |||
| email: mnot@mnot.net | email: mnot@mnot.net | |||
| uri: https://www.mnot.net/ | uri: https://www.mnot.net/ | |||
| - | - | |||
| ins: S. Krishnan | ins: S. Krishnan | |||
| name: Suresh Krishnan | fullname: Suresh Krishnan | |||
| email: suresh.krishnan@gmail.com | email: suresh.krishnan@gmail.com | |||
| normative: | normative: | |||
| informative: | informative: | |||
| CHATHAM-HOUSE: | CHATHAM-HOUSE: | |||
| title: Chatham House Rule | title: Chatham House Rule | |||
| target: https://www.chathamhouse.org/about-us/chatham-house-rule | target: https://www.chathamhouse.org/about-us/chatham-house-rule | |||
| date: false | ||||
| author: | author: | |||
| - | - | |||
| org: Chatham House | org: Chatham House | |||
| CFP: | CFP: | |||
| title: IAB Workshop on AI-CONTROL | title: IAB Workshop on AI-CONTROL | |||
| target: https://datatracker.ietf.org/group/aicontrolws/about/ | target: https://datatracker.ietf.org/group/aicontrolws/about/ | |||
| date: false | ||||
| author: | author: | |||
| - | - | |||
| org: Internet Architecture Board | org: Internet Architecture Board | |||
| PAPERS: | PAPERS: | |||
| title: IAB Workshop on AI-CONTROL Materials | title: IAB Workshop on AI-CONTROL Materials | |||
| target: https://datatracker.ietf.org/group/aicontrolws/materials/ | target: https://datatracker.ietf.org/group/aicontrolws/materials/ | |||
| date: false | ||||
| author: | author: | |||
| - | - | |||
| org: Internet Architecture Board | org: Internet Architecture Board | |||
| AI-ACT: | AI-ACT: | |||
| title: Regulation (eu) 2024/1689 of the European Parliament and of the Council | title: Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence and amending Re gulations (EC) No 300/2008, (EU) No 167/2013, (EU) No 168/2013, (EU) 2018/858, (EU) 2 018/1139 and (EU) 2019/2144 and Directives 2014/90/EU, (EU) 2016/797 and (EU) 2020/18 28 (Artificial Intelligence Act) (Text with EEA relevance) | |||
| target: https://eur-lex.europa.eu/eli/reg/2024/1689/oj | target: https://eur-lex.europa.eu/eli/reg/2024/1689/oj | |||
| author: | author: | |||
| - | - | |||
| org: European Parliament | org: European Parliament | |||
| date: 2024-06-13 | date: 2024-06-13 | |||
| DECLINE: | DECLINE: | |||
| title: "Consent in Crisis: The Rapid Decline of the AI Data Commons" | title: "Consent in Crisis: The Rapid Decline of the AI Data Commons" | |||
| target: https://www.ietf.org/slides/slides-aicontrolws-consent-in-crisis-the-rapi d-decline-of-the-ai-data-commons-00.pdf | target: https://www.ietf.org/slides/slides-aicontrolws-consent-in-crisis-the-rapi d-decline-of-the-ai-data-commons-00.pdf | |||
| author: | author: | |||
| skipping to change at line 89 ¶ | skipping to change at line 93 ¶ | |||
| - | - | |||
| ins: A. Lee | ins: A. Lee | |||
| name: Ariel Lee | name: Ariel Lee | |||
| - | - | |||
| ins: C. Lund | ins: C. Lund | |||
| name: Campbell Lund | name: Campbell Lund | |||
| date: 2025 | date: 2025 | |||
| --- abstract | --- abstract | |||
| <!--[rfced] May we update the title to follow the format in other | ||||
| workshop reports? | ||||
| Original: | ||||
| IAB AI-CONTROL Workshop Report | ||||
| Perhaps: | ||||
| Report from the IAB Workshop on AI-CONTROL | ||||
| --> | ||||
| The AI-CONTROL Workshop was convened by the Internet Architecture Board (IAB) in Sept ember 2024. This report summarizes its significant points of discussion and identifie s topics that may warrant further consideration and work. | The AI-CONTROL Workshop was convened by the Internet Architecture Board (IAB) in Sept ember 2024. This report summarizes its significant points of discussion and identifie s topics that may warrant further consideration and work. | |||
| Note that this document is a report on the proceedings of the workshop. The views an d positions documented in this report are those of the workshop participants and do n ot necessarily reflect IAB views and positions. | Note that this document is a report on the proceedings of the workshop. The views an d positions documented in this report are those of the workshop participants and do n ot necessarily reflect IAB views and positions. | |||
| --- middle | --- middle | |||
| # Introduction | # Introduction | |||
| <!--[rfced] May we update the text as shown below (i.e., replace | ||||
| "large language models" with "Large Language Models (LLMs)", or would | ||||
| this update change the intended meaning? | ||||
| Original: | ||||
| The Internet is one of the major sources of data used to train | ||||
| large language models (Large Language Models (LLMs), or more | ||||
| generally, "Artificial Intelligence (AI)"). | ||||
| Perhaps: | ||||
| The Internet is one of the major sources of data used to train | ||||
| Large Language Models (LLMs) (or, more generally, Artificial | ||||
| Intelligence (AI)). | ||||
| --> | ||||
| The Internet Architecture Board (IAB) holds occasional workshops designed to consider long-term issues and strategies for the Internet, and to suggest future directions f or the Internet architecture. This long-term planning function of the IAB is compleme ntary to the ongoing engineering efforts performed by working groups of the Internet Engineering Task Force (IETF). | The Internet Architecture Board (IAB) holds occasional workshops designed to consider long-term issues and strategies for the Internet, and to suggest future directions f or the Internet architecture. This long-term planning function of the IAB is compleme ntary to the ongoing engineering efforts performed by working groups of the Internet Engineering Task Force (IETF). | |||
| The Internet is one of the major sources of data used to train large language models (Large Language Models (LLMs), or more generally, "Artificial Intelligence (AI)"). Be cause this use was not envisioned by most publishers of information on the Internet, a means of expressing the owners' preferences regarding AI crawling has emerged, some times backed by law (e.g., in the European Union's AI Act {{AI-ACT}}). | The Internet is one of the major sources of data used to train large language models (Large Language Models (LLMs) or, more generally, Artificial Intelligence (AI)). Beca use this use was not envisioned by most publishers of information on the Internet, a means of expressing the owners' preferences regarding AI crawling has emerged, someti mes backed by law (e.g., in the European Union's AI Act {{AI-ACT}}). | |||
| The IAB convened the AI-CONTROL Workshop on 19-20 September 2024 to "explore practica l opt-out mechanisms for AI and build an understanding of use cases, requirements, an d other considerations in this space" {{CFP}}. In particular, the emerging practice o f using the Robots Exclusion Protocol {{?RFC9309}} -- also known as "robots.txt" -- h as not been coordinated between AI crawlers, resulting in considerable differences in how they treat it. Furthermore, robots.txt may or may not be a suitable way to contr ol AI crawlers. However, discussion was not limited to consideration of robots.txt, a nd approaches other than opt-out were considered. | The IAB convened the AI-CONTROL Workshop on 19-20 September 2024 to "explore practica l opt-out mechanisms for AI and build an understanding of use cases, requirements, an d other considerations in this space" {{CFP}}. In particular, the emerging practice o f using the Robots Exclusion Protocol {{?RFC9309}} -- also known as "robots.txt" -- h as not been coordinated between AI crawlers, resulting in considerable differences in how they treat it. Furthermore, robots.txt may or may not be a suitable way to contr ol AI crawlers. However, discussion was not limited to consideration of robots.txt, a nd approaches other than opt-out were considered. | |||
| To ensure many viewpoints were represented, the program committee invited a broad sel ection of technical experts, AI vendors, content publishers, civil society advocates, and policymakers. | To ensure many viewpoints were represented, the program committee invited a broad sel ection of technical experts, AI vendors, content publishers, civil society advocates, and policymakers. | |||
| ## Chatham House Rule | ## Chatham House Rule | |||
| Participants agreed to conduct the workshop under the Chatham House Rule {{CHATHAM-HO USE}}, so this report does not attribute statements to individuals or organizations w ithout express permission. Most submissions to the workshop were public and thus attr ibutable; they are used here to provide substance and context. | Participants agreed to conduct the workshop under the Chatham House Rule {{CHATHAM-HO USE}}, so this report does not attribute statements to individuals or organizations w ithout express permission. Most submissions to the workshop were public and thus attr ibutable; they are used here to provide substance and context. | |||
| {{attendees}} lists the workshop participants, unless they requested that this inform ation be withheld. | {{attendees}} lists the workshop participants, unless they requested that this inform ation be withheld. | |||
| ## Views Expressed in this Report | ## Views Expressed in This Report | |||
| This document is a report on the proceedings of the workshop. The views and positions documented in this report are expressed during the workshop by participants and do n ot necessarily reflect IAB's views and positions. | This document is a report on the proceedings of the workshop. The views and positions documented in this report are expressed during the workshop by participants and do n ot necessarily reflect the IAB's views and positions. | |||
| Furthermore, the content of the report comes from presentations given by workshop par ticipants and notes taken during the discussions, without interpretation or validatio n. Thus, the content of this report follows the flow and dialogue of the workshop but does not attempt to capture a consensus. | Furthermore, the content of the report comes from presentations given by workshop par ticipants and notes taken during the discussions, without interpretation or validatio n. Thus, the content of this report follows the flow and dialog of the workshop but d oes not attempt to capture a consensus. | |||
| # Workshop Scope and Discussion | # Workshop Scope and Discussion | |||
| The workshop began by surveying the state of AI control. | The workshop began by surveying the state of AI control. | |||
| Currently, Internet publishers express their preferences for how their content is tre ated for purposes of AI training using a variety of mechanisms, including declarative ones, such as terms of service, embedded metadata, and robots.txt {{RFC9309}}, and a ctive ones, such as use of paywalls and selective blocking of crawlers (e.g., by IP a ddress, User-Agent). | Currently, Internet publishers express their preferences for how their content is tre ated for the purposes of AI training using a variety of mechanisms. These include dec larative mechanisms, such as terms of service, embedded metadata, and robots.txt {{RF C9309}}, as well as active mechanisms, such as use of paywalls and selective blocking of crawlers (e.g., by IP address or User-Agent). | |||
| There was disagreement about the implications of AI opt-out overall. Research present ed at the workshop {{DECLINE}} indicates that the use of such controls is becoming mo re prevalent, reducing the availability of data to AI (for purposes including trainin g and inference-time usage). Some of the participants expressed concern about the imp lications of this -- although at least one AI vendor seemed less concerned by this, i ndicating that "there are plenty of tokens available" for training, even if many opt out. Others expressed a need to opt out of AI training because of how they perceive i ts effects on their control over content, seeing AI as usurping their relationships w ith customers and a potential threat to whole industries. | There was disagreement about the implications of AI opt-out overall. Research present ed at the workshop {{DECLINE}} indicates that the use of such controls is becoming mo re prevalent, reducing the availability of data to AI (for purposes including trainin g and inference-time usage). Some of the participants expressed concern about the imp lications of this -- although at least one AI vendor seemed less concerned by this, i ndicating that "there are plenty of tokens available" for training, even if many opt out. Others expressed a need to opt out of AI training because of how they perceive i ts effects on their control over content, seeing AI as usurping their relationships w ith customers and a potential threat to whole industries. | |||
| However, there was quick agreement that both viewpoints were harmed by the current st ate of AI opt-out -- a situation where "no one is better off" (in the words of one pa rticipant). | However, there was quick agreement that both viewpoints were harmed by the current st ate of AI opt-out -- a situation where "no one is better off" (in the words of one pa rticipant). | |||
| Much of that dysfunction was attributed to the lack of coordination and standards for | <!--[rfced] In the last sentence below, please clarify what "both" | |||
| AI opt-out. Currently, content publishers need to consult with each AI vendor to und | refers to - is it new vendors and policy updates? | |||
| erstand how to opt out of training their products, as there is significant variance i | ||||
| n each vendor's behaviour. Furthermore, publishers need to continually monitor both f | Current: | |||
| or new vendors, and for changes to the policies of the vendors they are aware of. | Much of that dysfunction was attributed to the lack of coordination | |||
| and standards for AI opt-out. Currently, content publishers need to | ||||
| consult with each AI vendor to understand how to opt out of training | ||||
| their products, as there is significant variance in each vendor's | ||||
| behavior. Furthermore, publishers need to continually monitor both for | ||||
| new vendors and changes to the policies of the vendors they are | ||||
| aware of. | ||||
| Perhaps: | ||||
| ... Furthermore, publishers need to continually monitor both new | ||||
| vendors and policy updates from the vendors they are aware | ||||
| of. | ||||
| --> | ||||
| Much of that dysfunction was attributed to the lack of coordination and standards for | ||||
| AI opt-out. Currently, content publishers need to consult with each AI vendor to und | ||||
| erstand how to opt out of training their products, as there is significant variance i | ||||
| n each vendor's behavior. Furthermore, publishers need to continually monitor for bot | ||||
| h new vendors and changes to the policies of the vendors they are aware of. | ||||
| Underlying those immediate issues, however, are significant constraints that could be attributed to uncertainties in the legal context, the nature of AI, and the implicat ions of needing to opt out of crawling for it. | Underlying those immediate issues, however, are significant constraints that could be attributed to uncertainties in the legal context, the nature of AI, and the implicat ions of needing to opt out of crawling for it. | |||
| ## Crawl Time vs. Inference Time | ## Crawl Time vs. Inference Time | |||
| Perhaps most significant is the "crawl time vs. inference time" problem. Statements o f preference are apparent at crawl time, bound to content either by location (e.g., r obots.txt) or embedded inside the content itself as metadata. However, the target of those directives is often disassociated from the crawler, either because the crawl da ta is not only used for training AI models, or because the preferences could be appli cable at inference time. | Perhaps most significant is the "crawl time vs. inference time" problem. Statements o f preference are apparent at crawl time, bound to content either by location (e.g., r obots.txt) or embedded inside the content itself as metadata. However, the target of those directives is often disassociated from the crawler, either because the crawl da ta is not only used for training AI models or because the preferences could be applic able at inference time. | |||
| ### Multiple Uses for Crawl Data | ### Multiple Uses for Crawl Data | |||
| A crawl's data might have multiple uses because the vendor also has another product t hat uses it (e.g., a search engine), or because the crawl is performed by a party oth er than the AI vendor. Both are very common patterns: operators of many Internet sear ch engines also train AI models, and many AI models use third-party crawl data. In ei ther case, conflating different uses can change the incentives for publishers to coop erate with the crawler. | A crawl's data might have multiple uses because the vendor also has another product t hat uses it (e.g., a search engine) or because the crawl is performed by a party othe r than the AI vendor. Both are very common patterns: Operators of many Internet searc h engines also train AI models, and many AI models use third-party crawl data. In eit her case, conflating different uses can change the incentives for publishers to coope rate with the crawler. | |||
| Well-established uses of crawling, such as Internet search, were seen by participants as at least partially aligned with the interests of publishers: they allow their sit es to be crawled, and in return, they receive higher traffic and attention due to bei ng in the search index. However, several participants pointed out that this symbiotic relationship does not exist for AI training uses -- with some viewing AI as hostile to publishers, because it has the capacity to take traffic away from their sites. | Well-established uses of crawling, such as Internet searches, were seen by participan ts as at least partially aligned with the interests of publishers: They allow their s ites to be crawled, and in return, they receive higher traffic and attention due to b eing in the search index. However, several participants pointed out that this symbiot ic relationship does not exist for AI training uses -- with some viewing AI as hostil e to publishers because it has the capacity to take traffic away from their sites. | |||
| Therefore, when a crawler has multiple uses that include AI, participants observed th at "collateral damage" was likely for non-AI uses, especially when publishers take mo re active control measures, such as blocking or paywalls, to protect their interests. | Therefore, when a crawler has multiple uses that include AI, participants observed th at "collateral damage" was likely for non-AI uses, especially when publishers take mo re active control measures, such as blocking or paywalls, to protect their interests. | |||
| Several participants expressed concerns about this phenomenon's effects on the ecosys tem, effectively "locking down the Web" with one opining that there were implications for freedom of expression overall. | Several participants expressed concerns about this phenomenon's effects on the ecosys tem, effectively "locking down the Web" with one opining that there were implications for freedom of expression overall. | |||
| ### Application of Preferences | ### Application of Preferences | |||
| When data is used to train an LLM, the resulting model does not have the ability to o nly selectively use a portion of it when performing a task, because inference uses th e whole model, and it is not possible to identify specific input data for its use in doing so. | When data is used to train an LLM, the resulting model does not have the ability to o nly selectively use a portion of it when performing a task because inference uses the whole model, and it is not possible to identify specific input data for its use in d oing so. | |||
| This means that while publishers' preferences may be available when content is crawle d, they generally are not when inference takes place. Those preferences that are stat ed in reference to use by AI -- for example, "no military uses" or "non-commercial on ly" cannot be applied by a general-purpose "foundation" model. | This means that while publishers' preferences may be available when content is crawle d, they generally are not when inference takes place. Those preferences that are stat ed in reference to use by AI -- for example, "no military uses" or "non-commercial on ly" -- cannot be applied by a general-purpose "foundation" model. | |||
| This leaves a few unappealing choices to AI vendors that wish to comply with those pr eferences. They can simply omit such data from foundation models, thereby reducing th eir viability. Or, they can create a separate model for each permutation of preferenc es -- with a likely proliferation of models as the set of permutations expands. | This leaves a few unappealing choices to AI vendors that wish to comply with those pr eferences. They can simply omit such data from foundation models, thereby reducing th eir viability. Or they can create a separate model for each permutation of preference s -- with a likely proliferation of models as the set of permutations expands. | |||
| Compounding this issue was the observation that preferences change over time, whereas LLMs are created over long time frames and cannot easily be updated to reflect those changes. Of particular concern to some was how this makes an opt-out regime "stickie r" because content that has no associated preference (such as that which predates the authors' knowledge of LLMs) is allowed to be used for these unforeseen purposes. | Compounding this issue was the observation that preferences change over time, whereas LLMs are created over long time frames and cannot easily be updated to reflect those changes. Of particular concern to some was how this makes an opt-out regime "stickie r" because content that has no associated preference (such as that which predates the authors' knowledge of LLMs) is allowed to be used for these unforeseen purposes. | |||
| ## Trust | ## Trust | |||
| This disconnection between the statement of preferences and its application was felt | <!--[rfced] May we update "was felt by participants to contribute to" | |||
| by participants to contribute to a lack of trust in the ecosystem, along with the typ | as shown below for easier readability? | |||
| ical lack of attribution for data sources in LLMs, lack of an incentive for publisher | ||||
| s to contribute data, and finally (and most noted) a lack of any means of monitoring | Original: | |||
| compliance with preferences. | This disconnection between the statement of preferences and its | |||
| application was felt by participants to contribute to a lack of | ||||
| trust in the ecosystem, along with the typical lack of attribution | ||||
| for data sources in LLMs, lack of an incentive for publishers to | ||||
| contribute data, and finally (and most noted) a lack of any means | ||||
| of monitoring compliance with preferences. | ||||
| Perhaps: | ||||
| Participants felt that the disconnection between the statement of | ||||
| preferences and its application contributes to a lack of trust in | ||||
| the ecosystem, along with the typical lack of attribution for data | ||||
| sources in LLMs, a lack of an incentive for publishers to | ||||
| contribute data, and finally (and most noted) a lack of any means | ||||
| of monitoring compliance with preferences. | ||||
| --> | ||||
| This disconnection between the statement of preferences and its application was felt | ||||
| by participants to contribute to a lack of trust in the ecosystem, along with the typ | ||||
| ical lack of attribution for data sources in LLMs, lack of an incentive for publisher | ||||
| s to contribute data, and finally (and most noted) lack of any means of monitoring co | ||||
| mpliance with preferences. | ||||
| This lack of trust led some participants to question whether communicating preference s is sufficient in all cases without an accompanying way to enforce them, or even to audit adherence to them. Some participants also indicated that a lack of trust was th e primary cause of the increasingly prevalent blocking of AI crawler IP addresses, am ong other measures. | This lack of trust led some participants to question whether communicating preference s is sufficient in all cases without an accompanying way to enforce them, or even to audit adherence to them. Some participants also indicated that a lack of trust was th e primary cause of the increasingly prevalent blocking of AI crawler IP addresses, am ong other measures. | |||
| ## Attachment | ## Attachment | |||
| One of the primary focuses of the workshop was on _attachment_ -- how preferences are associated with content on the Internet. A range of mechanisms was discussed. | One of the primary focuses of the workshop was on _attachment_, i.e., how preferences are associated with content on the Internet. A range of mechanisms was discussed. | |||
| ### robots.txt (and similar) | ### robots.txt (and Similar) | |||
| The Robots Exclusion Protocol {{RFC9309}} is widely recognised by AI vendors as an at tachment mechanism for preferences. Several deficiencies were discussed. | The Robots Exclusion Protocol {{RFC9309}} is widely recognized by AI vendors as an at tachment mechanism for preferences. Several deficiencies were discussed. | |||
| First, it does not scale to offer granular control over large sites where authors mig ht want to express different policies for a range of content (for example, YouTube). | First, it does not scale to offer granular control over large sites where authors mig ht want to express different policies for a range of content (for example, YouTube). | |||
| Robots.txt is also typically under the control of the site administrator. If a site h as content from many creators (as is often the case for social media and similar plat forms), the administrator may not allow them to express their preferences fully, or a t all. | robots.txt is also typically under the control of the site administrator. If a site h as content from many creators (as is often the case for social media and similar plat forms), the administrator may not allow them to express their preferences fully, or a t all. | |||
| If content is copied or moved to a different site, the preferences at the new site ne ed to be explicitly transferred, because robots.txt is a separate resource. | If content is copied or moved to a different site, the preferences at the new site ne ed to be explicitly transferred because robots.txt is a separate resource. | |||
| These deficiencies led many participants to feel that robots.txt cannot be the only s olution to opt-out: rather, it should be part of a larger system that addresses its s hortcomings. | These deficiencies led many participants to feel that robots.txt cannot be the only s olution to opt-out: Rather, it should be part of a larger system that addresses its s hortcomings. | |||
| Participants noted that other, similar attachment mechanisms have been proposed. Howe ver, none appear to have gained as much attention or implementation (both by AI vendo rs and content owners) as robots.txt. | Participants noted that other similar attachment mechanisms have been proposed. Howev er, none appear to have gained as much attention or implementation (both by AI vendor s and content owners) as robots.txt. | |||
| ### Embedding | ### Embedding | |||
| Another mechanism for associating preferences with content is to embed them into the content itself. Many formats used on the Internet allow this; for example, HTML has t he `<meta>` tag, images have XMP and similar metadata sections, and XML and JSON have rich potential for extensions to carry such data. | Another mechanism for associating preferences with content is to embed them into the content itself. Many formats used on the Internet allow this; for example, HTML has t he `<meta>` tag, images have Extensible Metadata Platform (XMP) and similar metadata sections, and XML and JSON have rich potential for extensions to carry such data. | |||
| Embedded preferences were seen to have the advantage of granularity, and of "travelli | <!--[rfced] Is "when it is moved" referring to "preferences"? If yes, | |||
| ng with" content as it is produced, when it is moved from site to site, or when it is | may we update the text as follows? | |||
| stored offline. | ||||
| However, several participants pointed out that embedded preferences are easily stripp | Original: | |||
| ed from most formats. This is a common practice for reducing the size of a file (ther | Embedded preferences were seen to have the advantage of granularity, | |||
| eby improving performance when downloading it), and for assuring privacy (since metad | and of "travelling with" content as it is produced, when it is moved | |||
| ata often leaks information unintentionally). | from site to site, or when it is stored offline. | |||
| Furthermore, some types of content are not suitable for embedding. For example, it is | Perhaps: | |||
| not possible to embed preferences into purely textual content, and Web pages with co | Embedded preferences were seen to have the advantage of granularity, | |||
| ntent from several producers (such as a social media or comments feed) cannot easily | and of "traveling with" content as it is produced, when they are moved | |||
| reflect preferences for each one. | from site to site or when they are stored offline. | |||
| --> | ||||
| Participants noted that the means of embedding preferences in many formats would need | Embedded preferences were seen to have the advantage of granularity, and of "travelin | |||
| to be determined by or coordinated with organisations outside the IETF. For example, | g with" content as it is produced, when it is moved from site to site or when it is s | |||
| HTML and many image formats are maintained by external bodies. | tored offline. | |||
| However, several participants pointed out that embedded preferences are easily stripp | ||||
| ed from most formats. This is a common practice for reducing the size of a file (ther | ||||
| eby improving performance when downloading it) and for assuring privacy (since metada | ||||
| ta often leaks information unintentionally). | ||||
| Furthermore, some types of content are not suitable for embedding. For example, it is | ||||
| not possible to embed preferences into purely textual content, and web pages with co | ||||
| ntent from several producers (such as a social media or comment feeds) cannot easily | ||||
| reflect preferences for each one. | ||||
| Participants noted that the means of embedding preferences in many formats would need | ||||
| to be determined by or coordinated with organizations outside the IETF. For example, | ||||
| HTML and many image formats are maintained by external bodies. | ||||
| ### Registries | ### Registries | |||
| In some existing copyright management regimes, it is already common to have a registr y of works that is consulted upon use. For example, this approach is often used for p hotographs, music, and video. | In some existing copyright management regimes, it is already common to have a registr y of works that is consulted upon use. For example, this approach is often used for p hotographs, music, and video. | |||
| Typically, registries use hashing mechanisms to create a "fingerprint" for the conten t that is robust to changes. | Typically, registries use hashing mechanisms to create a "fingerprint" for the conten t that is robust to changes. | |||
| Using a registry decouples the content in question from its location, so that it can be found even if moved. It is also claimed to be robust against stripping of embedded metadata, which is a common practice to improve performance and/or privacy. | Using a registry decouples the content in question from its location so that it can b e found even if moved. It is also claimed to be robust against stripping of embedded metadata, which is a common practice to improve performance and/or privacy. | |||
| However, several participants pointed out issues with deploying registries at Interne t scale. While they may be effective for (relatively) closed and well-known ecosystem s such as commercial music publishing, applying them to a diverse and very large ecos ystem like the Internet has proven problematic. | However, several participants pointed out issues with deploying registries at the sca le of the Internet. While they may be effective for (relatively) closed and well-know n ecosystems, such as commercial music publishing, applying them to a diverse and ver y large ecosystem like the Internet has proven problematic. | |||
| ## Vocabulary | ## Vocabulary | |||
| Another major focus area for the workshop was on _vocabulary_ -- the specific semanti cs of the opt-out signal. Several participants noted that there are already many prop osals for vocabularies, as well as many conflicting vocabularies already in use. Seve ral examples were discussed, including where existing terms were ambiguous, did not a ddress common use cases, or were used in conflicting ways by different actors. | Another major focus area for the workshop was on _vocabulary_ -- the specific semanti cs of the opt-out signal. Several participants noted that there are already many prop osals for vocabularies, as well as many conflicting vocabularies already in use. Seve ral examples were discussed, including where existing terms were ambiguous, did not a ddress common use cases, or were used in conflicting ways by different actors. | |||
| Although no conclusions regarding exact vocabulary were reached, it was generally agr eed that a complex vocabulary is unlikely to succeed. | Although no conclusions regarding exact vocabulary were reached, it was generally agr eed that a complex vocabulary is unlikely to succeed. | |||
| # Conclusions | # Conclusions | |||
| Participants generally agreed that on its current path, the ecosystem is not sustaina ble. As one remarked, "robots.txt is broken and we broke it." | Participants generally agreed that on its current path, the ecosystem is not sustaina ble. As one remarked, "robots.txt is broken and we broke it". | |||
| Legal uncertainty, along with fundamental limitations of opt-out regimes pointed out above, limit the effectiveness of any technical solution, which will be operating in a system unlike either robots.txt (where there is a symbiotic relationship between co ntent owners and the crawlers) or copyright (where the default is effectively opt-in, not opt-out). | Legal uncertainty, along with fundamental limitations of opt-out regimes pointed out above, limit the effectiveness of any technical solution, which will be operating in a system unlike either robots.txt (where there is a symbiotic relationship between co ntent owners and the crawlers) or copyright (where the default is effectively opt-in, not opt-out). | |||
| However, the workshop ended with general agreement that positive steps could be taken to improve the communication of preferences from content owners for AI use cases. In discussion, it was evident that the discovery of preferences from multiple attachmen t mechanisms is necessary to meet the diverse needs of content authors, and that ther efore defining how they are combined is important. | However, the workshop ended with general agreement that positive steps could be taken to improve the communication of preferences from content owners for AI use cases. In discussion, it was evident that the discovery of preferences from multiple attachmen t mechanisms is necessary to meet the diverse needs of content authors and, therefore , that defining how they are combined is important. | |||
| We outline a proposed standard program below. | We outline a proposed standard program below. | |||
| ## Potential Standards Work | ## Potential Standards Work | |||
| The following items were felt to be good starting points for IETF work: | The following items were identified as good starting points for IETF work: | |||
| * Attachment to Web sites by location (in robots.txt or a similar mechanism) | * Attachment to websites by location (in robots.txt or a similar mechanism) | |||
| * Attachment via embedding in IETF-controlled formats (e.g., HTTP headers) | * Attachment via embedding in IETF-controlled formats (e.g., HTTP headers) | |||
| * Definition of a common core vocabulary | * Definition of a common core vocabulary | |||
| * Definition of the overall regime; e.g., how to combine preferences discovered from multiple attachment mechanisms | * Definition of the overall regime, e.g., how to combine preferences discovered from multiple attachment mechanisms | |||
| It would be expected that the IETF would coordinate with other SDOs to define embeddi ng in other formats (e.g., HTML). | It would be expected that the IETF would coordinate with other Standards Development Organizations (SDOs) to define embedding in other formats (e.g., HTML). | |||
| ### Out of Initial Scope | ### Out of Initial Scope | |||
| It was broadly agreed that it would not be useful to work on the following items, at least to begin with: | It was broadly agreed that it would not be useful to work on the following items, at least to begin with: | |||
| * Enforcement mechanisms for preferences | * Enforcement mechanisms for preferences | |||
| * Registry-based solutions | * Registry-based solutions | |||
| * Identifying or authenticating crawlers and/or content owners | * Identifying or authenticating crawlers and/or content owners | |||
| * Audit or transparency mechanisms | * Audit or transparency mechanisms | |||
| # IANA Considerations | ||||
| This document has no IANA actions. | ||||
| # Security Considerations | # Security Considerations | |||
| This document is a workshop report and does not impact the security of the Internet. | This document is a workshop report and does not impact the security of the Internet. | |||
| --- back | --- back | |||
| # About the Workshop | # About the Workshop | |||
| The AI-CONTROL Workshop was held on 2024-09-19 and 2024-09-20 at Wilkinson Barker Kna uer in Washington DC, USA. | The AI-CONTROL Workshop was held on 2024-09-19 and 2024-09-20 at Wilkinson Barker Kna uer in Washington, D.C., USA. | |||
| Workshop attendees were asked to submit position papers. These papers are published o n the IAB website [PAPERS], unless the submitter requested it be withheld. | Workshop attendees were asked to submit position papers. These papers are published o n the IAB website {{PAPERS}}, unless the submitter requested it be withheld. | |||
| The workshop was conducted under the Chatham House Rule [CHATHAM-HOUSE], meaning that statements cannot be attributed to individuals or organizations without explicit aut horization. | The workshop was conducted under the Chatham House Rule {{CHATHAM-HOUSE}}, meaning th at statements cannot be attributed to individuals or organizations without explicit a uthorization. | |||
| ## Agenda | ## Agenda | |||
| This section outlines the broad areas of discussion on each day. | This section outlines the broad areas of discussion on each day. | |||
| ### Thursday 2024-09-19 | ### Thursday, 2024-09-19 | |||
| Setting the stage | Setting the stage: | |||
| : An overview of the current state of AI opt-out, its impact, and existing work in th is space | : An overview of the current state of AI opt-out, its impact, and existing work in th is space | |||
| Lightning talks | Lightning talks: | |||
| : A variety of perspectives from participants | : A variety of perspectives from participants | |||
| ### Friday 2024-09-20 | ### Friday, 2024-09-20 | |||
| Opt-Out Attachment: robots.txt and beyond | Opt-Out Attachment: robots.txt and beyond: | |||
| : Considerations in how preferences are attached to content on the Internet | : Considerations in how preferences are attached to content on the Internet | |||
| Vocabulary: what opt-out means | Vocabulary: what opt-out means: | |||
| : What information the opt-out signal needs to convey | : What information the opt-out signal needs to convey | |||
| Discussion and wrap-up | Discussion and wrap-up: | |||
| : Synthesis of the workshop's topics and how future work might unfold | : Synthesis of the workshop's topics and how future work might unfold | |||
| ## Attendees {#attendees} | ## Attendees {#attendees} | |||
| Attendees of the workshop are listed with their primary affiliation. Attendees from t he program committee (PC) and the Internet Architecture Board (IAB) are also marked. | Attendees of the workshop are listed with their primary affiliation. Attendees from t he program committee (PC) and the Internet Architecture Board (IAB) are also marked. | |||
| * Jari Arkko, Ericsson | * {{{Jari Arkko}}}, Ericsson | |||
| * Hirochika Asai, Preferred Networks | * {{{Hirochika Asai}}}, Preferred Networks | |||
| * Farzaneh Badiei, Digital Medusa (PC) | * {{{Farzaneh Badiei}}}, Digital Medusa (PC) | |||
| * Fabrice Canel, Microsoft (PC) | * {{{Fabrice Canel}}}, Microsoft (PC) | |||
| * Lena Cohen, EFF | * {{{Lena Cohen}}}, EFF | |||
| * Alissa Cooper, Knight-Georgetown Institute (PC, IAB) | * {{{Alissa Cooper}}}, Knight-Georgetown Institute (PC, IAB) | |||
| * Marwan Fayed, Cloudflare | * {{{Marwan Fayed}}}, Cloudflare | |||
| * Christopher Flammang, Elsevier | * {{{Christopher Flammang}}}, Elsevier | |||
| * Carl Gahnberg | * {{{Carl Gahnberg}}} | |||
| * Max Gendler, The News Corporation | * {{{Max Gendler}}}, The News Corporation | |||
| * Ted Hardie | * {{{Ted Hardie}}} | |||
| * Dominique Hazaël-Massieux, W3C | * {{{Dominique Hazaël-Massieux}}}, W3C | |||
| * Gary Ilyes, Google (PC) | * {{{Gary Ilyes}}}, Google (PC) | |||
| * Sarah Jennings, UK Department for Science, Innovation and Technology | * {{{Sarah Jennings}}}, UK Department for Science, Innovation and Technology | |||
| * Paul Keller, Open Future | * {{{Paul Keller}}}, Open Future | |||
| * Elizabeth Kendall, Meta | * {{{Elizabeth Kendall}}}, Meta | |||
| * Suresh Krishnan, Cisco (PC, IAB) | * {{{Suresh Krishnan}}}, Cisco (PC, IAB) | |||
| * Mirja Kühlewind, Ericsson (PC, IAB) | * {{{Mirja Kühlewind}}}, Ericsson (PC, IAB) | |||
| * Greg Leppert, Berkman Klein Center | * {{{Greg Leppert}}}, Berkman Klein Center | |||
| * Greg Lindahl, Common Crawl Foundation | * {{{Greg Lindahl}}}, Common Crawl Foundation | |||
| * Mike Linksvayer, GitHub | * {{{Mike Linksvayer}}}, GitHub | |||
| * Fred von Lohmann, OpenAI | * {{{Fred von Lohmann}}}, OpenAI | |||
| * Shayne Longpre, Data Provenance Initiative | * {{{Shayne Longpre}}}, Data Provenance Initiative | |||
| * Don Marti, Raptive | * {{{Don Marti}}}, Raptive | |||
| * Sarah McKenna, Alliance for Responsible Data Collection; Sequentum | * {{{Sarah McKenna}}}, Alliance for Responsible Data Collection; Sequentum | |||
| * Eric Null, Center for Democracy and Technology | * {{{Eric Null}}}, Center for Democracy and Technology | |||
| * Chris Needham, BBC | * {{{Chris Needham}}}, BBC | |||
| * Mark Nottingham, Cloudflare (PC) | * {{{Mark Nottingham}}}, Cloudflare (PC) | |||
| * Paul Ohm, Georgetown Law (PC) | * {{{Paul Ohm}}}, Georgetown Law (PC) | |||
| * Braxton Perkins, NBC Universal | * {{{Braxton Perkins}}}, NBC Universal | |||
| * Chris Petrillo, Wikimedia | * {{{Chris Petrillo}}}, Wikimedia | |||
| * Sebastian Posth, Liccium | * {{{Sebastian Posth}}}, Liccium | |||
| * Michael Prorock | * {{{Michael Prorock}}} | |||
| * Matt Rogerson, Financial Times | * {{{Matt Rogerson}}}, Financial Times | |||
| * Peter Santhanam, IBM | * {{{Peter Santhanam}}}, IBM | |||
| * Jeffrey Sedlik, IPTC/PLUS | * {{{Jeffrey Sedlik}}}, IPTC/PLUS | |||
| * Rony Shalit, Alliance For Responsible Data Collection; Bright Data | * {{{Rony Shalit}}}, Alliance For Responsible Data Collection; Bright Data | |||
| * Ian Sohl, OpenAI | * {{{Ian Sohl}}}, OpenAI | |||
| * Martin Thomson, Mozilla | * {{{Martin Thomson}}}, Mozilla | |||
| * Thom Vaughan, Common Crawl Foundation (PC) | * {{{Thom Vaughan}}}, Common Crawl Foundation (PC) | |||
| * Kat Walsh, Creative Commons | * {{{Kat Walsh}}}, Creative Commons | |||
| * James Whymark, Meta | * {{{James Whymark}}}, Meta | |||
| The following participants requested that their identity and/or affiliation not be re vealed: | The following participants requested that their identity and/or affiliation not be re vealed: | |||
| * A government official | * A government official | |||
| # IAB Members at the Time of Approval | # IAB Members at the Time of Approval | |||
| {:numbered="false"} | {:numbered="false"} | |||
| Internet Architecture Board members at the time this document was approved for public ation were: | Internet Architecture Board members at the time this document was approved for public ation were: | |||
| - Matthew Bocci | - {{{Matthew Bocci}}} | |||
| - Roman Danyliw | - {{{Roman Danyliw}}} | |||
| - Dhruv Dhody | - {{{Dhruv Dhody}}} | |||
| - Jana Iyengar | - {{{Jana Iyengar}}} | |||
| - Cullen Jennings | - {{{Cullen Jennings}}} | |||
| - Suresh Krishnan | - {{{Suresh Krishnan}}} | |||
| - Mirja Kühlewind | - {{{Mirja Kühlewind}}} | |||
| - Warren Kumari | - {{{Warren Kumari}}} | |||
| - Jason Livingood | - {{{Jason Livingood}}} | |||
| - Mark Nottingham | - {{{Mark Nottingham}}} | |||
| - Tommy Pauly | - {{{Tommy Pauly}}} | |||
| - Alvaro Retana | - {{{Alvaro Retana}}} | |||
| - Qin Wu | - {{{Qin Wu}}} | |||
| <span class="insert">{{{Qin Wu}}}</span> | ||||
| # Acknowledgements | # Acknowledgements | |||
| {:numbered="false"} | {:numbered="false"} | |||
| The Program Committee and the IAB would like to thank Wilkinson Barker Knauer for the ir generosity in hosting the workshop. | The program committee and the IAB would like to thank Wilkinson Barker Knauer for the ir generosity in hosting the workshop. | |||
| We also thank our scribes for capturing notes that assisted in the production of this report: | We also thank our scribes for capturing notes that assisted in the production of this report: | |||
| * Zander Arnao | * {{{Zander Arnao}}} | |||
| * Andrea Dean | * {{{Andrea Dean}}} | |||
| * Patrick Yurky | * {{{Patrick Yurky}}} | |||
| <!-- [rfced] FYI - We have added expansions for the following abbreviations | ||||
| per Section 3.6 of RFC 7322 ("RFC Style Guide"). Please review each | ||||
| expansion in the document carefully to ensure correctness. | ||||
| Standards Development Organization (SDO) | ||||
| Extensible Metadata Platform (XMP) | ||||
| --> | ||||
| <!-- [rfced] Please review the "Inclusive Language" portion of the online | ||||
| Style Guide <https://www.rfc-editor.org/styleguide/part2/#inclusive_language> | ||||
| and let us know if any changes are needed. Updates of this nature typically | ||||
| result in more precise language, which is helpful for readers. | ||||
| Note that our script did not flag any words in particular, but this should | ||||
| still be reviewed as a best practice. | ||||
| --> | ||||
| End of changes. 63 change blocks. | ||||
| 131 lines changed or deleted | 215 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. | ||||