AI and Copyright: The European Framework Bringing Order to the Chaos of Massive Data Training

AI and Copyright: The European Framework Bringing Order to the Chaos of Massive Data Training

Spanish Office30. 07. 2025
Share on XShare via emailShare on LinkedIn

news

In the race to regulate artificial intelligence, Europe took a decisive step in June 2024 with the adoption of Regulation 2024/1689 on Artificial Intelligence (the “AI Regulation” or “AI Act”). This pioneering legislation establishes a comprehensive legal framework for the development, use, and marketing of AI systems within the European market and, in Article 56, already foresees the promotion and drafting of Union-wide codes of good practice to facilitate proper implementation of the AI Regulation.

On 10 July, the final version of the first EU Code of Good Practices for general-purpose AI was released, promoted by the European Commission to bridge the gap between the legal provision and its practical implementation. This instrument—while voluntary and therefore not legally binding per se—provides a practical framework regarding the rules of the AI Regulation applicable to AI model providers, aiming to help them comply with the requirements of the AI Regulation, particularly in terms of transparency, safety, and respect for copyright in connection with general-purpose generative AI models. It thus enables major players such as OpenAI, Google, DeepMind, or Mistral to demonstrate compliance with the obligations imposed by the AI Regulation, especially concerning copyright.

In this regard, beyond high-risk systems or algorithmic transparency, this Code includes a crucial chapter that directly affects creators, tech companies, and intellectual property rights holders concerning the use of copyright-protected content to train AI models. And for good reason, as generative AI models are trained on vast volumes of data, much of which is protected by intellectual property rights. What happens when a model accesses, processes, or reproduces content without authorization? How should AI providers act to avoid infringing these rights?

In particular, Article 53.1(c) of the AI Act requires providers of general-purpose AI models to adopt a proactive compliance policy with the applicable European copyright legislation. Among other obligations, they must identify and respect any rights reservations expressly asserted by the content rights holders, where applicable.

This obligation is directly linked to Article 4.3 of Directive 2019/790 on copyright in the Digital Single Market, which provides that rights holders may exclude the use of their content for Text and Data Mining (TDM), provided they express their rights reservation appropriately, for example, through machine-readable mechanisms such as robots.txt, nonai metadata, noimageai, ai.txt, or similar technologies. It is important to note that this opt-out right is legally valid, but its effectiveness depends on technical standards that are not yet fully established across all sectors.

The transposition into Spanish law is still pending.

Although this Directive should have been transposed into Spanish law before June 2021, the current Spanish legal framework does not contain a specific provision that precisely reflects this requirement, neither in the consolidated Text of the Intellectual Property Law (TRLPI) nor in any other law. In other words, although Spain approved Royal Decree-Law 24/2021 to partially transpose the Directive, the TRLPI does not expressly integrate the possibility of objecting to Text and Data Mining (TDM) through rights reservations, leaving a significant regulatory gap. Consequently, and except for any broad interpretation by the courts, providers of AI models operating in Spain will need to rely directly on the aforementioned European legislation, which requires proactive compliance mechanisms and respect for rights reservations even in the absence of explicit transposition.

This is where the new Code of Good Practices comes into play, specifying how a policy respecting copyright should be implemented. Some of its key measures include: (a) refraining from accessing protected content that has been expressly reserved via technical mechanisms such as those mentioned above; (b) avoiding the extraction of content from websites judicially or administratively recognized as infringing; (c) implementing technical safeguards to minimize the risk that AI models generate outputs reproducing protected content without the required authorization; and (d) providing complaint mechanisms for rights holders, including the designation of a contact point and an effective system for processing claims.

Moreover, the Code states that these obligations must be fulfilled even when the provider operates through third parties (such as crawlers or APIs collecting data on the company’s behalf) and recommends active transparency regarding the content used to train the models.


Although the case law of the Court of Justice of the European Union has not yet directly addressed cases concerning generative AI, it has established fundamental principles on the right of reproduction (Infopaq, C‑5/08), communication to the public (Svensson, C‑466/12 and GS Media, C‑160/15), and the use of musical excerpts.


On a practical level, consider a provider training a language AI model using data extracted from scientific journal websites (web scraping), many of which are accessible only via paid subscription, without verifying whether these sites have expressed rights reservations through robots.txt or metadata. The AI model could reproduce entire paragraphs of protected articles, and if the rights holder detects such use and can prove the existence of a visible and technically enforceable rights reservation, the provider could incur liability for unauthorized reproduction, thereby infringing the exclusive rights of the relevant rights holder.

Another example could involve an AI model trained on musical scores of uncertain ownership that generates melodies substantially similar to works registered by a music publisher. Could the provider be held liable? In principle, yes, for infringing Article 18 of the Spanish Intellectual Property Law (TRLPI), particularly if no technical protocol was implemented to exclude such sources during the AI model’s training, or if the preventive measures required by Article 55 of the AI Regulation (RIA) were not properly documented.

Conversely, if a public university were to publish, for example, a thesis in open access without including technical rights reservations, in principle, a general-purpose AI model could be trained on it, as there would be no clear, machine-readable opt-out.

Given the above, it is worth highlighting that, although the Court of Justice of the European Union (CJEU) has not yet directly addressed cases involving generative AI, it has established key principles regarding the right of reproduction (Infopaq, C‑5/08), communication to the public (Svensson, C‑466/12 and GS Media, C‑160/15), and the use of musical excerpts (Pelham, C‑476/17), which could prove decisive once the first litigations concerning AI-generated outputs arise.

Therefore, adherence to and compliance with the Code (soft law self-regulation) may be key to demonstrating good faith and diligence on the part of the AI model provider. While following the Code does not guarantee full compliance with the AI Act (RIA), it serves as relevant evidence of best practices in the event of investigations by the AI Office or third‑party claims. In Spain, this aligns with the principle of proactive responsibility under the TRLPI and reinforces the idea that it is not enough to declare intentions; clear, transparent, and documented policies must be implemented.

In short, the new Code of Good Practices should not be seen as an additional burden, but rather as an opportunity for AI developers to operate with legal certainty. In an environment where creativity, innovation, and intellectual property rights coexist under increasing tension, having clear technical and legal tools is not only a regulatory obligation but also an ethical responsibility. Ultimately, for those developing or using AI in Spain, the AI Act–TRLPI duo is no longer an academic matter; it constitutes the real framework shaping the future of European digital culture.

Blanca de Planchard de Cussac Vegas-Latapie es abogada asociada de act legal Spain.

Publicado en Vozpópuli

Prejsť na
kancelárie

Prejsť na kancelárie