Data follows a easy rule: The extra it’s consumed, the extra it’s reshaped and shared. Human intelligence grows by absorbing data, combining it, and passing it ahead in new types. From oral tales to inscriptions, from letters to books, and from computer systems to algorithms, every stage of progress has relied on preserving what got here earlier than and remodeling it for what comes subsequent. Synthetic intelligence is constructed on the identical precept. Methods designed to ship solutions on all the things, in all places, without delay require entry to the widest and most dependable file of human thought.For hundreds of years, probably the most trusted and sturdy type of info has been the e-book. Books file humanity’s lengthy arc, from early instruments and survival to rockets reaching Mars, from handwritten letters to digital networks, from foraging for meals to ordering dinner in minutes. They protect concepts throughout generations. Inside Anthropic, planners considered books as a concentrated type of human data, formed by editors, authors, and time. They believed long-form texts might educate synthetic intelligence methods to purpose and write extra clearly than fragmented on-line content material.That perception led to an inner effort later referred to as Challenge Panama. Court docket filings unsealed in a copyright lawsuit reveal the way it labored. Anthropic purchased bodily books in bulk. The books have been reduce aside and scanned at excessive pace. As soon as digitised, the paper copies have been recycled. The objective was to quickly broaden the quantity of book-based information used to coach the corporate’s AI programs. The scope of the undertaking turned public after a report by The Washington Put up provided a uncommon glimpse into how aggressively AI corporations pursued high-quality textual content as competitors to construct extra succesful chatbots intensified.Inside paperwork present the corporate selected this strategy as an alternative of negotiating licences at scale. Executives argued that purchasing bodily copies and digitising them internally was sooner and extra sensible. The technique additionally mirrored the more and more fierce race to dominate synthetic intelligence, the place every advance can translate instantly into market share, funding, and income. In an business shifting at breakneck pace, new developments emerge virtually day by day, and entry to high-quality information has develop into one of the crucial useful property within the push to show AI functionality into industrial energy.
How Challenge Panama processed thousands and thousands of books
Vendor proposals and court docket data point out Anthropic sought scanning capability for 500,000 to 2 million books over roughly six months. Though the exact remaining quantity stays redacted, filings repeatedly describe the acquisition and destruction of thousands and thousands of volumes, acquired in batches of tens of hundreds. The undertaking concerned tens of thousands and thousands of {dollars} in spending on books, logistics, and scanning companies, underscoring how central books had develop into to AI coaching methods.As soon as bought, books have been despatched to industrial distributors outfitted for industrial doc processing. Hydraulic slicing machines eliminated the spines, permitting pages to be scanned on high-speed manufacturing tools. After digitisation, the paper copies have been scheduled for recycling. The method was deliberately irreversible, leaving no bodily archive behind. Preservation specialists word that this distinguishes Challenge Panama from earlier digitisation efforts, which generally retained authentic copies.Court docket data recommend Anthropic considered harmful scanning as a safer different to downloading giant pirated digital libraries. The strategy drew on classes from earlier mass digitisation efforts, together with Google Books, and was formed partly by Tom Turvey, who beforehand labored on that undertaking. In contrast to Google Books, nevertheless, Challenge Panama prioritised pace and exclusivity over public entry or preservation.A federal choose later dominated that coaching AI fashions on books can qualify as truthful use when the method is transformative. Nevertheless, the court docket additionally discovered that Anthropic’s earlier downloads of pirated books raised separate copyright issues, making clear that how coaching information is acquired stays legally important even when the coaching itself is permitted.
Authors’ response and settlement
Authors reacted strongly to the disclosures, arguing that AI corporations benefited from inventive work with out consent or compensation. Ed Newton-Rex, a former AI government, mentioned the case illustrated a rising imbalance between expertise corporations and creators whose work underpins trendy AI programs. He and others have argued that present copyright frameworks don’t adequately deal with large-scale machine studying.In 2025, Anthropic agreed to pay $1.5 billion to settle claims associated to its earlier use of pirated books, with out admitting wrongdoing. Below the settlement, authors whose works have been included can search compensation estimated at about $3,000 per title, although payouts differ. Anthropic has mentioned the settlement addressed acquisition practices somewhat than the legality of AI coaching itself.
A part of a wider business sample
Challenge Panama isn’t an remoted case. Court docket filings in different lawsuits present Meta workers debated downloading giant shadow libraries of books, whereas OpenAI has acknowledged downloading comparable datasets up to now earlier than deleting them. Google and Microsoft additionally face ongoing authorized challenges over AI coaching information.Authorized scholar James Grimmelmann has mentioned the business carried tutorial data-use norms right into a industrial arms race, solely confronting authorized dangers after huge investments had already been made. By then, he famous, corporations have been successfully locked into information pipelines that may be troublesome to unwind.The unsealed data surrounding Challenge Panama provide one of many clearest views but into how trendy AI programs are constructed. They present that behind consumer-facing chatbots lies an industrial pipeline involving giant capital outlays, authorized danger, and irreversible information extraction. The case has additionally sharpened debate over whether or not present copyright regulation is supplied to deal with machine studying at scale.As courts proceed to outline the boundaries of truthful use in AI coaching, Challenge Panama stands as a defining instance of the pressures shaping synthetic intelligence growth and the unresolved pressure between technological progress and the rights of creators.










