What does it mean that "ai companies have already ingested millions of expert works without consent — anthropic settled for $1.5 billion over 7 million scraped books"?

AI companies have already ingested millions of expert works without consent — Anthropic settled for $1.5 billion over 7 million scraped books

What does it mean that "52% of newly published articles are now ai-generated, collapsing the trust layer that makes expert content valuable"?

52% of newly published articles are now AI-generated, collapsing the trust layer that makes expert content valuable

What does it mean that "traditional publishing contracts offer zero protection against ai scraping, licensing, or ip monetization beyond book sales"?

Traditional publishing contracts offer zero protection against AI scraping, licensing, or IP monetization beyond book sales

What does it mean that "authentication at the point of creation — not after publication — is the only defensible strategy for protecting expert ip"?

Authentication at the point of creation — not after publication — is the only defensible strategy for protecting expert IP

What does it mean that "the book is not the product; it's the catalyst for a compounding ip portfolio of courses, licensed datasets, and advisory frameworks"?

The book is not the product; it's the catalyst for a compounding IP portfolio of courses, licensed datasets, and advisory frameworks

Back to The Edge

Thought Leadership

April 15, 2026· 18 min read

The Expert IP Crisis

By Dan Curran

TLDR — Key Takeaways

AI companies have already ingested millions of expert works without consent — Anthropic settled for $1.5 billion over 7 million scraped books
52% of newly published articles are now AI-generated, collapsing the trust layer that makes expert content valuable
Traditional publishing contracts offer zero protection against AI scraping, licensing, or IP monetization beyond book sales
Authentication at the point of creation — not after publication — is the only defensible strategy for protecting expert IP
The book is not the product; it's the catalyst for a compounding IP portfolio of courses, licensed datasets, and advisory frameworks

The publishing industry built its entire business model on a single assumption: that expert knowledge is valuable because it's scarce. That assumption just broke.

The Lawsuits That Changed Everything

Last September, Anthropic settled a copyright infringement case with a class of American authors and publishers for about $1.5 billion. The allegation was straightforward. The company had downloaded roughly seven million books to train its AI models. Not licensed. Not purchased. Downloaded. The books belonged to thousands of writers who never gave consent, never received attribution, and never saw a dollar.

Two months earlier, a federal judge in Northern California had ruled that Anthropic's use of those copyrighted works for AI training was "transformative, spectacularly so." The legal reasoning was fascinating: training an AI model on a book isn't the same as copying a book, the court argued, because the model doesn't reproduce the text. It learns from it. Like a student reading in a library.

Except students don't memorize seven million books in an afternoon and then compete with the authors for a living.

That case wasn't an anomaly. By the start of 2026, more than 100 copyright infringement lawsuits had been filed against AI companies in the United States. The New York Times is suing OpenAI and Microsoft over the use of its journalism in AI training and outputs. Disney, NBCUniversal, and DreamWorks sued Midjourney over alleged misuse of their films and characters. Universal Music Publishing Group just filed a lawsuit against Anthropic seeking over $3 billion in damages for infringement of musical works.

The courts are still sorting it out. But the expert who spent thirty years building a body of knowledge, a framework, a methodology, a practice? That person isn't waiting for a ruling. Their content is already inside the models. Their ideas are already being regurgitated without credit. And the value of their expertise is eroding in real time.

This is the Expert IP Crisis. And if you create original thinking for a living, it is already happening to you.

The Authenticity Apocalypse

Here's the number that should keep every publisher, speaker, and executive coach up at night: as of mid-2025, 52 percent of all newly published articles on the internet were AI-generated. Not AI-assisted. AI-generated. A study by the SEO firm Graphite analyzed 65,000 English-language articles and found that the tipping point arrived quietly, with almost no one noticing.

A separate analysis by Ahrefs put the number even higher. According to their study of 900,000 web pages published in April 2025, 74.2 percent contained AI-generated content. Only 2.5 percent were pure AI output. The rest were blends. Human ideas filtered through machine language. Or machine language dressed up to look human. Increasingly, it's hard to tell which.

Europol warned this was coming. Their 2022 report estimated that 90 percent of online content could be synthetically generated by 2026. That forecast seemed outlandish at the time. It doesn't anymore.

For the publishing industry, this isn't just a content quality problem. It's an existential one. The entire value proposition of nonfiction publishing rests on a single premise: that the knowledge between the covers is trustworthy because a real expert created it. When you can't distinguish expert-authored content from machine-generated content, that premise collapses. And with it, the economic logic of the entire industry.

A publisher sells authenticated expertise. Strip away that trust layer and a book is just another piece of content in a sea of content. No more valuable than a ChatGPT summary.

This is the authenticity apocalypse. The flood of AI-generated content isn't just competition. It's contamination. When everything looks credible and nothing is verifiable, the genuine expert gets buried alongside the generated noise.

Your Expertise Is Already Inside the Machine

If you've published a book, given a keynote, written a white paper, posted a LinkedIn article, recorded a podcast, or taught a course, some portion of your intellectual output is almost certainly inside an AI training dataset. You didn't agree to this. You weren't asked. And you weren't paid.

The mechanics are simple. AI companies need text to train language models. The more text, the better. The highest-quality text comes from experts: scientists, business leaders, consultants, coaches, academics, and authors. So the companies scrape it. From websites. From digital bookshelves. From public repositories. Sometimes from pirated databases.

The Anthropic case made this concrete. Court filings revealed that the company had trained its models using content from Library Genesis, a piracy site. Seven million books, including works by living authors who had spent years creating them. Anthropic argued that training is transformative. The authors argued that it's theft. The $1.5 billion settlement tells you who the market thought was right.

But the settlement only addresses one company and one dataset. The broader problem is structural. There is currently no standard mechanism for an expert to know whether their content was used to train an AI model. No notification system. No opt-in. No registry. The U.S. Copyright Office released a major report on AI training in May 2025 acknowledging this gap. The report was critical of arguments favoring unlicensed training. Days later, President Trump fired the director of the Copyright Office.

The legal framework is catching up. Slowly. The Generative AI Copyright Disclosure Act would require AI companies to reveal their training datasets. But it hasn't passed yet. In the meantime, every expert who has ever published anything is operating in the dark about how their work is being used.

When a keynote speaker sees ChatGPT parroting their signature framework with no attribution, that's not a glitch. That's the system working exactly as designed.

Why Traditional Publishing Can't Protect Your IP Anymore

The traditional publishing model was designed for a world where content was physical, distribution was controlled, and copying was expensive. None of those things are true anymore.

A traditionally published author typically earns between 7.5 and 15 percent of the retail price on a print book. For ebooks, the standard at major houses is roughly 25 percent of the publisher's net receipts, which works out to about 12.5 percent of what the reader actually pays. The publisher keeps the rest in exchange for editing, design, printing, distribution, and marketing.

This was a defensible arrangement when the book was the primary asset and the publisher controlled access to the market. But the book is no longer the primary asset. The expertise is the asset. The book is just one expression of it. And traditional publishers have no infrastructure for protecting or monetizing the broader IP.

Think about what a traditional publishing deal actually covers. It gives the publisher rights to produce and sell the book in specified formats and territories. It does not protect the author's frameworks from being scraped by AI. It does not create a provenance record for the author's ideas. It does not enable licensing of the author's content to AI training platforms on fair terms. It does not transform the manuscript into social content, course material, or advisory IP. And it does not generate ongoing revenue beyond book sales, which for most nonfiction titles peak within six to eight weeks of publication and then collapse.

The Authors Guild reported that the median annual income for full-time authors across all publishing routes was $25,000 in 2023. Only about $15,000 of that came directly from book sales. The book is not the product. The expertise is the product.

Traditional publishers aren't equipped for this reality because their entire business model is optimized for a single transaction: sell the book. Everything else — the speaking career, the consulting practice, the course revenue, the licensing deals — falls outside their scope. They hand you a book and wish you luck.

Meanwhile, the AI companies are treating your published work as raw material. Your publisher has no answer for this. They don't even have a question.

The Book Is the Beginning, Not the Product

Here's what most experts get wrong about publishing: they think the goal is to produce a book. It isn't. The goal is to create a structured, authenticated, monetizable body of intellectual property. The book is one output of that process. There should be dozens of others.

A single manuscript, properly structured, contains enough intellectual raw material to generate social content for a year. Speaking engagements for a decade. Course curriculum. Advisory frameworks. Licensable methodologies. AI training data that, if properly authenticated and priced, could generate revenue every time a model is retrained.

But most experts never get to that point, because the traditional manuscript process demands the one thing they don't have: time. Writing a book takes a year or more. Most high-value experts can't pause revenue-generating work to sit at a keyboard. This is why we built the Chapters Manuscript Accelerator — an interview-based process that captures an expert's thinking through conversation and delivers a professionally published manuscript in 90 days. The author never has to write a word. Their job is to know things. Our job is to turn that knowledge into structured IP.

This isn't theoretical. It's already happening in music. The major record labels fought AI companies in court, won settlements, and then negotiated licensing deals that give artists control over how their work is used. Universal Music's settlement with Udio includes opt-in licensing. Warner Music's deal with Suno requires a new model trained entirely on licensed content. Artists keep control. Artists get paid.

The publishing industry has not done this. Not because it can't. Because it hasn't been forced to yet.

The experts who figure this out first will have a compounding advantage. Not because they wrote a book, but because they built a system around their expertise.

This is the concept behind what we call the Chapters Living Library: a structured, searchable, living repository of an expert's entire body of knowledge. The manuscript comes from it. The social content comes from it. The licensing revenue comes from it. It begins on day one of the process, not at publication.

The book is the catalyst. What you build around it determines whether your expertise compounds or decays.

Authenticated Authorship: The Standard the Industry Doesn't Have Yet

In music, there's a system. ASCAP and BMI track who wrote what. Streaming platforms pay per play. Licensing agreements specify terms. The system isn't perfect, but it exists. If Spotify plays your song, you get paid. If an AI model trains on your song without permission, you have legal recourse and an evidentiary trail.

In publishing, there is nothing comparable. There is no universal registry of who created what content and when. No chain-of-custody system that tracks how an idea moved from an expert's mind to a manuscript to a published work. No authentication layer that proves, in a legally meaningful way, that a specific piece of intellectual property originated with a specific person at a specific point in time.

This is a problem that gets worse every day. As AI-generated content floods the market, the ability to prove that your work is authentically yours becomes not just valuable but essential. It's the difference between your expertise having market value and your expertise dissolving into the noise.

The U.S. Copyright Office has acknowledged this gap. Their January 2025 report on copyrightability confirmed that works created with AI assistance can still receive copyright protection, provided there is sufficient human authorship. But "sufficient human authorship" is a moving target. And without a provenance system that documents the creative process from the start, proving authorship after the fact becomes expensive, slow, and uncertain.

Authentication at the point of creation. Not after publication. Not in response to a lawsuit. At the very beginning.

When an expert first articulates an idea, that idea should be captured with metadata that establishes who created it, when, and in what context. Every subsequent expression of that idea — the manuscript, the keynote, the social post, the licensed dataset — should carry that provenance forward. This is what we built Chapters Proof-of-Creation to do: establish line-level chain of custody at the origin of an idea, so that authorship is documented before anyone has a reason to dispute it.

This isn't a technology problem. The technology exists. Timestamped records, cryptographic hashes, structured metadata, chain-of-custody logs. These tools are already standard in financial services, legal discovery, and supply chain management. They've just never been applied to intellectual property in publishing.

The question is who builds it. Not whether it's possible.

What Comes Next

The Expert IP Crisis is not going away. If anything, it will accelerate. AI models are getting better, which means they need more training data. The most valuable training data comes from experts. And the experts currently have no systematic way to protect, license, or monetize that data on their own terms.

Here's what I believe will happen over the next two to three years:

The lawsuits will force licensing frameworks. The music industry proved the model. Sue first, then negotiate. Publishing is following the same playbook, just two years behind. The settlements already emerging will establish precedents for how expert content gets licensed to AI companies.

Authentication will become a differentiator. Right now, authenticated authorship is a niche concept. Within two years, it will be a standard expectation. Experts who can prove the provenance of their ideas will command premium prices. Those who can't will compete with machines.

The book will become a starting point, not an endpoint. The economics of single-product publishing are already broken. The future belongs to platforms that turn one manuscript into a compounding IP portfolio: books, courses, licensed datasets, social content, advisory frameworks.

Publishers who don't adapt will lose their best authors. The most valuable experts — the ones whose ideas are worth protecting — will migrate to platforms that offer protection, distribution, and monetization beyond the book. Publishers who can only offer a print run and a royalty check will be left with the authors nobody is trying to scrape.

The crisis is here. The question for every expert, every publisher, and every organization that trades in human knowledge is straightforward: are you building infrastructure to protect and compound your IP, or are you waiting for someone else to solve it?

Because the AI companies aren't waiting. They already have your content. The only question is what you do about it.

Ready to turn your expertise into lasting impact?

Get Started