TL;DR

Most users try to reduce their AI bill by switching models.

GPT or Claude?

Pro version or API?

Smaller or larger context?

Yet in many cases, the real problem isn't the model.

The problem is what you send to it.

PDF, DOCX, PowerPoint, screenshots, images... all these formats were designed for humans. Not for LLMs.

Result:

more tokens consumed;
more context wasted;
higher costs;
sometimes even less relevant responses.

The best optimization is often not changing the model.

It's preparing documents better before sending them.

🚨 The Hidden Token Tax

When you send a document to an LLM, it doesn't "read" the file like you do.

Before even analyzing the content, it must understand:

where the text begins;
where the text ends;
which elements are headings;
which elements are paragraphs;
which areas represent a table;
which parts are decorative;
in what order the elements should be read.

In other words, some of your tokens are used solely to reconstruct the document's structure.

So you're paying for noise.

📄 Why PDFs Are Particularly Inefficient

PDF is probably the world's most popular document format.

But it was designed for printing.

Not for artificial intelligence.

A PDF mainly describes:

positions;
coordinates;
graphic blocks;
layout elements.

For a human, this is perfect.

For a model, it often requires preliminary reconstruction work:

determining the reading order;
reconstructing tables;
identifying headings;
understanding columns;
linking captions to images.

Some of the context is consumed before the business content is even analyzed.

📝 DOCX Files Aren't Much Better

Many think DOCX is naturally suited for LLMs.

That's not really the case.

A DOCX file contains:

text;
styles;
metadata;
layout information;
XML representation elements.

Even if the cost is generally lower than that of a complex PDF, a significant portion of the data sent doesn't directly contribute to understanding the document's business content.

🖼️ Images Are Often the Worst Case Scenario

The situation becomes even more costly when sending:

screenshots;
photos;
scanned documents;
slides exported as images.

Before understanding the content, the model must:

detect text;
perform OCR;
identify important areas;
reconstruct the logical structure.

You're asking the model to solve a vision problem before even addressing the business problem.

It's like photographing a sheet of paper and then asking someone to copy it before reading it.

📊 Not All Formats Are Equal

If the goal is to efficiently transmit information to an LLM, some formats are much better suited than others.

Format	Useful Signal	Structural Noise	Token Impact
Markdown	Very High	Low	Excellent
TXT	High	Very Low	Excellent
DOCX	Medium	Medium	Fair
Native PDF	Medium	High	Poor
Scanned PDF	Low	Very High	Very Poor
Image	Low	Extremely High	Catastrophic

This table summarizes a simple principle:

The closer a format is to structured plain text, the more efficient it is for an LLM.

💰 The False Debate: Changing Models

Most discussions revolve around questions like:

Which model consumes the least?

Which provider is the cheapest?

Which subscription offers the most context?

These questions matter.

But they often come too early.

Because before optimizing the engine, you need to optimize the fuel.

Reducing a model's cost by 30% is interesting.

Reducing the tokens sent with each request by 30 to 40% is often much more cost-effective.

And it generally improves response quality.

🚀 A Smarter Approach: Preparing Documents for LLMs

For a few months now, a new category of tools has begun to emerge.

Their goal isn't to create a new model.

Their goal is to better represent documents for existing models.

This is exactly the philosophy behind the DocLang project:

https://github.com/doclang-project/doclang

DocLang is an open-source project supported by the Linux Foundation AI & Data that starts with a simple observation:

PDF was designed for printing;
DOCX for editing;
HTML for display;
none of these formats were designed for LLMs.

DocLang's goal is therefore to provide an intermediate representation better suited to artificial intelligence.

Specifically, DocLang preserves:

document hierarchy;
headings;
sections;
tables;
metadata;
logical relationships between elements.

While removing much of the noise related to visual presentation.

The result:

more compact documents;
fewer tokens consumed;
better content understanding;
more effective context for AI agents and RAG pipelines.

According to figures published by the project, this approach significantly reduces the volume of tokens needed to represent a document while preserving its semantic structure.

🎯 The Real AI Challenge in 2026

For a long time, we optimized infrastructure.

Then we optimized models.

The next step is to optimize data.

In many AI projects, the biggest waste doesn't come from the model used.

It comes from how documents are prepared before ingestion.

Before searching for the next miracle model:

look at your prompts;
look at your workflows;
look at the files you send.

Because the easiest way to reduce your AI bill is often not to change models.

It's to stop sending it noise.

Why You Should Almost Never Send a PDF, DOCX, or Image to ChatGPT or Claude

TL;DR

🚨 The Hidden Token Tax

📄 Why PDFs Are Particularly Inefficient

📝 DOCX Files Aren't Much Better

🖼️ Images Are Often the Worst Case Scenario

📊 Not All Formats Are Equal

💰 The False Debate: Changing Models

🚀 A Smarter Approach: Preparing Documents for LLMs

🎯 The Real AI Challenge in 2026

Nicolas Dabène

Follow this blog

Follow my AI and e-commerce analysis

Read next

Countdown to PrestaShop Developer Conference 2025: What to Expect!

Anthropic Launches Claude Security: The AI That Detects Vulnerabilities in Your Code