I just heard a teaser for a story on how PDFs have become ubiquitous, with the supposed downside that AIs have a lot of trouble reading a PDF. The implication was that was bad, but I thought “Awesome! I’m going to have to switch to PDFs for more of my output! Oh, and I think I’ll start using TeX to produce more of that output!”

If you’ve ever read the contents of a PDF file produced by TeX you’ll understand.

Block of unreadable text at the beginning of a PDF file produced by TeX

Update: Turns out it was a story in the Economist. Here’s a gift link to the story (should get the first few people who click on it past the paywall):

https://www.economist.com/business/2026/02/24/the-war-against-pdfs-is-heating-up?giftId=OTNkOGVmNTgtN2ZmMi00NjAzLWExMmQtMDg0NjU5YzM1ZTY2&utm_campaign=gifted_article

And here’s the money quote:

The large language models underpinning generative AI are often bamboozled by PDFs, reading a page set in columns from left to right rather than top to bottom, say, or getting confused by headers and footers. Trouble parsing PDFs is one of the reasons AI chatbots occasionally “hallucinate”, generating nonsense.

I mean, for values of “money” that are totally confused about why LLMs hallucinate.

Possibly related posts (auto-generated):

Leave a Reply

Your email address will not be published. Required fields are marked *

To respond on your own website, enter the URL of your response which should contain a link to this post's permalink URL. Your response will then appear (possibly after moderation) on this page. Want to update or remove your response? Update or delete your post and re-enter your post's URL again. (Find out more about Webmentions.)