OCR via LLM

2024-11-17

1 min read til ocr llm

My stepfather has a box of newspaper clippings from long ago and wants to create digital copies. I asked him to send me a scan. It was an ordinary looking image. Three pairs of text columns, two pairs across the top, one below between the other two, reading in that order. It had some background flecks throughout the image, but was fully legible.

I attached it to a request to Claude with the instruction “Extract as much text as you can from the attached image.” It extracted the full text, including a bullet list in the article, in the correct reading order.

This surprised me. I know little about OCR, but did not expect a general tool to do such a good job. What will this do to businesses built around more traditional approaches to OCR? Do they all need to convert, or at least incorporate LLMs into their operations?

No doubt many more such surprises are coming, and the disruptions they cause will be fun to watch!