SuperScript

Blog

Man Page in Plain Text

Here is a command to print a man page in plain text: man -P cat page | col -b This uses cat as the pager, and strips backspace characters with col. Why would you want to produce plain-text output from man? So you can create embeddings from your man pages, of course! I am experime…

OCR via LLM

My stepfather has a box of newspaper clippings from long ago and wants to create digital copies. I asked him to send me a scan. It was an ordinary looking image. Three pairs of text columns, two pairs across the top, one below between the other two, reading in that order. It had …

A Useful Shell Pattern

If you work at the unix command line you probably execute pipelines of programs to test ideas. It’s part of the unix way. Sometimes you need a little more than a simple pipeline. A small shell script here or there is an easy lightweight way to provide a slighly larger pipeline co…

Split for Embedding

Different embedding models have different limits on input tokens. When you want to create embeddings for a large corpus, one major annoyance is splitting the content so that it fits within this limit. My recent work has centered around Simon Willison’s excellent llm tool. He also…

Checking Make Vars

If your make target recipes use make variables you may want to test that they are set before using them. This would look something like this: test -n '$(varname)' || { echo 'variable not set: varname'; exit 1; } Rather than sprinkling that sort of code all over yo…