Man Page in Plain Text

Here is a command to print a man page in plain text:

man -P cat page | col -b

This uses cat as the pager, and strips backspace characters with col. Why would you want to produce plain-text output from man? So you can create embeddings from your man pages, of course!

I am experimenting with this on a FreeBSD system, and today converted 9k+ man pages to text, split them into suitable chunks for embedding, and calculated vectors. The ultimate destination is a PostgreSQL database, where I create a new data table for each set of embeddings and perform RAG queries.

Plain text is gaining prominence as an input format to LLMs. For those of us accustomed to work on the command line this is a very welcome development!