Multiple Llamafiles With llm

In playing with llamafile under llm I initially used the llm-llamafile plugin. This plugin makes initial setup trivial. But soon I wanted to run multiple llamafile models in parallel, and the plugin doesn’t support that. In fact the plugin hardcodes the service at http://localhost:8080/v1.

Initially I thought a new plugin was needed, but some digging revealed a solution already in place, one that in fact obviates the need for llm-llamafile altogether: the extra-openai-models.yaml file.

It does appear in the documentation here and here. Because llamafile is an OpenAI-compatible model, one can add it in extra-openai-models.yaml. For example:

- model_id: llamafile-test
  model_name: llamafile-test
  api_base: "http://llamafile:8080/v1"
  api_key_name:

With this capability in place, there’s no point in using llm-llamafile. Here are several additional mentions of the extra-models feature:

It’s a great feature that deserves more prominence. Should every model family have an analogous file?