Multiple Llamafiles With llm
In playing with llamafile
under llm
I initially used the llm-llamafile
plugin. This plugin makes initial setup trivial. But soon I wanted to run multiple llamafile
models in parallel, and the plugin doesn’t support that. In fact the plugin hardcodes the service at http://localhost:8080/v1
.
Initially I thought a new plugin was needed, but some digging revealed a solution already in place, one that in fact obviates the need for llm-llamafile
altogether: the extra-openai-models.yaml
file.
It does appear in the documentation here and here. Because llamafile
is an OpenAI-compatible model, one can add it in extra-openai-models.yaml
. For example:
- model_id: llamafile-test
model_name: llamafile-test
api_base: "http://llamafile:8080/v1"
api_key_name:
With this capability in place, there’s no point in using llm-llamafile
. Here are several additional mentions of the extra-models feature:
It’s a great feature that deserves more prominence. Should every model family have an analogous file?