Multiple Llamafiles With llm
In playing with llamafile under llm I initially used the llm-llamafile plugin. This plugin makes initial setup trivial. But soon I wanted to run multiple llamafile models in parallel, and the plugin doesn’t support that. In fact the plugin hardcodes the service at http://localhost:8080/v1.
Initially I thought a new plugin was needed, but some digging revealed a solution already in place, one that in fact obviates the need for llm-llamafile altogether: the extra-openai-models.yaml file.
It does appear in the documentation here and here. Because llamafile is an OpenAI-compatible model, one can add it in extra-openai-models.yaml. For example:
- model_id: llamafile-test
model_name: llamafile-test
api_base: "http://llamafile:8080/v1"
api_key_name:
With this capability in place, there’s no point in using llm-llamafile. Here are several additional mentions of the extra-models feature:
It’s a great feature that deserves more prominence. Should every model family have an analogous file?