Features of the LLM-VM

  • Implicit Agents - The Anarchy LLM-VM can be set up to use external tools through our agents such as REBEL just by supplying tool descriptions!

  • Inference Optimization - The Anarchy LLM-VM is optimized from agent level all the way to assembly on known LLM architectures to get the most bang for your buck. With state of the art batching, sparse inference and quantization, distillation, and multi-level colocation, we aim to provide the fastest framework available.

  • Task Auto-Optimization - The Anarchy LLM-VM will analyze your use cases for repetitive tasks where it can activate student-teacher distillation to train a super-efficient small model from a larger more general model without loosing accuracy. It can furthermore take advantage of data-synthesis techniques to improve results.

  • Library Callable - Our library can be used from any python codebase directly.

  • HTTP Endpoint - We’ve provided an HTTP standalone server to handle completion requests via a convenient API.

Why use our LLM-VM?

  • Speed up development - With Anarchy, one interface is all you need to interact with the latest LLMs available.

  • Lower costs - Running models locally can reduce the pay-as-you-go costs of development and testing.

  • Flexibility - Anarchy allows you to rapidly switch between popular models so you can pinpoint the exact right tool for your project.

  • Community - Join our active community of highly motivated developers and engineers working passionately to democratize AGI


The LLM-VM wants to centralize and optimize the functionalities of modern completion endpoints in an opinionated way, allowing for the efficient batching of calls that might otherwise be extremely costly across multiple endpoints

We want to make the LLM-VM model and architecture agnostic. We want to create a backend that gives you an optimized solution regardless of which model you choose, and which architecture and hardware solution you choose to run it on