NOS

A fast, flexible Inference Server

Run LLMs and multi-modal models cost-efficiently and scalably on any cloud or AI hardware with NOS, a fast and flexible multi-modal inference server built from the ground-up.