LLAMA
Base class for AI models served via a local LLAMA server process.
Phases
Intelligence — Detect OS (
os.name) and architecture (os.arch)Fetch — Download the correct LLAMA-Server binary + GGUF model (with resume support)
Ignition — Launch the server via ProcessBuilder on a local port
Request — All inference goes through
http://127.0.0.1:{port}/v1/chat/completions.
Crash isolation
Because the AI runs in a separate OS process, if the model runs out of RAM, the OS kills the LLAMA-Server process but the Formcycle Tomcat JVM does not even feel a bump.
Plugin properties
| Property | Default | Description |
|---|---|---|
Active_AI | — | Must contain llama_engine |
AI_Remove | — | If contains llama_engine, clean up all |
AI_LLAMA_ENGINE_Port | 8392 | Local port for LLAMA-Server |
AI_LLAMA_ENGINE_Threads | physical cores | Number of CPU threads |
AI_LLAMA_ENGINE_CtxSize | 32768 | Context window size (shared across parallel slots) |
AI_LLAMA_ENGINE_GpuLayers | auto-detect | Layers offloaded to GPU (-1 = auto) |
AI_LLAMA_ENGINE_Release | b8175 | llama.cpp release tag for downloads |
AI_LLAMA_ENGINE_ServerArgs | — | Extra CLI args for LLAMA-Server |
AI_LLAMA_ENGINE_MaxConcurrent | 2 | Maximum concurrent inferences allowed across all local servers |
AI_LLAMA_ENGINE_Parallel | 4 | DEPRECATED Number of parallel inference slots per server (use MaxConcurrent instead) |
Domains to whitelist
github.com — llama-server binary releases
objects.githubusercontent.com — GitHub release asset CDN
DSGVO / EU-AI Act
All data stays on the local machine.
No external API calls.
Same compliance advantages as all other CodBi AI implementations.
Inheritors
Functions
Initializes the LLAMA infrastructure: creates directories, reads plugin properties. Subclasses should call super.initialize(configData) then proceed with downloading and starting the server.
Initiates a task that removes unused images that're expired (msExpirationIDedImages) from the cache (cacheIDedImages).
Rejects tenant-level installation. CodBi must be installed as a system plugin because its AI services (Whisper, LLAMA) bind local server ports and manage heavyweight processes that would conflict when instantiated once per tenant.