Companion

object Companion

Properties

Link copied to clipboard

Stores the last MAX_HISTORY_SIZE inference durations (in ms) per model-type key. Used by estimateWaitMs to compute average inference time per model type.

Link copied to clipboard

Shared semaphore that limits concurrent local AI inferences across all modules (LLAMA, Tesseract, Whisper). Configured via AI_LLAMA_ENGINE_MaxConcurrent (default 2).

Link copied to clipboard

Whether the queue-position badge is enabled globally. Configured via AI_QueueBadge.

Link copied to clipboard

Tracks every request that is waiting for or currently holding the inference semaphore. Streaming threads register before acquire; retry-based clients (sync LLAMA, Tesseract) register on the first failed tryAcquire. Tickets are removed when inference completes (in the finally block after release). The map value is the creation timestamp for waiting tickets, or Long.MAX_VALUE for running inferences (immune to stale cleanup).

Link copied to clipboard

Maps ticket UUID → model-type key (e.g. "llama-thinking", "llama-fast", "tesseract"). Registered alongside queueTickets so estimateWaitMs can look up which model types are ahead in the queue and calculate an approximate wait time.

Functions

Link copied to clipboard

Removes waiting tickets older than 30 s (abandoned clients). Active tickets (Long.MAX_VALUE) are not affected.

Link copied to clipboard
fun estimateWaitMs(excludeTicket: String?): Long?

Estimates the total wait time (in ms) for a ticket by summing the average inference duration of every ticket ahead of it in the queue. Only tickets whose model type has recorded history contribute to the estimate. Returns null if no estimate is possible (no history for any of the queued model types).

Link copied to clipboard
fun recordInferenceDuration(modelType: String, durationMs: Long)

Records the duration of a completed inference for a given model type.

Link copied to clipboard
fun updateMaxConcurrent(maxConcurrent: Int)

Replaces the shared inference semaphore with a new limit.