sanitizeQuery

fun sanitizeQuery(raw: String, language: String? = null, filterOverride: Boolean? = null): String

Sanitizes a search query by removing personally identifiable information (PII) and identifiers that should not be forwarded to an external search engine. Serves as a second layer of defense in addition to possible instructions in the prompt to the model to avoid including sensitive information in the query.

Strips:

  • Serial numbers (S/N..., SN:..., s/n ...)

  • Case references / Aktenzeichen (Az., Az:, Aktenzeichen)

  • Generic alphanumeric IDs that look like codes (6+ chars with mixed letters/digits/dashes)

  • "unless / except / not" clauses that typically reference specific people

  • Email addresses

  • Phone numbers (international and local formats)

  • IBAN, SSN-style national ID numbers

  • Date-of-birth patterns (DOB:, born on, Geb.)

  • Street addresses (house number + street name, EN/DE)

  • Trailing noise (whitespace, commas, dots)

Words wrapped in << WORD >> bypass all sieve rules and are kept verbatim.

Return

The santiazed raw-String.

Parameters

raw

The raw query string to sanitize.

language

Optional language code (e.g. "en", "de") to help with language-specific patterns (e.g. name detection). If null, only generic patterns are applied

filterOverride

Per-request override for filterResults. When non-null, takes precedence over the global filterResults flag.