Skip to main content

Local LLM & AI Setup

CICADA IR uses large language models to summarise investigation findings, identify attack patterns, and draft reports. AI processing can run entirely on your network with any local LLM server — Ollama, LM Studio, llama.cpp server, litellm, or any OpenAI-compatible local proxy — or you can connect a cloud LLM (Anthropic Claude, OpenAI, Google Gemini). All configuration is done through the CICADA IR web interface.

Supported local LLM providers. Pick the one that best fits your environment — the integration is identical from CICADA's side. In Settings → LLM Provider you choose the provider, paste the endpoint URL, and CICADA probes for available models.
  • Ollama — default port 11434. Easiest CLI workflow (ollama pull). The example walkthrough on this page uses Ollama.
  • LM Studio — default port 1234. GUI for browsing and loading GGUF models on macOS / Windows / Linux.
  • llama.cpp server — default port 8080. Lowest overhead; ideal for headless GPU hosts.
  • litellm — OpenAI-compatible proxy in front of multiple backends. Useful if you already run litellm to manage cost and routing centrally.
  • Any OpenAI-compatible local proxy — if it speaks /v1/chat/completions on your network, it works.
Note: The CICADA IR VM is a hardened closed appliance. The local LLM server cannot be installed on the VM itself — it needs to run on a separate host on your network (or use a cloud LLM instead).

Overview

OptionPrivacySpeedQualityCost
Local LLM (Ollama / LM Studio / llama.cpp / litellm)All data stays on your networkDepends on host resources / GPUGood (model-dependent)Free (open-source models)
Cloud LLM (Anthropic, OpenAI, Gemini)Data sent to provider API (PII blocked unless allowed)FastExcellentPay-per-use

Option 1: Local LLM on your network (worked example: Ollama)

Run a local LLM server on a machine on the same network as the CICADA IR VM — a workstation, a dedicated server, or a spare VM. The walkthrough below uses Ollama (port 11434) because it has the shortest install path. The same flow applies to LM Studio (port 1234), llama.cpp server (port 8080), or litellm proxy — pick your provider in Settings → LLM Provider, paste the endpoint URL, and CICADA probes for available models. For the best performance, use a host with an NVIDIA GPU.

Step 1: Install Ollama on your host

# Linux / macOS — one-line installer
curl -fsSL https://ollama.com/install.sh | sh

# Windows — download the installer from https://ollama.com/download

Verify the installation:

ollama --version

Step 2: Pull a model

CICADA IR works with any Ollama-compatible model. We recommend starting with one of these:

ModelSizeRAM neededBest for
llama3.1:8b4.7 GB8 GBGood balance of speed and quality (recommended default)
mistral:7b4.1 GB8 GBFast, good for lighter hardware
gemma2:9b5.4 GB10 GBStrong reasoning for its size
qwen2.5:7b4.4 GB8 GBGood all-rounder, strong at structured output
llama3.1:70b40 GB48 GBBest local quality (needs a GPU with sufficient VRAM)
# Pull the recommended default
ollama pull llama3.1:8b

# Verify
ollama list

Step 3: Expose Ollama to the network

By default, Ollama only listens on localhost. To allow the CICADA IR VM to connect, bind it to all interfaces:

Linux (systemd)

sudo systemctl edit ollama

# Add these lines in the editor that opens:
[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"

# Save, then restart
sudo systemctl restart ollama

macOS

launchctl setenv OLLAMA_HOST "0.0.0.0:11434"
# Quit and reopen Ollama from the menu bar

Windows

Set a system environment variable OLLAMA_HOST to 0.0.0.0:11434via System Properties > Environment Variables, then restart Ollama.

Verify it's accessible from another machine on the network:

curl http://<ollama-host-ip>:11434/api/tags

Step 4: Configure in CICADA IR

  1. Open the CICADA IR web interface
  2. Navigate to Settings > AI Configuration
  3. Set Provider to Ollama
  4. Set Ollama URL to http://<ollama-host-ip>:11434
  5. Set Model to the model you pulled (e.g., llama3.1:8b)
  6. Click Test Connection to verify
  7. Click Save

The CICADA IR VM must be able to reach the Ollama host on TCP 11434. If the machines are on different networks or behind firewalls, open port 11434 between them (see Network Requirements).

GPU acceleration

If the Ollama host has an NVIDIA GPU, Ollama will detect and use it automatically. This typically reduces inference time from 30–120 seconds (CPU) to 2–10 seconds (GPU) for the 7–9 billion parameter models.

# Confirm the GPU is visible
nvidia-smi

# Ollama uses the GPU automatically
ollama run llama3.1:8b "test"

# Watch GPU usage during inference
watch -n 1 nvidia-smi

CPU-only inference works fine for the smaller models — it's just slower.

Model management

On the Ollama host:

# List installed models
ollama list

# Remove a model you no longer need
ollama rm mistral:7b

# Update a model to the latest version
ollama pull llama3.1:8b

Option 2: Cloud LLM

For the best analysis quality without managing local infrastructure, CICADA IR supports three cloud LLM providers (Professional and Enterprise tiers):

ProviderWhere to get an API keyAPI endpoint
Anthropic Claudeconsole.anthropic.comapi.anthropic.com
OpenAIplatform.openai.comapi.openai.com
Google Geminiaistudio.google.comgenerativelanguage.googleapis.com

To configure:

  1. Obtain an API key from one of the providers above
  2. In CICADA IR, navigate to Settings > AI Configuration
  3. Set Provider to Anthropic, OpenAI, or Google Gemini
  4. Paste your API key
  5. Select the model. Reasonable defaults:
    • Anthropic: claude-sonnet-4-6
    • OpenAI: gpt-4o or gpt-4.1
    • Google Gemini: gemini-1.5-pro or gemini-2.0-flash
  6. Click Test Connection and then Save

The VM needs outbound HTTPS access (port 443) to whichever provider endpoint you're using. See Network Requirements.


Choosing the right setup

  • Air-gapped / high-security environments: Run Ollama on a host inside your secure network. No data leaves your environment.
  • Spare hardware available: A workstation or server with an NVIDIA GPU running Ollama gives you fast local inference with full data sovereignty.
  • No spare hardware: Use a cloud LLM (Anthropic, OpenAI, or Gemini) — no local infrastructure needed, just an API key.
  • Best quality analysis: Use a frontier cloud model (Claude Sonnet, GPT-4o/4.1, or Gemini 1.5 Pro) for the most thorough investigation summaries and attack pattern identification.

Troubleshooting

IssueSolution
Test Connection fails for OllamaConfirm the URL uses the Ollama host's IP, not localhost (the VM can't reach its own localhost for this). Verify the host is listening with curl http://<host-ip>:11434/api/tags from any other machine on the network, and check there's no host firewall blocking port 11434.
AI analysis is very slowThe model may be too large for the available RAM, causing it to swap. Try a smaller model (e.g., mistral:7b) or add a GPU to the Ollama host.
"Model not found" errorThe model name in CICADA IR settings must exactly match what's installed on the Ollama host. Run ollama list there and copy the exact name into the settings.
Cloud LLM returns an errorVerify the API key in Settings > AI Configuration. Check that the VM has outbound access on port 443 to the provider endpoint (api.anthropic.com, api.openai.com, or generativelanguage.googleapis.com), and that your API key has sufficient credits in the provider's console.

Next steps