Local LLM & AI Setup
CICADA IR uses large language models to summarise investigation findings, identify attack patterns, and draft reports. AI processing can run entirely on your network with any local LLM server — Ollama, LM Studio, llama.cpp server, litellm, or any OpenAI-compatible local proxy — or you can connect a cloud LLM (Anthropic Claude, OpenAI, Google Gemini). All configuration is done through the CICADA IR web interface.
- Ollama — default port
11434. Easiest CLI workflow (ollama pull). The example walkthrough on this page uses Ollama. - LM Studio — default port
1234. GUI for browsing and loading GGUF models on macOS / Windows / Linux. - llama.cpp server — default port
8080. Lowest overhead; ideal for headless GPU hosts. - litellm — OpenAI-compatible proxy in front of multiple backends. Useful if you already run litellm to manage cost and routing centrally.
- Any OpenAI-compatible local proxy — if it speaks
/v1/chat/completionson your network, it works.
Overview
| Option | Privacy | Speed | Quality | Cost |
|---|---|---|---|---|
| Local LLM (Ollama / LM Studio / llama.cpp / litellm) | All data stays on your network | Depends on host resources / GPU | Good (model-dependent) | Free (open-source models) |
| Cloud LLM (Anthropic, OpenAI, Gemini) | Data sent to provider API (PII blocked unless allowed) | Fast | Excellent | Pay-per-use |
Option 1: Local LLM on your network (worked example: Ollama)
Run a local LLM server on a machine on the same network as the CICADA IR VM — a workstation, a dedicated server, or a spare VM. The walkthrough below uses Ollama (port 11434) because it has the shortest install path. The same flow applies to LM Studio (port 1234), llama.cpp server (port 8080), or litellm proxy — pick your provider in Settings → LLM Provider, paste the endpoint URL, and CICADA probes for available models. For the best performance, use a host with an NVIDIA GPU.
Step 1: Install Ollama on your host
# Linux / macOS — one-line installer
curl -fsSL https://ollama.com/install.sh | sh
# Windows — download the installer from https://ollama.com/downloadVerify the installation:
ollama --versionStep 2: Pull a model
CICADA IR works with any Ollama-compatible model. We recommend starting with one of these:
| Model | Size | RAM needed | Best for |
|---|---|---|---|
llama3.1:8b | 4.7 GB | 8 GB | Good balance of speed and quality (recommended default) |
mistral:7b | 4.1 GB | 8 GB | Fast, good for lighter hardware |
gemma2:9b | 5.4 GB | 10 GB | Strong reasoning for its size |
qwen2.5:7b | 4.4 GB | 8 GB | Good all-rounder, strong at structured output |
llama3.1:70b | 40 GB | 48 GB | Best local quality (needs a GPU with sufficient VRAM) |
# Pull the recommended default
ollama pull llama3.1:8b
# Verify
ollama listStep 3: Expose Ollama to the network
By default, Ollama only listens on localhost. To allow the CICADA IR VM to connect, bind it to all interfaces:
Linux (systemd)
sudo systemctl edit ollama
# Add these lines in the editor that opens:
[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
# Save, then restart
sudo systemctl restart ollamamacOS
launchctl setenv OLLAMA_HOST "0.0.0.0:11434"
# Quit and reopen Ollama from the menu barWindows
Set a system environment variable OLLAMA_HOST to 0.0.0.0:11434via System Properties > Environment Variables, then restart Ollama.
Verify it's accessible from another machine on the network:
curl http://<ollama-host-ip>:11434/api/tagsStep 4: Configure in CICADA IR
- Open the CICADA IR web interface
- Navigate to Settings > AI Configuration
- Set Provider to Ollama
- Set Ollama URL to
http://<ollama-host-ip>:11434 - Set Model to the model you pulled (e.g.,
llama3.1:8b) - Click Test Connection to verify
- Click Save
The CICADA IR VM must be able to reach the Ollama host on TCP 11434. If the machines are on different networks or behind firewalls, open port 11434 between them (see Network Requirements).
GPU acceleration
If the Ollama host has an NVIDIA GPU, Ollama will detect and use it automatically. This typically reduces inference time from 30–120 seconds (CPU) to 2–10 seconds (GPU) for the 7–9 billion parameter models.
# Confirm the GPU is visible
nvidia-smi
# Ollama uses the GPU automatically
ollama run llama3.1:8b "test"
# Watch GPU usage during inference
watch -n 1 nvidia-smiCPU-only inference works fine for the smaller models — it's just slower.
Model management
On the Ollama host:
# List installed models
ollama list
# Remove a model you no longer need
ollama rm mistral:7b
# Update a model to the latest version
ollama pull llama3.1:8bOption 2: Cloud LLM
For the best analysis quality without managing local infrastructure, CICADA IR supports three cloud LLM providers (Professional and Enterprise tiers):
| Provider | Where to get an API key | API endpoint |
|---|---|---|
| Anthropic Claude | console.anthropic.com | api.anthropic.com |
| OpenAI | platform.openai.com | api.openai.com |
| Google Gemini | aistudio.google.com | generativelanguage.googleapis.com |
To configure:
- Obtain an API key from one of the providers above
- In CICADA IR, navigate to Settings > AI Configuration
- Set Provider to Anthropic, OpenAI, or Google Gemini
- Paste your API key
- Select the model. Reasonable defaults:
- Anthropic:
claude-sonnet-4-6 - OpenAI:
gpt-4oorgpt-4.1 - Google Gemini:
gemini-1.5-proorgemini-2.0-flash
- Anthropic:
- Click Test Connection and then Save
The VM needs outbound HTTPS access (port 443) to whichever provider endpoint you're using. See Network Requirements.
Choosing the right setup
- Air-gapped / high-security environments: Run Ollama on a host inside your secure network. No data leaves your environment.
- Spare hardware available: A workstation or server with an NVIDIA GPU running Ollama gives you fast local inference with full data sovereignty.
- No spare hardware: Use a cloud LLM (Anthropic, OpenAI, or Gemini) — no local infrastructure needed, just an API key.
- Best quality analysis: Use a frontier cloud model (Claude Sonnet, GPT-4o/4.1, or Gemini 1.5 Pro) for the most thorough investigation summaries and attack pattern identification.
Troubleshooting
| Issue | Solution |
|---|---|
| Test Connection fails for Ollama | Confirm the URL uses the Ollama host's IP, not localhost (the VM can't reach its own localhost for this). Verify the host is listening with curl http://<host-ip>:11434/api/tags from any other machine on the network, and check there's no host firewall blocking port 11434. |
| AI analysis is very slow | The model may be too large for the available RAM, causing it to swap. Try a smaller model (e.g., mistral:7b) or add a GPU to the Ollama host. |
| "Model not found" error | The model name in CICADA IR settings must exactly match what's installed on the Ollama host. Run ollama list there and copy the exact name into the settings. |
| Cloud LLM returns an error | Verify the API key in Settings > AI Configuration. Check that the VM has outbound access on port 443 to the provider endpoint (api.anthropic.com, api.openai.com, or generativelanguage.googleapis.com), and that your API key has sufficient credits in the provider's console. |
Next steps
- Getting Started — Create your first investigation
- Network Requirements — Firewall rules for AI providers
- Troubleshooting — General troubleshooting