Is pentagi free and open source?

Yes. pentagi is open source, available under the MIT license.

Is pentagi actively maintained?

pentagi is actively maintained — the latest commit landed within the last month. Its health score is 100/100.

How popular is pentagi?

pentagi has 17.8k GitHub stars.

pentagi

Go MIT★ 17,829

100

Health 100/100

Baseline50
Commit recency+25
Popularity+20
Release cadence+10
Open-issue load+5

pentagi is an open-source Go project with 17.7k GitHub stars. It is associated with ai-agents, ai-security-tool, anthropic, autonomous-agents. The repository is actively maintained, with recent commits.

Raw .md

Embed badge

[![health](https://opensources.dev/resource/pentagi/badge.svg)](https://opensources.dev/resource/pentagi)

MITPermissive — free to use in commercial and proprietary software, with attribution.View license →

Production readiness

4/5

Actively maintainedCommits in the last 6 months
No known vulnerabilitiesNot yet scanned
Clear, usable licenseMIT (permissive)
Proven adoptionWidely used
Has documentationDocumentation indexed

Install

go get pentagi

MaintenanceActively maintainedIs pentagi still maintained? →

Documentation from pentagi, shown under MIT with attribution. Source: https://pentagi.com

PentAGI

Join the Community! Connect with security researchers, AI enthusiasts, and fellow ethical hackers. Get support, share insights, and stay updated with the latest PentAGI developments.

⠀

Overview
Features
Architecture
- Agent Supervision
Quick Start
How to Use PentAGI After Login
API Access
- LLM Provider Configuration
Advanced Setup
Development
Testing LLM Agents
Embedding Configuration and Testing
Function Testing with ftester
Building
Credits
License

Overview

PentAGI is an innovative tool for automated security testing that leverages cutting-edge artificial intelligence technologies. The project is designed for information security professionals, researchers, and enthusiasts who need a powerful and flexible solution for conducting penetration tests.

You can watch the video PentAGI overview:

Features

Secure & Isolated. All operations are performed in a sandboxed Docker environment with complete isolation.
Fully Autonomous. AI-powered agent that automatically determines and executes penetration testing steps with optional execution monitoring and intelligent task planning for enhanced reliability.
Professional Pentesting Tools. Built-in suite of 20+ professional security tools including nmap, metasploit, sqlmap, and more.
Smart Memory System. Long-term storage of research results and successful approaches for future use.
Knowledge Graph Integration. Graphiti-powered knowledge graph using Neo4j for semantic relationship tracking and advanced context understanding.
Web Intelligence. Built-in browser via scraper for gathering latest information from web sources.
External Search Systems. Integration with advanced search APIs including Tavily, Traversaal, Perplexity, DuckDuckGo, Google Custom Search, Sploitus Search and Searxng for comprehensive information gathering.
Team of Specialists. Delegation system with specialized AI agents for research, development, and infrastructure tasks, enhanced with optional execution monitoring and intelligent task planning for optimal performance with smaller models.
Comprehensive Monitoring. Detailed logging and integration with Grafana/Prometheus for real-time system observation.
Detailed Reporting. Generation of thorough vulnerability reports with exploitation guides.
Smart Container Management. Automatic Docker image selection based on specific task requirements.
Modern Interface. Clean and intuitive web UI for system management and monitoring.
Comprehensive APIs. Full-featured REST and GraphQL APIs with Bearer token authentication for automation and integration.
Persistent Storage. All commands and outputs are stored in PostgreSQL with pgvector extension.
Scalable Architecture. Microservices-based design supporting horizontal scaling.
Self-Hosted Solution. Complete control over your deployment and data.
Flexible Authentication. Support for 10+ LLM providers (OpenAI, Anthropic, Google AI/Gemini, AWS Bedrock, Ollama, DeepSeek, GLM, Kimi, Qwen, Custom) plus aggregators (OpenRouter, DeepInfra). For production local deployments, see our vLLM + Qwen3.5-27B-FP8 guide.
API Token Authentication. Secure Bearer token system for programmatic access to REST and GraphQL APIs.
Quick Deployment. Easy setup through Docker Compose with comprehensive environment configuration.

Current Capability Boundaries

PentAGI today is an autonomous and assistant-guided penetration testing platform, not a CALDERA-style Breach and Attack Simulation (BAS) or adversary emulation product with predefined campaigns or attack plans.
BAS-like agent-authored attack scripts should be treated as conceptual or future work, not as a feature that is implemented today.
The current flow report UI supports web view, copy to clipboard, Markdown download, and PDF download. JSON flow-report export is not documented as a supported output format today.
Provider flexibility is available today through built-in providers and custom/OpenAI-compatible endpoints. See Custom LLM Provider Configuration and the vLLM + Qwen3.5-27B-FP8 guide.

Architecture

System Context

flowchart TB
    classDef person fill:#08427B,stroke:#073B6F,color:#fff
    classDef system fill:#1168BD,stroke:#0B4884,color:#fff
    classDef external fill:#666666,stroke:#0B4884,color:#fff

    pentester["👤 Security Engineer
    (User of the system)"]

    pentagi["✨ PentAGI
    (Autonomous penetration testing system)"]

    target["🎯 target-system
    (System under test)"]
    llm["🧠 llm-provider
    (OpenAI/Anthropic/Ollama/Bedrock/Gemini/Custom)"]
    search["🔍 search-systems
    (Google/DuckDuckGo/Tavily/Traversaal/Perplexity/Sploitus/Searxng)"]
    langfuse["📊 langfuse-ui
    (LLM Observability Dashboard)"]
    grafana["📈 grafana
    (System Monitoring Dashboard)"]

    pentester --> |Uses HTTPS| pentagi
    pentester --> |Monitors AI HTTPS| langfuse
    pentester --> |Monitors System HTTPS| grafana
    pentagi --> |Tests Various protocols| target
    pentagi --> |Queries HTTPS| llm
    pentagi --> |Searches HTTPS| search
    pentagi --> |Reports HTTPS| langfuse
    pentagi --> |Reports HTTPS| grafana

    class pentester person
    class pentagi system
    class target,llm,search,langfuse,grafana external

    linkStyle default stroke:#ffffff,color:#ffffff

graph TB
    subgraph Core Services
        UI[Frontend UI<br/>React + TypeScript]
        API[Backend API<br/>Go + GraphQL]
        DB[(Vector Store<br/>PostgreSQL + pgvector)]
        MQ[Task Queue<br/>Async Processing]
        Agent[AI Agents<br/>Multi-Agent System]
    end

    subgraph Knowledge Graph
        Graphiti[Graphiti<br/>Knowledge Graph API]
        Neo4j[(Neo4j<br/>Graph Database)]
    end

    subgraph Monitoring
        Grafana[Grafana<br/>Dashboards]
        VictoriaMetrics[VictoriaMetrics<br/>Time-series DB]
        Jaeger[Jaeger<br/>Distributed Tracing]
        Loki[Loki<br/>Log Aggregation]
        OTEL[OpenTelemetry<br/>Data Collection]
    end

    subgraph Analytics
        Langfuse[Langfuse<br/>LLM Analytics]
        ClickHouse[ClickHouse<br/>Analytics DB]
        Redis[Redis<br/>Cache + Rate Limiter]
        MinIO[MinIO<br/>S3 Storage]
    end

    subgraph Security Tools
        Scraper[Web Scraper<br/>Isolated Browser]
        PenTest[Security Tools<br/>20+ Pro Tools<br/>Sandboxed Execution]
    end

    UI --> |HTTP/WS| API
    API --> |SQL| DB
    API --> |Events| MQ
    MQ --> |Tasks| Agent
    Agent --> |Commands| PenTest
    Agent --> |Queries| DB
    Agent --> |Knowledge| Graphiti
    Graphiti --> |Graph| Neo4j

    API --> |Telemetry| OTEL
    OTEL --> |Metrics| VictoriaMetrics
    OTEL --> |Traces| Jaeger
    OTEL --> |Logs| Loki

    Grafana --> |Query| VictoriaMetrics
    Grafana --> |Query| Jaeger
    Grafana --> |Query| Loki

    API --> |Analytics| Langfuse
    Langfuse --> |Store| ClickHouse
    Langfuse --> |Cache| Redis
    Langfuse --> |Files| MinIO

    classDef core fill:#f9f,stroke:#333,stroke-width:2px,color:#000
    classDef knowledge fill:#ffa,stroke:#333,stroke-width:2px,color:#000
    classDef monitoring fill:#bbf,stroke:#333,stroke-width:2px,color:#000
    classDef analytics fill:#bfb,stroke:#333,stroke-width:2px,color:#000
    classDef tools fill:#fbb,stroke:#333,stroke-width:2px,color:#000

    class UI,API,DB,MQ,Agent core
    class Graphiti,Neo4j knowledge
    class Grafana,VictoriaMetrics,Jaeger,Loki,OTEL monitoring
    class Langfuse,ClickHouse,Redis,MinIO analytics
    class Scraper,PenTest tools

erDiagram
    Flow ||--o{ Task : contains
    Task ||--o{ SubTask : contains
    SubTask ||--o{ Action : contains
    Action ||--o{ Artifact : produces
    Action ||--o{ Memory : stores

    Flow {
        string id PK
        string name "Flow name"
        string description "Flow description"
        string status "active/completed/failed"
        json parameters "Flow parameters"
        timestamp created_at
        timestamp updated_at
    }

    Task {
        string id PK
        string flow_id FK
        string name "Task name"
        string description "Task description"
        string status "pending/running/done/failed"
        json result "Task results"
        timestamp created_at
        timestamp updated_at
    }

    SubTask {
        string id PK
        string task_id FK
        string name "Subtask name"
        string description "Subtask description"
        string status "queued/running/completed/failed"
        string agent_type "researcher/developer/executor"
        json context "Agent context"
        timestamp created_at
        timestamp updated_at
    }

    Action {
        string id PK
        string subtask_id FK
        string type "command/search/analyze/etc"
        string status "success/failure"
        json parameters "Action parameters"
        json result "Action results"
        timestamp created_at
    }

    Artifact {
        string id PK
        string action_id FK
        string type "file/report/log"
        string path "Storage path"
        json metadata "Additional info"
        timestamp created_at
    }

    Memory {
        string id PK
        string action_id FK
        string type "observation/conclusion"
        vector embedding "Vector representation"
        text content "Memory content"
        timestamp created_at
    }

sequenceDiagram
    participant O as Orchestrator
    participant R as Researcher
    participant D as Developer
    participant E as Executor
    participant VS as Vector Store
    participant KB as Knowledge Base

    Note over O,KB: Flow Initialization
    O->>VS: Query similar tasks
    VS-->>O: Return experiences
    O->>KB: Load relevant knowledge
    KB-->>O: Return context

    Note over O,R: Research Phase
    O->>R: Analyze target
    R->>VS: Search similar cases
    VS-->>R: Return patterns
    R->>KB: Query vulnerabilities
    KB-->>R: Return known issues
    R->>VS: Store findings
    R-->>O: Research results

    Note over O,D: Planning Phase
    O->>D: Plan attack
    D->>VS: Query exploits
    VS-->>D: Return techniques
    D->>KB: Load tools info
    KB-->>D: Return capabilities
    D-->>O: Attack plan

    Note over O,E: Execution Phase
    O->>E: Execute plan
    E->>KB: Load tool guides
    KB-->>E: Return procedures
    E->>VS: Store results
    E-->>O: Execution status

graph TB
    subgraph "Long-term Memory"
        VS[(Vector Store<br/>Embeddings DB)]
        KB[Knowledge Base<br/>Domain Expertise]
        Tools[Tools Knowledge<br/>Usage Patterns]
    end

    subgraph "Working Memory"
        Context[Current Context<br/>Task State]
        Goals[Active Goals<br/>Objectives]
        State[System State<br/>Resources]
    end

    subgraph "Episodic Memory"
        Actions[Past Actions<br/>Commands History]
        Results[Action Results<br/>Outcomes]
        Patterns[Success Patterns<br/>Best Practices]
    end

    Context --> |Query| VS
    VS --> |Retrieve| Context

    Goals --> |Consult| KB
    KB --> |Guide| Goals

    State --> |Record| Actions
    Actions --> |Learn| Patterns
    Patterns --> |Store| VS

    Tools --> |Inform| State
    Results --> |Update| Tools

    VS --> |Enhance| KB
    KB --> |Index| VS

    classDef ltm fill:#f9f,stroke:#333,stroke-width:2px,color:#000
    classDef wm fill:#bbf,stroke:#333,stroke-width:2px,color:#000
    classDef em fill:#bfb,stroke:#333,stroke-width:2px,color:#000

    class VS,KB,Tools ltm
    class Context,Goals,State wm
    class Actions,Results,Patterns em

The chain summarization system manages conversation context growth by selectively summarizing older messages. This is critical for preventing token limits from being exceeded while maintaining conversation coherence.

flowchart TD
    A[Input Chain] --> B{Needs Summarization?}
    B -->|No| C[Return Original Chain]
    B -->|Yes| D[Convert to ChainAST]
    D --> E[Apply Section Summarization]
    E --> F[Process Oversized Pairs]
    F --> G[Manage Last Section Size]
    G --> H[Apply QA Summarization]
    H --> I[Rebuild Chain with Summaries]
    I --> J{Is New Chain Smaller?}
    J -->|Yes| K[Return Optimized Chain]
    J -->|No| C

    classDef process fill:#bbf,stroke:#333,stroke-width:2px,color:#000
    classDef decision fill:#bfb,stroke:#333,stroke-width:2px,color:#000
    classDef output fill:#fbb,stroke:#333,stroke-width:2px,color:#000

    class A,D,E,F,G,H,I process
    class B,J decision
    class C,K output

The algorithm operates on a structured representation of conversation chains (ChainAST) that preserves message types including tool calls and their responses. All summarization operations maintain critical conversation flow while reducing context size.

Global Summarizer Configuration Options

ParameterEnvironment VariableDefaultDescriptionPreserve LastSUMMARIZER_PRESERVE_LASTtrueWhether to keep all messages in the last section intactUse QA PairsSUMMARIZER_USE_QAtrueWhether to use QA pair summarization strategySummarize Human in QASUMMARIZER_SUM_MSG_HUMAN_IN_QAfalseWhether to summarize human messages in QA pairsLast Section SizeSUMMARIZER_LAST_SEC_BYTES51200Maximum byte size for last section (50KB)Max Body Pair SizeSUMMARIZER_MAX_BP_BYTES16384Maximum byte size for a single body pair (16KB)Max QA SectionsSUMMARIZER_MAX_QA_SECTIONS10Maximum QA pair sections to preserveMax QA SizeSUMMARIZER_MAX_QA_BYTES65536Maximum byte size for QA pair sections (64KB)Keep QA SectionsSUMMARIZER_KEEP_QA_SECTIONS1Number of recent QA sections to keep without summarization

Assistant Summarizer Configuration Options

Assistant instances can use customized summarization settings to fine-tune context management behavior:

ParameterEnvironment VariableDefaultDescriptionPreserve LastASSISTANT_SUMMARIZER_PRESERVE_LASTtrueWhether to preserve all messages in the assistant's last sectionLast Section SizeASSISTANT_SUMMARIZER_LAST_SEC_BYTES76800Maximum byte size for assistant's last section (75KB)Max Body Pair SizeASSISTANT_SUMMARIZER_MAX_BP_BYTES16384Maximum byte size for a single body pair in assistant context (16KB)Max QA SectionsASSISTANT_SUMMARIZER_MAX_QA_SECTIONS7Maximum QA sections to preserve in assistant contextMax QA SizeASSISTANT_SUMMARIZER_MAX_QA_BYTES76800Maximum byte size for assistant's QA sections (75KB)Keep QA SectionsASSISTANT_SUMMARIZER_KEEP_QA_SECTIONS3Number of recent QA sections to preserve without summarization

The assistant summarizer configuration provides more memory for context retention compared to the global settings, preserving more recent conversation history while still ensuring efficient token usage.

Summarizer Environment Configuration

# Default values for global summarizer logic
SUMMARIZER_PRESERVE_LAST=true
SUMMARIZER_USE_QA=true
SUMMARIZER_SUM_MSG_HUMAN_IN_QA=false
SUMMARIZER_LAST_SEC_BYTES=51200
SUMMARIZER_MAX_BP_BYTES=16384
SUMMARIZER_MAX_QA_SECTIONS=10
SUMMARIZER_MAX_QA_BYTES=65536
SUMMARIZER_KEEP_QA_SECTIONS=1

# Default values for assistant summarizer logic
ASSISTANT_SUMMARIZER_PRESERVE_LAST=true
ASSISTANT_SUMMARIZER_LAST_SEC_BYTES=76800
ASSISTANT_SUMMARIZER_MAX_BP_BYTES=16384
ASSISTANT_SUMMARIZER_MAX_QA_SECTIONS=7
ASSISTANT_SUMMARIZER_MAX_QA_BYTES=76800
ASSISTANT_SUMMARIZER_KEEP_QA_SECTIONS=3

PentAGI includes sophisticated multi-layered agent supervision mechanisms to ensure efficient task execution, prevent infinite loops, and provide intelligent recovery from stuck states:

Execution Monitoring (Beta)

Automatic Mentor Intervention: Adviser agent (mentor) is automatically invoked when execution patterns indicate potential issues
Pattern Detection: Monitors identical tool calls (threshold: 5, configurable) and total tool calls (threshold: 10, configurable)
Progress Analysis: Evaluates whether agent advances toward subtask objective, detects loops and inefficiencies
Alternative Strategies: Recommends different approaches when current strategy fails
Information Retrieval Guidance: Suggests searching for established solutions instead of reinventing
Enhanced Response Format: Tool responses include both <original_result> and <mentor_analysis> sections
Configurable: Enable via EXECUTION_MONITOR_ENABLED (default: false), customize thresholds with EXECUTION_MONITOR_SAME_TOOL_LIMIT and EXECUTION_MONITOR_TOTAL_TOOL_LIMIT

Best for: Smaller models (< 32B parameters), complex attack scenarios requiring continuous guidance, preventing agents from getting stuck on single approach

Performance Impact: 2-3x increase in execution time and token usage, but delivers 2x improvement in result quality based on testing with Qwen3.5-27B-FP8

Intelligent Task Planning (Beta)

Automated Decomposition: Planner (adviser in planning mode) generates 3-7 specific, actionable steps before specialist agents begin work
Context-Aware Plans: Analyzes full execution context via enricher agent to create informed plans
Structured Assignment: Original request wrapped in <task_assignment> structure with execution plan and instructions
Scope Management: Prevents scope creep by keeping agents focused on current subtask only
Enriched Instructions: Plans highlight critical actions, potential pitfalls, and verification points
Configurable: Enable via AGENT_PLANNING_STEP_ENABLED (default: false)

Best for: Models < 32B parameters, complex penetration testing workflows, improving success rates on sophisticated tasks

Enhanced Adviser Configuration: Works exceptionally well when adviser agent uses stronger model or enhanced settings. Example: using same base model with maximum reasoning mode for adviser (see vllm-qwen3.5-27b-fp8.provider.yml) enables comprehensive task analysis and strategic planning from identical model architecture.

Performance Impact: Adds planning overhead but significantly improves completion rates and reduces redundant work

Tool Call Limits (Always Active)

Hard Limits: Prevent runaway executions regardless of supervision mode status
Differentiated by Agent Type:
- General agents (Assistant, Primary Agent, Pentester, Coder, Installer): MAX_GENERAL_AGENT_TOOL_CALLS (default: 100)
- Limited agents (Searcher, Enricher, Memorist, Generator, Reporter, Adviser, Reflector, Planner): MAX_LIMITED_AGENT_TOOL_CALLS (default: 20)
Graceful Termination: Reflector guides agents to proper completion when approaching limits
Resource Protection: Ensures system stability and prevents resource exhaustion

Reflector Integration (Always Active)

Automatic Correction: Invoked when LLM fails to generate tool calls after 3 attempts
Strategic Guidance: Analyzes failures and guides agents toward proper tool usage or barrier tools (done, ask)
Recovery Mechanism: Provides contextual guidance based on specific failure patterns
Limit Enforcement: Coordinates graceful termination when tool call limits are reached

Recommendations for Open Source Models

Must-Have for Models < 32B Parameters: Testing with Qwen3.5-27B-FP8 demonstrates that enabling both Execution Monitoring and Task Planning is essential for smaller open source models:

Quality Improvement: 2x better results compared to baseline execution without supervision
Loop Prevention: Significantly reduces infinite loops and redundant work
Attack Diversity: Encourages exploration of multiple attack vectors instead of fixating on single approach
Air-Gapped Deployments: Enables production-grade autonomous pentesting in closed network environments with local LLM inference

Trade-offs:

Token consumption: 2-3x increase due to mentor/planner invocations
Execution time: 2-3x longer due to analysis and planning steps
Result quality: 2x improvement in completeness, accuracy, and attack coverage
Model requirements: Works best when adviser uses enhanced configuration (higher reasoning parameters, stronger model variant, or different model)

Configuration Strategy: For optimal performance with smaller models, configure adviser agent with enhanced settings:

Use same model with maximum reasoning mode (example: vllm-qwen3.5-27b-fp8.provider.yml)
Or use stronger model for adviser while keeping base model for other agents
Adjust monitoring thresholds based on task complexity and model capabilities

The architecture of PentAGI is designed to be modular, scalable, and secure. Here are the key components:

Core Services
- Frontend UI: React-based web interface with TypeScript for type safety
- Backend API: Go-based REST and GraphQL APIs with Bearer token authentication for programmatic access
- Vector Store: PostgreSQL with pgvector for semantic search and memory storage
- Task Queue: Async task processing system for reliable operation
- AI Agent: Multi-agent system with specialized roles for efficient testing
Knowledge Graph
- Graphiti: Knowledge graph API for semantic relationship tracking and contextual understanding
- Neo4j: Graph database for storing and querying relationships between entities, actions, and outcomes
- Automatic capturing of agent responses and tool executions for building comprehensive knowledge base
Monitoring Stack
- OpenTelemetry: Unified observability data collection and correlation
- Grafana: Real-time visualization and alerting dashboards
- VictoriaMetrics: High-performance time-series metrics storage
- Jaeger: End-to-end distributed tracing for debugging
- Loki: Scalable log aggregation and analysis
Analytics Platform
- Langfuse: Advanced LLM observability and performance analytics
- ClickHouse: Column-oriented analytics data warehouse
- Redis: High-speed caching and rate limiting
- MinIO: S3-compatible object storage for artifacts
Security Tools
- Web Scraper: Isolated browser environment for safe web interaction
- Pentesting Tools: Comprehensive suite of 20+ professional security tools
- Sandboxed Execution: All operations run in isolated containers
Memory Systems
- Long-term Memory: Persistent storage of knowledge and experiences
- Working Memory: Active context and goals for current operations
- Episodic Memory: Historical actions and success patterns
- Knowledge Base: Structured domain expertise and tool capabilities
- Context Management: Intelligently manages growing LLM context windows using chain summarization

The system uses Docker containers for isolation and easy deployment, with separate networks for core services, monitoring, and analytics to ensure proper security boundaries. Each component is designed to scale horizontally and can be configured for high availability in production environments.

Quick Start

System Requirements

Docker and Docker Compose (or Podman - see Podman configuration)
Minimum 2 vCPU
Minimum 4GB RAM
20GB free disk space
Internet access for downloading images and updates

Using Installer (Recommended)

PentAGI provides an interactive installer with a terminal-based UI for streamlined configuration and deployment. The installer guides you through system checks, LLM provider setup, search engine configuration, and security hardening.

Supported Platforms:

Linux: amd64 download | arm64 download
Windows: amd64 download
macOS: amd64 (Intel) download | arm64 (M-series) download

Quick Installation (Linux amd64):

# Create installation directory
mkdir -p pentagi && cd pentagi

# Download installer
wget -O installer.zip https://pentagi.com/downloads/linux/amd64/installer-latest.zip

# Extract
unzip installer.zip

# Run interactive installer
./installer

Prerequisites & Permissions:

The installer requires appropriate privileges to interact with the Docker API for proper operation. By default, it uses the Docker socket (/var/run/docker.sock) which requires either:

Option 1 (Recommended for production): Run the installer as root:
```
sudo ./installer
```
Option 2 (Development environments): Grant your user access to the Docker socket by adding them to the docker group:
```
# Add your user to the docker group
sudo usermod -aG docker $USER

# Log out and log back in, or activate the group immediately
newgrp docker

# Verify Docker access (should run without sudo)
docker ps
```
⚠️ Security Note: Adding a user to the docker group grants root-equivalent privileges. Only do this for trusted users in controlled environments. For production deployments, consider using rootless Docker mode or running the installer with sudo.

The installer will:

System Checks: Verify Docker, network connectivity, and system requirements
Environment Setup: Create and configure .env file with optimal defaults
Provider Configuration: Set up LLM providers (OpenAI, Anthropic, Gemini, Bedrock, Ollama, Custom)
Search Engines: Configure DuckDuckGo, Google, Tavily, Traversaal, Perplexity, Sploitus, Searxng
Security Hardening: Generate secure credentials and configure SSL certificates
Deployment: Start PentAGI with docker-compose

Current Web Settings Coverage

The PentAGI web console already manages several settings areas after the server is up and running:

Settings -> Providers: Create, edit, delete, and test user-defined provider profiles for supported provider types. These profiles control per-agent model selection, runtime parameters, reasoning options, and pricing metadata.
Settings -> Prompts: Manage system, human, and tool prompt templates.
Settings -> PentAGI API: Create and manage PentAGI Bearer tokens for REST and GraphQL access.
Other UI-managed preferences: Favorite flows are stored as user preferences, and theme selection is handled from the main sidebar/profile controls rather than the Settings pages.

Still Server-Managed

The following configuration areas still need to be set on the server through environment variables, compose files, or mounted config files:

LLM credentials and connection details: API keys, endpoints, auth modes, and provider-specific connection settings for OpenAI, Anthropic, Bedrock, Ollama, custom providers, and similar backends; config-path settings apply only where supported, such as OLLAMA_SERVER_CONFIG_PATH and LLM_SERVER_CONFIG_PATH.
Search provider credentials and options: Settings such as DUCKDUCKGO_*, GOOGLE_*, TAVILY_API_KEY, TRAVERSAAL_API_KEY, PERPLEXITY_*, SEARXNG_*, and SPLOITUS_ENABLED.
Third-party integrations: Langfuse, Graphiti, and similar external services remain server-side configuration.
MCP server management: MCP settings pages are not currently exposed as a live web-console feature.

For Production & Enhanced Security:

For production deployments or security-sensitive environments, we strongly recommend using a distributed two-node architecture where worker operations are isolated on a separate server. This prevents untrusted code execution and network access issues on your main system.

See detailed guide: Worker Node Setup

The two-node setup provides:

Isolated Execution: Worker containers run on dedicated hardware
Network Isolation: Separate network boundaries for penetration testing
Security Boundaries: Docker-in-Docker with TLS authentication
OOB Attack Support: Dedicated port ranges for out-of-band techniques

Manual Installation

Create a working directory or clone the repository:

mkdir pentagi && cd pentagi

Copy .env.example to .env or download it:

curl -o .env https://raw.githubusercontent.com/vxcontrol/pentagi/master/.env.example

Touch examples files (example.custom.provider.yml, example.ollama.provider.yml) or download it:

curl -o example.custom.provider.yml https://raw.githubusercontent.com/vxcontrol/pentagi/master/examples/configs/custom-openai.provider.yml
curl -o example.ollama.provider.yml https://raw.githubusercontent.com/vxcontrol/pentagi/master/examples/configs/ollama-llama318b.provider.yml

Fill in the required API keys in .env file.

# Required: At least one of these LLM providers
OPEN_AI_KEY=your_openai_key
ANTHROPIC_API_KEY=your_anthropic_key
GEMINI_API_KEY=your_gemini_key

# Optional: AWS Bedrock provider (enterprise-grade models)
BEDROCK_REGION=us-east-1
# Choose one authentication method:
BEDROCK_DEFAULT_AUTH=true                        # Option 1: Use AWS SDK default credential chain (recommended for EC2/ECS)
# BEDROCK_BEARER_TOKEN=your_bearer_token         # Option 2: Bearer token authentication
# BEDROCK_ACCESS_KEY_ID=your_aws_access_key      # Option 3: Static credentials
# BEDROCK_SECRET_ACCESS_KEY=your_aws_secret_key

# Optional: Ollama provider (local or cloud)
# OLLAMA_SERVER_URL=http://ollama-server:11434   # Local server
# OLLAMA_SERVER_URL=https://ollama.com           # Cloud service
# OLLAMA_SERVER_API_KEY=your_ollama_cloud_key    # Required for cloud, empty for local

# Optional: Chinese AI providers
# DEEPSEEK_API_KEY=your_deepseek_key             # DeepSeek (strong reasoning)
# GLM_API_KEY=your_glm_key                       # GLM (Zhipu AI)
# KIMI_API_KEY=your_kimi_key                     # Kimi (Moonshot AI, ultra-long context)
# QWEN_API_KEY=your_qwen_key                     # Qwen (Alibaba Cloud, multimodal)

# Optional: Local LLM provider (zero-cost inference)
OLLAMA_SERVER_URL=http://localhost:11434
OLLAMA_SERVER_MODEL=your_model_name

# Optional: Additional search capabilities
DUCKDUCKGO_ENABLED=true
DUCKDUCKGO_REGION=us-en
DUCKDUCKGO_SAFESEARCH=
DUCKDUCKGO_TIME_RANGE=
SPLOITUS_ENABLED=true
GOOGLE_API_KEY=your_google_key
GOOGLE_CX_KEY=your_google_cx
TAVILY_API_KEY=your_tavily_key
TRAVERSAAL_API_KEY=your_traversaal_key
PERPLEXITY_API_KEY=your_perplexity_key
PERPLEXITY_MODEL=sonar-pro
PERPLEXITY_CONTEXT_SIZE=medium

# Searxng meta search engine (aggregates results from multiple sources)
SEARXNG_URL=http://your-searxng-instance:8080
SEARXNG_CATEGORIES=general
SEARXNG_LANGUAGE=
SEARXNG_SAFESEARCH=0
SEARXNG_TIME_RANGE=
SEARXNG_TIMEOUT=

## Graphiti knowledge graph settings
GRAPHITI_ENABLED=true
GRAPHITI_TIMEOUT=30
GRAPHITI_URL=http://graphiti:8000
GRAPHITI_MODEL_NAME=gpt-5-mini

# Neo4j settings (used by Graphiti stack)
NEO4J_USER=neo4j
NEO4J_DATABASE=neo4j
NEO4J_PASSWORD=devpassword
NEO4J_URI=bolt://neo4j:7687

# Assistant configuration
ASSISTANT_USE_AGENTS=false         # Default value for agent usage when creating new assistants

Change all security related environment variables in .env file to improve security.

Main Security Settings

COOKIE_SIGNING_SALT - Salt for cookie signing, change to random value
PUBLIC_URL - Public URL of your server (eg. https://pentagi.example.com)
SERVER_SSL_CRT and SERVER_SSL_KEY - Custom paths to your existing SSL certificate and key for HTTPS (these paths should be used in the docker-compose.yml file to mount as volumes)

Scraper Access

SCRAPER_PUBLIC_URL - Public URL for scraper if you want to use different scraper server for public URLs
SCRAPER_PRIVATE_URL - Private URL for scraper (local scraper server in docker-compose.yml file to access it to local URLs)

Access Credentials

PENTAGI_POSTGRES_USER and PENTAGI_POSTGRES_PASSWORD - PostgreSQL credentials
NEO4J_USER and NEO4J_PASSWORD - Neo4j credentials (for Graphiti knowledge graph)

Remove all inline comments from .env file if you want to use it in VSCode or other IDEs as a envFile option:

perl -i -pe 's/\s+#.*$//' .env

Run the PentAGI stack:

curl -O https://raw.githubusercontent.com/vxcontrol/pentagi/master/docker-compose.yml
docker compose up -d

Visit localhost:8443 to access PentAGI Web UI (default is admin@pentagi.com / admin)

Web UI Accounts

PentAGI does not expose public self-service sign-up from the login page. A fresh installation creates the default local administrator account:

Email: admin@pentagi.com
Password: admin

On first login, change the default password before using the instance for real work. If the administrator password is lost later, use the installer maintenance menu to reset the default admin@pentagi.com account password.

For multi-user setups, an authenticated administrator can manage local users through the Users REST API (/api/v1/users/). The OpenAPI UI is available at https://localhost:8443/api/v1/swagger/index.html after the instance is running.

[!NOTE] If you caught an error about pentagi-network or observability-network or langfuse-network you need to run docker-compose.yml firstly to create these networks and after that run docker-compose-langfuse.yml, docker-compose-graphiti.yml, and docker-compose-observability.yml to use Langfuse, Graphiti, and Observability services.
You have to set at least one Language Model provider (OpenAI, Anthropic, Gemini, AWS Bedrock, or Ollama) to use PentAGI. AWS Bedrock provides enterprise-grade access to multiple foundation models from leading AI companies, while Ollama provides zero-cost local inference if you have sufficient computational resources. Additional API keys for search engines are optional but recommended for better results.
For fully local deployment with advanced models: See our comprehensive guide on Running PentAGI with vLLM and Qwen3.5-27B-FP8 for a production-grade local LLM setup. This configuration achieves ~13,000 TPS for prompt processing and ~650 TPS for completion on 4× RTX 5090 GPUs, supporting 12+ concurrent flows with complete independence from cloud providers.
LLM_SERVER_* environment variables are experimental feature and will be changed in the future. Right now you can use them to specify custom LLM server URL and one model for all agent types.
PROXY_URL is a global proxy URL for all LLM providers and external search systems. You can use it for isolation from external networks.
The docker-compose.yml file runs the PentAGI service as root user because it needs access to docker.sock for container management. If you're using TCP/IP network connection to Docker instead of socket file, you can remove root privileges and use the default pentagi user for better security.

Accessing PentAGI from External Networks

By default, PentAGI binds to 127.0.0.1 (localhost only) for security. To access PentAGI from other machines on your network, you need to configure external access.

Configuration Steps

Update .env file with your server's IP address:

# Network binding - allow external connections
PENTAGI_LISTEN_IP=0.0.0.0
PENTAGI_LISTEN_PORT=8443

# Public URL - use your actual server IP or hostname
# Replace 192.168.1.100 with your server's IP address
PUBLIC_URL=https://192.168.1.100:8443

# CORS origins - list all URLs that will access PentAGI
# Include localhost for local access AND your server IP for external access
CORS_ORIGINS=https://localhost:8443,https://192.168.1.100:8443

[!IMPORTANT]
Replace 192.168.1.100 with your actual server's IP address
Do NOT use 0.0.0.0 in PUBLIC_URL or CORS_ORIGINS - use the actual IP address
Include both localhost and your server IP in CORS_ORIGINS for flexibility

Recreate containers to apply the changes:

docker compose down
docker compose up -d --force-recreate

Verify port binding:

docker ps | grep pentagi

You should see 0.0.0.0:8443->8443/tcp or :::8443->8443/tcp.

If you see 127.0.0.1:8443->8443/tcp, the environment variable wasn't picked up. In this case, directly edit docker-compose.yml line 31:

ports:
  - "0.0.0.0:8443:8443"

Then recreate containers again.

Configure firewall to allow incoming connections on port 8443:

# Ubuntu/Debian with UFW
sudo ufw allow 8443/tcp
sudo ufw reload

# CentOS/RHEL with firewalld
sudo firewall-cmd --permanent --add-port=8443/tcp
sudo firewall-cmd --reload

Access PentAGI:

Local access: https://localhost:8443
Network access: https://your-server-ip:8443

[!NOTE] You'll need to accept the self-signed SSL certificate warning in your browser when accessing via IP address.

Running PentAGI with Podman

PentAGI fully supports Podman as a Docker alternative. However, when using Podman in rootless mode, the scraper service requires special configuration because rootless containers cannot bind privileged ports (ports below 1024).

Podman Rootless Configuration

The default scraper configuration uses port 443 (HTTPS), which is a privileged port. For Podman rootless, reconfigure the scraper to use a non-privileged port:

1. Edit docker-compose.yml - modify the scraper service (around line 199):

scraper:
  image: vxcontrol/scraper:latest
  restart: unless-stopped
  container_name: scraper
  hostname: scraper
  expose:
    - 3000/tcp  # Changed from 443 to 3000
  ports:
    - "${SCRAPER_LISTEN_IP:-127.0.0.1}:${SCRAPER_LISTEN_PORT:-9443}:3000"  # Map to port 3000
  environment:
    - MAX_CONCURRENT_SESSIONS=${LOCAL_SCRAPER_MAX_CONCURRENT_SESSIONS:-10}
    - USERNAME=${LOCAL_SCRAPER_USERNAME:-someuser}
    - PASSWORD=${LOCAL_SCRAPER_PASSWORD:-somepass}
  logging:
    options:
      max-size: 50m
      max-file: "7"
  volumes:
    - scraper-ssl:/usr/src/app/ssl
  networks:
    - pentagi-network
  shm_size: 2g

2. Update .env file - change the scraper URL to use HTTP and port 3000:

# Scraper configuration for Podman rootless
SCRAPER_PRIVATE_URL=http://someuser:somepass@scraper:3000/
LOCAL_SCRAPER_USERNAME=someuser
LOCAL_SCRAPER_PASSWORD=somepass

[!IMPORTANT] Key changes for Podman:
Use HTTP instead of HTTPS for SCRAPER_PRIVATE_URL
Use port 3000 instead of 443
Change internal expose to 3000/tcp
Update port mapping to target 3000 instead of 443

3. Recreate containers:

podman-compose down
podman-compose up -d --force-recreate

4. Test scraper connectivity:

# Test from within the pentagi container
podman exec -it pentagi wget -O- "http://someuser:somepass@scraper:3000/html?url=http://example.com"

If you see HTML output, the scraper is working correctly.

Podman Rootful Mode

If you're running Podman in rootful mode (with sudo), you can use the default configuration without modifications. The scraper will work on port 443 as intended.

Docker Compatibility

All Podman configurations remain fully compatible with Docker. The non-privileged port approach works identically on both container runtimes.

Assistant Configuration

PentAGI allows you to configure default behavior for assistants:

VariableDefaultDescriptionASSISTANT_USE_AGENTSfalseControls the default value for agent usage when creating new assistants

The ASSISTANT_USE_AGENTS setting affects the initial state of the "Use Agents" toggle when creating a new assistant in the UI:

false (default): New assistants are created with agent delegation disabled by default
true: New assistants are created with agent delegation enabled by default

Note that users can always override this setting by toggling the "Use Agents" button in the UI when creating or editing an assistant. This environment variable only controls the initial default state.

Once the stack is running and you can sign in to the web UI, the fastest way to start is through the Flows workflow.

1. Create your first flow

Open Flows in the sidebar.
Click New Flow.
Choose the mode that fits your goal:
- Automation: fully autonomous execution for a testing goal you want PentAGI to carry out end-to-end
- Assistant: interactive back-and-forth help when you want to steer the investigation step by step. In this mode you can also enable the Use Agents toggle to let PentAGI delegate subtasks to specialized sub-agents for more complex investigations.
Select the LLM provider you want to use for this flow.
Describe the target and the objective in natural language in the message box.

Good first prompts usually include:

the target system or URL
the type of assessment you want
any scope limitations or rules of engagement
the result you expect, such as a vulnerability report or validation of a hypothesis

Example:

Assess https://target.example for common web application vulnerabilities. Focus on authentication, file handling, and injection issues. Stay within the provided target only and summarize confirmed findings with reproduction steps.

Only test systems you own or are explicitly authorized to assess. See EULA.md for the acceptable use requirements.

2. Use templates for repeatable workflows

The new flow form includes a template picker, which can prefill the message box with a saved flow template. This is useful when you run similar assessments repeatedly.

Use an existing template if you already have one saved in Templates
Start from the example prompt in examples/prompts/base_web_pentest.md if you need a practical baseline for web testing
Adjust the target, scope, and constraints before starting the flow

Templates are starting points. You do not need special syntax to use PentAGI: plain natural-language instructions work well as long as the target and goal are clear.

3. Monitor execution and review output

After submitting the flow, PentAGI opens the flow page automatically.

Use the main flow view to follow messages, agent activity, and task progress
Inspect tool activity and terminal output as the flow runs
Review generated tasks and subtasks to understand what PentAGI is doing

Once the flow has enough results, use the Report menu on the flow page to:

open the report in a web view
copy the generated report to the clipboard
download the report as Markdown
download the report as PDF

4. Use the Assistant view to steer an active flow

Each flow also includes an Assistant view for interactive guidance. This is useful when the autonomous run uncovers something that needs human direction instead of a hard restart.

Open the Assistant view for the same flow when you want to inspect the current state before changing anything.
Use the assistant to check flow status, stop the current task, submit follow-up instructions, or patch the remaining planned subtasks before the next step runs.
Treat this as an explicit control path for the current flow, not as an invisible background queue. If you want to change direction, say so clearly and keep the new instruction tied to the current engagement scope.
This works best for clarifying scope, redirecting priorities after intermediate findings, or answering an automation checkpoint without losing the rest of the flow context.

5. Manage flow-scoped files

Each flow has its own Files tab in the flow page. Files are scoped to the parent flow: they live in {dataDir}/flow-{id}-data/ on the host and never leak into other flows.

The tab exposes three sources of files:

Uploads (uploads/): files you provide from the web UI. Use the Upload files action, or drag and drop directly onto the Files tab. While the agent container is running, uploaded files are also pushed into it at /work/uploads/ so the agent can read them with normal shell tools.
Resources (resources/): files attached from your saved user resources library via Attach resources from library. Attached resources are copied into the flow and pushed into the running container at /work/resources/.
Container (container/): snapshots pulled from the running agent container via Pull file or directory from container. These are read-only on the flow side and are never sent back to the container.

Per-file actions in the Files tab include Download, Copy path, Save as resource (promote a flow file into your reusable resources library), and Delete. The Pull action is disabled when the container is not running, with the tooltip "Container is not running".

Uploaded files and attached resources are listed automatically in the agent's system prompts via the {{.UserFiles}} template variable, which renders a compact <task_files> XML block (with nested <uploads> and <resources> sections), so the assistant and automation agents can reference them by path without you pasting the contents into chat. Container snapshots are visible in the UI only and are not auto-injected back into the prompt.

Current limits and limitations to be aware of:

Maximum upload file size is 300 MB; per upload request up to 1000 files and 2 GB total. File names are capped at 255 bytes (roughly 255 ASCII characters; non-ASCII names use multiple bytes per character).
Uploads and resources are mirrored into the running container at the fixed paths /work/uploads/ and /work/resources/; files written to other container paths are not auto-mirrored back into the flow file model. Container snapshots can originate from any container path you pull (for example /etc/...) and are cached on the flow side under container/; they are not pushed back into the container.
Container snapshots are point-in-time pulls. Editing a snapshot in the UI does not write back into the running container.
Deleting a flow today removes the flow record and its long-term memory entries, but does not yet archive or remove the flow's flow-{id}-data/ directory on disk. Operators are still expected to clean up the data directory manually if they want to reclaim the space.

For early testing, start with a narrow target and a single clear objective. This makes the output easier to review and helps you refine your prompts before running larger assessments.

API Access

PentAGI provides comprehensive programmatic access through both REST and GraphQL APIs, allowing you to integrate penetration testing workflows into your automation pipelines, CI/CD processes, and custom applications.

Generating API Tokens

API tokens are managed through the PentAGI web interface:

Navigate to Settings → API Tokens in the web UI
Click Create Token to generate a new API token
Configure token properties:
- Name (optional): A descriptive name for the token
- Expiration Date: When the token will expire (minimum 1 minute, maximum 3 years)
Click Create and copy the token immediately - it will only be shown once for security reasons
Use the token as a Bearer token in your API requests

Each token is associated with your user account and inherits your role's permissions.

Using API Tokens

Include the API token in the Authorization header of your HTTP requests:

# GraphQL API example
curl -X POST https://your-pentagi-instance:8443/api/v1/graphql \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"query": "{ flows { id title status } }"}'

# REST API example
curl https://your-pentagi-instance:8443/api/v1/flows \
  -H "Authorization: Bearer YOUR_API_TOKEN"

API Exploration and Testing

PentAGI provides interactive documentation for exploring and testing API endpoints:

GraphQL Playground

Access the GraphQL Playground at https://your-pentagi-instance:8443/api/v1/graphql/playground

Click the HTTP Headers tab at the bottom

Add your authorization header:

{
  "Authorization": "Bearer YOUR_API_TOKEN"
}

Explore the schema, run queries, and test mutations interactively

Swagger UI

Access the REST API documentation at https://your-pentagi-instance:8443/api/v1/swagger/index.html

Click the Authorize button
Enter your token in the format: Bearer YOUR_API_TOKEN
Click Authorize to apply
Test endpoints directly from the Swagger UI

Generating API Clients

You can generate type-safe API clients for your preferred programming language using the schema files included with PentAGI:

GraphQL Clients

The GraphQL schema is available at:

Web UI: Navigate to Settings to download schema.graphqls
Direct file: backend/pkg/graph/schema.graphqls in the repository

Generate clients using tools like:

GraphQL Code Generator (JavaScript/TypeScript): https://the-guild.dev/graphql/codegen
genqlient (Go): https://github.com/Khan/genqlient
Apollo iOS (Swift): https://www.apollographql.com/docs/ios

REST API Clients

The OpenAPI specification is available at:

Swagger JSON: https://your-pentagi-instance:8443/api/v1/swagger/doc.json
Swagger YAML: Available in backend/pkg/server/docs/swagger.yaml

Generate clients using:

OpenAPI Generator: https://openapi-generator.tech

openapi-generator-cli generate \
  -i https://your-pentagi-instance:8443/api/v1/swagger/doc.json \
  -g python \
  -o ./pentagi-client

Swagger Codegen: https://github.com/swagger-api/swagger-codegen

swagger-codegen generate \
  -i https://your-pentagi-instance:8443/api/v1/swagger/doc.json \
  -l typescript-axios \
  -o ./pentagi-client

swagger-typescript-api (TypeScript): https://github.com/acacode/swagger-typescript-api

npx swagger-typescript-api \
  -p https://your-pentagi-instance:8443/api/v1/swagger/doc.json \
  -o ./src/api \
  -n pentagi-api.ts

API Usage Examples

mutation CreateFlow {
  createFlow(
    modelProvider: "openai"
    input: "Test the security of https://example.com"
  ) {
    id
    title
    status
    createdAt
  }
}

curl https://your-pentagi-instance:8443/api/v1/flows \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  | jq '.flows[] | {id, title, status}'

import requests

class PentAGIClient:
    def __init__(self, base_url, api_token):
        self.base_url = base_url
        self.headers = {
            "Authorization": f"Bearer {api_token}",
            "Content-Type": "application/json"
        }
    
    def create_flow(self, provider, target):
        query = """
        mutation CreateFlow($provider: String!, $input: String!) {
          createFlow(modelProvider: $provider, input: $input) {
            id
            title
            status
          }
        }
        """
        response = requests.post(
            f"{self.base_url}/api/v1/graphql",
            json={
                "query": query,
                "variables": {
                    "provider": provider,
                    "input": target
                }
            },
            headers=self.headers
        )
        return response.json()
    
    def get_flows(self):
        response = requests.get(
            f"{self.base_url}/api/v1/flows",
            headers=self.headers
        )
        return response.json()

# Usage
client = PentAGIClient(
    "https://your-pentagi-instance:8443",
    "your_api_token_here"
)

# Create a new flow
flow = client.create_flow("openai", "Scan https://example.com for vulnerabilities")
print(f"Created flow: {flow}")

# List all flows
flows = client.get_flows()
print(f"Total flows: {len(flows['flows'])}")

import axios, { AxiosInstance } from 'axios';

interface Flow {
  id: string;
  title: string;
  status: string;
  createdAt: string;
}

class PentAGIClient {
  private client: AxiosInstance;

  constructor(baseURL: string, apiToken: string) {
    this.client = axios.create({
      baseURL: `${baseURL}/api/v1`,
      headers: {
        'Authorization': `Bearer ${apiToken}`,
        'Content-Type': 'application/json',
      },
    });
  }

  async createFlow(provider: string, input: string): Promise<Flow> {
    const query = `
      mutation CreateFlow($provider: String!, $input: String!) {
        createFlow(modelProvider: $provider, input: $input) {
          id
          title
          status
          createdAt
        }
      }
    `;

    const response = await this.client.post('/graphql', {
      query,
      variables: { provider, input },
    });

    return response.data.data.createFlow;
  }

  async getFlows(): Promise<Flow[]> {
    const response = await this.client.get('/flows');
    return response.data.flows;
  }

  async getFlow(flowId: string): Promise<Flow> {
    const response = await this.client.get(`/flows/${flowId}`);
    return response.data;
  }
}

// Usage
const client = new PentAGIClient(
  'https://your-pentagi-instance:8443',
  'your_api_token_here'
);

// Create a new flow
const flow = await client.createFlow(
  'openai',
  'Perform penetration test on https://example.com'
);
console.log('Created flow:', flow);

// List all flows
const flows = await client.getFlows();
console.log(`Total flows: ${flows.length}`);

Security Best Practices

When working with API tokens:

Never commit tokens to version control - use environment variables or secrets management
Rotate tokens regularly - set appropriate expiration dates and create new tokens periodically
Use separate tokens for different applications - makes it easier to revoke access if needed
Monitor token usage - review API token activity in the Settings page
Revoke unused tokens - disable or delete tokens that are no longer needed
Use HTTPS only - never send API tokens over unencrypted connections

Token Management

View tokens: See all your active tokens in Settings → API Tokens
Edit tokens: Update token names or revoke tokens
Delete tokens: Permanently remove tokens (this action cannot be undone)
Token ID: Each token has a unique ID that can be copied for reference

The token list shows:

Token name (if provided)
Token ID (unique identifier)
Status (active/revoked/expired)
Creation date
Expiration date

Custom LLM Provider Configuration

When using custom LLM providers with the LLM_SERVER_* variables, you can fine-tune the reasoning format used in requests.

[!TIP] For production-grade local deployments, consider using vLLM with Qwen3.5-27B-FP8 for optimal performance. See our comprehensive deployment guide which includes hardware requirements, configuration templates (thinking mode and non-thinking mode), and performance benchmarks showing 13K TPS prompt processing on 4× RTX 5090 GPUs.

VariableDefaultDescriptionLLM_SERVER_URLBase URL for the custom LLM API endpointLLM_SERVER_KEYAPI key for the custom LLM providerLLM_SERVER_MODELDefault model to use (can be overridden in provider config)LLM_SERVER_CONFIG_PATHPath to the YAML configuration file for agent-specific modelsLLM_SERVER_PROVIDERProvider name prefix for model names (e.g., openrouter, deepseek for LiteLLM proxy)LLM_SERVER_LEGACY_REASONINGfalseControls reasoning format in API requestsLLM_SERVER_PRESERVE_REASONINGfalsePreserve reasoning content in multi-turn conversations (required by some providers)

The LLM_SERVER_PROVIDER setting is particularly useful when using LiteLLM proxy, which adds a provider prefix to model names. For example, when connecting to Moonshot API through LiteLLM, models like kimi-2.5 become moonshot/kimi-2.5. By setting LLM_SERVER_PROVIDER=moonshot, you can use the same provider configuration file for both direct API access and LiteLLM proxy access without modifications.

The LLM_SERVER_LEGACY_REASONING setting affects how reasoning parameters are sent to the LLM:

false (default): Uses modern format where reasoning is sent as a structured object with max_tokens parameter
true: Uses legacy format with string-based reasoning_effort parameter

This setting is important when working with different LLM providers as they may expect different reasoning formats in their API requests. If you encounter reasoning-related errors with custom providers, try changing this setting.

The LLM_SERVER_PRESERVE_REASONING setting controls whether reasoning content is preserved in multi-turn conversations:

false (default): Reasoning content is not preserved in conversation history
true: Reasoning content is preserved and sent in subsequent API calls

This setting is required by some LLM providers (e.g., Moonshot) that return errors like "thinking is enabled but reasoning_content is missing in assistant tool call message" when reasoning content is not included in multi-turn conversations. Enable this setting if your provider requires reasoning content to be preserved.

Ollama Provider Configuration

PentAGI supports Ollama for both local LLM inference (zero-cost, enhanced privacy) and Ollama Cloud (managed service with free tier).

Configuration Variables

VariableDefaultDescriptionOLLAMA_SERVER_URLURL of your Ollama server or Ollama CloudOLLAMA_SERVER_API_KEYAPI key for Ollama Cloud authenticationOLLAMA_SERVER_MODELDefault model for inferenceOLLAMA_SERVER_CONFIG_PATHPath to custom agent configuration fileOLLAMA_SERVER_PULL_MODELS_TIMEOUT600Timeout for model downloads (seconds)OLLAMA_SERVER_PULL_MODELS_ENABLEDfalseAuto-download models on startupOLLAMA_SERVER_LOAD_MODELS_ENABLEDfalseQuery server for available models

Ollama Cloud Configuration

Ollama Cloud provides managed inference with a generous free tier and scalable paid plans.

Free Tier Setup (Single Model)

# Free tier allows one model at a time
OLLAMA_SERVER_URL=https://ollama.com
OLLAMA_SERVER_API_KEY=your_ollama_cloud_api_key
OLLAMA_SERVER_MODEL=gpt-oss:120b  # Example: OpenAI OSS 120B model

Paid Tier Setup (Multi-Model with Pre-built Configuration)

For paid tiers supporting multiple concurrent models, use the pre-built Ollama Cloud configuration:

# Using pre-built Ollama Cloud configuration (included in Docker image)
OLLAMA_SERVER_URL=https://ollama.com
OLLAMA_SERVER_API_KEY=your_ollama_cloud_api_key
OLLAMA_SERVER_CONFIG_PATH=/opt/pentagi/conf/ollama-cloud.provider.yml

The pre-built ollama-cloud.provider.yml configuration includes optimized model assignments for all agent types:

Simple/Assistant: nemotron-3-super:cloud - Fast general-purpose model
Primary Agent: qwen3-coder-next:cloud - Advanced reasoning with high effort mode
Coder/Pentester: qwen3-coder-next:cloud - Specialized coding models
Searcher: qwen3.5:397b-cloud - Large context for information gathering
Refiner/Refactor: glm-5:cloud - High-quality text refinement
Adviser/Enricher: minimax-m2.7:cloud - Efficient advisory tasks
Installer: devstral-2:123b-cloud - Installation and setup tasks

Custom Configuration (Advanced)

To create your own agent configuration, mount a custom file from your host filesystem:

# Using custom provider configuration
OLLAMA_SERVER_URL=https://ollama.com
OLLAMA_SERVER_API_KEY=your_ollama_cloud_api_key
OLLAMA_SERVER_CONFIG_PATH=/opt/pentagi/conf/ollama.provider.yml

# Mount custom configuration from host filesystem (in .env or docker-compose override)
PENTAGI_OLLAMA_SERVER_CONFIG_PATH=/path/on/host/my-ollama-config.yml

The PENTAGI_OLLAMA_SERVER_CONFIG_PATH environment variable maps your host configuration file to /opt/pentagi/conf/ollama.provider.yml inside the container.

Example custom configuration (my-ollama-config.yml):

primary_agent:
  model: "qwen3-coder-next:cloud"
  temperature: 1.0
  top_p: 0.9
  max_tokens: 32768
  reasoning:
    effort: high

coder:
  model: "qwen3-coder:32b"
  temperature: 1.0
  max_tokens: 20480

Local Ollama Configuration

For self-hosted Ollama instances:

# Basic local Ollama setup
OLLAMA_SERVER_URL=http://localhost:11434
OLLAMA_SERVER_MODEL=llama3.1:8b-instruct-q8_0

# Production setup with auto-pull and model discovery
OLLAMA_SERVER_URL=http://ollama-server:11434
OLLAMA_SERVER_PULL_MODELS_ENABLED=true
OLLAMA_SERVER_PULL_MODELS_TIMEOUT=900
OLLAMA_SERVER_LOAD_MODELS_ENABLED=true

# Using pre-built configurations from Docker image
OLLAMA_SERVER_CONFIG_PATH=/opt/pentagi/conf/ollama-llama318b.provider.yml
# or
OLLAMA_SERVER_CONFIG_PATH=/opt/pentagi/conf/ollama-qwen332b-fp16-tc.provider.yml
# or
OLLAMA_SERVER_CONFIG_PATH=/opt/pentagi/conf/ollama-qwq32b-fp16-tc.provider.yml

Performance Considerations:

Model Discovery (OLLAMA_SERVER_LOAD_MODELS_ENABLED=true): Adds 1-2s startup latency querying Ollama API
Auto-pull (OLLAMA_SERVER_PULL_MODELS_ENABLED=true): First startup may take several minutes downloading models
Pull timeout (OLLAMA_SERVER_PULL_MODELS_TIMEOUT=900): 15 minutes in seconds
Static Config: Disable both flags and specify models in config file for fastest startup

Creating Custom Ollama Models with Extended Context

PentAGI requires models with larger context windows than the default Ollama configurations. You need to create custom models with increased num_ctx parameter through Modelfiles. While typical agent workflows consume around 64K tokens, PentAGI uses 110K context size for safety margin and handling complex penetration testing scenarios.

Important: The num_ctx parameter can only be set during model creation via Modelfile - it cannot be changed after model creation or overridden at runtime.

Example: Qwen3 32B FP16 with Extended Context

Create a Modelfile named Modelfile_qwen3_32b_fp16_tc:

FROM qwen3:32b-fp16
PARAMETER num_ctx 110000
PARAMETER temperature 0.3
PARAMETER top_p 0.8
PARAMETER min_p 0.0
PARAMETER top_k 20
PARAMETER repeat_penalty 1.1

Build the custom model:

ollama create qwen3:32b-fp16-tc -f Modelfile_qwen3_32b_fp16_tc

Example: QwQ 32B FP16 with Extended Context

Create a Modelfile named Modelfile_qwq_32b_fp16_tc:

FROM qwq:32b-fp16
PARAMETER num_ctx 110000
PARAMETER temperature 0.2
PARAMETER top_p 0.7
PARAMETER min_p 0.0
PARAMETER top_k 40
PARAMETER repeat_penalty 1.2

Build the custom model:

ollama create qwq:32b-fp16-tc -f Modelfile_qwq_32b_fp16_tc

Note: The QwQ 32B FP16 model requires approximately 71.3 GB VRAM for inference. Ensure your system has sufficient GPU memory before attempting to use this model.

These custom models are referenced in the pre-built provider configuration files (ollama-qwen332b-fp16-tc.provider.yml and ollama-qwq32b-fp16-tc.provider.yml) that are included in the Docker image at /opt/pentagi/conf/.

OpenAI Provider Configuration

PentAGI integrates with OpenAI's comprehensive model lineup, featuring advanced reasoning capabilities with extended chain-of-thought, agentic models with enhanced tool integration, and specialized code models for security engineering.

Configuration Variables

VariableDefaultDescriptionOPEN_AI_KEYAPI key for OpenAI servicesOPEN_AI_SERVER_URLhttps://api.openai.com/v1OpenAI API endpoint

Configuration Examples

# Basic OpenAI setup
OPEN_AI_KEY=your_openai_api_key
OPEN_AI_SERVER_URL=https://api.openai.com/v1

# Using with proxy for enhanced security
OPEN_AI_KEY=your_openai_api_key
PROXY_URL=http://your-proxy:8080

Supported Models

PentAGI supports 31 OpenAI models with tool calling, streaming, reasoning modes, and prompt caching. Models marked with * are used in default configuration.

GPT-5.2 Series - Latest Flagship Agentic (December 2025)

Model IDThinkingPrice (Input/Output/Cache)Use Casegpt-5.2*✅$1.75/$14.00/$0.18Latest flagship with enhanced reasoning and tool integration, autonomous security researchgpt-5.2-pro✅$21.00/$168.00/$0.00Premium version with superior agentic coding, mission-critical security research, zero-day discoverygpt-5.2-codex✅$1.75/$14.00/$0.18Most advanced code-specialized, context compaction, strong cybersecurity capabilities

GPT-5/5.1 Series - Advanced Agentic Models

Model IDThinkingPrice (Input/Output/Cache)Use Casegpt-5✅$1.25/$10.00/$0.13Premier agentic with advanced reasoning, autonomous security research, exploit chain developmentgpt-5.1✅$1.25/$10.00/$0.13Enhanced agentic with adaptive reasoning, balanced penetration testing with strong tool coordinationgpt-5-pro✅$15.00/$120.00/$0.00Premium version with major reasoning improvements, reduced hallucinations, critical security operationsgpt-5-mini✅$0.25/$2.00/$0.03Efficient balancing speed and intelligence, automated vulnerability analysis, exploit generationgpt-5-nano✅$0.05/$0.40/$0.01Fastest for high-throughput scanning, reconnaissance, bulk vulnerability detection

GPT-5/5.1 Codex Series - Code-Specialized

Model IDThinkingPrice (Input/Output/Cache)Use Casegpt-5.1-codex-max✅$1.25/$10.00/$0.13Enhanced reasoning for sophisticated coding, proven CVE findings, systematic exploit developmentgpt-5.1-codex✅$1.25/$10.00/$0.13Standard code-optimized with strong reasoning, exploit generation, vulnerability analysisgpt-5-codex✅$1.25/$10.00/$0.13Foundational code-specialized, vulnerability scanning, basic exploit generationgpt-5.1-codex-mini✅$0.25/$2.00/$0.03Compact high-performance, 4x higher capacity, rapid vulnerability detectioncodex-mini-latest✅$1.50/$6.00/$0.38Latest compact code model, automated code review, basic vulnerability analysis

GPT-4.1 Series - Enhanced Intelligence

Model IDThinkingPrice (Input/Output/Cache)Use Casegpt-4.1❌$2.00/$8.00/$0.50Enhanced flagship with superior function calling, complex threat analysis, sophisticated exploit developmentgpt-4.1-mini*❌$0.40/$1.60/$0.10Balanced performance with improved efficiency, routine security assessments, automated code analysisgpt-4.1-nano❌$0.10/$0.40/$0.03Ultra-fast lightweight, bulk security scanning, rapid reconnaissance, continuous monitoring

GPT-4o Series - Multimodal Flagship

Model IDThinkingPrice (Input/Output/Cache)Use Casegpt-4o❌$2.50/$10.00/$1.25Multimodal flagship with vision, image analysis, web UI assessment, multi-tool orchestrationgpt-4o-mini❌$0.15/$0.60/$0.08Compact multimodal with strong function calling, high-frequency scanning, cost-effective bulk operations

o-Series - Advanced Reasoning Models

Model IDThinkingPrice (Input/Output/Cache)Use Caseo4-mini*✅$1.10/$4.40/$0.28Next-gen reasoning with enhanced speed, methodical security assessments, systematic exploit developmento3*✅$2.00/$8.00/$0.50Advanced reasoning powerhouse, multi-stage attack chains, deep vulnerability analysiso3-mini✅$1.10/$4.40/$0.55Compact reasoning with extended thinking, step-by-step attack planning, logical vulnerability chainingo1✅$15.00/$60.00/$7.50Premier reasoning with maximum depth, advanced penetration testing, novel exploit researcho3-pro✅$20.00/$80.00/$0.00Most advanced reasoning, 80% cheaper than o1-pro, zero-day research, critical security investigationso1-pro✅$150.00/$600.00/$0.00Previous-gen premium reasoning, exhaustive security analysis, mission-critical challenges

Prices: Per 1M tokens. Reasoning models include thinking tokens in output pricing.

[!WARNING] GPT-5 Models - Trusted Access Required*
All GPT-5 series models (gpt-5, gpt-5.1, gpt-5.2, gpt-5-pro, gpt-5.2-pro, and all Codex variants) work unstably with PentAGI and may trigger OpenAI's cybersecurity safety mechanisms without verified access.
To use GPT-5 models reliably:*
Individual users: Verify your identity at chatgpt.com/cyber
Enterprise teams: Request trusted access through your OpenAI representative
Security researchers: Apply for the Cybersecurity Grant Program (includes $10M in API credits)
Recommended alternatives without verification:
Use o-series models (o3, o4-mini, o1) for reasoning tasks
Use gpt-4.1 series for general intelligence and function calling
All o-series and gpt-4.x models work reliably without special access

Reasoning Effort Levels:

High: Maximum reasoning depth (refiner - o3 with high effort)
Medium: Balanced reasoning (primary_agent, assistant, reflector - o4-mini/o3 with medium effort)
Low: Efficient targeted reasoning (coder, installer, pentester - o3/o4-mini with low effort; adviser - gpt-5.2 with low effort)

Key Features:

Extended Reasoning: o-series models with chain-of-thought for complex security analysis
Agentic Intelligence: GPT-5/5.1/5.2 series with enhanced tool integration and autonomous capabilities
Prompt Caching: Cost reduction on repeated context (10-50% of input price)
Code Specialization: Dedicated Codex models for vulnerability discovery and exploit development
Multimodal Support: GPT-4o series for vision-based security assessments
Tool Calling: Robust function calling across all models for pentesting tool orchestration
Streaming: Real-time response streaming for interactive workflows
Proven Track Record: Industry-leading models with CVE discoveries and real-world security applications

Anthropic Provider Configuration

PentAGI integrates with Anthropic's Claude models, featuring advanced extended thinking capabilities, exceptional safety mechanisms, and sophisticated understanding of complex security contexts with prompt caching.

Configuration Variables

VariableDefaultDescriptionANTHROPIC_API_KEYAPI key for Anthropic servicesANTHROPIC_SERVER_URLhttps://api.anthropic.com/v1Anthropic API endpoint

Configuration Examples

# Basic Anthropic setup
ANTHROPIC_API_KEY=your_anthropic_api_key
ANTHROPIC_SERVER_URL=https://api.anthropic.com/v1

# Using with proxy for secure environments
ANTHROPIC_API_KEY=your_anthropic_api_key
PROXY_URL=http://your-proxy:8080

[!NOTE] Google Vertex AI for Claude models
PentAGI does not currently expose a dedicated Google Vertex AI configuration path for Anthropic Claude in .env. There is no separate Vertex AI API key field at this time, and the existing Anthropic variables (ANTHROPIC_API_KEY, ANTHROPIC_SERVER_URL) target the direct Anthropic API. Supported routes for Claude are:
Direct Anthropic API: ANTHROPIC_API_KEY and ANTHROPIC_SERVER_URL (see above).
AWS Bedrock: BEDROCK_* variables (see AWS Bedrock Provider Configuration).
If you need to use Vertex AI today, the safest supported workaround is to expose Vertex AI through an OpenAI-compatible proxy or gateway that translates Vertex AI calls into the Chat Completions format while preserving the chat and tool-call behavior PentAGI relies on, then point the Custom LLM provider at that gateway via LLM_SERVER_URL, LLM_SERVER_KEY, and LLM_SERVER_MODEL. This path is only as reliable as the gateway you choose.

Supported Models

PentAGI supports 10 Claude models with tool calling, streaming, extended thinking, adaptive thinking, and prompt caching. Models marked with * are used in default configuration.

Claude 4 Series - Latest Models (2025-2026)

Model IDThinkingRelease DatePrice (Input/Output/Cache R/W)Use Caseclaude-opus-4-6*✅May 2025$5.00/$25.00/$0.50/$6.25Most intelligent model for autonomous agents and coding. Extended + adaptive thinking for complex exploit development, multi-stage attack simulationclaude-sonnet-4-6*✅Aug 2025$3.00/$15.00/$0.30/$3.75Best speed/intelligence balance with adaptive thinking. Multi-phase security assessments, intelligent vulnerability analysis, real-time threat huntingclaude-haiku-4-5*✅Oct 2025$1.00/$5.00/$0.10/$1.25Fastest model with near-frontier intelligence. High-frequency scanning, real-time monitoring, bulk automated testing

Legacy Models - Still Supported

Model IDThinkingRelease DatePrice (Input/Output/Cache R/W)Use Caseclaude-sonnet-4-5✅Sep 2025$3.00/$15.00/$0.30/$3.75State-of-the-art reasoning (superseded by 4-6). Sophisticated penetration testing, advanced threat analysisclaude-opus-4-5✅Nov 2025$5.00/$25.00/$0.50/$6.25Ultimate reasoning (superseded by opus-4-6). Critical security research, zero-day discovery, red team operationsclaude-opus-4-1✅Aug 2025$15.00/$75.00/$1.50/$18.75Advanced reasoning (superseded). Complex penetration testing, sophisticated threat modelingclaude-sonnet-4-0✅May 2025$3.00/$15.00/$0.30/$3.75High-performance reasoning (superseded). Complex threat modeling, multi-tool coordinationclaude-opus-4-0✅May 2025$15.00/$75.00/$1.50/$18.75First generation Opus (superseded). Multi-step exploit development, autonomous pentesting workflows

Deprecated Models - Migrate to Current Models

Model IDThinkingRelease DatePrice (Input/Output/Cache R/W)Notesclaude-3-haiku-20240307❌Mar 2024$0.25/$1.25/$0.03/$0.30Will be retired April 19, 2026. Migrate to claude-haiku-4-5

Prices: Per 1M tokens. Cache pricing includes both Read and Write costs.

Extended Thinking Configuration:

Max Tokens 4096: Generator (claude-opus-4-6) for maximum reasoning depth on complex exploit development
Max Tokens 2048: Coder (claude-sonnet-4-6) for balanced code analysis and vulnerability research
Max Tokens 1024: Primary agent, assistant, refiner, adviser, reflector, searcher, installer, pentester for focused reasoning on specific tasks
Extended Thinking: All Claude 4.5+ and 4.6 models support configurable extended thinking for deep reasoning tasks

Key Features:

Extended Thinking: All Claude 4.5+ and 4.6 models with configurable chain-of-thought reasoning depths for complex security analysis
Adaptive Thinking: Claude 4.6 series (Opus/Sonnet) dynamically adjusts reasoning depth based on task complexity for optimal performance
Prompt Caching: Significant cost reduction with separate read/write pricing (10% read, 125% write of input)
Extended Context Window: 200K tokens standard, up to 1M tokens (beta) for Claude Opus/Sonnet 4.6 for comprehensive codebase analysis
Tool Calling: Robust function calling with exceptional accuracy for security tool orchestration
Streaming: Real-time response streaming for interactive penetration testing workflows
Safety-First Design: Built-in safety mechanisms ensuring responsible security testing practices
Multimodal Support: Vision capabilities in latest models for screenshot analysis and UI security assessment
Constitutional AI: Advanced safety training providing reliable and ethical security guidance

Google AI (Gemini) Provider Configuration

PentAGI integrates with Google's Gemini models through the Google AI API, offering state-of-the-art multimodal reasoning capabilities with extended thinking and context caching.

Configuration Variables

VariableDefaultDescriptionGEMINI_API_KEYAPI key for Google AI servicesGEMINI_SERVER_URLhttps://generativelanguage.googleapis.comGoogle AI API endpoint

Configuration Examples

# Basic Gemini setup
GEMINI_API_KEY=your_gemini_api_key
GEMINI_SERVER_URL=https://generativelanguage.googleapis.com

# Using with proxy
GEMINI_API_KEY=your_gemini_api_key
PROXY_URL=http://your-proxy:8080

Supported Models

PentAGI supports 9 Gemini models with tool calling, streaming, thinking modes, and context caching. Models marked with * are used in default configuration.

Gemini 3.5 Series - Latest Stable Flash (May 2026)

Model IDThinkingContextPrice (Input/Output/Cache)Use Casegemini-3.5-flash*✅1M$1.50/$9.00/$0.15Most intelligent Flash model with sustained frontier performance on agentic and coding tasks, superior search and grounding

Gemini 3.1 Series - Stable Flash-Lite + Pro Preview (Feb-May 2026)

Model IDThinkingContextPrice (Input/Output/Cache)Use Casegemini-3.1-pro-preview*✅1M$2.00/$12.00/$0.20Latest flagship with refined thinking, improved token efficiency, optimized for software engineering and agentic workflowsgemini-3.1-pro-preview-customtools✅1M$2.00/$12.00/$0.20Custom tools endpoint optimized for bash and custom tools (view_file, search_code) prioritizationgemini-3.1-flash-lite*✅1M$0.25/$1.50/$0.025Most cost-efficient stable multimodal model, frontier-class performance for high-volume agentic tasks and low-latency applications

Gemini 2.5 Series - Advanced Thinking Models (active until October 16, 2026)

Model IDThinkingContextPrice (Input/Output/Cache)Use Casegemini-2.5-pro✅1M$1.25/$10.00/$0.125State-of-the-art for complex coding and reasoning, sophisticated threat modelinggemini-2.5-flash✅1M$0.30/$2.50/$0.03First hybrid reasoning model with thinking budgets, best price-performance for large-scale assessmentsgemini-2.5-flash-lite✅1M$0.10/$0.40/$0.01Smallest and most cost-effective for at-scale usage, high-throughput scanning

Gemma 4 Open-Source Models (Apache 2.0, Free Tier)

Model IDThinkingContextPrice (Input/Output/Cache)Use Casegemma-4-31b-it✅256KFree/Free/FreeLargest open-source Gemma 4 dense model (~31B params), multimodal text+image, 140+ languages, on-premises security operationsgemma-4-26b-a4b-it✅256KFree/Free/FreeMoE architecture (~26B total / ~3.8B active params), highly efficient inference on consumer GPUs for on-premises high-throughput scanning

Prices: Per 1M tokens (Standard Paid tier). Context window is input token limit.

[!NOTE] Gemini 2.5 Series Shutdown
gemini-2.5-pro, gemini-2.5-flash, and gemini-2.5-flash-lite will be shut down on October 16, 2026. Recommended migrations:
gemini-2.5-pro → gemini-3.1-pro-preview (same $2.00 input pricing tier)
gemini-2.5-flash → gemini-3.5-flash (improved frontier capabilities)
gemini-2.5-flash-lite → gemini-3.1-flash-lite (same $0.25 input pricing)

Default Model Assignments (config.yml):

gemini-3.1-pro-preview - primary_agent, assistant, generator, refiner, adviser, coder, pentester
gemini-3.5-flash - reflector, searcher, enricher, installer
gemini-3.1-flash-lite - simple, simple_json

Key Features:

Extended Thinking: Step-by-step reasoning for complex security analysis (all Gemini 3.x, 2.5 series, and Gemma 4 with toggleable thinking)
Context Caching: Significant cost reduction on repeated context (10% of input price for most models)
Ultra-Long Context: 1M tokens for Gemini chat models, 256K tokens for Gemma 4 open-source models
Multimodal Support: Text, image, video, audio, and PDF processing for comprehensive assessments
Tool Calling: Seamless integration with 20+ pentesting tools via function calling
Streaming: Real-time response streaming for interactive security workflows
Code Execution: Built-in code execution for offensive tool testing and exploit validation
Search Grounding: Google Search integration for threat intelligence and CVE research
File Search: Document retrieval and RAG capabilities for knowledge-based assessments
Batch API: 50% cost reduction for non-real-time batch processing
Custom Tools Endpoint: Dedicated gemini-3.1-pro-preview-customtools route for tool-heavy agentic workflows that prefer registered tools over bash

Reasoning Effort Levels:

High: Maximum thinking depth for complex multi-step analysis (generator)
Medium: Balanced reasoning for general agentic tasks (primary_agent, assistant, refiner, adviser)
Low: Efficient thinking for focused tasks (coder, installer, pentester)

AWS Bedrock Provider Configuration

PentAGI integrates with Amazon Bedrock, offering access to 20+ foundation models from leading AI companies including Anthropic, Amazon, Cohere, DeepSeek, OpenAI, Qwen, Mistral, and Moonshot.

Configuration Variables

VariableDefaultDescriptionBEDROCK_REGIONus-east-1AWS region for Bedrock serviceBEDROCK_DEFAULT_AUTHfalseUse AWS SDK default credential chain (environment, EC2 role, ~/.aws/credentials) - highest priorityBEDROCK_BEARER_TOKENBearer token authentication - priority over static credentialsBEDROCK_ACCESS_KEY_IDAWS access key ID for static credentialsBEDROCK_SECRET_ACCESS_KEYAWS secret access key for static credentialsBEDROCK_SESSION_TOKENAWS session token for temporary credentials (optional, used with static credentials)BEDROCK_SERVER_URLCustom Bedrock endpoint (VPC endpoints, local testing)

Authentication Priority: BEDROCK_DEFAULT_AUTH → BEDROCK_BEARER_TOKEN → BEDROCK_ACCESS_KEY_ID+BEDROCK_SECRET_ACCESS_KEY

Configuration Examples

# Recommended: Default AWS SDK authentication (EC2/ECS/Lambda roles)
BEDROCK_REGION=us-east-1
BEDROCK_DEFAULT_AUTH=true

# Bearer token authentication (AWS STS, custom auth)
BEDROCK_REGION=us-east-1
BEDROCK_BEARER_TOKEN=your_bearer_token

# Static credentials (development, testing)
BEDROCK_REGION=us-east-1
BEDROCK_ACCESS_KEY_ID=your_aws_access_key
BEDROCK_SECRET_ACCESS_KEY=your_aws_secret_key

# With proxy and custom endpoint
BEDROCK_REGION=us-east-1
BEDROCK_DEFAULT_AUTH=true
BEDROCK_SERVER_URL=https://bedrock-runtime.us-east-1.vpce-xxx.amazonaws.com
PROXY_URL=http://your-proxy:8080

Supported Models

PentAGI supports 21 AWS Bedrock models with tool calling, streaming, and multimodal capabilities. Models marked with * are used in default configuration.

Model IDProviderThinkingMultimodalPrice (Input/Output)Use Caseus.amazon.nova-2-lite-v1:0Amazon Nova❌✅$0.33/$2.75Adaptive reasoning, efficient thinkingus.amazon.nova-premier-v1:0Amazon Nova❌✅$2.50/$12.50Complex reasoning, advanced analysisus.amazon.nova-pro-v1:0Amazon Nova❌✅$0.80/$3.20Balanced accuracy, speed, costus.amazon.nova-lite-v1:0Amazon Nova❌✅$0.06/$0.24Fast processing, high-volume operationsus.amazon.nova-micro-v1:0Amazon Nova❌❌$0.035/$0.14Ultra-low latency, real-time monitoringus.anthropic.claude-opus-4-6-v1*Anthropic✅✅$5.00/$25.00World-class coding, enterprise agentsus.anthropic.claude-sonnet-4-6Anthropic✅✅$3.00/$15.00Frontier intelligence, enterprise scaleus.anthropic.claude-opus-4-5-20251101-v1:0Anthropic✅✅$5.00/$25.00Multi-day software developmentus.anthropic.claude-haiku-4-5-20251001-v1:0*Anthropic✅✅$1.00/$5.00Near-frontier performance, high speedus.anthropic.claude-sonnet-4-5-20250929-v1:0*Anthropic✅✅$3.00/$15.00Real-world agents, coding excellenceus.anthropic.claude-sonnet-4-20250514-v1:0Anthropic✅✅$3.00/$15.00Balanced performance, production-readyus.anthropic.claude-3-5-haiku-20241022-v1:0Anthropic❌❌$0.80/$4.00Fastest model, cost-effective scanningcohere.command-r-plus-v1:0Cohere❌❌$3.00/$15.00Large-scale operations, superior RAGdeepseek.v3.2DeepSeek❌❌$0.58/$1.68Long-context reasoning, efficiencyopenai.gpt-oss-120b-1:0*OpenAI (OSS)✅❌$0.15/$0.60Strong reasoning, scientific analysisopenai.gpt-oss-20b-1:0OpenAI (OSS)✅❌$0.07/$0.30Efficient coding, software developmentqwen.qwen3-next-80b-a3bQwen❌❌$0.15/$1.20Ultra-long context, flagship reasoningqwen.qwen3-32b-v1:0Qwen❌❌$0.15/$0.60Balanced reasoning, research use casesqwen.qwen3-coder-30b-a3b-v1:0Qwen❌❌$0.15/$0.60Vibe coding, natural-language firstqwen.qwen3-coder-nextQwen❌❌$0.45/$1.80Tool use, function calling optimizedmistral.mistral-large-3-675b-instructMistral❌✅$4.00/$12.00Advanced multimodal, long-contextmoonshotai.kimi-k2.5Moonshot❌✅$0.60/$3.00Vision, language, code in one model

Prices: Per 1M tokens. Models with thinking/reasoning support additional compute costs during reasoning phase.

Tested but Incompatible Models

Some AWS Bedrock models were tested but are not supported due to technical limitations:

Model FamilyReason for IncompatibilityGLM (Z.AI)Tool calling format incompatible with Converse API (expects string instead of JSON)AI21 JambaSevere rate limits (1-2 req/min) prevent reliable testing and production useMeta Llama 3.3/3.1Unstable tool call result processing, causes unexpected failures in multi-turn workflowsMistral MagistralTool calling not supported by the modelMoonshot K2-ThinkingUnstable streaming behavior with tool calls, unreliable in productionQwen3-VLUnstable streaming with tool calling, multimodal + tools combination fails intermittently

[!IMPORTANT] Rate Limits & Quota Management
Default AWS Bedrock quotas for Claude models are extremely restrictive (2-20 requests/minute for new accounts). For production penetration testing:
Request quota increases through AWS Service Quotas console for models you plan to use
Use Amazon Nova models - higher default quotas and excellent performance
Enable provisioned throughput for consistent high-volume testing
Monitor usage - AWS throttles aggressively at quota limits
Without quota increases, expect frequent delays and workflow interruptions.

[!WARNING] Converse API Requirements
PentAGI uses Amazon Bedrock Converse API for unified model access. All supported models require:
✅ Converse/ConverseStream API support
✅ Tool use (function calling) for penetration testing workflows
✅ Streaming tool use for real-time feedback
Verify model capabilities at: AWS Bedrock Model Features

Key Features:

Automatic Prompt Caching: 40-70% cost reduction on repeated context (Claude 4.x models)
Extended Thinking: Step-by-step reasoning for complex security analysis (Claude, DeepSeek R1, OpenAI GPT)
Multimodal Analysis: Process screenshots, diagrams, video for comprehensive testing (Nova, Claude, Mistral, Kimi)
Tool Calling: Seamless integration with 20+ pentesting tools via function calling
Streaming: Real-time response streaming for interactive security assessment workflows

DeepSeek Provider Configuration

PentAGI integrates with DeepSeek, providing access to advanced AI models with strong reasoning, coding capabilities, and context caching at competitive prices.

Configuration Variables

VariableDefault ValueDescriptionDEEPSEEK_API_KEYDeepSeek API key for authenticationDEEPSEEK_SERVER_URLhttps://api.deepseek.comDeepSeek API endpoint URLDEEPSEEK_PROVIDERProvider prefix for LiteLLM integration (optional)

Configuration Examples

# Direct API usage
DEEPSEEK_API_KEY=your_deepseek_api_key
DEEPSEEK_SERVER_URL=https://api.deepseek.com

# With LiteLLM proxy
DEEPSEEK_API_KEY=your_litellm_key
DEEPSEEK_SERVER_URL=http://litellm-proxy:4000
DEEPSEEK_PROVIDER=deepseek  # Adds prefix to model names (deepseek/deepseek-v4-flash) for LiteLLM

Supported Models

PentAGI supports 2 DeepSeek V4 models with tool calling, streaming, hybrid thinking/non-thinking modes, and context caching. Both models support thinking mode by default and can be switched to non-thinking mode via extra_body. Models marked with * are used in default configuration.

Model IDThinkingMax OutputContextPrice (Input/Output/Cache)Use Casedeepseek-v4-flash*✅ hybrid384K1M$0.14/$0.28/$0.0028Utility agents, general dialogue, fast tool callingdeepseek-v4-pro*✅ hybrid384K1M$1.74/$3.48/$0.0145Advanced reasoning, complex logic, security analysis

Prices: Per 1M tokens. Cache pricing applies to prompt tokens served from cache (input cache hit, reduced to 1/10 of launch price since 2026-04-26). Both models support hybrid thinking — thinking mode is enabled by default; pass extra_body.thinking.type: disabled to switch to non-thinking mode for faster/cheaper responses.

Pricing Note (deepseek-v4-pro): The 75% promotional discount on deepseek-v4-pro officially ended on 2026-05-31 15:59 UTC. The prices above reflect the standard post-promotional pricing. If you have legacy configurations using the discounted prices ($0.435/$0.87/$0.003625), update them to the current rates for accurate cost tracking.

The legacy model names deepseek-chat and deepseek-reasoner are scheduled for deprecation by DeepSeek on 2026-07-24. Existing user configurations referencing the legacy names continue to work until then; the defaults above use the current V4 names. deepseek-chat maps to deepseek-v4-flash non-thinking mode; deepseek-reasoner maps to deepseek-v4-flash thinking mode.

Default Agent Configuration:

Strategy: prefer deepseek-v4-flash (12x cheaper input, 12x cheaper output) as the workhorse for utility/lightweight agents; reserve deepseek-v4-pro for complex multi-step reasoning. The installer agent runs on Flash with thinking enabled because environment setup tasks (shell commands, config edits) rarely require pro-level reasoning. Run A/B tests on your own workloads before promoting more agents to Pro.

Agent RoleDefault ModelThinkingReasoning EffortMax OutputTemperatureTop PGenerator / Refinerdeepseek-v4-proEnabledHigh32768(auto)(auto)Coderdeepseek-v4-proEnabledHigh20480(auto)(auto)Primary Agent / Assistant / Pentesterdeepseek-v4-proEnabledHigh16384(auto)(auto)Adviser (mentor/planner)deepseek-v4-proEnabledHigh8192(auto)(auto)Installerdeepseek-v4-flashEnabledHigh12288(auto)(auto)Reflector / Searcher / Enricherdeepseek-v4-flashDisabled—40960.50.9Simple / Simple JSONdeepseek-v4-flashDisabled—20480.30.9

Note: When thinking mode is enabled, DeepSeek silently ignores temperature, top_p, presence_penalty, and frequency_penalty. The langchaingo client automatically nullifies temperature/top_p when reasoning_effort is set, so they appear as "(auto)" in the table above. All thinking-enabled agents also explicitly pass extra_body.thinking.type: enabled as defensive coding against future provider default changes.

Key Features:

Hybrid Thinking Modes: Switch between thinking (deep reasoning) and non-thinking (fast) modes via extra_body.thinking.type
Automatic Prompt Caching: Significant cost reduction on repeated context via cache-hit pricing (1/10 of launch price)
Extended Thinking: Reinforcement learning CoT for complex security analysis (both V4 models)
Strong Coding: Optimized for code generation and exploit development
Long Context: 1M token context window with up to 384K output tokens
Tool Calling: Seamless integration with 20+ pentesting tools via function calling
Streaming: Real-time response streaming for interactive workflows
Multilingual: Strong Chinese and English support
Additional Features: JSON Output, Chat Prefix Completion (beta), FIM/Fill-in-the-Middle Completion (non-thinking mode only)

Concurrency Limits: deepseek-v4-flash: 2500 concurrent requests; deepseek-v4-pro: 500 concurrent requests.

LiteLLM Integration: Set DEEPSEEK_PROVIDER=deepseek to enable model name prefixing when using default PentAGI configurations with LiteLLM proxy. Leave empty for direct API usage.

GLM Provider Configuration

PentAGI integrates with GLM from Zhipu AI (Z.AI), providing advanced language models with MoE architecture, strong reasoning, and agentic capabilities developed by Tsinghua University.

Configuration Variables

VariableDefault ValueDescriptionGLM_API_KEYGLM API key for authenticationGLM_SERVER_URLhttps://api.z.ai/api/paas/v4GLM API endpoint URL (international)GLM_PROVIDERProvider prefix for LiteLLM integration (optional)

Configuration Examples

# Direct API usage (international endpoint)
GLM_API_KEY=your_glm_api_key
GLM_SERVER_URL=https://api.z.ai/api/paas/v4

# Alternative endpoints
GLM_SERVER_URL=https://open.bigmodel.cn/api/paas/v4  # China
GLM_SERVER_URL=https://api.z.ai/api/coding/paas/v4   # Coding-specific

# With LiteLLM proxy
GLM_API_KEY=your_litellm_key
GLM_SERVER_URL=http://litellm-proxy:4000
GLM_PROVIDER=zai  # Adds prefix to model names (zai/glm-4) for LiteLLM

Supported Models

PentAGI supports 13 GLM models with tool calling, streaming, hybrid thinking modes, and prompt caching. Models marked with * are used in default configuration. Thinking is controlled via extra_body.thinking.type ("enabled"/"disabled"); unlike Kimi, GLM is permissive about temperature in either mode.

GLM-5.x Series - Latest Generation (200K context, 128K max output)

Model IDThinkingContextMax OutputPrice (Input/Output/Cache)Use Caseglm-5.1*✅ Hybrid200K128K$1.40/$4.40/$0.26Newest flagship: 8h sustained autonomous execution, Claude Opus 4.6-aligned (generator/refiner/adviser/coder/pentester default)glm-5✅ Hybrid200K128K$1.00/$3.20/$0.20Foundation for Agentic Engineering, MoE 744B/40B active, Claude Opus 4.5-level codingglm-5-turbo*✅ Hybrid200K128K$1.20/$4.00/$0.24OpenClaw-native: optimized for tool invocation, persistent tasks, long-chain execution (primary_agent/assistant default)

GLM-4.7 Series - Premium with Interleaved Thinking

Model IDThinkingContextMax OutputPrice (Input/Output/Cache)Use Caseglm-4.7✅ Hybrid200K128K$0.60/$2.20/$0.11Enhanced programming, stable multi-step reasoningglm-4.7-flashx✅ Hybrid200K128K$0.07/$0.40/$0.01Ultra-cheap with priority GPU, but lower RPM limits (avoid for high-frequency use)glm-4.7-flash✅ Hybrid200K128KFree/Free/FreeFree ~30B SOTA model, 1 concurrent request

GLM-4.6 Series - Balanced with Auto-Thinking

Model IDThinkingContextMax OutputPrice (Input/Output/Cache)Use Caseglm-4.6✅ Auto200K128K$0.60/$2.20/$0.11Balanced, streaming tool calls, token-efficient

GLM-4.5 Series - Unified Reasoning/Coding/Agents

Model IDThinkingContextMax OutputPrice (Input/Output/Cache)Use Caseglm-4.5✅ Auto128K96K$0.60/$2.20/$0.11Unified, MoE 355B/32B activeglm-4.5-x✅ Auto128K96K$2.20/$8.90/$0.45Ultra-fast premium, lowest latencyglm-4.5-air*✅ Auto128K96K$0.20/$1.10/$0.03Cost-effective MoE 106B/12B (simple/simple_json/reflector/searcher/enricher/installer default)glm-4.5-airx✅ Auto128K96K$1.10/$4.50/$0.22Accelerated Air with priority GPUglm-4.5-flash✅ Auto128K96KFree/Free/FreeFree with reasoning/coding/agents support

GLM-4 Legacy - Dense Architecture

Model IDThinkingContextMax OutputPrice (Input/Output)Use Caseglm-4-32b-0414-128k❌128K16K$0.10/$0.10Ultra-budget dense 32B, parsing without reasoning

Prices: Per 1M tokens. Cache pricing is for prompt cache hit; cache storage is currently free per Z.AI promotion. GLM-4-32B has no cache support.

Default Agent Configuration:

Strategy: glm-5.1 (newest flagship, $1.40 input) for critical reasoning, glm-5-turbo (OpenClaw-native, agent-optimized) for orchestration, glm-4.5-air (cheap MoE with hybrid thinking and reliable RPM) for all utility/installer agents. glm-4.7-flashx is avoided as default due to lower RPM limits causing frequent 429 errors at high frequency.

Agent RoleDefault ModelThinkingTemperatureTop PMax OutputGenerator / Refinerglm-5.1Enabled1.00.9532768Coderglm-5.1Enabled1.00.9520480Adviser / Pentesterglm-5.1Enabled1.00.9516384Primary Agent / Assistantglm-5-turboEnabled1.00.9516384Installerglm-4.5-airEnabled1.00.9516384Simple / Reflectorglm-4.5-airDisabled0.60.98192Searcher / Enricher / Simple JSONglm-4.5-airDisabled0.60.94096

Note on temperature: GLM accepts both 1.0 and 0.6 in either thinking/non-thinking mode (per Z.AI docs). langchaingo's IsReasoningModel matches glm-4.5*/glm-4.6*/glm-4.7* prefixes and force-overrides temperature to 1.0 in createChatRequest — this is harmless for GLM (unlike Kimi) but means temperature values for those models in YAML are advisory. glm-5/glm-5.1/glm-5-turbo are not matched, so explicit values pass through unchanged.

Thinking Modes:

Hybrid (GLM-5.x, GLM-4.7): Explicit toggle via extra_body.thinking.type
Auto (GLM-4.6, GLM-4.5 series): Model automatically determines when reasoning is needed
Preserved Thinking (Z.AI Coding capability): all thinking-enabled agents in PentAGI also pass extra_body.thinking.clear_thinking: false so that reasoning_content from previous assistant turns is retained across the conversation. This is required on the standard API endpoint (/api/paas/v4) — on the Coding Plan endpoint it would be enabled by default. Improves reasoning continuity and cache hit rates in multi-turn tool call chains.
All thinking-enabled agents also pass extra_body.tool_choice: auto defensively

Key Features:

Long-Horizon Tasks: GLM-5.1 supports 8-hour sustained autonomous execution, ideal for complex multi-stage agentic workflows
OpenClaw-Native Orchestration: GLM-5-Turbo is specifically optimized for tool invocation, instruction following, and long-chain execution
Prompt Caching: Significant cost reduction on repeated context (cached input pricing shown)
Ultra-Long Context: 200K tokens for GLM-5.x/4.7/4.6 series
MoE Architecture: Efficient 744B/40B active (GLM-5/5.1), 355B/32B (GLM-4.5), 106B/12B (GLM-4.5-Air)
Tool Calling: Seamless integration with 20+ pentesting tools via function calling
Streaming: Real-time streaming with streaming tool calls support (GLM-4.6+)
Multilingual: Exceptional Chinese and English NLP capabilities
Free Options: GLM-4.7-Flash and GLM-4.5-Flash for prototyping and experimentation

LiteLLM Integration: Set GLM_PROVIDER=zai to enable model name prefixing when using default PentAGI configurations with LiteLLM proxy. Leave empty for direct API usage.

Kimi Provider Configuration

PentAGI integrates with Kimi from Moonshot AI, providing ultra-long context models with multimodal capabilities perfect for analyzing extensive codebases and documentation.

Configuration Variables

VariableDefault ValueDescriptionKIMI_API_KEYKimi API key for authenticationKIMI_SERVER_URLhttps://api.moonshot.ai/v1Kimi API endpoint URL (international)KIMI_PROVIDERProvider prefix for LiteLLM integration (optional)

Configuration Examples

# Direct API usage (international endpoint)
KIMI_API_KEY=your_kimi_api_key
KIMI_SERVER_URL=https://api.moonshot.ai/v1

# Alternative endpoint
KIMI_SERVER_URL=https://api.moonshot.cn/v1  # China

# With LiteLLM proxy
KIMI_API_KEY=your_litellm_key
KIMI_SERVER_URL=http://litellm-proxy:4000
KIMI_PROVIDER=moonshot  # Adds prefix to model names (moonshot/kimi-k2.5) for LiteLLM

Supported Models

PentAGI supports 8 Kimi/Moonshot models with tool calling, streaming, hybrid thinking modes, and multimodal capabilities (text/image/video for K2.x). All kimi-k2-* legacy models (turbo-preview, 0905-preview, 0711-preview, thinking, thinking-turbo) were deprecated by Moonshot on 2026-05-25 and are NOT included. Models marked with * are used in default configuration.

Kimi K2.x Series - Multimodal Flagship

Model IDThinkingMultimodalContextPrice (Input Miss / Output / Cache Hit)Use Casekimi-k2.6*✅ hybrid✅256K$0.95 / $4.00 / $0.16Latest flagship: native multimodal, stronger code, improved instruction compliance (generator/refiner/adviser/coder/pentester default)kimi-k2.5*✅ hybrid✅256K$0.60 / $3.00 / $0.10Previous-gen: 36% cheaper input, same architecture (primary/assistant/installer/utility default)

Moonshot V1 Series - Generation Models (Flexible Parameters)

Model IDThinkingMultimodalContextPrice (Input / Output)Use Casemoonshot-v1-8k❌❌8K$0.20 / $2.00Short text generation, ultra-cheapmoonshot-v1-32k❌❌32K$1.00 / $3.00Long text generationmoonshot-v1-128k❌❌128K$2.00 / $5.00Very long context

Moonshot V1 Vision Series - Image Understanding

Model IDThinkingMultimodalContextPrice (Input / Output)Use Casemoonshot-v1-8k-vision-preview❌✅8K$0.20 / $2.00Vision + short contextmoonshot-v1-32k-vision-preview❌✅32K$1.00 / $3.00Vision + medium contextmoonshot-v1-128k-vision-preview❌✅128K$2.00 / $5.00Vision + long context

Prices: Per 1M tokens. Cache pricing applies to prompt tokens served from automatic context cache (only Kimi K2.x models support cache).

CRITICAL — Kimi K2.6/K2.5 parameter constraints: API returns invalid_request_error for any deviation:
temperature: MUST be 1.0 in thinking mode, MUST be 0.6 in non-thinking mode
top_p: MUST be 0.95
n: MUST be 1
presence_penalty and frequency_penalty: MUST be 0 (not modifiable)
Moonshot V1 models use standard OpenAI-compatible parameters with no such constraints.

Default Agent Configuration:

Strategy: prefer kimi-k2.5 as cost-effective workhorse (36% cheaper input vs kimi-k2.6); reserve kimi-k2.6 for critical reasoning. All kimi-k2.x agents are configured with the API-required fixed parameters (temp/top_p/n) and explicit extra_body.thinking.type. For thinking-enabled agents, extra_body.thinking.keep: "all" is set to preserve historical reasoning_content in multi-turn tool call chains (without it Moonshot returns "thinking is enabled but reasoning_content is missing").

Agent RoleDefault ModelThinkingTemperatureTop PMax OutputGenerator / Refinerkimi-k2.6Enabled (keep=all)1.00.9532768Coderkimi-k2.6Enabled (keep=all)1.00.9520480Pentesterkimi-k2.6Enabled (keep=all)1.00.9516384Adviser (mentor/planner)kimi-k2.6Enabled (keep=all)1.00.958192Primary Agent / Assistantkimi-k2.5Enabled (keep=all)1.00.9516384Installerkimi-k2.5Enabled (keep=all)1.00.9512288Reflector / Searcher / Enricherkimi-k2.5Disabled0.60.954096Simple / Simple JSONkimi-k2.5Disabled0.60.952048

Key Features:

Ultra-Long Context: Up to 256K tokens (K2.x) for comprehensive codebase/documentation analysis
Native Multimodal: K2.6/K2.5 support text + image + video input out of the box
Hybrid Thinking: K2.6/K2.5 toggle between thinking and non-thinking via extra_body.thinking.type
Preserved Thinking (K2.6): thinking.keep: "all" preserves historical reasoning_content across turns — required for multi-turn tool call chains
Automatic Context Caching: K2.x models cache repeated prefixes (~17% of miss price for K2.6, ~17% for K2.5)
Tool Calling: Full function-calling support for K2.x and Moonshot V1
Self-Correction: K2.6 features improved instruction compliance and self-correction
Multilingual: Strong Chinese, English, and multi-language support

Multi-turn with thinking + tool calls: PentAGI's universal reasoning preservation pattern (TextPartWithReasoning + WithPreserveReasoningContent) automatically ensures reasoning_content is sent back in the required TextContent → ToolCall order, satisfying Moonshot's "thinking is enabled but reasoning_content is missing in assistant tool call message" requirement.

LiteLLM Integration: Set KIMI_PROVIDER=moonshot to enable model name prefixing when using default PentAGI configurations with LiteLLM proxy. Leave empty for direct API usage.

Qwen Provider Configuration

PentAGI integrates with Qwen from Alibaba Cloud Model Studio (DashScope), providing powerful multilingual models with reasoning capabilities and context caching support.

Configuration Variables

VariableDefault ValueDescriptionQWEN_API_KEYQwen API key for authenticationQWEN_SERVER_URLhttps://dashscope-us.aliyuncs.com/compatible-mode/v1Qwen API endpoint URL (international)QWEN_PROVIDERProvider prefix for LiteLLM integration (optional)

Configuration Examples

# Direct API usage (Global/US endpoint)
QWEN_API_KEY=your_qwen_api_key
QWEN_SERVER_URL=https://dashscope-us.aliyuncs.com/compatible-mode/v1

# Alternative endpoints
QWEN_SERVER_URL=https://dashscope-intl.aliyuncs.com/compatible-mode/v1  # International (Singapore)
QWEN_SERVER_URL=https://dashscope.aliyuncs.com/compatible-mode/v1       # Chinese Mainland (Beijing)

# With LiteLLM proxy
QWEN_API_KEY=your_litellm_key
QWEN_SERVER_URL=http://litellm-proxy:4000
QWEN_PROVIDER=dashscope  # Adds prefix to model names (dashscope/qwen-plus) for LiteLLM

Supported Models

PentAGI supports 33 Qwen models curated for agent workflows: text reasoning, code generation, and vision-language (browser screenshots). All models are non-snapshot main aliases with tool calling, streaming, thinking modes, and context caching. Models marked with * are used in default configuration.

Flagship Models (Top-tier Reasoning)

Model IDThinkingIntlGlobal/USChinaPrice (Input/Output/Cache)Use Caseqwen3.7-max*✅✅✅✅$2.50/$7.50/$0.50Next-gen flagship for agent-centric era (generator/refiner/adviser default)qwen3.6-max-preview✅✅✅✅$1.30/$7.80/$0.13Preview Max with enhanced vibe coding & front-end skillsqwen3-max✅✅✅✅$1.20/$6.00/$0.24Previous-gen flagship with agent programming upgradesqwen-plus✅✅✅✅$0.40/$4.00/$0.08Qwen3-backbone Plus with switchable thinking modes

Balanced Plus Models (Mid-tier)

Model IDThinkingIntlGlobal/USChinaPrice (Input/Output/Cache)Use Caseqwen3.6-plus*✅✅✅✅$0.50/$3.00/$0.05Native VL Plus with agentic coding (primary/assistant/pentester default)qwen3.5-plus✅✅✅✅$0.40/$2.40/$0.04Previous-gen native VL with strong multimodal capabilities

Fast Flash Models (Cost-optimized)

Model IDThinkingIntlGlobal/USChinaPrice (Input/Output/Cache)Use Caseqwen3.6-flash✅✅✅✅$0.25/$1.50/$0.025Latest Flash with significant agentic-coding boostqwen3.5-flash*✅✅✅✅$0.10/$0.40/$0.01Ultra-fast lightweight (simple/reflector/searcher/enricher default)qwen-flash✅✅✅✅$0.05/$0.40/$0.01Qwen3-series Flash with 1M context, tiered pricing

Code-Specialized Models

Model IDThinkingIntlGlobal/USChinaPrice (Input/Output/Cache)Use Caseqwen3-coder-plus*❌✅✅✅$1.00/$5.00/$0.20Strong coding agent with autonomous programming (coder default)qwen3-coder-flash*❌✅✅✅$0.30/$1.50/$0.06Fast code-gen with multi-turn tool stability (installer default)qwen3-coder-next❌✅✅✅$0.30/$1.50/—Open-source code generation, SOTA at same scale

Vision-Language Models (Browser & Screenshot Analysis)

Model IDThinkingIntlGlobal/USChinaPrice (Input/Output/Cache)Use Caseqwen3-vl-plus✅✅✅✅$0.20/$1.60/$0.04VL with visual agent capabilities, ultra-long video understandingqwen3-vl-flash✅✅✅✅$0.05/$0.40/$0.01Small VL with 2D/3D localization for browser triageqvq-max✅✅✅✅$1.20/$4.80/—Visual reasoning with chain-of-thought

Open-Source Qwen3.6 Series

Model IDThinkingIntlGlobal/USChinaPrice (Input/Output/Cache)Use Caseqwen3.6-27b✅✅✅✅$0.60/$3.60/—Native VL on hybrid architecture, on-premises readyqwen3.6-35b-a3b✅✅✅✅$0.25/$1.49/—Efficient 35B MoE (~3B active) for continuous monitoring

Open-Source Qwen3.5 Series

Model IDThinkingIntlGlobal/USChinaPrice (Input/Output/Cache)Use Caseqwen3.5-397b-a17b✅✅✅✅$0.60/$3.60/—Largest 397B params (~17B active), exceptional reasoningqwen3.5-122b-a10b✅✅✅✅$0.40/$3.20/—Large 122B params (~10B active), strong balanceqwen3.5-35b-a3b✅✅✅✅$0.25/$2.00/—Efficient 35B MoE (~3B active), cost-effectiveqwen3.5-27b✅✅✅✅$0.30/$2.40/—Medium 27B with hybrid linear attention + sparse MoE

Open-Source Qwen3 Coder Series

Model IDThinkingIntlGlobal/USChinaPrice (Input/Output/Cache)Use Caseqwen3-coder-480b-a35b-instruct❌✅✅✅$1.50/$7.50/—Largest open coder MoE (480B/~35B active)qwen3-coder-30b-a3b-instruct❌✅✅✅$0.45/$2.25/—Efficient 30B MoE (~3B active), repository-scale

Open-Source Qwen3 Dense & MoE Series

Model IDThinkingIntlGlobal/USChinaPrice (Input/Output/Cache)Use Caseqwen3-next-80b-a3b-thinking✅✅✅✅$0.15/$1.20/—Next-gen 80B MoE (~3B active) thinking-onlyqwen3-next-80b-a3b-instruct❌✅✅✅$0.15/$1.20/—Next-gen 80B MoE instruction-followingqwen3-235b-a22b✅✅✅✅$0.70/$8.40/—Dual-mode 235B MoE (~22B active)qwen3-32b✅✅✅✅$0.16/$0.64/—Versatile 32B dense dual-modeqwen3-30b-a3b✅✅✅✅$0.20/$2.40/—Efficient 30B MoE (~3B active)qwen3-14b✅✅✅✅$0.35/$4.20/—Medium 14B dense performance-cost balanceqwen3-8b✅✅✅✅$0.18/$2.10/—Compact 8B dense efficiencyqwen3-4b✅✅✅✅$0.11/$1.26/—Lightweight 4B dense for simple tasksqwen3-1.7b✅✅✅✅$0.11/$1.26/—Ultra-compact 1.7B basic checksqwen3-0.6b✅✅✅✅$0.11/$1.26/—Smallest 0.6B for edge monitoring

Prices: Per 1M tokens. Cache pricing reflects implicit cache hit (when available); MoE/dense open-source models do not expose cache pricing. Tiered models (Max/Plus) show lowest-tier pricing (typically ≤32k or ≤256k input); larger contexts incur higher rates per Alibaba Cloud pricing.

Region Availability:

Intl (International): Singapore region (dashscope-intl.aliyuncs.com)
Global/US: US Virginia region (dashscope-us.aliyuncs.com)
China: Chinese Mainland Beijing region (dashscope.aliyuncs.com)

Default Agent Configuration:

Agent RoleDefault ModelTierGenerator / Refiner / Adviser (planning, mentor)qwen3.7-maxFlagshipPrimary / Assistant / Pentesterqwen3.6-plusBalancedCoder (exploit development)qwen3-coder-plusCode+Installer (env setup)qwen3-coder-flashCode FastSimple / Reflector / Searcher / Enricherqwen3.5-flashFast

Key Features:

Agent-Centric Design: Qwen3.7-Max is purpose-built for long-horizon autonomous execution and tool invocation
Automatic Context Caching: 30-50% cost reduction on repeated context with implicit cache
Extended Thinking: Chain-of-thought reasoning for complex security analysis (Qwen3.7/3.6/3.5/3-Max, QVQ-Max)
Code Specialization: Qwen3-Coder series with multi-turn tool interaction and repository-level understanding
Vision-Language: Qwen3-VL series for browser screenshot triage, 2D/3D localization, OCR-level analysis
Tool Calling: Seamless integration with 20+ pentesting tools via function calling
Streaming: Real-time response streaming for interactive workflows
Multilingual: Strong Chinese, English, and multi-language support
Open-Source Variants: Dense and MoE models from 0.6B to 480B for on-premises/air-gapped deployments

LiteLLM Integration: Set QWEN_PROVIDER=dashscope to enable model name prefixing when using default PentAGI configurations with LiteLLM proxy. Leave empty for direct API usage.

Alternative Integrations

DashScope is fully OpenAI-compatible, so Qwen can also power two other PentAGI subsystems through the standard OpenAI client.

As embedding provider (text-embedding-v4, see Alibaba Cloud Model Studio pricing):

EMBEDDING_PROVIDER=openai
EMBEDDING_URL=https://dashscope-intl.aliyuncs.com/compatible-mode/v1  # International (Singapore)
# EMBEDDING_URL=https://dashscope.aliyuncs.com/compatible-mode/v1     # Chinese Mainland
EMBEDDING_KEY=sk-*******
EMBEDDING_MODEL=text-embedding-v4
EMBEDDING_BATCH_SIZE=         # optional, default applies
EMBEDDING_STRIP_NEW_LINES=    # optional, default applies

Note: the Global/US DashScope endpoint (dashscope-us.aliyuncs.com) does not expose embedding APIs — use the International or China endpoints for text-embedding-v4.

As OpenAI-typed custom LLM provider: instead of the dedicated QWEN_* variables, you can wire any Qwen chat model through PentAGI's custom OpenAI-compatible provider by pointing OPENAI_SERVER_URL (or a custom provider entry) to the DashScope /compatible-mode/v1 endpoint and selecting the desired Qwen model name. Useful when you already manage all model traffic through a single OpenAI-shaped client (e.g. shared with LiteLLM/OneAPI proxies).

Advanced Setup

Langfuse Integration

Langfuse provides advanced capabilities for monitoring and analyzing AI agent operations.

Configure Langfuse environment variables in existing .env file.

Database Credentials

LANGFUSE_POSTGRES_USER and LANGFUSE_POSTGRES_PASSWORD - Langfuse PostgreSQL credentials
LANGFUSE_CLICKHOUSE_USER and LANGFUSE_CLICKHOUSE_PASSWORD - ClickHouse credentials
LANGFUSE_REDIS_AUTH - Redis password

Encryption and Security Keys

LANGFUSE_SALT - Salt for hashing in Langfuse Web UI
LANGFUSE_ENCRYPTION_KEY - Encryption key (32 bytes in hex)
LANGFUSE_NEXTAUTH_SECRET - Secret key for NextAuth

Admin Credentials

LANGFUSE_INIT_USER_EMAIL - Admin email
LANGFUSE_INIT_USER_PASSWORD - Admin password
LANGFUSE_INIT_USER_NAME - Admin username

API Keys and Tokens

LANGFUSE_INIT_PROJECT_PUBLIC_KEY - Project public key (used from PentAGI side too)
LANGFUSE_INIT_PROJECT_SECRET_KEY - Project secret key (used from PentAGI side too)

S3 Storage

LANGFUSE_S3_ACCESS_KEY_ID - S3 access key ID
LANGFUSE_S3_SECRET_ACCESS_KEY - S3 secret access key

Enable integration with Langfuse for PentAGI service in .env file.

LANGFUSE_BASE_URL=http://langfuse-web:3000
LANGFUSE_PROJECT_ID= # default: value from ${LANGFUSE_INIT_PROJECT_ID}
LANGFUSE_PUBLIC_KEY= # default: value from ${LANGFUSE_INIT_PROJECT_PUBLIC_KEY}
LANGFUSE_SECRET_KEY= # default: value from ${LANGFUSE_INIT_PROJECT_SECRET_KEY}

Run the Langfuse stack:

curl -O https://raw.githubusercontent.com/vxcontrol/pentagi/master/docker-compose-langfuse.yml
docker compose -f docker-compose.yml -f docker-compose-langfuse.yml up -d

Visit localhost:4000 to access Langfuse Web UI with credentials from .env file:

LANGFUSE_INIT_USER_EMAIL - Admin email
LANGFUSE_INIT_USER_PASSWORD - Admin password

Monitoring and Observability

For detailed system operation tracking, integration with monitoring tools is available.

Enable integration with OpenTelemetry and all observability services for PentAGI in .env file.

OTEL_HOST=otelcol:8148

Run the observability stack:

curl -O https://raw.githubusercontent.com/vxcontrol/pentagi/master/docker-compose-observability.yml
docker compose -f docker-compose.yml -f docker-compose-observability.yml up -d

Visit localhost:3000 to access Grafana Web UI.

[!NOTE] If you want to use Observability stack with Langfuse, you need to enable integration in .env file to set LANGFUSE_OTEL_EXPORTER_OTLP_ENDPOINT to http://otelcol:4318.
To run all available stacks together (Langfuse, Graphiti, and Observability):
docker compose -f docker-compose.yml -f docker-compose-langfuse.yml -f docker-compose-graphiti.yml -f docker-compose-observability.yml up -d
You can also register aliases for these commands in your shell to run it faster:
alias pentagi="docker compose -f docker-compose.yml -f docker-compose-langfuse.yml -f docker-compose-graphiti.yml -f docker-compose-observability.yml"
alias pentagi-up="docker compose -f docker-compose.yml -f docker-compose-langfuse.yml -f docker-compose-graphiti.yml -f docker-compose-observability.yml up -d"
alias pentagi-down="docker compose -f docker-compose.yml -f docker-compose-langfuse.yml -f docker-compose-graphiti.yml -f docker-compose-observability.yml down"

Knowledge Graph Integration (Graphiti)

[!IMPORTANT] The Graphiti integration is currently a beta feature and has notable provider limitations. See Current Limitations below before enabling it in production.

PentAGI integrates with Graphiti, a temporal knowledge graph system powered by Neo4j, to provide advanced semantic understanding and relationship tracking for AI agent operations. The vxcontrol fork provides custom entity and edge types that are specific to pentesting purposes.

What is Graphiti?

Graphiti automatically extracts and stores structured knowledge from agent interactions, building a graph of entities, relationships, and temporal context. This enables:

Semantic Memory: Store and recall relationships between tools, targets, vulnerabilities, and techniques
Contextual Understanding: Track how different pentesting actions relate to each other over time
Knowledge Reuse: Learn from past penetration tests and apply insights to new assessments
Advanced Querying: Search for complex patterns like "What tools were effective against similar targets?"

Enabling Graphiti

The Graphiti knowledge graph is optional and disabled by default. To enable it:

Configure Graphiti environment variables in .env file:

## Graphiti knowledge graph settings
GRAPHITI_ENABLED=true
GRAPHITI_TIMEOUT=30
GRAPHITI_URL=http://graphiti:8000
GRAPHITI_MODEL_NAME=gpt-5-mini

# Neo4j settings (used by Graphiti stack)
NEO4J_USER=neo4j
NEO4J_DATABASE=neo4j
NEO4J_PASSWORD=devpassword
NEO4J_URI=bolt://neo4j:7687

# OpenAI API key (required by Graphiti for entity extraction)
OPEN_AI_KEY=your_openai_api_key

Run the Graphiti stack along with the main PentAGI services:

# Download the Graphiti compose file if needed
curl -O https://raw.githubusercontent.com/vxcontrol/pentagi/master/docker-compose-graphiti.yml

# Start PentAGI with Graphiti
docker compose -f docker-compose.yml -f docker-compose-graphiti.yml up -d

Verify Graphiti is running:

# Check service health
docker compose -f docker-compose.yml -f docker-compose-graphiti.yml ps graphiti neo4j

# View Graphiti logs
docker compose -f docker-compose.yml -f docker-compose-graphiti.yml logs -f graphiti

# Access Neo4j Browser (optional)
# Visit http://localhost:7474 and login with NEO4J_USER/NEO4J_PASSWORD

# Access Graphiti API (optional, for debugging)
# Visit http://localhost:8000/docs for Swagger API documentation

[!NOTE] The Graphiti service is defined in docker-compose-graphiti.yml as a separate stack. You must run both compose files together to enable the knowledge graph functionality. The pre-built Docker image vxcontrol/graphiti:latest is used by default.

What Gets Stored

When enabled, PentAGI automatically captures:

Agent Responses: All agent reasoning, analysis, and decisions
Tool Executions: Commands executed, tools used, and their results
Context Information: Flow, task, and subtask hierarchy

Current Limitations

The Graphiti integration is currently a beta feature. Operators should plan around the following constraints before enabling it in production:

OpenAI-compatible LLM only. The bundled vxcontrol/graphiti image authenticates against a single OpenAI-compatible endpoint configured through PentAGI's .env variables OPEN_AI_KEY and OPEN_AI_SERVER_URL (default https://api.openai.com/v1). docker-compose-graphiti.yml maps these into the container as OPENAI_API_KEY and OPENAI_BASE_URL, so operators do not set the container variables directly. Provider credentials configured elsewhere in PentAGI for Anthropic, Google AI (Gemini), AWS Bedrock, DeepSeek, GLM, Kimi, or Qwen are not used by Graphiti for entity extraction. If your deployment cannot reach an OpenAI-compatible endpoint, leave GRAPHITI_ENABLED=false.
Single fixed model per deployment. Graphiti uses one model name (GRAPHITI_MODEL_NAME, default gpt-5-mini) for all extractions. The model cannot be selected per agent or per flow.
Independent billing. Even when a flow runs against a non-OpenAI provider, Graphiti still incurs cost on the configured OpenAI-compatible endpoint.
No in-app graph explorer yet. Browsing the captured graph relies on the Neo4j Browser at http://localhost:7474 and the Graphiti Swagger UI at http://localhost:8000/docs. There is no PentAGI UI surface for the graph today.

When GRAPHITI_ENABLED=false, PentAGI continues to operate with its primary memory and vector store; only the additional knowledge graph features are skipped.

GitHub and Google OAuth Integration

OAuth integration with GitHub and Google allows users to authenticate using their existing accounts on these platforms. This provides several benefits:

Simplified login process without need to create separate credentials
Enhanced security through trusted identity providers
Access to user profile information from GitHub/Google accounts
Seamless integration with existing development workflows

PentAGI uses PUBLIC_URL as the public origin/base URL for OAuth redirects. In the default deployment, both GitHub and Google callbacks are handled by:

${PUBLIC_URL}/api/v1/auth/login-callback

For GitHub OAuth:

Create a new OAuth App in your GitHub account.
Set Homepage URL to your PUBLIC_URL.
Set Authorization callback URL to ${PUBLIC_URL}/api/v1/auth/login-callback.
Add the client credentials to your .env file:

PUBLIC_URL=https://pentagi.example.com
OAUTH_GITHUB_CLIENT_ID=your_github_client_id
OAUTH_GITHUB_CLIENT_SECRET=your_github_client_secret

For Google OAuth:

Create OAuth credentials in your Google Cloud project.
Use the same callback endpoint: ${PUBLIC_URL}/api/v1/auth/login-callback.
Add the client credentials to your .env file:

PUBLIC_URL=https://pentagi.example.com
OAUTH_GOOGLE_CLIENT_ID=your_google_client_id
OAUTH_GOOGLE_CLIENT_SECRET=your_google_client_secret

Make sure PUBLIC_URL matches the externally accessible HTTPS address of your PentAGI instance and does not include the callback path itself. If the URL configured in the OAuth provider does not exactly match the callback generated by PentAGI, the provider will reject the login attempt with a redirect URI mismatch error.

Docker Image Configuration

PentAGI allows you to configure Docker image selection for executing various tasks. The system automatically chooses the most appropriate image based on the task type, but you can constrain this selection by specifying your preferred images:

VariableDefaultDescriptionPENTAGI_IMAGEvxcontrol/pentagi:latestDocker image used for the main PentAGI application serviceDOCKER_DEFAULT_IMAGEdebian:latestDefault Docker image for general tasks and ambiguous casesDOCKER_DEFAULT_IMAGE_FOR_PENTESTvxcontrol/kali-linuxDefault Docker image for security/penetration testing tasks

PENTAGI_IMAGE changes the image used by the main pentagi service in docker-compose.yml. The DOCKER_DEFAULT_IMAGE and DOCKER_DEFAULT_IMAGE_FOR_PENTEST variables only affect automatic worker image selection for task execution inside PentAGI. They do not rewrite the rest of the Compose stack, so services such as pgvector, scraper, and the optional graphiti stack still use the image references defined in the compose files.

When DOCKER_DEFAULT_IMAGE and DOCKER_DEFAULT_IMAGE_FOR_PENTEST are set, AI agents will be limited to the image choices you specify. This is particularly useful for:

Security Enforcement: Restricting usage to only verified and trusted images
Environment Standardization: Using corporate or customized images across all operations
Performance Optimization: Utilizing pre-built images with necessary tools already installed

Configuration examples:

# Using a custom PentAGI application image
PENTAGI_IMAGE=registry.example.com/security/pentagi:latest

# Using a custom image for general tasks
DOCKER_DEFAULT_IMAGE=mycompany/custom-debian:latest

# Using a specialized image for penetration testing
DOCKER_DEFAULT_IMAGE_FOR_PENTEST=mycompany/pentest-tools:v2.0

[!NOTE] If a user explicitly specifies a particular Docker image in their task, the system will try to use that exact image, ignoring these settings. These variables only affect the system's automatic image selection process.

For an advanced OpenVAS/GVM experiment that uses a custom pentest image, see OpenVAS via a Custom Pentest Image.

Restricted Networks, Docker Mirrors, and Proxies

If your environment cannot reach Docker Hub (docker.io) directly, changing PentAGI environment variables is usually not enough to fix image download failures. PentAGI still relies on Docker's own registry access for Compose-managed services, and the installer network checks also validate Docker Hub reachability.

For restricted networks:

Confirm that the host can resolve and reach docker.io.
If your environment requires an outbound proxy for PentAGI or installer HTTP traffic, set the PROXY_URL environment variable. To route Docker image pulls through a proxy, configure the Docker daemon or Docker Desktop proxy separately — Docker does not use PentAGI's PROXY_URL for registry access.
If Docker Hub is blocked or heavily rate-limited, configure an organization-approved registry mirror or registry proxy before running the installer or docker compose up.
Restart Docker after changing the daemon configuration, then rerun the installer checks or Compose startup.

Example Docker daemon mirror configuration:

{
  "registry-mirrors": ["https://mirror.example.com"]
}

On Linux, this is typically configured in /etc/docker/daemon.json. On Docker Desktop, use the equivalent Docker Engine or proxy settings. A Docker Hub mirror covers Docker Hub-hosted images such as vxcontrol/*, but the main Compose stack already includes quay.io/prometheuscommunity/postgres-exporter, and the optional observability stack includes gcr.io/cadvisor/cadvisor. Those registries still need direct access or individually approved proxy/mirror paths.

See the official Docker documentation for registry mirrors and daemon proxy configuration.

Development

Development Requirements

golang
nodejs
docker
postgres
commitlint

Environment Setup

Backend Setup

Run once cd backend && go mod download to install needed packages.

For generating swagger files have to run

swag init -g ../../pkg/server/router.go -o pkg/server/docs/ --parseDependency --parseInternal --parseDepth 2 -d cmd/pentagi

before installing swag package via

go install github.com/swaggo/swag/cmd/swag@v1.8.7

For generating graphql resolver files have to run

go run github.com/99designs/gqlgen --config ./gqlgen/gqlgen.yml

after that you can see the generated files in pkg/graph folder.

For generating ORM methods (database package) from sqlc configuration

docker run --rm -v $(pwd):/src -w /src --network pentagi-network -e DATABASE_URL="{URL}" sqlc/sqlc:1.27.0 generate -f sqlc/sqlc.yml

For generating Langfuse SDK from OpenAPI specification

fern generate --local

and to install fern-cli

pnpm add -g fern-api

Testing

For running tests cd backend && go test -v ./...

Frontend Setup

Run once cd frontend && pnpm install to install needed packages.

For generating graphql files have to run pnpm run graphql:generate which using graphql-codegen.ts file.

Be sure that you have graphql-codegen installed globally:

pnpm add -g graphql-codegen

After that you can run:

pnpm run prettier to check if your code is formatted correctly
pnpm run prettier:fix to fix it
pnpm run lint to check if your code is linted correctly
pnpm run lint:fix to fix it

For generating SSL certificates you need to run pnpm run ssl:generate which using generate-ssl.ts file or it will be generated automatically when you run pnpm run dev.

Backend Configuration

Edit the configuration for backend in .vscode/launch.json file:

DATABASE_URL - PostgreSQL database URL (eg. postgres://postgres:postgres@localhost:5432/pentagidb?sslmode=disable)
DOCKER_HOST - Docker SDK API (eg. for macOS DOCKER_HOST=unix:///Users/<my-user>/Library/Containers/com.docker.docker/Data/docker.raw.sock) more info

Optional:

SERVER_PORT - Port to run the server (default: 8443)
SERVER_USE_SSL - Enable SSL for the server (default: false)

PostgreSQL / pgvector connection pool sizing

PentAGI opens two independent connection pools to the same Postgres instance:

PoolEnv varDefaultUsed byShared sql.DBDATABASE_MAX_OPEN_CONNS25All sqlc queries and GORM handlers share a single *sql.DBShared pgxpoolDATABASE_VECTOR_MAX_CONNS10All pgvector stores (agent memory + knowledge API) share a single pool

Additional tuning knob:

DATABASE_MAX_IDLE_CONNS — maximum idle connections kept open in the sql.DB pool between requests (default: 5).

Budget for the stock vxcontrol/pgvector image (max_connections = 100, superuser_reserved_connections = 3):

Available for client connections  = 97
  pentagi sql.DB  (DATABASE_MAX_OPEN_CONNS)   = 25
  pentagi pgxpool (DATABASE_VECTOR_MAX_CONNS) = 10
  pgexporter                                  =  3
  autovacuum workers                          =  3
  ─────────────────────────────────────────
  Total consumed                              = 41
  Free buffer                                 = 56  (≈ 58 %)

The defaults are sized for 10 parallel flows with concurrent API requests. If you run more flows or deploy multiple PentAGI instances against the same Postgres, raise max_connections via the command override in docker-compose.yml and increase the pool sizes proportionally:

pgvector:
  image: vxcontrol/pgvector:latest
  command: postgres -c max_connections=200

To inspect the live connection budget on a running deployment:

# Postgres limits
docker exec pgvector sh -c 'psql -U "$POSTGRES_USER" -d "$POSTGRES_DB" -c \
  "SELECT name, setting FROM pg_settings
   WHERE name IN ('"'"'max_connections'"'"', '"'"'superuser_reserved_connections'"'"');"'

# Current usage vs. available
docker exec pgvector sh -c 'psql -U "$POSTGRES_USER" -d "$POSTGRES_DB" -c \
  "SELECT max_conn, used, max_conn - used AS available
   FROM (SELECT current_setting('"'"'max_connections'"'"')::int AS max_conn,
                count(*) AS used FROM pg_stat_activity) t;"'

# Breakdown by client
docker exec pgvector sh -c 'psql -U "$POSTGRES_USER" -d "$POSTGRES_DB" -c \
  "SELECT application_name, client_addr, state, count(*)
   FROM pg_stat_activity
   WHERE pid <> pg_backend_pid()
   GROUP BY 1, 2, 3 ORDER BY count DESC;"'

Frontend Configuration

Edit the configuration for frontend in .vscode/launch.json file:

VITE_API_URL - Backend API URL. Omit the URL scheme (e.g., localhost:8080 NOT http://localhost:8080)
VITE_USE_HTTPS - Enable SSL for the server (default: false)
VITE_PORT - Port to run the server (default: 8000)
VITE_HOST - Host to run the server (default: 0.0.0.0)

Running the Application

Backend

Run the command(s) in backend folder:

Use .env file to set environment variables like a source .env
Run go run cmd/pentagi/main.go to start the server

[!NOTE] The first run can take a while as dependencies and docker images need to be downloaded to setup the backend environment.

Frontend

Run the command(s) in frontend folder:

Run pnpm install to install the dependencies
Run pnpm run dev to run the web app
Run pnpm run build to build the web app

Open your browser and visit the web app URL.

Testing LLM Agents

PentAGI includes a powerful utility called ctester for testing and validating LLM agent capabilities. This tool helps ensure your LLM provider configurations work correctly with different agent types, allowing you to optimize model selection for each specific agent role.

The utility features parallel testing of multiple agents, detailed reporting, and flexible configuration options.

Key Features

Parallel Testing: Tests multiple agents simultaneously for faster results
Comprehensive Test Suite: Evaluates basic completion, JSON responses, function calling, and penetration testing knowledge
Detailed Reporting: Generates markdown reports with success rates and performance metrics
Flexible Configuration: Test specific agents or test groups as needed
Specialized Test Groups: Includes domain-specific tests for cybersecurity and penetration testing scenarios

Usage Scenarios

For Developers (with local Go environment)

If you've cloned the repository and have Go installed:

# Default configuration with .env file
cd backend
go run cmd/ctester/*.go -verbose

# Custom provider configuration
go run cmd/ctester/*.go -config ../examples/configs/openrouter.provider.yml -verbose

# Generate a report file
go run cmd/ctester/*.go -config ../examples/configs/deepinfra.provider.yml -report ../test-report.md

# Test specific agent types only
go run cmd/ctester/*.go -agents simple,simple_json,primary_agent -verbose

# Test specific test groups only
go run cmd/ctester/*.go -groups basic,advanced -verbose

For Users (using Docker image)

If you prefer to use the pre-built Docker image without setting up a development environment:

# Using Docker to test with default environment
docker run --rm -v $(pwd)/.env:/opt/pentagi/.env vxcontrol/pentagi /opt/pentagi/bin/ctester -verbose

# Test with your custom provider configuration
docker run --rm \
  -v $(pwd)/.env:/opt/pentagi/.env \
  -v $(pwd)/my-config.yml:/opt/pentagi/config.yml \
  vxcontrol/pentagi /opt/pentagi/bin/ctester -config /opt/pentagi/config.yml -agents simple,primary_agent,coder -verbose

# Generate a detailed report
docker run --rm \
  -v $(pwd)/.env:/opt/pentagi/.env \
  -v $(pwd):/opt/pentagi/output \
  vxcontrol/pentagi /opt/pentagi/bin/ctester -report /opt/pentagi/output/report.md

Using Pre-configured Providers

The Docker image comes with built-in support for major providers (OpenAI, Anthropic, Gemini, Ollama) and pre-configured provider files for additional services (OpenRouter, DeepInfra, DeepSeek, Moonshot, Novita):

# Test with OpenRouter configuration
docker exec -it pentagi /opt/pentagi/bin/ctester -config /opt/pentagi/conf/openrouter.provider.yml

# Test with DeepInfra configuration
docker exec -it pentagi /opt/pentagi/bin/ctester -config /opt/pentagi/conf/deepinfra.provider.yml

# Test with DeepSeek configuration
docker exec -it pentagi /opt/pentagi/bin/ctester -provider deepseek

# Test with GLM configuration
docker exec -it pentagi /opt/pentagi/bin/ctester -provider glm

# Test with Kimi configuration
docker exec -it pentagi /opt/pentagi/bin/ctester -provider kimi

# Test with Qwen configuration
docker exec -it pentagi /opt/pentagi/bin/ctester -provider qwen

# Test with DeepSeek configuration file for custom provider
docker exec -it pentagi /opt/pentagi/bin/ctester -config /opt/pentagi/conf/deepseek.provider.yml

# Test with Moonshot configuration file for custom provider
docker exec -it pentagi /opt/pentagi/bin/ctester -config /opt/pentagi/conf/moonshot.provider.yml

# Test with Novita configuration
docker exec -it pentagi /opt/pentagi/bin/ctester -config /opt/pentagi/conf/novita.provider.yml

# Test with OpenAI configuration
docker exec -it pentagi /opt/pentagi/bin/ctester -type openai

# Test with Anthropic configuration
docker exec -it pentagi /opt/pentagi/bin/ctester -type anthropic

# Test with Gemini configuration
docker exec -it pentagi /opt/pentagi/bin/ctester -type gemini

# Test with AWS Bedrock configuration
docker exec -it pentagi /opt/pentagi/bin/ctester -type bedrock

# Test with Custom OpenAI configuration
docker exec -it pentagi /opt/pentagi/bin/ctester -config /opt/pentagi/conf/custom-openai.provider.yml

# Test with Ollama configuration (local inference)
docker exec -it pentagi /opt/pentagi/bin/ctester -config /opt/pentagi/conf/ollama-llama318b.provider.yml

# Test with Ollama Qwen3 32B configuration (requires custom model creation)
docker exec -it pentagi /opt/pentagi/bin/ctester -config /opt/pentagi/conf/ollama-qwen332b-fp16-tc.provider.yml

# Test with Ollama QwQ 32B configuration (requires custom model creation and 71.3GB VRAM)
docker exec -it pentagi /opt/pentagi/bin/ctester -config /opt/pentagi/conf/ollama-qwq32b-fp16-tc.provider.yml

To use these configurations, your .env file only needs to contain:

LLM_SERVER_URL=https://openrouter.ai/api/v1      # or https://api.deepinfra.com/v1/openai or https://api.openai.com/v1 or https://api.novita.ai/openai
LLM_SERVER_KEY=your_api_key
LLM_SERVER_MODEL=                                # Leave empty, as models are specified in the config
LLM_SERVER_CONFIG_PATH=/opt/pentagi/conf/openrouter.provider.yml  # or deepinfra.provider.ymll or custom-openai.provider.yml or novita.provider.yml
LLM_SERVER_PROVIDER=                             # Provider name for LiteLLM proxy (e.g., openrouter, deepseek, moonshot, novita)
LLM_SERVER_LEGACY_REASONING=false                # Controls reasoning format, for OpenAI must be true (default: false)
LLM_SERVER_PRESERVE_REASONING=false              # Preserve reasoning content in multi-turn conversations (required by Moonshot, default: false)

# For OpenAI (official API)
OPEN_AI_KEY=your_openai_api_key                  # Your OpenAI API key
OPEN_AI_SERVER_URL=https://api.openai.com/v1     # OpenAI API endpoint

# For Anthropic (Claude models)
ANTHROPIC_API_KEY=your_anthropic_api_key         # Your Anthropic API key
ANTHROPIC_SERVER_URL=https://api.anthropic.com/v1  # Anthropic API endpoint

# For Gemini (Google AI)
GEMINI_API_KEY=your_gemini_api_key               # Your Google AI API key
GEMINI_SERVER_URL=https://generativelanguage.googleapis.com  # Google AI API endpoint

# For AWS Bedrock (enterprise foundation models)
BEDROCK_REGION=us-east-1                         # AWS region for Bedrock service
# Authentication (choose one method, priority: DefaultAuth > BearerToken > AccessKey):
BEDROCK_DEFAULT_AUTH=false                       # Use AWS SDK credential chain (env vars, EC2 role, ~/.aws/credentials)
BEDROCK_BEARER_TOKEN=                            # Bearer token authentication (takes priority over static credentials)
BEDROCK_ACCESS_KEY_ID=your_aws_access_key        # AWS access key ID (static credentials)
BEDROCK_SECRET_ACCESS_KEY=your_aws_secret_key    # AWS secret access key (static credentials)
BEDROCK_SESSION_TOKEN=                           # AWS session token (optional, for temporary credentials with static auth)
BEDROCK_SERVER_URL=                              # Optional custom Bedrock endpoint (VPC endpoints, local testing)

# For Ollama (local server or cloud)
OLLAMA_SERVER_URL=                               # Local: http://ollama-server:11434, Cloud: https://ollama.com
OLLAMA_SERVER_API_KEY=                           # Required for Ollama Cloud (https://ollama.com/settings/keys), leave empty for local
OLLAMA_SERVER_MODEL=
OLLAMA_SERVER_CONFIG_PATH=
OLLAMA_SERVER_PULL_MODELS_TIMEOUT=
OLLAMA_SERVER_PULL_MODELS_ENABLED=
OLLAMA_SERVER_LOAD_MODELS_ENABLED=

# For DeepSeek (Chinese AI with strong reasoning)
DEEPSEEK_API_KEY=                                # DeepSeek API key
DEEPSEEK_SERVER_URL=https://api.deepseek.com     # DeepSeek API endpoint
DEEPSEEK_PROVIDER=                               # Optional: LiteLLM prefix (e.g., 'deepseek')

# For GLM (Zhipu AI)
GLM_API_KEY=                                     # GLM API key
GLM_SERVER_URL=https://api.z.ai/api/paas/v4      # GLM API endpoint (international)
GLM_PROVIDER=                                    # Optional: LiteLLM prefix (e.g., 'zai')

# For Kimi (Moonshot AI)
KIMI_API_KEY=                                    # Kimi API key
KIMI_SERVER_URL=https://api.moonshot.ai/v1       # Kimi API endpoint (international)
KIMI_PROVIDER=                                   # Optional: LiteLLM prefix (e.g., 'moonshot')

# For Qwen (Alibaba Cloud DashScope)
QWEN_API_KEY=                                    # Qwen API key
QWEN_SERVER_URL=https://dashscope-us.aliyuncs.com/compatible-mode/v1  # Qwen API endpoint (US)
QWEN_PROVIDER=                                   # Optional: LiteLLM prefix (e.g., 'dashscope')

# For Ollama (local inference) use variables above
OLLAMA_SERVER_URL=http://localhost:11434
OLLAMA_SERVER_MODEL=llama3.1:8b-instruct-q8_0
OLLAMA_SERVER_CONFIG_PATH=/opt/pentagi/conf/ollama-llama318b.provider.yml
OLLAMA_SERVER_PULL_MODELS_ENABLED=false
OLLAMA_SERVER_LOAD_MODELS_ENABLED=false

Using OpenAI with Unverified Organizations

For OpenAI accounts with unverified organizations that don't have access to the latest reasoning models (o1, o3, o4-mini), you need to use a custom configuration.

To use OpenAI with unverified organization accounts, configure your .env file as follows:

LLM_SERVER_URL=https://api.openai.com/v1
LLM_SERVER_KEY=your_openai_api_key
LLM_SERVER_MODEL=                                # Leave empty, models are specified in config
LLM_SERVER_CONFIG_PATH=/opt/pentagi/conf/custom-openai.provider.yml
LLM_SERVER_LEGACY_REASONING=true                 # Required for OpenAI reasoning format

This configuration uses the pre-built custom-openai.provider.yml file that maps all agent types to models available for unverified organizations, using o3-mini instead of models like o1, o3, and o4-mini.

You can test this configuration using:

# Test with custom OpenAI configuration for unverified accounts
docker exec -it pentagi /opt/pentagi/bin/ctester -config /opt/pentagi/conf/custom-openai.provider.yml

[!NOTE] The LLM_SERVER_LEGACY_REASONING=true setting is crucial for OpenAI compatibility as it ensures reasoning parameters are sent in the format expected by OpenAI's API.

Using LiteLLM Proxy

When using LiteLLM proxy to access various LLM providers, model names are prefixed with the provider name (e.g., moonshot/kimi-2.5 instead of kimi-2.5). To use the same provider configuration files with both direct API access and LiteLLM proxy, set the LLM_SERVER_PROVIDER variable:

# Direct access to Moonshot API
LLM_SERVER_URL=https://api.moonshot.ai/v1
LLM_SERVER_KEY=your_moonshot_api_key
LLM_SERVER_CONFIG_PATH=/opt/pentagi/conf/moonshot.provider.yml
LLM_SERVER_PROVIDER=                             # Empty for direct access

# Access via LiteLLM proxy
LLM_SERVER_URL=http://litellm-proxy:4000
LLM_SERVER_KEY=your_litellm_api_key
LLM_SERVER_CONFIG_PATH=/opt/pentagi/conf/moonshot.provider.yml
LLM_SERVER_PROVIDER=moonshot                     # Provider prefix for LiteLLM

With LLM_SERVER_PROVIDER=moonshot, the system automatically prefixes all model names from the configuration file with moonshot/, making them compatible with LiteLLM's model naming convention.

LiteLLM Provider Name Mapping:

When using LiteLLM proxy, set the corresponding *_PROVIDER variable to enable model prefixing:

deepseek - for DeepSeek models (DEEPSEEK_PROVIDER=deepseek → deepseek/deepseek-v4-flash)
zai - for GLM models (GLM_PROVIDER=zai → zai/glm-4)
moonshot - for Kimi models (KIMI_PROVIDER=moonshot → moonshot/kimi-k2.5)
dashscope - for Qwen models (QWEN_PROVIDER=dashscope → dashscope/qwen-plus)
openai, anthropic, gemini - for major cloud providers
openrouter - for OpenRouter aggregator
deepinfra - for DeepInfra hosting
novita - for Novita AI
Any other provider name configured in your LiteLLM instance

Example with LiteLLM:

# Use DeepSeek models via LiteLLM proxy with model prefixing
DEEPSEEK_API_KEY=your_litellm_proxy_key
DEEPSEEK_SERVER_URL=http://litellm-proxy:4000
DEEPSEEK_PROVIDER=deepseek  # Models become deepseek/deepseek-v4-flash, deepseek/deepseek-v4-pro for LiteLLM

# Direct DeepSeek API usage (no prefix needed)
DEEPSEEK_API_KEY=your_deepseek_api_key
DEEPSEEK_SERVER_URL=https://api.deepseek.com
# Leave DEEPSEEK_PROVIDER empty

This approach allows you to:

Use the same configuration files for both direct and proxied access
Switch between providers without modifying configuration files
Easily test different routing strategies with LiteLLM

Running Tests in a Production Environment

If you already have a running PentAGI container and want to test the current configuration:

# Run ctester in an existing container using current environment variables
docker exec -it pentagi /opt/pentagi/bin/ctester -verbose

# Test specific agent types with deterministic ordering
docker exec -it pentagi /opt/pentagi/bin/ctester -agents simple,primary_agent,pentester -groups basic,knowledge -verbose

# Generate a report file inside the container
docker exec -it pentagi /opt/pentagi/bin/ctester -report /opt/pentagi/data/agent-test-report.md

# Access the report from the host
docker cp pentagi:/opt/pentagi/data/agent-test-report.md ./

Command-line Options

The utility accepts several options:

-env <path> - Path to environment file (default: .env)
-type <provider> - Provider type: custom, openai, anthropic, ollama, bedrock, gemini (default: custom)
-config <path> - Path to custom provider config (default: from LLM_SERVER_CONFIG_PATH env variable)
-tests <path> - Path to custom tests YAML file (optional)
-report <path> - Path to write the report file (optional)
-agents <list> - Comma-separated list of agent types to test (default: all)
-groups <list> - Comma-separated list of test groups to run (default: all)
-verbose - Enable verbose output with detailed test results for each agent

Available Agent Types

Agents are tested in the following deterministic order:

simple - Basic completion tasks
simple_json - JSON-structured responses
primary_agent - Main reasoning agent
assistant - Interactive assistant mode
generator - Content generation
refiner - Content refinement and improvement
adviser - Expert advice and consultation
reflector - Self-reflection and analysis
searcher - Information gathering and search
enricher - Data enrichment and expansion
coder - Code generation and analysis
installer - Installation and setup tasks
pentester - Penetration testing and security assessment

Available Test Groups

basic - Fundamental completion and prompt response tests
advanced - Complex reasoning and function calling tests
json - JSON format validation and structure tests (specifically designed for simple_json agent)
knowledge - Domain-specific cybersecurity and penetration testing knowledge tests

Note: The json test group is specifically designed for the simple_json agent type, while all other agents are tested with basic, advanced, and knowledge groups. This specialization ensures optimal testing coverage for each agent's intended purpose.

Example Provider Configuration

Provider configuration defines which models to use for different agent types:

simple:
  model: "provider/model-name"
  temperature: 0.7
  top_p: 0.95
  n: 1
  max_tokens: 4000

simple_json:
  model: "provider/model-name"
  temperature: 0.7
  top_p: 1.0
  n: 1
  max_tokens: 4000
  json: true

# ... other agent types ...

Optimization Workflow

Create a baseline: Run tests with default configuration to establish benchmark performance
Analyze agent-specific performance: Review the deterministic agent ordering to identify underperforming agents
Test specialized configurations: Experiment with different models for each agent type using provider-specific configs
Focus on domain knowledge: Pay special attention to knowledge group tests for cybersecurity expertise
Validate function calling: Ensure tool-based tests pass consistently for critical agent types
Compare results: Look for the best success rate and performance across all test groups
Deploy optimal configuration: Use in production with your optimized setup

This tool helps ensure your AI agents are using the most effective models for their specific tasks, improving reliability while optimizing costs.

Embedding Configuration and Testing

PentAGI uses vector embeddings for semantic search, knowledge storage, and memory management. The system supports multiple embedding providers that can be configured according to your needs and preferences.

Supported Embedding Providers

PentAGI supports the following embedding providers:

OpenAI (default): Uses OpenAI's text embedding models
Ollama: Local embedding model through Ollama
Mistral: Mistral AI's embedding models
Jina: Jina AI's embedding service
HuggingFace: Models from HuggingFace
GoogleAI: Google's embedding models
VoyageAI: VoyageAI's embedding models

OpenAI-compatible third parties: any provider exposing OpenAI's /embeddings API can be plugged in via EMBEDDING_PROVIDER=openai with a custom EMBEDDING_URL. For example, Qwen DashScope offers text-embedding-v4 through the /compatible-mode/v1 endpoint (International and Chinese Mainland regions only — the US region does not expose embeddings). See the Qwen Alternative Integrations subsection for the full configuration snippet.

Environment Variables

To configure the embedding provider, set the following environment variables in your .env file:

# Primary embedding configuration
EMBEDDING_PROVIDER=openai       # Provider type (openai, ollama, mistral, jina, huggingface, googleai, voyageai)
EMBEDDING_MODEL=text-embedding-3-small  # Model name to use
EMBEDDING_URL=                  # Optional custom API endpoint
EMBEDDING_KEY=                  # API key for the provider (if required)
EMBEDDING_BATCH_SIZE=100        # Number of documents to process in a batch
EMBEDDING_STRIP_NEW_LINES=true  # Whether to remove new lines from text before embedding
EMBEDDING_MAX_TEXT_BYTES=8192   # Max bytes of text sent to embedding model per document (byte proxy for token limit)

# Advanced settings
PROXY_URL=                      # Optional proxy for all API calls
HTTP_CLIENT_TIMEOUT=600         # Timeout in seconds for external API calls (default: 600, 0 = no timeout)
TERMINAL_TOOL_TIMEOUT=1200      # Default timeout in seconds for terminal tool commands when timeout=0 or negative (range: 1–10800; values <= 0 or above 10800 are clamped to 10800 = 3 hours)

# SSL/TLS Certificate Configuration (for external communication with LLM backends and tool servers)
EXTERNAL_SSL_CA_PATH=           # Path to custom CA certificate file (PEM format) inside the container
                                # Must point to /opt/pentagi/ssl/ directory (e.g., /opt/pentagi/ssl/ca-bundle.pem)
EXTERNAL_SSL_INSECURE=false     # Skip certificate verification (use only for testing)

If you see this error: tls: failed to verify certificate: x509: certificate signed by unknown authority

Step 1: Get your CA certificate bundle in PEM format (can contain multiple certificates)

Step 2: Place the file in the SSL directory on your host machine:

# Default location (if PENTAGI_SSL_DIR is not set)
cp ca-bundle.pem ./pentagi-ssl/

# Or custom location (if using PENTAGI_SSL_DIR in docker-compose.yml)
cp ca-bundle.pem /path/to/your/ssl/dir/

Step 3: Set the path in .env file (path must be inside the container):

# The volume pentagi-ssl is mounted to /opt/pentagi/ssl inside the container
EXTERNAL_SSL_CA_PATH=/opt/pentagi/ssl/ca-bundle.pem
EXTERNAL_SSL_INSECURE=false

Step 4: Restart PentAGI:

docker compose restart pentagi

Notes:

The pentagi-ssl volume is mounted to /opt/pentagi/ssl inside the container
You can change host directory using PENTAGI_SSL_DIR variable in docker-compose.yml
File supports multiple certificates and intermediate CAs in one PEM file
Use EXTERNAL_SSL_INSECURE=true only for testing (not recommended for production)

Provider-Specific Limitations

Each provider has specific limitations and supported features:

OpenAI: Supports all configuration options
Ollama: Does not support EMBEDDING_KEY as it uses local models
Mistral: Does not support EMBEDDING_MODEL or custom HTTP client
Jina: Does not support custom HTTP client
HuggingFace: Requires EMBEDDING_KEY and supports all other options
GoogleAI: Does not support EMBEDDING_URL, requires EMBEDDING_KEY
VoyageAI: Supports all configuration options

If EMBEDDING_URL and EMBEDDING_KEY are not specified, the system will attempt to use the corresponding LLM provider settings (e.g., OPEN_AI_KEY when EMBEDDING_PROVIDER=openai).

Why Consistent Embedding Providers Matter

It's crucial to use the same embedding provider consistently because:

Vector Compatibility: Different providers produce vectors with different dimensions and mathematical properties
Semantic Consistency: Changing providers can break semantic similarity between previously embedded documents
Memory Corruption: Mixed embeddings can lead to poor search results and broken knowledge base functionality

If you change your embedding provider, you should flush and reindex your entire knowledge base (see etester utility below).

Embedding Tester Utility (etester)

PentAGI includes a specialized etester utility for testing, managing, and debugging embedding functionality. This tool is essential for diagnosing and resolving issues related to vector embeddings and knowledge storage.

# Test embedding provider and database connection
cd backend
go run cmd/etester/main.go test -verbose

# Show statistics about the embedding database
go run cmd/etester/main.go info

# Delete all documents from the embedding database (use with caution!)
go run cmd/etester/main.go flush

# Recalculate embeddings for all documents (after changing provider)
go run cmd/etester/main.go reindex

# Search for documents in the embedding database
go run cmd/etester/main.go search -query "How to install PostgreSQL" -limit 5

Using Docker

If you're running PentAGI in Docker, you can use etester from within the container:

# Test embedding provider
docker exec -it pentagi /opt/pentagi/bin/etester test

# Show detailed database information
docker exec -it pentagi /opt/pentagi/bin/etester info -verbose

Advanced Search Options

The search command supports various filters to narrow down results:

# Filter by document type
docker exec -it pentagi /opt/pentagi/bin/etester search -query "Security vulnerability" -doc_type guide -threshold 0.8

# Filter by flow ID
docker exec -it pentagi /opt/pentagi/bin/etester search -query "Code examples" -doc_type code -flow_id 42

# All available search options
docker exec -it pentagi /opt/pentagi/bin/etester search -help

Available search parameters:

-query STRING: Search query text (required)
-doc_type STRING: Filter by document type (answer, memory, guide, code)
-flow_id NUMBER: Filter by flow ID (positive number)
-answer_type STRING: Filter by answer type (guide, vulnerability, code, tool, other)
-guide_type STRING: Filter by guide type (install, configure, use, pentest, development, other)
-limit NUMBER: Maximum number of results (default: 3)
-threshold NUMBER: Similarity threshold (0.0-1.0, default: 0.7)

Memory Lifecycle Across Flows

PentAGI stores several kinds of vector documents, and they serve different purposes:

memory captures flow-specific execution history such as tool results and agent observations
guide, answer, and code are intended for reusable knowledge that can help future runs

If you want to inspect what happened in one engagement, search the vector store with the related flow_id. If you want knowledge to survive beyond a single run, store the durable result explicitly as a guide, answer, or code document instead of relying on execution memory alone.

For example, if a target has recurring setup notes, authentication quirks, or target-specific testing methodology, instruct the agent to save that information as a guide and search for it at the beginning of the next engagement. This is the safest current workflow when you want a new flow to start with reusable context.

Flow deletion removes the flow from normal queries through PentAGI's soft-delete mechanism, so reusable knowledge should be treated as a separate concern from per-flow execution history. If you enable the optional Graphiti knowledge graph described earlier in this README, treat its current search context as scoped to the active flow or engagement unless you explicitly build a separate cross-flow reuse workflow.

Common Troubleshooting Scenarios

After changing embedding provider: Always run flush or reindex to ensure consistency
Poor search results: Try adjusting the similarity threshold or check if embeddings are correctly generated
Database connection issues: Verify PostgreSQL is running with pgvector extension installed
Missing API keys: Check environment variables for your chosen embedding provider

Function Testing with ftester

PentAGI includes a versatile utility called ftester for debugging, testing, and developing specific functions and AI agent behaviors. While ctester focuses on testing LLM model capabilities, ftester allows you to directly invoke individual system functions and AI agent components with precise control over execution context.

Key Features

Direct Function Access: Test individual functions without running the entire system
Mock Mode: Test functions without a live PentAGI deployment using built-in mocks
Interactive Input: Fill function arguments interactively for exploratory testing
Detailed Output: Color-coded terminal output with formatted responses and errors
Context-Aware Testing: Debug AI agents within the context of specific flows, tasks, and subtasks
Observability Integration: All function calls are logged to Langfuse and Observability stack

Usage Modes

Command Line Arguments

Run ftester with specific function and arguments directly from the command line:

# Basic usage with mock mode
cd backend
go run cmd/ftester/main.go [function_name] -[arg1] [value1] -[arg2] [value2]

# Example: Test terminal command in mock mode
go run cmd/ftester/main.go terminal -command "ls -la" -message "List files"

# Using a real flow context
go run cmd/ftester/main.go -flow 123 terminal -command "whoami" -message "Check user"

# Testing AI agent in specific task/subtask context
go run cmd/ftester/main.go -flow 123 -task 456 -subtask 789 pentester -message "Find vulnerabilities"

Interactive Mode

Run ftester without arguments for a guided interactive experience:

# Start interactive mode
go run cmd/ftester/main.go [function_name]

# For example, to interactively fill browser tool arguments
go run cmd/ftester/main.go browser

Environment Functions

terminal: Execute commands in a container and return the output
file: Perform file operations (read, write, list) in a container

Search Functions

browser: Access websites and capture screenshots
google: Search the web using Google Custom Search
duckduckgo: Search the web using DuckDuckGo
tavily: Search using Tavily AI search engine
traversaal: Search using Traversaal AI search engine
perplexity: Search using Perplexity AI
sploitus: Search for security exploits, vulnerabilities (CVEs), and pentesting tools
searxng: Search using Searxng meta search engine (aggregates results from multiple engines)

Vector Database Functions

search_in_memory: Search for information in vector database
search_guide: Find guidance documents in vector database
search_answer: Find answers to questions in vector database
search_code: Find code examples in vector database

AI Agent Functions

advice: Get expert advice from an AI agent
coder: Request code generation or modification
maintenance: Run system maintenance tasks
memorist: Store and organize information in vector database
pentester: Perform security tests and vulnerability analysis
search: Complex search across multiple sources

Utility Functions

describe: Show information about flows, tasks, and subtasks

The describe function provides detailed information about tasks and subtasks within a flow. This is particularly useful for diagnosing issues when PentAGI encounters problems or gets stuck.

# List all flows in the system
go run cmd/ftester/main.go describe

# Show all tasks and subtasks for a specific flow
go run cmd/ftester/main.go -flow 123 describe

# Show detailed information for a specific task
go run cmd/ftester/main.go -flow 123 -task 456 describe

# Show detailed information for a specific subtask
go run cmd/ftester/main.go -flow 123 -task 456 -subtask 789 describe

# Show verbose output with full descriptions and results
go run cmd/ftester/main.go -flow 123 describe -verbose

This function allows you to identify the exact point where a flow might be stuck and resume processing by directly invoking the appropriate agent function.

Each function has a help mode that shows available parameters:

# Get help for a specific function
go run cmd/ftester/main.go [function_name] -help

# Examples:
go run cmd/ftester/main.go terminal -help
go run cmd/ftester/main.go browser -help
go run cmd/ftester/main.go describe -help

You can also run ftester without arguments to see a list of all available functions:

go run cmd/ftester/main.go

The ftester utility uses color-coded output to make interpretation easier:

Blue headers: Section titles and key names
Cyan [INFO]: General information messages
Green [SUCCESS]: Successful operations
Red [ERROR]: Error messages
Yellow [WARNING]: Warning messages
Yellow [MOCK]: Indicates mock mode operation
Magenta values: Function arguments and results

JSON and Markdown responses are automatically formatted for readability.

Debugging Stuck AI Flows

When PentAGI gets stuck in a flow:

Pause the flow through the UI
Use describe to identify the current task and subtask
Directly invoke the agent function with the same task/subtask IDs
Examine the detailed output to identify the issue
Resume the flow or manually intervene as needed

Testing Environment Variables

Verify that API keys and external services are configured correctly:

# Test Google search API configuration
go run cmd/ftester/main.go google -query "pentesting tools"

# Test browser access to external websites
go run cmd/ftester/main.go browser -url "https://example.com"

Developing New AI Agent Behaviors

When developing new prompt templates or agent behaviors:

Create a test flow in the UI
Use ftester to directly invoke the agent with different prompts
Observe responses and adjust prompts accordingly
Check Langfuse for detailed traces of all function calls

Verifying Docker Container Setup

Ensure containers are properly configured:

go run cmd/ftester/main.go -flow 123 terminal -command "env | grep -i proxy" -message "Check proxy settings"

If you have PentAGI running in Docker, you can use ftester from within the container:

# Run ftester inside the running PentAGI container
docker exec -it pentagi /opt/pentagi/bin/ftester [arguments]

# Examples:
docker exec -it pentagi /opt/pentagi/bin/ftester -flow 123 describe
docker exec -it pentagi /opt/pentagi/bin/ftester -flow 123 terminal -command "ps aux" -message "List processes"

This is particularly useful for production deployments where you don't have a local development environment.

All function calls made through ftester are logged to:

Langfuse: Captures the entire AI agent interaction chain, including prompts, responses, and function calls
OpenTelemetry: Records metrics, traces, and logs for system performance analysis
Terminal Output: Provides immediate feedback on function execution

To access detailed logs:

Check Langfuse UI for AI agent traces (typically at http://localhost:4000)
Use Grafana dashboards for system metrics (typically at http://localhost:3000)
Examine terminal output for immediate function results and errors

Command-line Options

The main utility accepts several options:

-env <path> - Path to environment file (optional, default: .env)
-provider <type> - Provider type to use (default: custom, options: openai, anthropic, ollama, bedrock, gemini, custom)
-flow <id> - Flow ID for testing (0 means using mocks, default: 0)
-task <id> - Task ID for agent context (optional)
-subtask <id> - Subtask ID for agent context (optional)

Function-specific arguments are passed after the function name using -name value format.

Pentesting Prompt Methodology

When refining prompts for offensive security work, give the agent a clear methodology instead of a flat list of payloads:

Start with explicit scope, authorization, and success criteria
Map the application first: roles, routes, parameters, uploads, integrations, and trust boundaries
Prioritize attack surfaces systematically instead of testing everything at once
Validate findings with reproducible evidence before escalating to deeper exploitation
Finish with report-ready notes that capture impact, prerequisites, and next steps

For PentAGI-specific prompt guidance, see backend/docs/prompt_engineering_pentagi.md. For a practical starting point, reuse and adapt examples/prompts/base_web_pentest.md to match the target application, technology stack, and engagement scope.

Building

Building Docker Image

The Docker build process automatically embeds version information from git tags. To properly version your build, use the provided scripts:

Linux/macOS

# Load version variables
source ./scripts/version.sh

# Standard build
docker build \
  --build-arg PACKAGE_VER=$PACKAGE_VER \
  --build-arg PACKAGE_REV=$PACKAGE_REV \
  -t pentagi:$PACKAGE_VER .

# Multi-platform build
docker buildx build \
  --platform linux/amd64,linux/arm64 \
  --build-arg PACKAGE_VER=$PACKAGE_VER \
  --build-arg PACKAGE_REV=$PACKAGE_REV \
  -t pentagi:$PACKAGE_VER .

# Build and push
docker buildx build \
  --platform linux/amd64,linux/arm64 \
  --build-arg PACKAGE_VER=$PACKAGE_VER \
  --build-arg PACKAGE_REV=$PACKAGE_REV \
  -t myregistry/pentagi:$PACKAGE_VER \
  --push .

Windows (PowerShell)

# Load version variables
. .\scripts\version.ps1

# Standard build
docker build `
  --build-arg PACKAGE_VER=$env:PACKAGE_VER `
  --build-arg PACKAGE_REV=$env:PACKAGE_REV `
  -t pentagi:$env:PACKAGE_VER .

# Multi-platform build
docker buildx build `
  --platform linux/amd64,linux/arm64 `
  --build-arg PACKAGE_VER=$env:PACKAGE_VER `
  --build-arg PACKAGE_REV=$env:PACKAGE_REV `
  -t pentagi:$env:PACKAGE_VER .

Quick build without version

For development builds without version tracking:

docker build -t pentagi:dev .

[!NOTE]
The build scripts automatically determine version from git tags
Release builds (on tag commit) have no revision suffix
Development builds (after tag) include commit hash as revision (e.g., 1.1.0-bc6e800)
To use the built image locally, update the image name in docker-compose.yml or use the build option

Credits

This project is made possible thanks to the following research and developments:

Emerging Architectures for LLM Applications
A Survey of Autonomous LLM Agents
Codel by Andriy Semenets - initial architectural inspiration for agent-based automation

License

PentAGI is licensed under the MIT License.

Third-Party Dependencies

All third-party dependencies use MIT-compatible licenses. See licenses/ directory for detailed license reports.

VXControl Cloud Services

⚠️ Note: While the VXControl Cloud SDK code is MIT licensed, accessing VXControl Cloud Services (threat intelligence, AI support, premium features) requires a separate License Key and compliance with Terms of Service.

The SDK code itself is free to use - service access requires registration.

For questions contact: info@pentagi.com or info@vxcontrol.com

View on GitHub

On this page

PentAGI
Table of Contents
Overview
Features
Current Capability Boundaries
Architecture
System Context
Global Summarizer Configuration Options
Assistant Summarizer Configuration Options
Summarizer Environment Configuration
Execution Monitoring (Beta)
Intelligent Task Planning (Beta)
Tool Call Limits (Always Active)
Reflector Integration (Always Active)
Recommendations for Open Source Models
Quick Start
System Requirements
Using Installer (Recommended)
Current Web Settings Coverage
Still Server-Managed
Manual Installation
Main Security Settings
Scraper Access
Access Credentials
Web UI Accounts
Accessing PentAGI from External Networks
Configuration Steps
Running PentAGI with Podman
Podman Rootless Configuration
Podman Rootful Mode
Docker Compatibility
Assistant Configuration
How to Use PentAGI After Login
1. Create your first flow
2. Use templates for repeatable workflows
3. Monitor execution and review output
4. Use the Assistant view to steer an active flow
5. Manage flow-scoped files
API Access
Generating API Tokens
Using API Tokens
API Exploration and Testing
GraphQL Playground
Swagger UI
Generating API Clients
GraphQL Clients
REST API Clients
API Usage Examples
Security Best Practices
Token Management
Custom LLM Provider Configuration
Ollama Provider Configuration
Configuration Variables
Ollama Cloud Configuration
Local Ollama Configuration
Creating Custom Ollama Models with Extended Context
Example: Qwen3 32B FP16 with Extended Context
Example: QwQ 32B FP16 with Extended Context
OpenAI Provider Configuration
Configuration Variables
Configuration Examples
Supported Models
Anthropic Provider Configuration
Configuration Variables
Configuration Examples
Supported Models
Google AI (Gemini) Provider Configuration
Configuration Variables
Configuration Examples
Supported Models
AWS Bedrock Provider Configuration
Configuration Variables
Configuration Examples
Supported Models
Tested but Incompatible Models
DeepSeek Provider Configuration
Configuration Variables
Configuration Examples
Supported Models
GLM Provider Configuration
Configuration Variables
Configuration Examples
Supported Models
Kimi Provider Configuration
Configuration Variables
Configuration Examples
Supported Models
Qwen Provider Configuration
Configuration Variables
Configuration Examples
Supported Models
Alternative Integrations
Advanced Setup
Langfuse Integration
Database Credentials
Encryption and Security Keys
Admin Credentials
API Keys and Tokens
S3 Storage
Monitoring and Observability
Knowledge Graph Integration (Graphiti)
What is Graphiti?
Enabling Graphiti
What Gets Stored
Current Limitations
GitHub and Google OAuth Integration
Docker Image Configuration
Restricted Networks, Docker Mirrors, and Proxies
Development
Development Requirements
Environment Setup
Backend Setup
Testing
Frontend Setup
Backend Configuration
PostgreSQL / pgvector connection pool sizing
Frontend Configuration
Running the Application
Backend
Frontend
Testing LLM Agents
Key Features
Usage Scenarios
For Developers (with local Go environment)
For Users (using Docker image)
Using Pre-configured Providers
Using OpenAI with Unverified Organizations
Using LiteLLM Proxy
Running Tests in a Production Environment
Command-line Options
Available Agent Types
Available Test Groups
Example Provider Configuration
Optimization Workflow
Embedding Configuration and Testing
Supported Embedding Providers
Environment Variables
Provider-Specific Limitations
Why Consistent Embedding Providers Matter
Embedding Tester Utility (etester)
Using Docker
Advanced Search Options
Memory Lifecycle Across Flows
Common Troubleshooting Scenarios
Function Testing with ftester
Key Features
Usage Modes
Command Line Arguments
Interactive Mode
Environment Functions
Search Functions
Vector Database Functions
AI Agent Functions
Utility Functions
Debugging Stuck AI Flows
Testing Environment Variables
Developing New AI Agent Behaviors
Verifying Docker Container Setup
Command-line Options
Pentesting Prompt Methodology
Building
Building Docker Image
Linux/macOS
Windows (PowerShell)
Quick build without version
Credits
License
Third-Party Dependencies
VXControl Cloud Services

pentagi

Go MIT★ 17,829

100

Health 100/100

Baseline50
Commit recency+25
Popularity+20
Release cadence+10
Open-issue load+5

Raw .md

Embed badge

[![health](https://opensources.dev/resource/pentagi/badge.svg)](https://opensources.dev/resource/pentagi)

MITPermissive — free to use in commercial and proprietary software, with attribution.View license →

Production readiness

4/5

Actively maintainedCommits in the last 6 months
No known vulnerabilitiesNot yet scanned
Clear, usable licenseMIT (permissive)
Proven adoptionWidely used
Has documentationDocumentation indexed

Install

go get pentagi

MaintenanceActively maintainedIs pentagi still maintained? →

Documentation from pentagi, shown under MIT with attribution. Source: https://pentagi.com

PentAGI

Join the Community! Connect with security researchers, AI enthusiasts, and fellow ethical hackers. Get support, share insights, and stay updated with the latest PentAGI developments.

⠀

Overview
Features
Architecture
- Agent Supervision
Quick Start
How to Use PentAGI After Login
API Access
- LLM Provider Configuration
Advanced Setup
Development
Testing LLM Agents
Embedding Configuration and Testing
Function Testing with ftester
Building
Credits
License

Overview

You can watch the video PentAGI overview:

Features

Secure & Isolated. All operations are performed in a sandboxed Docker environment with complete isolation.
Fully Autonomous. AI-powered agent that automatically determines and executes penetration testing steps with optional execution monitoring and intelligent task planning for enhanced reliability.
Professional Pentesting Tools. Built-in suite of 20+ professional security tools including nmap, metasploit, sqlmap, and more.
Smart Memory System. Long-term storage of research results and successful approaches for future use.
Knowledge Graph Integration. Graphiti-powered knowledge graph using Neo4j for semantic relationship tracking and advanced context understanding.
Web Intelligence. Built-in browser via scraper for gathering latest information from web sources.
External Search Systems. Integration with advanced search APIs including Tavily, Traversaal, Perplexity, DuckDuckGo, Google Custom Search, Sploitus Search and Searxng for comprehensive information gathering.
Team of Specialists. Delegation system with specialized AI agents for research, development, and infrastructure tasks, enhanced with optional execution monitoring and intelligent task planning for optimal performance with smaller models.
Comprehensive Monitoring. Detailed logging and integration with Grafana/Prometheus for real-time system observation.
Detailed Reporting. Generation of thorough vulnerability reports with exploitation guides.
Smart Container Management. Automatic Docker image selection based on specific task requirements.
Modern Interface. Clean and intuitive web UI for system management and monitoring.
Comprehensive APIs. Full-featured REST and GraphQL APIs with Bearer token authentication for automation and integration.
Persistent Storage. All commands and outputs are stored in PostgreSQL with pgvector extension.
Scalable Architecture. Microservices-based design supporting horizontal scaling.
Self-Hosted Solution. Complete control over your deployment and data.
Flexible Authentication. Support for 10+ LLM providers (OpenAI, Anthropic, Google AI/Gemini, AWS Bedrock, Ollama, DeepSeek, GLM, Kimi, Qwen, Custom) plus aggregators (OpenRouter, DeepInfra). For production local deployments, see our vLLM + Qwen3.5-27B-FP8 guide.
API Token Authentication. Secure Bearer token system for programmatic access to REST and GraphQL APIs.
Quick Deployment. Easy setup through Docker Compose with comprehensive environment configuration.

Current Capability Boundaries

PentAGI today is an autonomous and assistant-guided penetration testing platform, not a CALDERA-style Breach and Attack Simulation (BAS) or adversary emulation product with predefined campaigns or attack plans.
BAS-like agent-authored attack scripts should be treated as conceptual or future work, not as a feature that is implemented today.
The current flow report UI supports web view, copy to clipboard, Markdown download, and PDF download. JSON flow-report export is not documented as a supported output format today.
Provider flexibility is available today through built-in providers and custom/OpenAI-compatible endpoints. See Custom LLM Provider Configuration and the vLLM + Qwen3.5-27B-FP8 guide.

Architecture

System Context

flowchart TB
    classDef person fill:#08427B,stroke:#073B6F,color:#fff
    classDef system fill:#1168BD,stroke:#0B4884,color:#fff
    classDef external fill:#666666,stroke:#0B4884,color:#fff

    pentester["👤 Security Engineer
    (User of the system)"]

    pentagi["✨ PentAGI
    (Autonomous penetration testing system)"]

    target["🎯 target-system
    (System under test)"]
    llm["🧠 llm-provider
    (OpenAI/Anthropic/Ollama/Bedrock/Gemini/Custom)"]
    search["🔍 search-systems
    (Google/DuckDuckGo/Tavily/Traversaal/Perplexity/Sploitus/Searxng)"]
    langfuse["📊 langfuse-ui
    (LLM Observability Dashboard)"]
    grafana["📈 grafana
    (System Monitoring Dashboard)"]

    pentester --> |Uses HTTPS| pentagi
    pentester --> |Monitors AI HTTPS| langfuse
    pentester --> |Monitors System HTTPS| grafana
    pentagi --> |Tests Various protocols| target
    pentagi --> |Queries HTTPS| llm
    pentagi --> |Searches HTTPS| search
    pentagi --> |Reports HTTPS| langfuse
    pentagi --> |Reports HTTPS| grafana

    class pentester person
    class pentagi system
    class target,llm,search,langfuse,grafana external

    linkStyle default stroke:#ffffff,color:#ffffff

graph TB
    subgraph Core Services
        UI[Frontend UI<br/>React + TypeScript]
        API[Backend API<br/>Go + GraphQL]
        DB[(Vector Store<br/>PostgreSQL + pgvector)]
        MQ[Task Queue<br/>Async Processing]
        Agent[AI Agents<br/>Multi-Agent System]
    end

    subgraph Knowledge Graph
        Graphiti[Graphiti<br/>Knowledge Graph API]
        Neo4j[(Neo4j<br/>Graph Database)]
    end

    subgraph Monitoring
        Grafana[Grafana<br/>Dashboards]
        VictoriaMetrics[VictoriaMetrics<br/>Time-series DB]
        Jaeger[Jaeger<br/>Distributed Tracing]
        Loki[Loki<br/>Log Aggregation]
        OTEL[OpenTelemetry<br/>Data Collection]
    end

    subgraph Analytics
        Langfuse[Langfuse<br/>LLM Analytics]
        ClickHouse[ClickHouse<br/>Analytics DB]
        Redis[Redis<br/>Cache + Rate Limiter]
        MinIO[MinIO<br/>S3 Storage]
    end

    subgraph Security Tools
        Scraper[Web Scraper<br/>Isolated Browser]
        PenTest[Security Tools<br/>20+ Pro Tools<br/>Sandboxed Execution]
    end

    UI --> |HTTP/WS| API
    API --> |SQL| DB
    API --> |Events| MQ
    MQ --> |Tasks| Agent
    Agent --> |Commands| PenTest
    Agent --> |Queries| DB
    Agent --> |Knowledge| Graphiti
    Graphiti --> |Graph| Neo4j

    API --> |Telemetry| OTEL
    OTEL --> |Metrics| VictoriaMetrics
    OTEL --> |Traces| Jaeger
    OTEL --> |Logs| Loki

    Grafana --> |Query| VictoriaMetrics
    Grafana --> |Query| Jaeger
    Grafana --> |Query| Loki

    API --> |Analytics| Langfuse
    Langfuse --> |Store| ClickHouse
    Langfuse --> |Cache| Redis
    Langfuse --> |Files| MinIO

    classDef core fill:#f9f,stroke:#333,stroke-width:2px,color:#000
    classDef knowledge fill:#ffa,stroke:#333,stroke-width:2px,color:#000
    classDef monitoring fill:#bbf,stroke:#333,stroke-width:2px,color:#000
    classDef analytics fill:#bfb,stroke:#333,stroke-width:2px,color:#000
    classDef tools fill:#fbb,stroke:#333,stroke-width:2px,color:#000

    class UI,API,DB,MQ,Agent core
    class Graphiti,Neo4j knowledge
    class Grafana,VictoriaMetrics,Jaeger,Loki,OTEL monitoring
    class Langfuse,ClickHouse,Redis,MinIO analytics
    class Scraper,PenTest tools

erDiagram
    Flow ||--o{ Task : contains
    Task ||--o{ SubTask : contains
    SubTask ||--o{ Action : contains
    Action ||--o{ Artifact : produces
    Action ||--o{ Memory : stores

    Flow {
        string id PK
        string name "Flow name"
        string description "Flow description"
        string status "active/completed/failed"
        json parameters "Flow parameters"
        timestamp created_at
        timestamp updated_at
    }

    Task {
        string id PK
        string flow_id FK
        string name "Task name"
        string description "Task description"
        string status "pending/running/done/failed"
        json result "Task results"
        timestamp created_at
        timestamp updated_at
    }

    SubTask {
        string id PK
        string task_id FK
        string name "Subtask name"
        string description "Subtask description"
        string status "queued/running/completed/failed"
        string agent_type "researcher/developer/executor"
        json context "Agent context"
        timestamp created_at
        timestamp updated_at
    }

    Action {
        string id PK
        string subtask_id FK
        string type "command/search/analyze/etc"
        string status "success/failure"
        json parameters "Action parameters"
        json result "Action results"
        timestamp created_at
    }

    Artifact {
        string id PK
        string action_id FK
        string type "file/report/log"
        string path "Storage path"
        json metadata "Additional info"
        timestamp created_at
    }

    Memory {
        string id PK
        string action_id FK
        string type "observation/conclusion"
        vector embedding "Vector representation"
        text content "Memory content"
        timestamp created_at
    }

sequenceDiagram
    participant O as Orchestrator
    participant R as Researcher
    participant D as Developer
    participant E as Executor
    participant VS as Vector Store
    participant KB as Knowledge Base

    Note over O,KB: Flow Initialization
    O->>VS: Query similar tasks
    VS-->>O: Return experiences
    O->>KB: Load relevant knowledge
    KB-->>O: Return context

    Note over O,R: Research Phase
    O->>R: Analyze target
    R->>VS: Search similar cases
    VS-->>R: Return patterns
    R->>KB: Query vulnerabilities
    KB-->>R: Return known issues
    R->>VS: Store findings
    R-->>O: Research results

    Note over O,D: Planning Phase
    O->>D: Plan attack
    D->>VS: Query exploits
    VS-->>D: Return techniques
    D->>KB: Load tools info
    KB-->>D: Return capabilities
    D-->>O: Attack plan

    Note over O,E: Execution Phase
    O->>E: Execute plan
    E->>KB: Load tool guides
    KB-->>E: Return procedures
    E->>VS: Store results
    E-->>O: Execution status

graph TB
    subgraph "Long-term Memory"
        VS[(Vector Store<br/>Embeddings DB)]
        KB[Knowledge Base<br/>Domain Expertise]
        Tools[Tools Knowledge<br/>Usage Patterns]
    end

    subgraph "Working Memory"
        Context[Current Context<br/>Task State]
        Goals[Active Goals<br/>Objectives]
        State[System State<br/>Resources]
    end

    subgraph "Episodic Memory"
        Actions[Past Actions<br/>Commands History]
        Results[Action Results<br/>Outcomes]
        Patterns[Success Patterns<br/>Best Practices]
    end

    Context --> |Query| VS
    VS --> |Retrieve| Context

    Goals --> |Consult| KB
    KB --> |Guide| Goals

    State --> |Record| Actions
    Actions --> |Learn| Patterns
    Patterns --> |Store| VS

    Tools --> |Inform| State
    Results --> |Update| Tools

    VS --> |Enhance| KB
    KB --> |Index| VS

    classDef ltm fill:#f9f,stroke:#333,stroke-width:2px,color:#000
    classDef wm fill:#bbf,stroke:#333,stroke-width:2px,color:#000
    classDef em fill:#bfb,stroke:#333,stroke-width:2px,color:#000

    class VS,KB,Tools ltm
    class Context,Goals,State wm
    class Actions,Results,Patterns em

flowchart TD
    A[Input Chain] --> B{Needs Summarization?}
    B -->|No| C[Return Original Chain]
    B -->|Yes| D[Convert to ChainAST]
    D --> E[Apply Section Summarization]
    E --> F[Process Oversized Pairs]
    F --> G[Manage Last Section Size]
    G --> H[Apply QA Summarization]
    H --> I[Rebuild Chain with Summaries]
    I --> J{Is New Chain Smaller?}
    J -->|Yes| K[Return Optimized Chain]
    J -->|No| C

    classDef process fill:#bbf,stroke:#333,stroke-width:2px,color:#000
    classDef decision fill:#bfb,stroke:#333,stroke-width:2px,color:#000
    classDef output fill:#fbb,stroke:#333,stroke-width:2px,color:#000

    class A,D,E,F,G,H,I process
    class B,J decision
    class C,K output

Global Summarizer Configuration Options

Assistant Summarizer Configuration Options

Assistant instances can use customized summarization settings to fine-tune context management behavior:

Summarizer Environment Configuration

# Default values for global summarizer logic
SUMMARIZER_PRESERVE_LAST=true
SUMMARIZER_USE_QA=true
SUMMARIZER_SUM_MSG_HUMAN_IN_QA=false
SUMMARIZER_LAST_SEC_BYTES=51200
SUMMARIZER_MAX_BP_BYTES=16384
SUMMARIZER_MAX_QA_SECTIONS=10
SUMMARIZER_MAX_QA_BYTES=65536
SUMMARIZER_KEEP_QA_SECTIONS=1

# Default values for assistant summarizer logic
ASSISTANT_SUMMARIZER_PRESERVE_LAST=true
ASSISTANT_SUMMARIZER_LAST_SEC_BYTES=76800
ASSISTANT_SUMMARIZER_MAX_BP_BYTES=16384
ASSISTANT_SUMMARIZER_MAX_QA_SECTIONS=7
ASSISTANT_SUMMARIZER_MAX_QA_BYTES=76800
ASSISTANT_SUMMARIZER_KEEP_QA_SECTIONS=3

PentAGI includes sophisticated multi-layered agent supervision mechanisms to ensure efficient task execution, prevent infinite loops, and provide intelligent recovery from stuck states:

Execution Monitoring (Beta)

Automatic Mentor Intervention: Adviser agent (mentor) is automatically invoked when execution patterns indicate potential issues
Pattern Detection: Monitors identical tool calls (threshold: 5, configurable) and total tool calls (threshold: 10, configurable)
Progress Analysis: Evaluates whether agent advances toward subtask objective, detects loops and inefficiencies
Alternative Strategies: Recommends different approaches when current strategy fails
Information Retrieval Guidance: Suggests searching for established solutions instead of reinventing
Enhanced Response Format: Tool responses include both <original_result> and <mentor_analysis> sections
Configurable: Enable via EXECUTION_MONITOR_ENABLED (default: false), customize thresholds with EXECUTION_MONITOR_SAME_TOOL_LIMIT and EXECUTION_MONITOR_TOTAL_TOOL_LIMIT

Best for: Smaller models (< 32B parameters), complex attack scenarios requiring continuous guidance, preventing agents from getting stuck on single approach

Performance Impact: 2-3x increase in execution time and token usage, but delivers 2x improvement in result quality based on testing with Qwen3.5-27B-FP8

Intelligent Task Planning (Beta)

Automated Decomposition: Planner (adviser in planning mode) generates 3-7 specific, actionable steps before specialist agents begin work
Context-Aware Plans: Analyzes full execution context via enricher agent to create informed plans
Structured Assignment: Original request wrapped in <task_assignment> structure with execution plan and instructions
Scope Management: Prevents scope creep by keeping agents focused on current subtask only
Enriched Instructions: Plans highlight critical actions, potential pitfalls, and verification points
Configurable: Enable via AGENT_PLANNING_STEP_ENABLED (default: false)

Best for: Models < 32B parameters, complex penetration testing workflows, improving success rates on sophisticated tasks

Performance Impact: Adds planning overhead but significantly improves completion rates and reduces redundant work

Tool Call Limits (Always Active)

Hard Limits: Prevent runaway executions regardless of supervision mode status
Differentiated by Agent Type:
- General agents (Assistant, Primary Agent, Pentester, Coder, Installer): MAX_GENERAL_AGENT_TOOL_CALLS (default: 100)
- Limited agents (Searcher, Enricher, Memorist, Generator, Reporter, Adviser, Reflector, Planner): MAX_LIMITED_AGENT_TOOL_CALLS (default: 20)
Graceful Termination: Reflector guides agents to proper completion when approaching limits
Resource Protection: Ensures system stability and prevents resource exhaustion

Reflector Integration (Always Active)

Automatic Correction: Invoked when LLM fails to generate tool calls after 3 attempts
Strategic Guidance: Analyzes failures and guides agents toward proper tool usage or barrier tools (done, ask)
Recovery Mechanism: Provides contextual guidance based on specific failure patterns
Limit Enforcement: Coordinates graceful termination when tool call limits are reached

Recommendations for Open Source Models

Must-Have for Models < 32B Parameters: Testing with Qwen3.5-27B-FP8 demonstrates that enabling both Execution Monitoring and Task Planning is essential for smaller open source models:

Quality Improvement: 2x better results compared to baseline execution without supervision
Loop Prevention: Significantly reduces infinite loops and redundant work
Attack Diversity: Encourages exploration of multiple attack vectors instead of fixating on single approach
Air-Gapped Deployments: Enables production-grade autonomous pentesting in closed network environments with local LLM inference

Trade-offs:

Token consumption: 2-3x increase due to mentor/planner invocations
Execution time: 2-3x longer due to analysis and planning steps
Result quality: 2x improvement in completeness, accuracy, and attack coverage
Model requirements: Works best when adviser uses enhanced configuration (higher reasoning parameters, stronger model variant, or different model)

Configuration Strategy: For optimal performance with smaller models, configure adviser agent with enhanced settings:

Use same model with maximum reasoning mode (example: vllm-qwen3.5-27b-fp8.provider.yml)
Or use stronger model for adviser while keeping base model for other agents
Adjust monitoring thresholds based on task complexity and model capabilities

The architecture of PentAGI is designed to be modular, scalable, and secure. Here are the key components:

Core Services
- Frontend UI: React-based web interface with TypeScript for type safety
- Backend API: Go-based REST and GraphQL APIs with Bearer token authentication for programmatic access
- Vector Store: PostgreSQL with pgvector for semantic search and memory storage
- Task Queue: Async task processing system for reliable operation
- AI Agent: Multi-agent system with specialized roles for efficient testing
Knowledge Graph
- Graphiti: Knowledge graph API for semantic relationship tracking and contextual understanding
- Neo4j: Graph database for storing and querying relationships between entities, actions, and outcomes
- Automatic capturing of agent responses and tool executions for building comprehensive knowledge base
Monitoring Stack
- OpenTelemetry: Unified observability data collection and correlation
- Grafana: Real-time visualization and alerting dashboards
- VictoriaMetrics: High-performance time-series metrics storage
- Jaeger: End-to-end distributed tracing for debugging
- Loki: Scalable log aggregation and analysis
Analytics Platform
- Langfuse: Advanced LLM observability and performance analytics
- ClickHouse: Column-oriented analytics data warehouse
- Redis: High-speed caching and rate limiting
- MinIO: S3-compatible object storage for artifacts
Security Tools
- Web Scraper: Isolated browser environment for safe web interaction
- Pentesting Tools: Comprehensive suite of 20+ professional security tools
- Sandboxed Execution: All operations run in isolated containers
Memory Systems
- Long-term Memory: Persistent storage of knowledge and experiences
- Working Memory: Active context and goals for current operations
- Episodic Memory: Historical actions and success patterns
- Knowledge Base: Structured domain expertise and tool capabilities
- Context Management: Intelligently manages growing LLM context windows using chain summarization

Quick Start

System Requirements

Docker and Docker Compose (or Podman - see Podman configuration)
Minimum 2 vCPU
Minimum 4GB RAM
20GB free disk space
Internet access for downloading images and updates

Using Installer (Recommended)

Supported Platforms:

Linux: amd64 download | arm64 download
Windows: amd64 download
macOS: amd64 (Intel) download | arm64 (M-series) download

Quick Installation (Linux amd64):

# Create installation directory
mkdir -p pentagi && cd pentagi

# Download installer
wget -O installer.zip https://pentagi.com/downloads/linux/amd64/installer-latest.zip

# Extract
unzip installer.zip

# Run interactive installer
./installer

Prerequisites & Permissions:

The installer requires appropriate privileges to interact with the Docker API for proper operation. By default, it uses the Docker socket (/var/run/docker.sock) which requires either:

Option 1 (Recommended for production): Run the installer as root:
```
sudo ./installer
```
Option 2 (Development environments): Grant your user access to the Docker socket by adding them to the docker group:
```
# Add your user to the docker group
sudo usermod -aG docker $USER

# Log out and log back in, or activate the group immediately
newgrp docker

# Verify Docker access (should run without sudo)
docker ps
```
⚠️ Security Note: Adding a user to the docker group grants root-equivalent privileges. Only do this for trusted users in controlled environments. For production deployments, consider using rootless Docker mode or running the installer with sudo.

The installer will:

System Checks: Verify Docker, network connectivity, and system requirements
Environment Setup: Create and configure .env file with optimal defaults
Provider Configuration: Set up LLM providers (OpenAI, Anthropic, Gemini, Bedrock, Ollama, Custom)
Search Engines: Configure DuckDuckGo, Google, Tavily, Traversaal, Perplexity, Sploitus, Searxng
Security Hardening: Generate secure credentials and configure SSL certificates
Deployment: Start PentAGI with docker-compose

Current Web Settings Coverage

The PentAGI web console already manages several settings areas after the server is up and running:

Settings -> Providers: Create, edit, delete, and test user-defined provider profiles for supported provider types. These profiles control per-agent model selection, runtime parameters, reasoning options, and pricing metadata.
Settings -> Prompts: Manage system, human, and tool prompt templates.
Settings -> PentAGI API: Create and manage PentAGI Bearer tokens for REST and GraphQL access.
Other UI-managed preferences: Favorite flows are stored as user preferences, and theme selection is handled from the main sidebar/profile controls rather than the Settings pages.

Still Server-Managed

The following configuration areas still need to be set on the server through environment variables, compose files, or mounted config files:

LLM credentials and connection details: API keys, endpoints, auth modes, and provider-specific connection settings for OpenAI, Anthropic, Bedrock, Ollama, custom providers, and similar backends; config-path settings apply only where supported, such as OLLAMA_SERVER_CONFIG_PATH and LLM_SERVER_CONFIG_PATH.
Search provider credentials and options: Settings such as DUCKDUCKGO_*, GOOGLE_*, TAVILY_API_KEY, TRAVERSAAL_API_KEY, PERPLEXITY_*, SEARXNG_*, and SPLOITUS_ENABLED.
Third-party integrations: Langfuse, Graphiti, and similar external services remain server-side configuration.
MCP server management: MCP settings pages are not currently exposed as a live web-console feature.

For Production & Enhanced Security:

See detailed guide: Worker Node Setup

The two-node setup provides:

Isolated Execution: Worker containers run on dedicated hardware
Network Isolation: Separate network boundaries for penetration testing
Security Boundaries: Docker-in-Docker with TLS authentication
OOB Attack Support: Dedicated port ranges for out-of-band techniques

Manual Installation

Create a working directory or clone the repository:

mkdir pentagi && cd pentagi

Copy .env.example to .env or download it:

curl -o .env https://raw.githubusercontent.com/vxcontrol/pentagi/master/.env.example

Touch examples files (example.custom.provider.yml, example.ollama.provider.yml) or download it:

curl -o example.custom.provider.yml https://raw.githubusercontent.com/vxcontrol/pentagi/master/examples/configs/custom-openai.provider.yml
curl -o example.ollama.provider.yml https://raw.githubusercontent.com/vxcontrol/pentagi/master/examples/configs/ollama-llama318b.provider.yml

Fill in the required API keys in .env file.

# Required: At least one of these LLM providers
OPEN_AI_KEY=your_openai_key
ANTHROPIC_API_KEY=your_anthropic_key
GEMINI_API_KEY=your_gemini_key

# Optional: AWS Bedrock provider (enterprise-grade models)
BEDROCK_REGION=us-east-1
# Choose one authentication method:
BEDROCK_DEFAULT_AUTH=true                        # Option 1: Use AWS SDK default credential chain (recommended for EC2/ECS)
# BEDROCK_BEARER_TOKEN=your_bearer_token         # Option 2: Bearer token authentication
# BEDROCK_ACCESS_KEY_ID=your_aws_access_key      # Option 3: Static credentials
# BEDROCK_SECRET_ACCESS_KEY=your_aws_secret_key

# Optional: Ollama provider (local or cloud)
# OLLAMA_SERVER_URL=http://ollama-server:11434   # Local server
# OLLAMA_SERVER_URL=https://ollama.com           # Cloud service
# OLLAMA_SERVER_API_KEY=your_ollama_cloud_key    # Required for cloud, empty for local

# Optional: Chinese AI providers
# DEEPSEEK_API_KEY=your_deepseek_key             # DeepSeek (strong reasoning)
# GLM_API_KEY=your_glm_key                       # GLM (Zhipu AI)
# KIMI_API_KEY=your_kimi_key                     # Kimi (Moonshot AI, ultra-long context)
# QWEN_API_KEY=your_qwen_key                     # Qwen (Alibaba Cloud, multimodal)

# Optional: Local LLM provider (zero-cost inference)
OLLAMA_SERVER_URL=http://localhost:11434
OLLAMA_SERVER_MODEL=your_model_name

# Optional: Additional search capabilities
DUCKDUCKGO_ENABLED=true
DUCKDUCKGO_REGION=us-en
DUCKDUCKGO_SAFESEARCH=
DUCKDUCKGO_TIME_RANGE=
SPLOITUS_ENABLED=true
GOOGLE_API_KEY=your_google_key
GOOGLE_CX_KEY=your_google_cx
TAVILY_API_KEY=your_tavily_key
TRAVERSAAL_API_KEY=your_traversaal_key
PERPLEXITY_API_KEY=your_perplexity_key
PERPLEXITY_MODEL=sonar-pro
PERPLEXITY_CONTEXT_SIZE=medium

# Searxng meta search engine (aggregates results from multiple sources)
SEARXNG_URL=http://your-searxng-instance:8080
SEARXNG_CATEGORIES=general
SEARXNG_LANGUAGE=
SEARXNG_SAFESEARCH=0
SEARXNG_TIME_RANGE=
SEARXNG_TIMEOUT=

## Graphiti knowledge graph settings
GRAPHITI_ENABLED=true
GRAPHITI_TIMEOUT=30
GRAPHITI_URL=http://graphiti:8000
GRAPHITI_MODEL_NAME=gpt-5-mini

# Neo4j settings (used by Graphiti stack)
NEO4J_USER=neo4j
NEO4J_DATABASE=neo4j
NEO4J_PASSWORD=devpassword
NEO4J_URI=bolt://neo4j:7687

# Assistant configuration
ASSISTANT_USE_AGENTS=false         # Default value for agent usage when creating new assistants

Change all security related environment variables in .env file to improve security.

Main Security Settings

COOKIE_SIGNING_SALT - Salt for cookie signing, change to random value
PUBLIC_URL - Public URL of your server (eg. https://pentagi.example.com)
SERVER_SSL_CRT and SERVER_SSL_KEY - Custom paths to your existing SSL certificate and key for HTTPS (these paths should be used in the docker-compose.yml file to mount as volumes)

Scraper Access

SCRAPER_PUBLIC_URL - Public URL for scraper if you want to use different scraper server for public URLs
SCRAPER_PRIVATE_URL - Private URL for scraper (local scraper server in docker-compose.yml file to access it to local URLs)

Access Credentials

PENTAGI_POSTGRES_USER and PENTAGI_POSTGRES_PASSWORD - PostgreSQL credentials
NEO4J_USER and NEO4J_PASSWORD - Neo4j credentials (for Graphiti knowledge graph)

Remove all inline comments from .env file if you want to use it in VSCode or other IDEs as a envFile option:

perl -i -pe 's/\s+#.*$//' .env

Run the PentAGI stack:

curl -O https://raw.githubusercontent.com/vxcontrol/pentagi/master/docker-compose.yml
docker compose up -d

Visit localhost:8443 to access PentAGI Web UI (default is admin@pentagi.com / admin)

Web UI Accounts

PentAGI does not expose public self-service sign-up from the login page. A fresh installation creates the default local administrator account:

Email: admin@pentagi.com
Password: admin

[!NOTE] If you caught an error about pentagi-network or observability-network or langfuse-network you need to run docker-compose.yml firstly to create these networks and after that run docker-compose-langfuse.yml, docker-compose-graphiti.yml, and docker-compose-observability.yml to use Langfuse, Graphiti, and Observability services.
You have to set at least one Language Model provider (OpenAI, Anthropic, Gemini, AWS Bedrock, or Ollama) to use PentAGI. AWS Bedrock provides enterprise-grade access to multiple foundation models from leading AI companies, while Ollama provides zero-cost local inference if you have sufficient computational resources. Additional API keys for search engines are optional but recommended for better results.
For fully local deployment with advanced models: See our comprehensive guide on Running PentAGI with vLLM and Qwen3.5-27B-FP8 for a production-grade local LLM setup. This configuration achieves ~13,000 TPS for prompt processing and ~650 TPS for completion on 4× RTX 5090 GPUs, supporting 12+ concurrent flows with complete independence from cloud providers.
LLM_SERVER_* environment variables are experimental feature and will be changed in the future. Right now you can use them to specify custom LLM server URL and one model for all agent types.
PROXY_URL is a global proxy URL for all LLM providers and external search systems. You can use it for isolation from external networks.
The docker-compose.yml file runs the PentAGI service as root user because it needs access to docker.sock for container management. If you're using TCP/IP network connection to Docker instead of socket file, you can remove root privileges and use the default pentagi user for better security.

Accessing PentAGI from External Networks

By default, PentAGI binds to 127.0.0.1 (localhost only) for security. To access PentAGI from other machines on your network, you need to configure external access.

Configuration Steps

Update .env file with your server's IP address:

# Network binding - allow external connections
PENTAGI_LISTEN_IP=0.0.0.0
PENTAGI_LISTEN_PORT=8443

# Public URL - use your actual server IP or hostname
# Replace 192.168.1.100 with your server's IP address
PUBLIC_URL=https://192.168.1.100:8443

# CORS origins - list all URLs that will access PentAGI
# Include localhost for local access AND your server IP for external access
CORS_ORIGINS=https://localhost:8443,https://192.168.1.100:8443

[!IMPORTANT]
Replace 192.168.1.100 with your actual server's IP address
Do NOT use 0.0.0.0 in PUBLIC_URL or CORS_ORIGINS - use the actual IP address
Include both localhost and your server IP in CORS_ORIGINS for flexibility

Recreate containers to apply the changes:

docker compose down
docker compose up -d --force-recreate

Verify port binding:

docker ps | grep pentagi

You should see 0.0.0.0:8443->8443/tcp or :::8443->8443/tcp.

If you see 127.0.0.1:8443->8443/tcp, the environment variable wasn't picked up. In this case, directly edit docker-compose.yml line 31:

ports:
  - "0.0.0.0:8443:8443"

Then recreate containers again.

Configure firewall to allow incoming connections on port 8443:

# Ubuntu/Debian with UFW
sudo ufw allow 8443/tcp
sudo ufw reload

# CentOS/RHEL with firewalld
sudo firewall-cmd --permanent --add-port=8443/tcp
sudo firewall-cmd --reload

Access PentAGI:

Local access: https://localhost:8443
Network access: https://your-server-ip:8443

[!NOTE] You'll need to accept the self-signed SSL certificate warning in your browser when accessing via IP address.

Running PentAGI with Podman

Podman Rootless Configuration

The default scraper configuration uses port 443 (HTTPS), which is a privileged port. For Podman rootless, reconfigure the scraper to use a non-privileged port:

1. Edit docker-compose.yml - modify the scraper service (around line 199):

scraper:
  image: vxcontrol/scraper:latest
  restart: unless-stopped
  container_name: scraper
  hostname: scraper
  expose:
    - 3000/tcp  # Changed from 443 to 3000
  ports:
    - "${SCRAPER_LISTEN_IP:-127.0.0.1}:${SCRAPER_LISTEN_PORT:-9443}:3000"  # Map to port 3000
  environment:
    - MAX_CONCURRENT_SESSIONS=${LOCAL_SCRAPER_MAX_CONCURRENT_SESSIONS:-10}
    - USERNAME=${LOCAL_SCRAPER_USERNAME:-someuser}
    - PASSWORD=${LOCAL_SCRAPER_PASSWORD:-somepass}
  logging:
    options:
      max-size: 50m
      max-file: "7"
  volumes:
    - scraper-ssl:/usr/src/app/ssl
  networks:
    - pentagi-network
  shm_size: 2g

2. Update .env file - change the scraper URL to use HTTP and port 3000:

# Scraper configuration for Podman rootless
SCRAPER_PRIVATE_URL=http://someuser:somepass@scraper:3000/
LOCAL_SCRAPER_USERNAME=someuser
LOCAL_SCRAPER_PASSWORD=somepass

[!IMPORTANT] Key changes for Podman:
Use HTTP instead of HTTPS for SCRAPER_PRIVATE_URL
Use port 3000 instead of 443
Change internal expose to 3000/tcp
Update port mapping to target 3000 instead of 443

3. Recreate containers:

podman-compose down
podman-compose up -d --force-recreate

4. Test scraper connectivity:

# Test from within the pentagi container
podman exec -it pentagi wget -O- "http://someuser:somepass@scraper:3000/html?url=http://example.com"

If you see HTML output, the scraper is working correctly.

Podman Rootful Mode

If you're running Podman in rootful mode (with sudo), you can use the default configuration without modifications. The scraper will work on port 443 as intended.

Docker Compatibility

All Podman configurations remain fully compatible with Docker. The non-privileged port approach works identically on both container runtimes.

Assistant Configuration

PentAGI allows you to configure default behavior for assistants:

VariableDefaultDescriptionASSISTANT_USE_AGENTSfalseControls the default value for agent usage when creating new assistants

The ASSISTANT_USE_AGENTS setting affects the initial state of the "Use Agents" toggle when creating a new assistant in the UI:

false (default): New assistants are created with agent delegation disabled by default
true: New assistants are created with agent delegation enabled by default

Once the stack is running and you can sign in to the web UI, the fastest way to start is through the Flows workflow.

1. Create your first flow

Open Flows in the sidebar.
Click New Flow.
Choose the mode that fits your goal:
- Automation: fully autonomous execution for a testing goal you want PentAGI to carry out end-to-end
- Assistant: interactive back-and-forth help when you want to steer the investigation step by step. In this mode you can also enable the Use Agents toggle to let PentAGI delegate subtasks to specialized sub-agents for more complex investigations.
Select the LLM provider you want to use for this flow.
Describe the target and the objective in natural language in the message box.

Good first prompts usually include:

the target system or URL
the type of assessment you want
any scope limitations or rules of engagement
the result you expect, such as a vulnerability report or validation of a hypothesis

Example:

Assess https://target.example for common web application vulnerabilities. Focus on authentication, file handling, and injection issues. Stay within the provided target only and summarize confirmed findings with reproduction steps.

Only test systems you own or are explicitly authorized to assess. See EULA.md for the acceptable use requirements.

2. Use templates for repeatable workflows

The new flow form includes a template picker, which can prefill the message box with a saved flow template. This is useful when you run similar assessments repeatedly.

Use an existing template if you already have one saved in Templates
Start from the example prompt in examples/prompts/base_web_pentest.md if you need a practical baseline for web testing
Adjust the target, scope, and constraints before starting the flow

Templates are starting points. You do not need special syntax to use PentAGI: plain natural-language instructions work well as long as the target and goal are clear.

3. Monitor execution and review output

After submitting the flow, PentAGI opens the flow page automatically.

Use the main flow view to follow messages, agent activity, and task progress
Inspect tool activity and terminal output as the flow runs
Review generated tasks and subtasks to understand what PentAGI is doing

Once the flow has enough results, use the Report menu on the flow page to:

open the report in a web view
copy the generated report to the clipboard
download the report as Markdown
download the report as PDF

4. Use the Assistant view to steer an active flow

Each flow also includes an Assistant view for interactive guidance. This is useful when the autonomous run uncovers something that needs human direction instead of a hard restart.

Open the Assistant view for the same flow when you want to inspect the current state before changing anything.
Use the assistant to check flow status, stop the current task, submit follow-up instructions, or patch the remaining planned subtasks before the next step runs.
Treat this as an explicit control path for the current flow, not as an invisible background queue. If you want to change direction, say so clearly and keep the new instruction tied to the current engagement scope.
This works best for clarifying scope, redirecting priorities after intermediate findings, or answering an automation checkpoint without losing the rest of the flow context.

5. Manage flow-scoped files

Each flow has its own Files tab in the flow page. Files are scoped to the parent flow: they live in {dataDir}/flow-{id}-data/ on the host and never leak into other flows.

The tab exposes three sources of files:

Uploads (uploads/): files you provide from the web UI. Use the Upload files action, or drag and drop directly onto the Files tab. While the agent container is running, uploaded files are also pushed into it at /work/uploads/ so the agent can read them with normal shell tools.
Resources (resources/): files attached from your saved user resources library via Attach resources from library. Attached resources are copied into the flow and pushed into the running container at /work/resources/.
Container (container/): snapshots pulled from the running agent container via Pull file or directory from container. These are read-only on the flow side and are never sent back to the container.

Current limits and limitations to be aware of:

Maximum upload file size is 300 MB; per upload request up to 1000 files and 2 GB total. File names are capped at 255 bytes (roughly 255 ASCII characters; non-ASCII names use multiple bytes per character).
Uploads and resources are mirrored into the running container at the fixed paths /work/uploads/ and /work/resources/; files written to other container paths are not auto-mirrored back into the flow file model. Container snapshots can originate from any container path you pull (for example /etc/...) and are cached on the flow side under container/; they are not pushed back into the container.
Container snapshots are point-in-time pulls. Editing a snapshot in the UI does not write back into the running container.
Deleting a flow today removes the flow record and its long-term memory entries, but does not yet archive or remove the flow's flow-{id}-data/ directory on disk. Operators are still expected to clean up the data directory manually if they want to reclaim the space.

For early testing, start with a narrow target and a single clear objective. This makes the output easier to review and helps you refine your prompts before running larger assessments.

API Access

Generating API Tokens

API tokens are managed through the PentAGI web interface:

Navigate to Settings → API Tokens in the web UI
Click Create Token to generate a new API token
Configure token properties:
- Name (optional): A descriptive name for the token
- Expiration Date: When the token will expire (minimum 1 minute, maximum 3 years)
Click Create and copy the token immediately - it will only be shown once for security reasons
Use the token as a Bearer token in your API requests

Each token is associated with your user account and inherits your role's permissions.

Using API Tokens

Include the API token in the Authorization header of your HTTP requests:

# GraphQL API example
curl -X POST https://your-pentagi-instance:8443/api/v1/graphql \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"query": "{ flows { id title status } }"}'

# REST API example
curl https://your-pentagi-instance:8443/api/v1/flows \
  -H "Authorization: Bearer YOUR_API_TOKEN"

API Exploration and Testing

PentAGI provides interactive documentation for exploring and testing API endpoints:

GraphQL Playground

Access the GraphQL Playground at https://your-pentagi-instance:8443/api/v1/graphql/playground

Click the HTTP Headers tab at the bottom

Add your authorization header:

{
  "Authorization": "Bearer YOUR_API_TOKEN"
}

Explore the schema, run queries, and test mutations interactively

Swagger UI

Access the REST API documentation at https://your-pentagi-instance:8443/api/v1/swagger/index.html

Click the Authorize button
Enter your token in the format: Bearer YOUR_API_TOKEN
Click Authorize to apply
Test endpoints directly from the Swagger UI

Generating API Clients

You can generate type-safe API clients for your preferred programming language using the schema files included with PentAGI:

GraphQL Clients

The GraphQL schema is available at:

Web UI: Navigate to Settings to download schema.graphqls
Direct file: backend/pkg/graph/schema.graphqls in the repository

Generate clients using tools like:

GraphQL Code Generator (JavaScript/TypeScript): https://the-guild.dev/graphql/codegen
genqlient (Go): https://github.com/Khan/genqlient
Apollo iOS (Swift): https://www.apollographql.com/docs/ios

REST API Clients

The OpenAPI specification is available at:

Swagger JSON: https://your-pentagi-instance:8443/api/v1/swagger/doc.json
Swagger YAML: Available in backend/pkg/server/docs/swagger.yaml

Generate clients using:

OpenAPI Generator: https://openapi-generator.tech

openapi-generator-cli generate \
  -i https://your-pentagi-instance:8443/api/v1/swagger/doc.json \
  -g python \
  -o ./pentagi-client

Swagger Codegen: https://github.com/swagger-api/swagger-codegen

swagger-codegen generate \
  -i https://your-pentagi-instance:8443/api/v1/swagger/doc.json \
  -l typescript-axios \
  -o ./pentagi-client

swagger-typescript-api (TypeScript): https://github.com/acacode/swagger-typescript-api

npx swagger-typescript-api \
  -p https://your-pentagi-instance:8443/api/v1/swagger/doc.json \
  -o ./src/api \
  -n pentagi-api.ts

API Usage Examples

mutation CreateFlow {
  createFlow(
    modelProvider: "openai"
    input: "Test the security of https://example.com"
  ) {
    id
    title
    status
    createdAt
  }
}

curl https://your-pentagi-instance:8443/api/v1/flows \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  | jq '.flows[] | {id, title, status}'

import requests

class PentAGIClient:
    def __init__(self, base_url, api_token):
        self.base_url = base_url
        self.headers = {
            "Authorization": f"Bearer {api_token}",
            "Content-Type": "application/json"
        }
    
    def create_flow(self, provider, target):
        query = """
        mutation CreateFlow($provider: String!, $input: String!) {
          createFlow(modelProvider: $provider, input: $input) {
            id
            title
            status
          }
        }
        """
        response = requests.post(
            f"{self.base_url}/api/v1/graphql",
            json={
                "query": query,
                "variables": {
                    "provider": provider,
                    "input": target
                }
            },
            headers=self.headers
        )
        return response.json()
    
    def get_flows(self):
        response = requests.get(
            f"{self.base_url}/api/v1/flows",
            headers=self.headers
        )
        return response.json()

# Usage
client = PentAGIClient(
    "https://your-pentagi-instance:8443",
    "your_api_token_here"
)

# Create a new flow
flow = client.create_flow("openai", "Scan https://example.com for vulnerabilities")
print(f"Created flow: {flow}")

# List all flows
flows = client.get_flows()
print(f"Total flows: {len(flows['flows'])}")

import axios, { AxiosInstance } from 'axios';

interface Flow {
  id: string;
  title: string;
  status: string;
  createdAt: string;
}

class PentAGIClient {
  private client: AxiosInstance;

  constructor(baseURL: string, apiToken: string) {
    this.client = axios.create({
      baseURL: `${baseURL}/api/v1`,
      headers: {
        'Authorization': `Bearer ${apiToken}`,
        'Content-Type': 'application/json',
      },
    });
  }

  async createFlow(provider: string, input: string): Promise<Flow> {
    const query = `
      mutation CreateFlow($provider: String!, $input: String!) {
        createFlow(modelProvider: $provider, input: $input) {
          id
          title
          status
          createdAt
        }
      }
    `;

    const response = await this.client.post('/graphql', {
      query,
      variables: { provider, input },
    });

    return response.data.data.createFlow;
  }

  async getFlows(): Promise<Flow[]> {
    const response = await this.client.get('/flows');
    return response.data.flows;
  }

  async getFlow(flowId: string): Promise<Flow> {
    const response = await this.client.get(`/flows/${flowId}`);
    return response.data;
  }
}

// Usage
const client = new PentAGIClient(
  'https://your-pentagi-instance:8443',
  'your_api_token_here'
);

// Create a new flow
const flow = await client.createFlow(
  'openai',
  'Perform penetration test on https://example.com'
);
console.log('Created flow:', flow);

// List all flows
const flows = await client.getFlows();
console.log(`Total flows: ${flows.length}`);

Security Best Practices

When working with API tokens:

Never commit tokens to version control - use environment variables or secrets management
Rotate tokens regularly - set appropriate expiration dates and create new tokens periodically
Use separate tokens for different applications - makes it easier to revoke access if needed
Monitor token usage - review API token activity in the Settings page
Revoke unused tokens - disable or delete tokens that are no longer needed
Use HTTPS only - never send API tokens over unencrypted connections

Token Management

View tokens: See all your active tokens in Settings → API Tokens
Edit tokens: Update token names or revoke tokens
Delete tokens: Permanently remove tokens (this action cannot be undone)
Token ID: Each token has a unique ID that can be copied for reference

The token list shows:

Token name (if provided)
Token ID (unique identifier)
Status (active/revoked/expired)
Creation date
Expiration date

Custom LLM Provider Configuration

When using custom LLM providers with the LLM_SERVER_* variables, you can fine-tune the reasoning format used in requests.

[!TIP] For production-grade local deployments, consider using vLLM with Qwen3.5-27B-FP8 for optimal performance. See our comprehensive deployment guide which includes hardware requirements, configuration templates (thinking mode and non-thinking mode), and performance benchmarks showing 13K TPS prompt processing on 4× RTX 5090 GPUs.

The LLM_SERVER_LEGACY_REASONING setting affects how reasoning parameters are sent to the LLM:

false (default): Uses modern format where reasoning is sent as a structured object with max_tokens parameter
true: Uses legacy format with string-based reasoning_effort parameter

The LLM_SERVER_PRESERVE_REASONING setting controls whether reasoning content is preserved in multi-turn conversations:

false (default): Reasoning content is not preserved in conversation history
true: Reasoning content is preserved and sent in subsequent API calls

Ollama Provider Configuration

PentAGI supports Ollama for both local LLM inference (zero-cost, enhanced privacy) and Ollama Cloud (managed service with free tier).

Configuration Variables

Ollama Cloud Configuration

Ollama Cloud provides managed inference with a generous free tier and scalable paid plans.

Free Tier Setup (Single Model)

# Free tier allows one model at a time
OLLAMA_SERVER_URL=https://ollama.com
OLLAMA_SERVER_API_KEY=your_ollama_cloud_api_key
OLLAMA_SERVER_MODEL=gpt-oss:120b  # Example: OpenAI OSS 120B model

Paid Tier Setup (Multi-Model with Pre-built Configuration)

For paid tiers supporting multiple concurrent models, use the pre-built Ollama Cloud configuration:

# Using pre-built Ollama Cloud configuration (included in Docker image)
OLLAMA_SERVER_URL=https://ollama.com
OLLAMA_SERVER_API_KEY=your_ollama_cloud_api_key
OLLAMA_SERVER_CONFIG_PATH=/opt/pentagi/conf/ollama-cloud.provider.yml

The pre-built ollama-cloud.provider.yml configuration includes optimized model assignments for all agent types:

Simple/Assistant: nemotron-3-super:cloud - Fast general-purpose model
Primary Agent: qwen3-coder-next:cloud - Advanced reasoning with high effort mode
Coder/Pentester: qwen3-coder-next:cloud - Specialized coding models
Searcher: qwen3.5:397b-cloud - Large context for information gathering
Refiner/Refactor: glm-5:cloud - High-quality text refinement
Adviser/Enricher: minimax-m2.7:cloud - Efficient advisory tasks
Installer: devstral-2:123b-cloud - Installation and setup tasks

Custom Configuration (Advanced)

To create your own agent configuration, mount a custom file from your host filesystem:

# Using custom provider configuration
OLLAMA_SERVER_URL=https://ollama.com
OLLAMA_SERVER_API_KEY=your_ollama_cloud_api_key
OLLAMA_SERVER_CONFIG_PATH=/opt/pentagi/conf/ollama.provider.yml

# Mount custom configuration from host filesystem (in .env or docker-compose override)
PENTAGI_OLLAMA_SERVER_CONFIG_PATH=/path/on/host/my-ollama-config.yml

The PENTAGI_OLLAMA_SERVER_CONFIG_PATH environment variable maps your host configuration file to /opt/pentagi/conf/ollama.provider.yml inside the container.

Example custom configuration (my-ollama-config.yml):

primary_agent:
  model: "qwen3-coder-next:cloud"
  temperature: 1.0
  top_p: 0.9
  max_tokens: 32768
  reasoning:
    effort: high

coder:
  model: "qwen3-coder:32b"
  temperature: 1.0
  max_tokens: 20480

Local Ollama Configuration

For self-hosted Ollama instances:

# Basic local Ollama setup
OLLAMA_SERVER_URL=http://localhost:11434
OLLAMA_SERVER_MODEL=llama3.1:8b-instruct-q8_0

# Production setup with auto-pull and model discovery
OLLAMA_SERVER_URL=http://ollama-server:11434
OLLAMA_SERVER_PULL_MODELS_ENABLED=true
OLLAMA_SERVER_PULL_MODELS_TIMEOUT=900
OLLAMA_SERVER_LOAD_MODELS_ENABLED=true

# Using pre-built configurations from Docker image
OLLAMA_SERVER_CONFIG_PATH=/opt/pentagi/conf/ollama-llama318b.provider.yml
# or
OLLAMA_SERVER_CONFIG_PATH=/opt/pentagi/conf/ollama-qwen332b-fp16-tc.provider.yml
# or
OLLAMA_SERVER_CONFIG_PATH=/opt/pentagi/conf/ollama-qwq32b-fp16-tc.provider.yml

Performance Considerations:

Model Discovery (OLLAMA_SERVER_LOAD_MODELS_ENABLED=true): Adds 1-2s startup latency querying Ollama API
Auto-pull (OLLAMA_SERVER_PULL_MODELS_ENABLED=true): First startup may take several minutes downloading models
Pull timeout (OLLAMA_SERVER_PULL_MODELS_TIMEOUT=900): 15 minutes in seconds
Static Config: Disable both flags and specify models in config file for fastest startup

Creating Custom Ollama Models with Extended Context

Important: The num_ctx parameter can only be set during model creation via Modelfile - it cannot be changed after model creation or overridden at runtime.

Example: Qwen3 32B FP16 with Extended Context

Create a Modelfile named Modelfile_qwen3_32b_fp16_tc:

FROM qwen3:32b-fp16
PARAMETER num_ctx 110000
PARAMETER temperature 0.3
PARAMETER top_p 0.8
PARAMETER min_p 0.0
PARAMETER top_k 20
PARAMETER repeat_penalty 1.1

Build the custom model:

ollama create qwen3:32b-fp16-tc -f Modelfile_qwen3_32b_fp16_tc

Example: QwQ 32B FP16 with Extended Context

Create a Modelfile named Modelfile_qwq_32b_fp16_tc:

FROM qwq:32b-fp16
PARAMETER num_ctx 110000
PARAMETER temperature 0.2
PARAMETER top_p 0.7
PARAMETER min_p 0.0
PARAMETER top_k 40
PARAMETER repeat_penalty 1.2

Build the custom model:

ollama create qwq:32b-fp16-tc -f Modelfile_qwq_32b_fp16_tc

Note: The QwQ 32B FP16 model requires approximately 71.3 GB VRAM for inference. Ensure your system has sufficient GPU memory before attempting to use this model.

OpenAI Provider Configuration

Configuration Variables

VariableDefaultDescriptionOPEN_AI_KEYAPI key for OpenAI servicesOPEN_AI_SERVER_URLhttps://api.openai.com/v1OpenAI API endpoint

Configuration Examples

# Basic OpenAI setup
OPEN_AI_KEY=your_openai_api_key
OPEN_AI_SERVER_URL=https://api.openai.com/v1

# Using with proxy for enhanced security
OPEN_AI_KEY=your_openai_api_key
PROXY_URL=http://your-proxy:8080

Supported Models

PentAGI supports 31 OpenAI models with tool calling, streaming, reasoning modes, and prompt caching. Models marked with * are used in default configuration.

GPT-5.2 Series - Latest Flagship Agentic (December 2025)

GPT-5/5.1 Series - Advanced Agentic Models

GPT-5/5.1 Codex Series - Code-Specialized

GPT-4.1 Series - Enhanced Intelligence

GPT-4o Series - Multimodal Flagship

o-Series - Advanced Reasoning Models

Prices: Per 1M tokens. Reasoning models include thinking tokens in output pricing.

[!WARNING] GPT-5 Models - Trusted Access Required*
All GPT-5 series models (gpt-5, gpt-5.1, gpt-5.2, gpt-5-pro, gpt-5.2-pro, and all Codex variants) work unstably with PentAGI and may trigger OpenAI's cybersecurity safety mechanisms without verified access.
To use GPT-5 models reliably:*
Individual users: Verify your identity at chatgpt.com/cyber
Enterprise teams: Request trusted access through your OpenAI representative
Security researchers: Apply for the Cybersecurity Grant Program (includes $10M in API credits)
Recommended alternatives without verification:
Use o-series models (o3, o4-mini, o1) for reasoning tasks
Use gpt-4.1 series for general intelligence and function calling
All o-series and gpt-4.x models work reliably without special access

Reasoning Effort Levels:

High: Maximum reasoning depth (refiner - o3 with high effort)
Medium: Balanced reasoning (primary_agent, assistant, reflector - o4-mini/o3 with medium effort)
Low: Efficient targeted reasoning (coder, installer, pentester - o3/o4-mini with low effort; adviser - gpt-5.2 with low effort)

Key Features:

Extended Reasoning: o-series models with chain-of-thought for complex security analysis
Agentic Intelligence: GPT-5/5.1/5.2 series with enhanced tool integration and autonomous capabilities
Prompt Caching: Cost reduction on repeated context (10-50% of input price)
Code Specialization: Dedicated Codex models for vulnerability discovery and exploit development
Multimodal Support: GPT-4o series for vision-based security assessments
Tool Calling: Robust function calling across all models for pentesting tool orchestration
Streaming: Real-time response streaming for interactive workflows
Proven Track Record: Industry-leading models with CVE discoveries and real-world security applications

Anthropic Provider Configuration

Configuration Variables

VariableDefaultDescriptionANTHROPIC_API_KEYAPI key for Anthropic servicesANTHROPIC_SERVER_URLhttps://api.anthropic.com/v1Anthropic API endpoint

Configuration Examples

# Basic Anthropic setup
ANTHROPIC_API_KEY=your_anthropic_api_key
ANTHROPIC_SERVER_URL=https://api.anthropic.com/v1

# Using with proxy for secure environments
ANTHROPIC_API_KEY=your_anthropic_api_key
PROXY_URL=http://your-proxy:8080

[!NOTE] Google Vertex AI for Claude models
PentAGI does not currently expose a dedicated Google Vertex AI configuration path for Anthropic Claude in .env. There is no separate Vertex AI API key field at this time, and the existing Anthropic variables (ANTHROPIC_API_KEY, ANTHROPIC_SERVER_URL) target the direct Anthropic API. Supported routes for Claude are:
Direct Anthropic API: ANTHROPIC_API_KEY and ANTHROPIC_SERVER_URL (see above).
AWS Bedrock: BEDROCK_* variables (see AWS Bedrock Provider Configuration).
If you need to use Vertex AI today, the safest supported workaround is to expose Vertex AI through an OpenAI-compatible proxy or gateway that translates Vertex AI calls into the Chat Completions format while preserving the chat and tool-call behavior PentAGI relies on, then point the Custom LLM provider at that gateway via LLM_SERVER_URL, LLM_SERVER_KEY, and LLM_SERVER_MODEL. This path is only as reliable as the gateway you choose.

Supported Models

PentAGI supports 10 Claude models with tool calling, streaming, extended thinking, adaptive thinking, and prompt caching. Models marked with * are used in default configuration.

Claude 4 Series - Latest Models (2025-2026)

Legacy Models - Still Supported

Deprecated Models - Migrate to Current Models

Model IDThinkingRelease DatePrice (Input/Output/Cache R/W)Notesclaude-3-haiku-20240307❌Mar 2024$0.25/$1.25/$0.03/$0.30Will be retired April 19, 2026. Migrate to claude-haiku-4-5

Prices: Per 1M tokens. Cache pricing includes both Read and Write costs.

Extended Thinking Configuration:

Max Tokens 4096: Generator (claude-opus-4-6) for maximum reasoning depth on complex exploit development
Max Tokens 2048: Coder (claude-sonnet-4-6) for balanced code analysis and vulnerability research
Max Tokens 1024: Primary agent, assistant, refiner, adviser, reflector, searcher, installer, pentester for focused reasoning on specific tasks
Extended Thinking: All Claude 4.5+ and 4.6 models support configurable extended thinking for deep reasoning tasks

Key Features:

Extended Thinking: All Claude 4.5+ and 4.6 models with configurable chain-of-thought reasoning depths for complex security analysis
Adaptive Thinking: Claude 4.6 series (Opus/Sonnet) dynamically adjusts reasoning depth based on task complexity for optimal performance
Prompt Caching: Significant cost reduction with separate read/write pricing (10% read, 125% write of input)
Extended Context Window: 200K tokens standard, up to 1M tokens (beta) for Claude Opus/Sonnet 4.6 for comprehensive codebase analysis
Tool Calling: Robust function calling with exceptional accuracy for security tool orchestration
Streaming: Real-time response streaming for interactive penetration testing workflows
Safety-First Design: Built-in safety mechanisms ensuring responsible security testing practices
Multimodal Support: Vision capabilities in latest models for screenshot analysis and UI security assessment
Constitutional AI: Advanced safety training providing reliable and ethical security guidance

Google AI (Gemini) Provider Configuration

PentAGI integrates with Google's Gemini models through the Google AI API, offering state-of-the-art multimodal reasoning capabilities with extended thinking and context caching.

Configuration Variables

VariableDefaultDescriptionGEMINI_API_KEYAPI key for Google AI servicesGEMINI_SERVER_URLhttps://generativelanguage.googleapis.comGoogle AI API endpoint

Configuration Examples

# Basic Gemini setup
GEMINI_API_KEY=your_gemini_api_key
GEMINI_SERVER_URL=https://generativelanguage.googleapis.com

# Using with proxy
GEMINI_API_KEY=your_gemini_api_key
PROXY_URL=http://your-proxy:8080

Supported Models

PentAGI supports 9 Gemini models with tool calling, streaming, thinking modes, and context caching. Models marked with * are used in default configuration.

Gemini 3.5 Series - Latest Stable Flash (May 2026)

Gemini 3.1 Series - Stable Flash-Lite + Pro Preview (Feb-May 2026)

Gemini 2.5 Series - Advanced Thinking Models (active until October 16, 2026)

Gemma 4 Open-Source Models (Apache 2.0, Free Tier)

Prices: Per 1M tokens (Standard Paid tier). Context window is input token limit.

[!NOTE] Gemini 2.5 Series Shutdown
gemini-2.5-pro, gemini-2.5-flash, and gemini-2.5-flash-lite will be shut down on October 16, 2026. Recommended migrations:
gemini-2.5-pro → gemini-3.1-pro-preview (same $2.00 input pricing tier)
gemini-2.5-flash → gemini-3.5-flash (improved frontier capabilities)
gemini-2.5-flash-lite → gemini-3.1-flash-lite (same $0.25 input pricing)

Default Model Assignments (config.yml):

gemini-3.1-pro-preview - primary_agent, assistant, generator, refiner, adviser, coder, pentester
gemini-3.5-flash - reflector, searcher, enricher, installer
gemini-3.1-flash-lite - simple, simple_json

Key Features:

Extended Thinking: Step-by-step reasoning for complex security analysis (all Gemini 3.x, 2.5 series, and Gemma 4 with toggleable thinking)
Context Caching: Significant cost reduction on repeated context (10% of input price for most models)
Ultra-Long Context: 1M tokens for Gemini chat models, 256K tokens for Gemma 4 open-source models
Multimodal Support: Text, image, video, audio, and PDF processing for comprehensive assessments
Tool Calling: Seamless integration with 20+ pentesting tools via function calling
Streaming: Real-time response streaming for interactive security workflows
Code Execution: Built-in code execution for offensive tool testing and exploit validation
Search Grounding: Google Search integration for threat intelligence and CVE research
File Search: Document retrieval and RAG capabilities for knowledge-based assessments
Batch API: 50% cost reduction for non-real-time batch processing
Custom Tools Endpoint: Dedicated gemini-3.1-pro-preview-customtools route for tool-heavy agentic workflows that prefer registered tools over bash

Reasoning Effort Levels:

High: Maximum thinking depth for complex multi-step analysis (generator)
Medium: Balanced reasoning for general agentic tasks (primary_agent, assistant, refiner, adviser)
Low: Efficient thinking for focused tasks (coder, installer, pentester)

AWS Bedrock Provider Configuration

PentAGI integrates with Amazon Bedrock, offering access to 20+ foundation models from leading AI companies including Anthropic, Amazon, Cohere, DeepSeek, OpenAI, Qwen, Mistral, and Moonshot.

Configuration Variables

Authentication Priority: BEDROCK_DEFAULT_AUTH → BEDROCK_BEARER_TOKEN → BEDROCK_ACCESS_KEY_ID+BEDROCK_SECRET_ACCESS_KEY

Configuration Examples

# Recommended: Default AWS SDK authentication (EC2/ECS/Lambda roles)
BEDROCK_REGION=us-east-1
BEDROCK_DEFAULT_AUTH=true

# Bearer token authentication (AWS STS, custom auth)
BEDROCK_REGION=us-east-1
BEDROCK_BEARER_TOKEN=your_bearer_token

# Static credentials (development, testing)
BEDROCK_REGION=us-east-1
BEDROCK_ACCESS_KEY_ID=your_aws_access_key
BEDROCK_SECRET_ACCESS_KEY=your_aws_secret_key

# With proxy and custom endpoint
BEDROCK_REGION=us-east-1
BEDROCK_DEFAULT_AUTH=true
BEDROCK_SERVER_URL=https://bedrock-runtime.us-east-1.vpce-xxx.amazonaws.com
PROXY_URL=http://your-proxy:8080

Supported Models

PentAGI supports 21 AWS Bedrock models with tool calling, streaming, and multimodal capabilities. Models marked with * are used in default configuration.

Prices: Per 1M tokens. Models with thinking/reasoning support additional compute costs during reasoning phase.

Tested but Incompatible Models

Some AWS Bedrock models were tested but are not supported due to technical limitations:

[!IMPORTANT] Rate Limits & Quota Management
Default AWS Bedrock quotas for Claude models are extremely restrictive (2-20 requests/minute for new accounts). For production penetration testing:
Request quota increases through AWS Service Quotas console for models you plan to use
Use Amazon Nova models - higher default quotas and excellent performance
Enable provisioned throughput for consistent high-volume testing
Monitor usage - AWS throttles aggressively at quota limits
Without quota increases, expect frequent delays and workflow interruptions.

[!WARNING] Converse API Requirements
PentAGI uses Amazon Bedrock Converse API for unified model access. All supported models require:
✅ Converse/ConverseStream API support
✅ Tool use (function calling) for penetration testing workflows
✅ Streaming tool use for real-time feedback
Verify model capabilities at: AWS Bedrock Model Features

Key Features:

Automatic Prompt Caching: 40-70% cost reduction on repeated context (Claude 4.x models)
Extended Thinking: Step-by-step reasoning for complex security analysis (Claude, DeepSeek R1, OpenAI GPT)
Multimodal Analysis: Process screenshots, diagrams, video for comprehensive testing (Nova, Claude, Mistral, Kimi)
Tool Calling: Seamless integration with 20+ pentesting tools via function calling
Streaming: Real-time response streaming for interactive security assessment workflows

DeepSeek Provider Configuration

PentAGI integrates with DeepSeek, providing access to advanced AI models with strong reasoning, coding capabilities, and context caching at competitive prices.

Configuration Variables

Configuration Examples

# Direct API usage
DEEPSEEK_API_KEY=your_deepseek_api_key
DEEPSEEK_SERVER_URL=https://api.deepseek.com

# With LiteLLM proxy
DEEPSEEK_API_KEY=your_litellm_key
DEEPSEEK_SERVER_URL=http://litellm-proxy:4000
DEEPSEEK_PROVIDER=deepseek  # Adds prefix to model names (deepseek/deepseek-v4-flash) for LiteLLM

Supported Models

Pricing Note (deepseek-v4-pro): The 75% promotional discount on deepseek-v4-pro officially ended on 2026-05-31 15:59 UTC. The prices above reflect the standard post-promotional pricing. If you have legacy configurations using the discounted prices ($0.435/$0.87/$0.003625), update them to the current rates for accurate cost tracking.

The legacy model names deepseek-chat and deepseek-reasoner are scheduled for deprecation by DeepSeek on 2026-07-24. Existing user configurations referencing the legacy names continue to work until then; the defaults above use the current V4 names. deepseek-chat maps to deepseek-v4-flash non-thinking mode; deepseek-reasoner maps to deepseek-v4-flash thinking mode.

Default Agent Configuration:

Note: When thinking mode is enabled, DeepSeek silently ignores temperature, top_p, presence_penalty, and frequency_penalty. The langchaingo client automatically nullifies temperature/top_p when reasoning_effort is set, so they appear as "(auto)" in the table above. All thinking-enabled agents also explicitly pass extra_body.thinking.type: enabled as defensive coding against future provider default changes.

Key Features:

Hybrid Thinking Modes: Switch between thinking (deep reasoning) and non-thinking (fast) modes via extra_body.thinking.type
Automatic Prompt Caching: Significant cost reduction on repeated context via cache-hit pricing (1/10 of launch price)
Extended Thinking: Reinforcement learning CoT for complex security analysis (both V4 models)
Strong Coding: Optimized for code generation and exploit development
Long Context: 1M token context window with up to 384K output tokens
Tool Calling: Seamless integration with 20+ pentesting tools via function calling
Streaming: Real-time response streaming for interactive workflows
Multilingual: Strong Chinese and English support
Additional Features: JSON Output, Chat Prefix Completion (beta), FIM/Fill-in-the-Middle Completion (non-thinking mode only)

Concurrency Limits: deepseek-v4-flash: 2500 concurrent requests; deepseek-v4-pro: 500 concurrent requests.

LiteLLM Integration: Set DEEPSEEK_PROVIDER=deepseek to enable model name prefixing when using default PentAGI configurations with LiteLLM proxy. Leave empty for direct API usage.

GLM Provider Configuration

PentAGI integrates with GLM from Zhipu AI (Z.AI), providing advanced language models with MoE architecture, strong reasoning, and agentic capabilities developed by Tsinghua University.

Configuration Variables

Configuration Examples

# Direct API usage (international endpoint)
GLM_API_KEY=your_glm_api_key
GLM_SERVER_URL=https://api.z.ai/api/paas/v4

# Alternative endpoints
GLM_SERVER_URL=https://open.bigmodel.cn/api/paas/v4  # China
GLM_SERVER_URL=https://api.z.ai/api/coding/paas/v4   # Coding-specific

# With LiteLLM proxy
GLM_API_KEY=your_litellm_key
GLM_SERVER_URL=http://litellm-proxy:4000
GLM_PROVIDER=zai  # Adds prefix to model names (zai/glm-4) for LiteLLM

Supported Models

GLM-5.x Series - Latest Generation (200K context, 128K max output)

GLM-4.7 Series - Premium with Interleaved Thinking

GLM-4.6 Series - Balanced with Auto-Thinking

Model IDThinkingContextMax OutputPrice (Input/Output/Cache)Use Caseglm-4.6✅ Auto200K128K$0.60/$2.20/$0.11Balanced, streaming tool calls, token-efficient

GLM-4.5 Series - Unified Reasoning/Coding/Agents

GLM-4 Legacy - Dense Architecture

Model IDThinkingContextMax OutputPrice (Input/Output)Use Caseglm-4-32b-0414-128k❌128K16K$0.10/$0.10Ultra-budget dense 32B, parsing without reasoning

Prices: Per 1M tokens. Cache pricing is for prompt cache hit; cache storage is currently free per Z.AI promotion. GLM-4-32B has no cache support.

Default Agent Configuration:

Note on temperature: GLM accepts both 1.0 and 0.6 in either thinking/non-thinking mode (per Z.AI docs). langchaingo's IsReasoningModel matches glm-4.5*/glm-4.6*/glm-4.7* prefixes and force-overrides temperature to 1.0 in createChatRequest — this is harmless for GLM (unlike Kimi) but means temperature values for those models in YAML are advisory. glm-5/glm-5.1/glm-5-turbo are not matched, so explicit values pass through unchanged.

Thinking Modes:

Hybrid (GLM-5.x, GLM-4.7): Explicit toggle via extra_body.thinking.type
Auto (GLM-4.6, GLM-4.5 series): Model automatically determines when reasoning is needed
Preserved Thinking (Z.AI Coding capability): all thinking-enabled agents in PentAGI also pass extra_body.thinking.clear_thinking: false so that reasoning_content from previous assistant turns is retained across the conversation. This is required on the standard API endpoint (/api/paas/v4) — on the Coding Plan endpoint it would be enabled by default. Improves reasoning continuity and cache hit rates in multi-turn tool call chains.
All thinking-enabled agents also pass extra_body.tool_choice: auto defensively

Key Features:

Long-Horizon Tasks: GLM-5.1 supports 8-hour sustained autonomous execution, ideal for complex multi-stage agentic workflows
OpenClaw-Native Orchestration: GLM-5-Turbo is specifically optimized for tool invocation, instruction following, and long-chain execution
Prompt Caching: Significant cost reduction on repeated context (cached input pricing shown)
Ultra-Long Context: 200K tokens for GLM-5.x/4.7/4.6 series
MoE Architecture: Efficient 744B/40B active (GLM-5/5.1), 355B/32B (GLM-4.5), 106B/12B (GLM-4.5-Air)
Tool Calling: Seamless integration with 20+ pentesting tools via function calling
Streaming: Real-time streaming with streaming tool calls support (GLM-4.6+)
Multilingual: Exceptional Chinese and English NLP capabilities
Free Options: GLM-4.7-Flash and GLM-4.5-Flash for prototyping and experimentation

LiteLLM Integration: Set GLM_PROVIDER=zai to enable model name prefixing when using default PentAGI configurations with LiteLLM proxy. Leave empty for direct API usage.

Kimi Provider Configuration

PentAGI integrates with Kimi from Moonshot AI, providing ultra-long context models with multimodal capabilities perfect for analyzing extensive codebases and documentation.

Configuration Variables

Configuration Examples

# Direct API usage (international endpoint)
KIMI_API_KEY=your_kimi_api_key
KIMI_SERVER_URL=https://api.moonshot.ai/v1

# Alternative endpoint
KIMI_SERVER_URL=https://api.moonshot.cn/v1  # China

# With LiteLLM proxy
KIMI_API_KEY=your_litellm_key
KIMI_SERVER_URL=http://litellm-proxy:4000
KIMI_PROVIDER=moonshot  # Adds prefix to model names (moonshot/kimi-k2.5) for LiteLLM

Supported Models

Kimi K2.x Series - Multimodal Flagship

Moonshot V1 Series - Generation Models (Flexible Parameters)

Moonshot V1 Vision Series - Image Understanding

Prices: Per 1M tokens. Cache pricing applies to prompt tokens served from automatic context cache (only Kimi K2.x models support cache).

CRITICAL — Kimi K2.6/K2.5 parameter constraints: API returns invalid_request_error for any deviation:
temperature: MUST be 1.0 in thinking mode, MUST be 0.6 in non-thinking mode
top_p: MUST be 0.95
n: MUST be 1
presence_penalty and frequency_penalty: MUST be 0 (not modifiable)
Moonshot V1 models use standard OpenAI-compatible parameters with no such constraints.

Default Agent Configuration:

Key Features:

Ultra-Long Context: Up to 256K tokens (K2.x) for comprehensive codebase/documentation analysis
Native Multimodal: K2.6/K2.5 support text + image + video input out of the box
Hybrid Thinking: K2.6/K2.5 toggle between thinking and non-thinking via extra_body.thinking.type
Preserved Thinking (K2.6): thinking.keep: "all" preserves historical reasoning_content across turns — required for multi-turn tool call chains
Automatic Context Caching: K2.x models cache repeated prefixes (~17% of miss price for K2.6, ~17% for K2.5)
Tool Calling: Full function-calling support for K2.x and Moonshot V1
Self-Correction: K2.6 features improved instruction compliance and self-correction
Multilingual: Strong Chinese, English, and multi-language support

LiteLLM Integration: Set KIMI_PROVIDER=moonshot to enable model name prefixing when using default PentAGI configurations with LiteLLM proxy. Leave empty for direct API usage.

Qwen Provider Configuration

PentAGI integrates with Qwen from Alibaba Cloud Model Studio (DashScope), providing powerful multilingual models with reasoning capabilities and context caching support.

Configuration Variables

Configuration Examples

# Direct API usage (Global/US endpoint)
QWEN_API_KEY=your_qwen_api_key
QWEN_SERVER_URL=https://dashscope-us.aliyuncs.com/compatible-mode/v1

# Alternative endpoints
QWEN_SERVER_URL=https://dashscope-intl.aliyuncs.com/compatible-mode/v1  # International (Singapore)
QWEN_SERVER_URL=https://dashscope.aliyuncs.com/compatible-mode/v1       # Chinese Mainland (Beijing)

# With LiteLLM proxy
QWEN_API_KEY=your_litellm_key
QWEN_SERVER_URL=http://litellm-proxy:4000
QWEN_PROVIDER=dashscope  # Adds prefix to model names (dashscope/qwen-plus) for LiteLLM

Supported Models

Flagship Models (Top-tier Reasoning)

Balanced Plus Models (Mid-tier)

Fast Flash Models (Cost-optimized)

Code-Specialized Models

Vision-Language Models (Browser & Screenshot Analysis)

Open-Source Qwen3.6 Series

Open-Source Qwen3.5 Series

Open-Source Qwen3 Coder Series

Open-Source Qwen3 Dense & MoE Series

Region Availability:

Intl (International): Singapore region (dashscope-intl.aliyuncs.com)
Global/US: US Virginia region (dashscope-us.aliyuncs.com)
China: Chinese Mainland Beijing region (dashscope.aliyuncs.com)

Default Agent Configuration:

Key Features:

Agent-Centric Design: Qwen3.7-Max is purpose-built for long-horizon autonomous execution and tool invocation
Automatic Context Caching: 30-50% cost reduction on repeated context with implicit cache
Extended Thinking: Chain-of-thought reasoning for complex security analysis (Qwen3.7/3.6/3.5/3-Max, QVQ-Max)
Code Specialization: Qwen3-Coder series with multi-turn tool interaction and repository-level understanding
Vision-Language: Qwen3-VL series for browser screenshot triage, 2D/3D localization, OCR-level analysis
Tool Calling: Seamless integration with 20+ pentesting tools via function calling
Streaming: Real-time response streaming for interactive workflows
Multilingual: Strong Chinese, English, and multi-language support
Open-Source Variants: Dense and MoE models from 0.6B to 480B for on-premises/air-gapped deployments

LiteLLM Integration: Set QWEN_PROVIDER=dashscope to enable model name prefixing when using default PentAGI configurations with LiteLLM proxy. Leave empty for direct API usage.

Alternative Integrations

DashScope is fully OpenAI-compatible, so Qwen can also power two other PentAGI subsystems through the standard OpenAI client.

As embedding provider (text-embedding-v4, see Alibaba Cloud Model Studio pricing):

EMBEDDING_PROVIDER=openai
EMBEDDING_URL=https://dashscope-intl.aliyuncs.com/compatible-mode/v1  # International (Singapore)
# EMBEDDING_URL=https://dashscope.aliyuncs.com/compatible-mode/v1     # Chinese Mainland
EMBEDDING_KEY=sk-*******
EMBEDDING_MODEL=text-embedding-v4
EMBEDDING_BATCH_SIZE=         # optional, default applies
EMBEDDING_STRIP_NEW_LINES=    # optional, default applies

Note: the Global/US DashScope endpoint (dashscope-us.aliyuncs.com) does not expose embedding APIs — use the International or China endpoints for text-embedding-v4.

Advanced Setup

Langfuse Integration

Langfuse provides advanced capabilities for monitoring and analyzing AI agent operations.

Configure Langfuse environment variables in existing .env file.

Database Credentials

LANGFUSE_POSTGRES_USER and LANGFUSE_POSTGRES_PASSWORD - Langfuse PostgreSQL credentials
LANGFUSE_CLICKHOUSE_USER and LANGFUSE_CLICKHOUSE_PASSWORD - ClickHouse credentials
LANGFUSE_REDIS_AUTH - Redis password

Encryption and Security Keys

LANGFUSE_SALT - Salt for hashing in Langfuse Web UI
LANGFUSE_ENCRYPTION_KEY - Encryption key (32 bytes in hex)
LANGFUSE_NEXTAUTH_SECRET - Secret key for NextAuth

Admin Credentials

LANGFUSE_INIT_USER_EMAIL - Admin email
LANGFUSE_INIT_USER_PASSWORD - Admin password
LANGFUSE_INIT_USER_NAME - Admin username

API Keys and Tokens

LANGFUSE_INIT_PROJECT_PUBLIC_KEY - Project public key (used from PentAGI side too)
LANGFUSE_INIT_PROJECT_SECRET_KEY - Project secret key (used from PentAGI side too)

S3 Storage

LANGFUSE_S3_ACCESS_KEY_ID - S3 access key ID
LANGFUSE_S3_SECRET_ACCESS_KEY - S3 secret access key

Enable integration with Langfuse for PentAGI service in .env file.

LANGFUSE_BASE_URL=http://langfuse-web:3000
LANGFUSE_PROJECT_ID= # default: value from ${LANGFUSE_INIT_PROJECT_ID}
LANGFUSE_PUBLIC_KEY= # default: value from ${LANGFUSE_INIT_PROJECT_PUBLIC_KEY}
LANGFUSE_SECRET_KEY= # default: value from ${LANGFUSE_INIT_PROJECT_SECRET_KEY}

Run the Langfuse stack:

curl -O https://raw.githubusercontent.com/vxcontrol/pentagi/master/docker-compose-langfuse.yml
docker compose -f docker-compose.yml -f docker-compose-langfuse.yml up -d

Visit localhost:4000 to access Langfuse Web UI with credentials from .env file:

LANGFUSE_INIT_USER_EMAIL - Admin email
LANGFUSE_INIT_USER_PASSWORD - Admin password

Monitoring and Observability

For detailed system operation tracking, integration with monitoring tools is available.

Enable integration with OpenTelemetry and all observability services for PentAGI in .env file.

OTEL_HOST=otelcol:8148

Run the observability stack:

curl -O https://raw.githubusercontent.com/vxcontrol/pentagi/master/docker-compose-observability.yml
docker compose -f docker-compose.yml -f docker-compose-observability.yml up -d

Visit localhost:3000 to access Grafana Web UI.

[!NOTE] If you want to use Observability stack with Langfuse, you need to enable integration in .env file to set LANGFUSE_OTEL_EXPORTER_OTLP_ENDPOINT to http://otelcol:4318.
To run all available stacks together (Langfuse, Graphiti, and Observability):
docker compose -f docker-compose.yml -f docker-compose-langfuse.yml -f docker-compose-graphiti.yml -f docker-compose-observability.yml up -d
You can also register aliases for these commands in your shell to run it faster:
alias pentagi="docker compose -f docker-compose.yml -f docker-compose-langfuse.yml -f docker-compose-graphiti.yml -f docker-compose-observability.yml"
alias pentagi-up="docker compose -f docker-compose.yml -f docker-compose-langfuse.yml -f docker-compose-graphiti.yml -f docker-compose-observability.yml up -d"
alias pentagi-down="docker compose -f docker-compose.yml -f docker-compose-langfuse.yml -f docker-compose-graphiti.yml -f docker-compose-observability.yml down"

Knowledge Graph Integration (Graphiti)

[!IMPORTANT] The Graphiti integration is currently a beta feature and has notable provider limitations. See Current Limitations below before enabling it in production.

What is Graphiti?

Graphiti automatically extracts and stores structured knowledge from agent interactions, building a graph of entities, relationships, and temporal context. This enables:

Semantic Memory: Store and recall relationships between tools, targets, vulnerabilities, and techniques
Contextual Understanding: Track how different pentesting actions relate to each other over time
Knowledge Reuse: Learn from past penetration tests and apply insights to new assessments
Advanced Querying: Search for complex patterns like "What tools were effective against similar targets?"

Enabling Graphiti

The Graphiti knowledge graph is optional and disabled by default. To enable it:

Configure Graphiti environment variables in .env file:

## Graphiti knowledge graph settings
GRAPHITI_ENABLED=true
GRAPHITI_TIMEOUT=30
GRAPHITI_URL=http://graphiti:8000
GRAPHITI_MODEL_NAME=gpt-5-mini

# Neo4j settings (used by Graphiti stack)
NEO4J_USER=neo4j
NEO4J_DATABASE=neo4j
NEO4J_PASSWORD=devpassword
NEO4J_URI=bolt://neo4j:7687

# OpenAI API key (required by Graphiti for entity extraction)
OPEN_AI_KEY=your_openai_api_key

Run the Graphiti stack along with the main PentAGI services:

# Download the Graphiti compose file if needed
curl -O https://raw.githubusercontent.com/vxcontrol/pentagi/master/docker-compose-graphiti.yml

# Start PentAGI with Graphiti
docker compose -f docker-compose.yml -f docker-compose-graphiti.yml up -d

Verify Graphiti is running:

# Check service health
docker compose -f docker-compose.yml -f docker-compose-graphiti.yml ps graphiti neo4j

# View Graphiti logs
docker compose -f docker-compose.yml -f docker-compose-graphiti.yml logs -f graphiti

# Access Neo4j Browser (optional)
# Visit http://localhost:7474 and login with NEO4J_USER/NEO4J_PASSWORD

# Access Graphiti API (optional, for debugging)
# Visit http://localhost:8000/docs for Swagger API documentation

[!NOTE] The Graphiti service is defined in docker-compose-graphiti.yml as a separate stack. You must run both compose files together to enable the knowledge graph functionality. The pre-built Docker image vxcontrol/graphiti:latest is used by default.

What Gets Stored

When enabled, PentAGI automatically captures:

Agent Responses: All agent reasoning, analysis, and decisions
Tool Executions: Commands executed, tools used, and their results
Context Information: Flow, task, and subtask hierarchy

Current Limitations

The Graphiti integration is currently a beta feature. Operators should plan around the following constraints before enabling it in production:

OpenAI-compatible LLM only. The bundled vxcontrol/graphiti image authenticates against a single OpenAI-compatible endpoint configured through PentAGI's .env variables OPEN_AI_KEY and OPEN_AI_SERVER_URL (default https://api.openai.com/v1). docker-compose-graphiti.yml maps these into the container as OPENAI_API_KEY and OPENAI_BASE_URL, so operators do not set the container variables directly. Provider credentials configured elsewhere in PentAGI for Anthropic, Google AI (Gemini), AWS Bedrock, DeepSeek, GLM, Kimi, or Qwen are not used by Graphiti for entity extraction. If your deployment cannot reach an OpenAI-compatible endpoint, leave GRAPHITI_ENABLED=false.
Single fixed model per deployment. Graphiti uses one model name (GRAPHITI_MODEL_NAME, default gpt-5-mini) for all extractions. The model cannot be selected per agent or per flow.
Independent billing. Even when a flow runs against a non-OpenAI provider, Graphiti still incurs cost on the configured OpenAI-compatible endpoint.
No in-app graph explorer yet. Browsing the captured graph relies on the Neo4j Browser at http://localhost:7474 and the Graphiti Swagger UI at http://localhost:8000/docs. There is no PentAGI UI surface for the graph today.

When GRAPHITI_ENABLED=false, PentAGI continues to operate with its primary memory and vector store; only the additional knowledge graph features are skipped.

GitHub and Google OAuth Integration

OAuth integration with GitHub and Google allows users to authenticate using their existing accounts on these platforms. This provides several benefits:

Simplified login process without need to create separate credentials
Enhanced security through trusted identity providers
Access to user profile information from GitHub/Google accounts
Seamless integration with existing development workflows

PentAGI uses PUBLIC_URL as the public origin/base URL for OAuth redirects. In the default deployment, both GitHub and Google callbacks are handled by:

${PUBLIC_URL}/api/v1/auth/login-callback

For GitHub OAuth:

Create a new OAuth App in your GitHub account.
Set Homepage URL to your PUBLIC_URL.
Set Authorization callback URL to ${PUBLIC_URL}/api/v1/auth/login-callback.
Add the client credentials to your .env file:

PUBLIC_URL=https://pentagi.example.com
OAUTH_GITHUB_CLIENT_ID=your_github_client_id
OAUTH_GITHUB_CLIENT_SECRET=your_github_client_secret

For Google OAuth:

Create OAuth credentials in your Google Cloud project.
Use the same callback endpoint: ${PUBLIC_URL}/api/v1/auth/login-callback.
Add the client credentials to your .env file:

PUBLIC_URL=https://pentagi.example.com
OAUTH_GOOGLE_CLIENT_ID=your_google_client_id
OAUTH_GOOGLE_CLIENT_SECRET=your_google_client_secret

Docker Image Configuration

When DOCKER_DEFAULT_IMAGE and DOCKER_DEFAULT_IMAGE_FOR_PENTEST are set, AI agents will be limited to the image choices you specify. This is particularly useful for:

Security Enforcement: Restricting usage to only verified and trusted images
Environment Standardization: Using corporate or customized images across all operations
Performance Optimization: Utilizing pre-built images with necessary tools already installed

Configuration examples:

# Using a custom PentAGI application image
PENTAGI_IMAGE=registry.example.com/security/pentagi:latest

# Using a custom image for general tasks
DOCKER_DEFAULT_IMAGE=mycompany/custom-debian:latest

# Using a specialized image for penetration testing
DOCKER_DEFAULT_IMAGE_FOR_PENTEST=mycompany/pentest-tools:v2.0

[!NOTE] If a user explicitly specifies a particular Docker image in their task, the system will try to use that exact image, ignoring these settings. These variables only affect the system's automatic image selection process.

For an advanced OpenVAS/GVM experiment that uses a custom pentest image, see OpenVAS via a Custom Pentest Image.

Restricted Networks, Docker Mirrors, and Proxies

For restricted networks:

Confirm that the host can resolve and reach docker.io.
If your environment requires an outbound proxy for PentAGI or installer HTTP traffic, set the PROXY_URL environment variable. To route Docker image pulls through a proxy, configure the Docker daemon or Docker Desktop proxy separately — Docker does not use PentAGI's PROXY_URL for registry access.
If Docker Hub is blocked or heavily rate-limited, configure an organization-approved registry mirror or registry proxy before running the installer or docker compose up.
Restart Docker after changing the daemon configuration, then rerun the installer checks or Compose startup.

Example Docker daemon mirror configuration:

{
  "registry-mirrors": ["https://mirror.example.com"]
}

See the official Docker documentation for registry mirrors and daemon proxy configuration.

Development

Development Requirements

golang
nodejs
docker
postgres
commitlint

Environment Setup

Backend Setup

Run once cd backend && go mod download to install needed packages.

For generating swagger files have to run

swag init -g ../../pkg/server/router.go -o pkg/server/docs/ --parseDependency --parseInternal --parseDepth 2 -d cmd/pentagi

before installing swag package via

go install github.com/swaggo/swag/cmd/swag@v1.8.7

For generating graphql resolver files have to run

go run github.com/99designs/gqlgen --config ./gqlgen/gqlgen.yml

after that you can see the generated files in pkg/graph folder.

For generating ORM methods (database package) from sqlc configuration

docker run --rm -v $(pwd):/src -w /src --network pentagi-network -e DATABASE_URL="{URL}" sqlc/sqlc:1.27.0 generate -f sqlc/sqlc.yml

For generating Langfuse SDK from OpenAPI specification

fern generate --local

and to install fern-cli

pnpm add -g fern-api

Testing

For running tests cd backend && go test -v ./...

Frontend Setup

Run once cd frontend && pnpm install to install needed packages.

For generating graphql files have to run pnpm run graphql:generate which using graphql-codegen.ts file.

Be sure that you have graphql-codegen installed globally:

pnpm add -g graphql-codegen

After that you can run:

pnpm run prettier to check if your code is formatted correctly
pnpm run prettier:fix to fix it
pnpm run lint to check if your code is linted correctly
pnpm run lint:fix to fix it

For generating SSL certificates you need to run pnpm run ssl:generate which using generate-ssl.ts file or it will be generated automatically when you run pnpm run dev.

Backend Configuration

Edit the configuration for backend in .vscode/launch.json file:

DATABASE_URL - PostgreSQL database URL (eg. postgres://postgres:postgres@localhost:5432/pentagidb?sslmode=disable)
DOCKER_HOST - Docker SDK API (eg. for macOS DOCKER_HOST=unix:///Users/<my-user>/Library/Containers/com.docker.docker/Data/docker.raw.sock) more info

Optional:

SERVER_PORT - Port to run the server (default: 8443)
SERVER_USE_SSL - Enable SSL for the server (default: false)

PostgreSQL / pgvector connection pool sizing

PentAGI opens two independent connection pools to the same Postgres instance:

Additional tuning knob:

DATABASE_MAX_IDLE_CONNS — maximum idle connections kept open in the sql.DB pool between requests (default: 5).

Budget for the stock vxcontrol/pgvector image (max_connections = 100, superuser_reserved_connections = 3):

Available for client connections  = 97
  pentagi sql.DB  (DATABASE_MAX_OPEN_CONNS)   = 25
  pentagi pgxpool (DATABASE_VECTOR_MAX_CONNS) = 10
  pgexporter                                  =  3
  autovacuum workers                          =  3
  ─────────────────────────────────────────
  Total consumed                              = 41
  Free buffer                                 = 56  (≈ 58 %)

pgvector:
  image: vxcontrol/pgvector:latest
  command: postgres -c max_connections=200

To inspect the live connection budget on a running deployment:

# Postgres limits
docker exec pgvector sh -c 'psql -U "$POSTGRES_USER" -d "$POSTGRES_DB" -c \
  "SELECT name, setting FROM pg_settings
   WHERE name IN ('"'"'max_connections'"'"', '"'"'superuser_reserved_connections'"'"');"'

# Current usage vs. available
docker exec pgvector sh -c 'psql -U "$POSTGRES_USER" -d "$POSTGRES_DB" -c \
  "SELECT max_conn, used, max_conn - used AS available
   FROM (SELECT current_setting('"'"'max_connections'"'"')::int AS max_conn,
                count(*) AS used FROM pg_stat_activity) t;"'

# Breakdown by client
docker exec pgvector sh -c 'psql -U "$POSTGRES_USER" -d "$POSTGRES_DB" -c \
  "SELECT application_name, client_addr, state, count(*)
   FROM pg_stat_activity
   WHERE pid <> pg_backend_pid()
   GROUP BY 1, 2, 3 ORDER BY count DESC;"'

Frontend Configuration

Edit the configuration for frontend in .vscode/launch.json file:

VITE_API_URL - Backend API URL. Omit the URL scheme (e.g., localhost:8080 NOT http://localhost:8080)
VITE_USE_HTTPS - Enable SSL for the server (default: false)
VITE_PORT - Port to run the server (default: 8000)
VITE_HOST - Host to run the server (default: 0.0.0.0)

Running the Application

Backend

Run the command(s) in backend folder:

Use .env file to set environment variables like a source .env
Run go run cmd/pentagi/main.go to start the server

[!NOTE] The first run can take a while as dependencies and docker images need to be downloaded to setup the backend environment.

Frontend

Run the command(s) in frontend folder:

Run pnpm install to install the dependencies
Run pnpm run dev to run the web app
Run pnpm run build to build the web app

Open your browser and visit the web app URL.

Testing LLM Agents

The utility features parallel testing of multiple agents, detailed reporting, and flexible configuration options.

Key Features

Parallel Testing: Tests multiple agents simultaneously for faster results
Comprehensive Test Suite: Evaluates basic completion, JSON responses, function calling, and penetration testing knowledge
Detailed Reporting: Generates markdown reports with success rates and performance metrics
Flexible Configuration: Test specific agents or test groups as needed
Specialized Test Groups: Includes domain-specific tests for cybersecurity and penetration testing scenarios

Usage Scenarios

For Developers (with local Go environment)

If you've cloned the repository and have Go installed:

# Default configuration with .env file
cd backend
go run cmd/ctester/*.go -verbose

# Custom provider configuration
go run cmd/ctester/*.go -config ../examples/configs/openrouter.provider.yml -verbose

# Generate a report file
go run cmd/ctester/*.go -config ../examples/configs/deepinfra.provider.yml -report ../test-report.md

# Test specific agent types only
go run cmd/ctester/*.go -agents simple,simple_json,primary_agent -verbose

# Test specific test groups only
go run cmd/ctester/*.go -groups basic,advanced -verbose

For Users (using Docker image)

If you prefer to use the pre-built Docker image without setting up a development environment:

# Using Docker to test with default environment
docker run --rm -v $(pwd)/.env:/opt/pentagi/.env vxcontrol/pentagi /opt/pentagi/bin/ctester -verbose

# Test with your custom provider configuration
docker run --rm \
  -v $(pwd)/.env:/opt/pentagi/.env \
  -v $(pwd)/my-config.yml:/opt/pentagi/config.yml \
  vxcontrol/pentagi /opt/pentagi/bin/ctester -config /opt/pentagi/config.yml -agents simple,primary_agent,coder -verbose

# Generate a detailed report
docker run --rm \
  -v $(pwd)/.env:/opt/pentagi/.env \
  -v $(pwd):/opt/pentagi/output \
  vxcontrol/pentagi /opt/pentagi/bin/ctester -report /opt/pentagi/output/report.md

Using Pre-configured Providers

# Test with OpenRouter configuration
docker exec -it pentagi /opt/pentagi/bin/ctester -config /opt/pentagi/conf/openrouter.provider.yml

# Test with DeepInfra configuration
docker exec -it pentagi /opt/pentagi/bin/ctester -config /opt/pentagi/conf/deepinfra.provider.yml

# Test with DeepSeek configuration
docker exec -it pentagi /opt/pentagi/bin/ctester -provider deepseek

# Test with GLM configuration
docker exec -it pentagi /opt/pentagi/bin/ctester -provider glm

# Test with Kimi configuration
docker exec -it pentagi /opt/pentagi/bin/ctester -provider kimi

# Test with Qwen configuration
docker exec -it pentagi /opt/pentagi/bin/ctester -provider qwen

# Test with DeepSeek configuration file for custom provider
docker exec -it pentagi /opt/pentagi/bin/ctester -config /opt/pentagi/conf/deepseek.provider.yml

# Test with Moonshot configuration file for custom provider
docker exec -it pentagi /opt/pentagi/bin/ctester -config /opt/pentagi/conf/moonshot.provider.yml

# Test with Novita configuration
docker exec -it pentagi /opt/pentagi/bin/ctester -config /opt/pentagi/conf/novita.provider.yml

# Test with OpenAI configuration
docker exec -it pentagi /opt/pentagi/bin/ctester -type openai

# Test with Anthropic configuration
docker exec -it pentagi /opt/pentagi/bin/ctester -type anthropic

# Test with Gemini configuration
docker exec -it pentagi /opt/pentagi/bin/ctester -type gemini

# Test with AWS Bedrock configuration
docker exec -it pentagi /opt/pentagi/bin/ctester -type bedrock

# Test with Custom OpenAI configuration
docker exec -it pentagi /opt/pentagi/bin/ctester -config /opt/pentagi/conf/custom-openai.provider.yml

# Test with Ollama configuration (local inference)
docker exec -it pentagi /opt/pentagi/bin/ctester -config /opt/pentagi/conf/ollama-llama318b.provider.yml

# Test with Ollama Qwen3 32B configuration (requires custom model creation)
docker exec -it pentagi /opt/pentagi/bin/ctester -config /opt/pentagi/conf/ollama-qwen332b-fp16-tc.provider.yml

# Test with Ollama QwQ 32B configuration (requires custom model creation and 71.3GB VRAM)
docker exec -it pentagi /opt/pentagi/bin/ctester -config /opt/pentagi/conf/ollama-qwq32b-fp16-tc.provider.yml

To use these configurations, your .env file only needs to contain:

LLM_SERVER_URL=https://openrouter.ai/api/v1      # or https://api.deepinfra.com/v1/openai or https://api.openai.com/v1 or https://api.novita.ai/openai
LLM_SERVER_KEY=your_api_key
LLM_SERVER_MODEL=                                # Leave empty, as models are specified in the config
LLM_SERVER_CONFIG_PATH=/opt/pentagi/conf/openrouter.provider.yml  # or deepinfra.provider.ymll or custom-openai.provider.yml or novita.provider.yml
LLM_SERVER_PROVIDER=                             # Provider name for LiteLLM proxy (e.g., openrouter, deepseek, moonshot, novita)
LLM_SERVER_LEGACY_REASONING=false                # Controls reasoning format, for OpenAI must be true (default: false)
LLM_SERVER_PRESERVE_REASONING=false              # Preserve reasoning content in multi-turn conversations (required by Moonshot, default: false)

# For OpenAI (official API)
OPEN_AI_KEY=your_openai_api_key                  # Your OpenAI API key
OPEN_AI_SERVER_URL=https://api.openai.com/v1     # OpenAI API endpoint

# For Anthropic (Claude models)
ANTHROPIC_API_KEY=your_anthropic_api_key         # Your Anthropic API key
ANTHROPIC_SERVER_URL=https://api.anthropic.com/v1  # Anthropic API endpoint

# For Gemini (Google AI)
GEMINI_API_KEY=your_gemini_api_key               # Your Google AI API key
GEMINI_SERVER_URL=https://generativelanguage.googleapis.com  # Google AI API endpoint

# For AWS Bedrock (enterprise foundation models)
BEDROCK_REGION=us-east-1                         # AWS region for Bedrock service
# Authentication (choose one method, priority: DefaultAuth > BearerToken > AccessKey):
BEDROCK_DEFAULT_AUTH=false                       # Use AWS SDK credential chain (env vars, EC2 role, ~/.aws/credentials)
BEDROCK_BEARER_TOKEN=                            # Bearer token authentication (takes priority over static credentials)
BEDROCK_ACCESS_KEY_ID=your_aws_access_key        # AWS access key ID (static credentials)
BEDROCK_SECRET_ACCESS_KEY=your_aws_secret_key    # AWS secret access key (static credentials)
BEDROCK_SESSION_TOKEN=                           # AWS session token (optional, for temporary credentials with static auth)
BEDROCK_SERVER_URL=                              # Optional custom Bedrock endpoint (VPC endpoints, local testing)

# For Ollama (local server or cloud)
OLLAMA_SERVER_URL=                               # Local: http://ollama-server:11434, Cloud: https://ollama.com
OLLAMA_SERVER_API_KEY=                           # Required for Ollama Cloud (https://ollama.com/settings/keys), leave empty for local
OLLAMA_SERVER_MODEL=
OLLAMA_SERVER_CONFIG_PATH=
OLLAMA_SERVER_PULL_MODELS_TIMEOUT=
OLLAMA_SERVER_PULL_MODELS_ENABLED=
OLLAMA_SERVER_LOAD_MODELS_ENABLED=

# For DeepSeek (Chinese AI with strong reasoning)
DEEPSEEK_API_KEY=                                # DeepSeek API key
DEEPSEEK_SERVER_URL=https://api.deepseek.com     # DeepSeek API endpoint
DEEPSEEK_PROVIDER=                               # Optional: LiteLLM prefix (e.g., 'deepseek')

# For GLM (Zhipu AI)
GLM_API_KEY=                                     # GLM API key
GLM_SERVER_URL=https://api.z.ai/api/paas/v4      # GLM API endpoint (international)
GLM_PROVIDER=                                    # Optional: LiteLLM prefix (e.g., 'zai')

# For Kimi (Moonshot AI)
KIMI_API_KEY=                                    # Kimi API key
KIMI_SERVER_URL=https://api.moonshot.ai/v1       # Kimi API endpoint (international)
KIMI_PROVIDER=                                   # Optional: LiteLLM prefix (e.g., 'moonshot')

# For Qwen (Alibaba Cloud DashScope)
QWEN_API_KEY=                                    # Qwen API key
QWEN_SERVER_URL=https://dashscope-us.aliyuncs.com/compatible-mode/v1  # Qwen API endpoint (US)
QWEN_PROVIDER=                                   # Optional: LiteLLM prefix (e.g., 'dashscope')

# For Ollama (local inference) use variables above
OLLAMA_SERVER_URL=http://localhost:11434
OLLAMA_SERVER_MODEL=llama3.1:8b-instruct-q8_0
OLLAMA_SERVER_CONFIG_PATH=/opt/pentagi/conf/ollama-llama318b.provider.yml
OLLAMA_SERVER_PULL_MODELS_ENABLED=false
OLLAMA_SERVER_LOAD_MODELS_ENABLED=false

Using OpenAI with Unverified Organizations

For OpenAI accounts with unverified organizations that don't have access to the latest reasoning models (o1, o3, o4-mini), you need to use a custom configuration.

To use OpenAI with unverified organization accounts, configure your .env file as follows:

LLM_SERVER_URL=https://api.openai.com/v1
LLM_SERVER_KEY=your_openai_api_key
LLM_SERVER_MODEL=                                # Leave empty, models are specified in config
LLM_SERVER_CONFIG_PATH=/opt/pentagi/conf/custom-openai.provider.yml
LLM_SERVER_LEGACY_REASONING=true                 # Required for OpenAI reasoning format

You can test this configuration using:

# Test with custom OpenAI configuration for unverified accounts
docker exec -it pentagi /opt/pentagi/bin/ctester -config /opt/pentagi/conf/custom-openai.provider.yml

[!NOTE] The LLM_SERVER_LEGACY_REASONING=true setting is crucial for OpenAI compatibility as it ensures reasoning parameters are sent in the format expected by OpenAI's API.

Using LiteLLM Proxy

# Direct access to Moonshot API
LLM_SERVER_URL=https://api.moonshot.ai/v1
LLM_SERVER_KEY=your_moonshot_api_key
LLM_SERVER_CONFIG_PATH=/opt/pentagi/conf/moonshot.provider.yml
LLM_SERVER_PROVIDER=                             # Empty for direct access

# Access via LiteLLM proxy
LLM_SERVER_URL=http://litellm-proxy:4000
LLM_SERVER_KEY=your_litellm_api_key
LLM_SERVER_CONFIG_PATH=/opt/pentagi/conf/moonshot.provider.yml
LLM_SERVER_PROVIDER=moonshot                     # Provider prefix for LiteLLM

With LLM_SERVER_PROVIDER=moonshot, the system automatically prefixes all model names from the configuration file with moonshot/, making them compatible with LiteLLM's model naming convention.

LiteLLM Provider Name Mapping:

When using LiteLLM proxy, set the corresponding *_PROVIDER variable to enable model prefixing:

deepseek - for DeepSeek models (DEEPSEEK_PROVIDER=deepseek → deepseek/deepseek-v4-flash)
zai - for GLM models (GLM_PROVIDER=zai → zai/glm-4)
moonshot - for Kimi models (KIMI_PROVIDER=moonshot → moonshot/kimi-k2.5)
dashscope - for Qwen models (QWEN_PROVIDER=dashscope → dashscope/qwen-plus)
openai, anthropic, gemini - for major cloud providers
openrouter - for OpenRouter aggregator
deepinfra - for DeepInfra hosting
novita - for Novita AI
Any other provider name configured in your LiteLLM instance

Example with LiteLLM:

# Use DeepSeek models via LiteLLM proxy with model prefixing
DEEPSEEK_API_KEY=your_litellm_proxy_key
DEEPSEEK_SERVER_URL=http://litellm-proxy:4000
DEEPSEEK_PROVIDER=deepseek  # Models become deepseek/deepseek-v4-flash, deepseek/deepseek-v4-pro for LiteLLM

# Direct DeepSeek API usage (no prefix needed)
DEEPSEEK_API_KEY=your_deepseek_api_key
DEEPSEEK_SERVER_URL=https://api.deepseek.com
# Leave DEEPSEEK_PROVIDER empty

This approach allows you to:

Use the same configuration files for both direct and proxied access
Switch between providers without modifying configuration files
Easily test different routing strategies with LiteLLM

Running Tests in a Production Environment

If you already have a running PentAGI container and want to test the current configuration:

# Run ctester in an existing container using current environment variables
docker exec -it pentagi /opt/pentagi/bin/ctester -verbose

# Test specific agent types with deterministic ordering
docker exec -it pentagi /opt/pentagi/bin/ctester -agents simple,primary_agent,pentester -groups basic,knowledge -verbose

# Generate a report file inside the container
docker exec -it pentagi /opt/pentagi/bin/ctester -report /opt/pentagi/data/agent-test-report.md

# Access the report from the host
docker cp pentagi:/opt/pentagi/data/agent-test-report.md ./

Command-line Options

The utility accepts several options:

-env <path> - Path to environment file (default: .env)
-type <provider> - Provider type: custom, openai, anthropic, ollama, bedrock, gemini (default: custom)
-config <path> - Path to custom provider config (default: from LLM_SERVER_CONFIG_PATH env variable)
-tests <path> - Path to custom tests YAML file (optional)
-report <path> - Path to write the report file (optional)
-agents <list> - Comma-separated list of agent types to test (default: all)
-groups <list> - Comma-separated list of test groups to run (default: all)
-verbose - Enable verbose output with detailed test results for each agent

Available Agent Types

Agents are tested in the following deterministic order:

simple - Basic completion tasks
simple_json - JSON-structured responses
primary_agent - Main reasoning agent
assistant - Interactive assistant mode
generator - Content generation
refiner - Content refinement and improvement
adviser - Expert advice and consultation
reflector - Self-reflection and analysis
searcher - Information gathering and search
enricher - Data enrichment and expansion
coder - Code generation and analysis
installer - Installation and setup tasks
pentester - Penetration testing and security assessment

Available Test Groups

basic - Fundamental completion and prompt response tests
advanced - Complex reasoning and function calling tests
json - JSON format validation and structure tests (specifically designed for simple_json agent)
knowledge - Domain-specific cybersecurity and penetration testing knowledge tests

Note: The json test group is specifically designed for the simple_json agent type, while all other agents are tested with basic, advanced, and knowledge groups. This specialization ensures optimal testing coverage for each agent's intended purpose.

Example Provider Configuration

Provider configuration defines which models to use for different agent types:

simple:
  model: "provider/model-name"
  temperature: 0.7
  top_p: 0.95
  n: 1
  max_tokens: 4000

simple_json:
  model: "provider/model-name"
  temperature: 0.7
  top_p: 1.0
  n: 1
  max_tokens: 4000
  json: true

# ... other agent types ...

Optimization Workflow

Create a baseline: Run tests with default configuration to establish benchmark performance
Analyze agent-specific performance: Review the deterministic agent ordering to identify underperforming agents
Test specialized configurations: Experiment with different models for each agent type using provider-specific configs
Focus on domain knowledge: Pay special attention to knowledge group tests for cybersecurity expertise
Validate function calling: Ensure tool-based tests pass consistently for critical agent types
Compare results: Look for the best success rate and performance across all test groups
Deploy optimal configuration: Use in production with your optimized setup

This tool helps ensure your AI agents are using the most effective models for their specific tasks, improving reliability while optimizing costs.

Embedding Configuration and Testing

Supported Embedding Providers

PentAGI supports the following embedding providers:

OpenAI (default): Uses OpenAI's text embedding models
Ollama: Local embedding model through Ollama
Mistral: Mistral AI's embedding models
Jina: Jina AI's embedding service
HuggingFace: Models from HuggingFace
GoogleAI: Google's embedding models
VoyageAI: VoyageAI's embedding models

OpenAI-compatible third parties: any provider exposing OpenAI's /embeddings API can be plugged in via EMBEDDING_PROVIDER=openai with a custom EMBEDDING_URL. For example, Qwen DashScope offers text-embedding-v4 through the /compatible-mode/v1 endpoint (International and Chinese Mainland regions only — the US region does not expose embeddings). See the Qwen Alternative Integrations subsection for the full configuration snippet.

Environment Variables

To configure the embedding provider, set the following environment variables in your .env file:

# Primary embedding configuration
EMBEDDING_PROVIDER=openai       # Provider type (openai, ollama, mistral, jina, huggingface, googleai, voyageai)
EMBEDDING_MODEL=text-embedding-3-small  # Model name to use
EMBEDDING_URL=                  # Optional custom API endpoint
EMBEDDING_KEY=                  # API key for the provider (if required)
EMBEDDING_BATCH_SIZE=100        # Number of documents to process in a batch
EMBEDDING_STRIP_NEW_LINES=true  # Whether to remove new lines from text before embedding
EMBEDDING_MAX_TEXT_BYTES=8192   # Max bytes of text sent to embedding model per document (byte proxy for token limit)

# Advanced settings
PROXY_URL=                      # Optional proxy for all API calls
HTTP_CLIENT_TIMEOUT=600         # Timeout in seconds for external API calls (default: 600, 0 = no timeout)
TERMINAL_TOOL_TIMEOUT=1200      # Default timeout in seconds for terminal tool commands when timeout=0 or negative (range: 1–10800; values <= 0 or above 10800 are clamped to 10800 = 3 hours)

# SSL/TLS Certificate Configuration (for external communication with LLM backends and tool servers)
EXTERNAL_SSL_CA_PATH=           # Path to custom CA certificate file (PEM format) inside the container
                                # Must point to /opt/pentagi/ssl/ directory (e.g., /opt/pentagi/ssl/ca-bundle.pem)
EXTERNAL_SSL_INSECURE=false     # Skip certificate verification (use only for testing)

If you see this error: tls: failed to verify certificate: x509: certificate signed by unknown authority

Step 1: Get your CA certificate bundle in PEM format (can contain multiple certificates)

Step 2: Place the file in the SSL directory on your host machine:

# Default location (if PENTAGI_SSL_DIR is not set)
cp ca-bundle.pem ./pentagi-ssl/

# Or custom location (if using PENTAGI_SSL_DIR in docker-compose.yml)
cp ca-bundle.pem /path/to/your/ssl/dir/

Step 3: Set the path in .env file (path must be inside the container):

# The volume pentagi-ssl is mounted to /opt/pentagi/ssl inside the container
EXTERNAL_SSL_CA_PATH=/opt/pentagi/ssl/ca-bundle.pem
EXTERNAL_SSL_INSECURE=false

Step 4: Restart PentAGI:

docker compose restart pentagi

Notes:

The pentagi-ssl volume is mounted to /opt/pentagi/ssl inside the container
You can change host directory using PENTAGI_SSL_DIR variable in docker-compose.yml
File supports multiple certificates and intermediate CAs in one PEM file
Use EXTERNAL_SSL_INSECURE=true only for testing (not recommended for production)

Provider-Specific Limitations

Each provider has specific limitations and supported features:

OpenAI: Supports all configuration options
Ollama: Does not support EMBEDDING_KEY as it uses local models
Mistral: Does not support EMBEDDING_MODEL or custom HTTP client
Jina: Does not support custom HTTP client
HuggingFace: Requires EMBEDDING_KEY and supports all other options
GoogleAI: Does not support EMBEDDING_URL, requires EMBEDDING_KEY
VoyageAI: Supports all configuration options

If EMBEDDING_URL and EMBEDDING_KEY are not specified, the system will attempt to use the corresponding LLM provider settings (e.g., OPEN_AI_KEY when EMBEDDING_PROVIDER=openai).

Why Consistent Embedding Providers Matter

It's crucial to use the same embedding provider consistently because:

Vector Compatibility: Different providers produce vectors with different dimensions and mathematical properties
Semantic Consistency: Changing providers can break semantic similarity between previously embedded documents
Memory Corruption: Mixed embeddings can lead to poor search results and broken knowledge base functionality

If you change your embedding provider, you should flush and reindex your entire knowledge base (see etester utility below).

Embedding Tester Utility (etester)

# Test embedding provider and database connection
cd backend
go run cmd/etester/main.go test -verbose

# Show statistics about the embedding database
go run cmd/etester/main.go info

# Delete all documents from the embedding database (use with caution!)
go run cmd/etester/main.go flush

# Recalculate embeddings for all documents (after changing provider)
go run cmd/etester/main.go reindex

# Search for documents in the embedding database
go run cmd/etester/main.go search -query "How to install PostgreSQL" -limit 5

Using Docker

If you're running PentAGI in Docker, you can use etester from within the container:

# Test embedding provider
docker exec -it pentagi /opt/pentagi/bin/etester test

# Show detailed database information
docker exec -it pentagi /opt/pentagi/bin/etester info -verbose

Advanced Search Options

The search command supports various filters to narrow down results:

# Filter by document type
docker exec -it pentagi /opt/pentagi/bin/etester search -query "Security vulnerability" -doc_type guide -threshold 0.8

# Filter by flow ID
docker exec -it pentagi /opt/pentagi/bin/etester search -query "Code examples" -doc_type code -flow_id 42

# All available search options
docker exec -it pentagi /opt/pentagi/bin/etester search -help

Available search parameters:

-query STRING: Search query text (required)
-doc_type STRING: Filter by document type (answer, memory, guide, code)
-flow_id NUMBER: Filter by flow ID (positive number)
-answer_type STRING: Filter by answer type (guide, vulnerability, code, tool, other)
-guide_type STRING: Filter by guide type (install, configure, use, pentest, development, other)
-limit NUMBER: Maximum number of results (default: 3)
-threshold NUMBER: Similarity threshold (0.0-1.0, default: 0.7)

Memory Lifecycle Across Flows

PentAGI stores several kinds of vector documents, and they serve different purposes:

memory captures flow-specific execution history such as tool results and agent observations
guide, answer, and code are intended for reusable knowledge that can help future runs

Common Troubleshooting Scenarios

After changing embedding provider: Always run flush or reindex to ensure consistency
Poor search results: Try adjusting the similarity threshold or check if embeddings are correctly generated
Database connection issues: Verify PostgreSQL is running with pgvector extension installed
Missing API keys: Check environment variables for your chosen embedding provider

Function Testing with ftester

Key Features

Direct Function Access: Test individual functions without running the entire system
Mock Mode: Test functions without a live PentAGI deployment using built-in mocks
Interactive Input: Fill function arguments interactively for exploratory testing
Detailed Output: Color-coded terminal output with formatted responses and errors
Context-Aware Testing: Debug AI agents within the context of specific flows, tasks, and subtasks
Observability Integration: All function calls are logged to Langfuse and Observability stack

Usage Modes

Command Line Arguments

Run ftester with specific function and arguments directly from the command line:

# Basic usage with mock mode
cd backend
go run cmd/ftester/main.go [function_name] -[arg1] [value1] -[arg2] [value2]

# Example: Test terminal command in mock mode
go run cmd/ftester/main.go terminal -command "ls -la" -message "List files"

# Using a real flow context
go run cmd/ftester/main.go -flow 123 terminal -command "whoami" -message "Check user"

# Testing AI agent in specific task/subtask context
go run cmd/ftester/main.go -flow 123 -task 456 -subtask 789 pentester -message "Find vulnerabilities"

Interactive Mode

Run ftester without arguments for a guided interactive experience:

# Start interactive mode
go run cmd/ftester/main.go [function_name]

# For example, to interactively fill browser tool arguments
go run cmd/ftester/main.go browser

Environment Functions

terminal: Execute commands in a container and return the output
file: Perform file operations (read, write, list) in a container

Search Functions

browser: Access websites and capture screenshots
google: Search the web using Google Custom Search
duckduckgo: Search the web using DuckDuckGo
tavily: Search using Tavily AI search engine
traversaal: Search using Traversaal AI search engine
perplexity: Search using Perplexity AI
sploitus: Search for security exploits, vulnerabilities (CVEs), and pentesting tools
searxng: Search using Searxng meta search engine (aggregates results from multiple engines)

Vector Database Functions

search_in_memory: Search for information in vector database
search_guide: Find guidance documents in vector database
search_answer: Find answers to questions in vector database
search_code: Find code examples in vector database

AI Agent Functions

advice: Get expert advice from an AI agent
coder: Request code generation or modification
maintenance: Run system maintenance tasks
memorist: Store and organize information in vector database
pentester: Perform security tests and vulnerability analysis
search: Complex search across multiple sources

Utility Functions

describe: Show information about flows, tasks, and subtasks

The describe function provides detailed information about tasks and subtasks within a flow. This is particularly useful for diagnosing issues when PentAGI encounters problems or gets stuck.

# List all flows in the system
go run cmd/ftester/main.go describe

# Show all tasks and subtasks for a specific flow
go run cmd/ftester/main.go -flow 123 describe

# Show detailed information for a specific task
go run cmd/ftester/main.go -flow 123 -task 456 describe

# Show detailed information for a specific subtask
go run cmd/ftester/main.go -flow 123 -task 456 -subtask 789 describe

# Show verbose output with full descriptions and results
go run cmd/ftester/main.go -flow 123 describe -verbose

This function allows you to identify the exact point where a flow might be stuck and resume processing by directly invoking the appropriate agent function.

Each function has a help mode that shows available parameters:

# Get help for a specific function
go run cmd/ftester/main.go [function_name] -help

# Examples:
go run cmd/ftester/main.go terminal -help
go run cmd/ftester/main.go browser -help
go run cmd/ftester/main.go describe -help

You can also run ftester without arguments to see a list of all available functions:

go run cmd/ftester/main.go

The ftester utility uses color-coded output to make interpretation easier:

Blue headers: Section titles and key names
Cyan [INFO]: General information messages
Green [SUCCESS]: Successful operations
Red [ERROR]: Error messages
Yellow [WARNING]: Warning messages
Yellow [MOCK]: Indicates mock mode operation
Magenta values: Function arguments and results

JSON and Markdown responses are automatically formatted for readability.

Debugging Stuck AI Flows

When PentAGI gets stuck in a flow:

Pause the flow through the UI
Use describe to identify the current task and subtask
Directly invoke the agent function with the same task/subtask IDs
Examine the detailed output to identify the issue
Resume the flow or manually intervene as needed

Testing Environment Variables

Verify that API keys and external services are configured correctly:

# Test Google search API configuration
go run cmd/ftester/main.go google -query "pentesting tools"

# Test browser access to external websites
go run cmd/ftester/main.go browser -url "https://example.com"

Developing New AI Agent Behaviors

When developing new prompt templates or agent behaviors:

Create a test flow in the UI
Use ftester to directly invoke the agent with different prompts
Observe responses and adjust prompts accordingly
Check Langfuse for detailed traces of all function calls

Verifying Docker Container Setup

Ensure containers are properly configured:

go run cmd/ftester/main.go -flow 123 terminal -command "env | grep -i proxy" -message "Check proxy settings"

If you have PentAGI running in Docker, you can use ftester from within the container:

# Run ftester inside the running PentAGI container
docker exec -it pentagi /opt/pentagi/bin/ftester [arguments]

# Examples:
docker exec -it pentagi /opt/pentagi/bin/ftester -flow 123 describe
docker exec -it pentagi /opt/pentagi/bin/ftester -flow 123 terminal -command "ps aux" -message "List processes"

This is particularly useful for production deployments where you don't have a local development environment.

All function calls made through ftester are logged to:

Langfuse: Captures the entire AI agent interaction chain, including prompts, responses, and function calls
OpenTelemetry: Records metrics, traces, and logs for system performance analysis
Terminal Output: Provides immediate feedback on function execution

To access detailed logs:

Check Langfuse UI for AI agent traces (typically at http://localhost:4000)
Use Grafana dashboards for system metrics (typically at http://localhost:3000)
Examine terminal output for immediate function results and errors

Command-line Options

The main utility accepts several options:

-env <path> - Path to environment file (optional, default: .env)
-provider <type> - Provider type to use (default: custom, options: openai, anthropic, ollama, bedrock, gemini, custom)
-flow <id> - Flow ID for testing (0 means using mocks, default: 0)
-task <id> - Task ID for agent context (optional)
-subtask <id> - Subtask ID for agent context (optional)

Function-specific arguments are passed after the function name using -name value format.

Pentesting Prompt Methodology

When refining prompts for offensive security work, give the agent a clear methodology instead of a flat list of payloads:

Start with explicit scope, authorization, and success criteria
Map the application first: roles, routes, parameters, uploads, integrations, and trust boundaries
Prioritize attack surfaces systematically instead of testing everything at once
Validate findings with reproducible evidence before escalating to deeper exploitation
Finish with report-ready notes that capture impact, prerequisites, and next steps

Building

Building Docker Image

The Docker build process automatically embeds version information from git tags. To properly version your build, use the provided scripts:

Linux/macOS

# Load version variables
source ./scripts/version.sh

# Standard build
docker build \
  --build-arg PACKAGE_VER=$PACKAGE_VER \
  --build-arg PACKAGE_REV=$PACKAGE_REV \
  -t pentagi:$PACKAGE_VER .

# Multi-platform build
docker buildx build \
  --platform linux/amd64,linux/arm64 \
  --build-arg PACKAGE_VER=$PACKAGE_VER \
  --build-arg PACKAGE_REV=$PACKAGE_REV \
  -t pentagi:$PACKAGE_VER .

# Build and push
docker buildx build \
  --platform linux/amd64,linux/arm64 \
  --build-arg PACKAGE_VER=$PACKAGE_VER \
  --build-arg PACKAGE_REV=$PACKAGE_REV \
  -t myregistry/pentagi:$PACKAGE_VER \
  --push .

Windows (PowerShell)

# Load version variables
. .\scripts\version.ps1

# Standard build
docker build `
  --build-arg PACKAGE_VER=$env:PACKAGE_VER `
  --build-arg PACKAGE_REV=$env:PACKAGE_REV `
  -t pentagi:$env:PACKAGE_VER .

# Multi-platform build
docker buildx build `
  --platform linux/amd64,linux/arm64 `
  --build-arg PACKAGE_VER=$env:PACKAGE_VER `
  --build-arg PACKAGE_REV=$env:PACKAGE_REV `
  -t pentagi:$env:PACKAGE_VER .

Quick build without version

For development builds without version tracking:

docker build -t pentagi:dev .

[!NOTE]
The build scripts automatically determine version from git tags
Release builds (on tag commit) have no revision suffix
Development builds (after tag) include commit hash as revision (e.g., 1.1.0-bc6e800)
To use the built image locally, update the image name in docker-compose.yml or use the build option

Credits

This project is made possible thanks to the following research and developments:

Emerging Architectures for LLM Applications
A Survey of Autonomous LLM Agents
Codel by Andriy Semenets - initial architectural inspiration for agent-based automation

License

PentAGI is licensed under the MIT License.

Third-Party Dependencies

All third-party dependencies use MIT-compatible licenses. See licenses/ directory for detailed license reports.

VXControl Cloud Services

The SDK code itself is free to use - service access requires registration.

For questions contact: info@pentagi.com or info@vxcontrol.com

View on GitHub

On this page

PentAGI
Table of Contents
Overview
Features
Current Capability Boundaries
Architecture
System Context
Global Summarizer Configuration Options
Assistant Summarizer Configuration Options
Summarizer Environment Configuration
Execution Monitoring (Beta)
Intelligent Task Planning (Beta)
Tool Call Limits (Always Active)
Reflector Integration (Always Active)
Recommendations for Open Source Models
Quick Start
System Requirements
Using Installer (Recommended)
Current Web Settings Coverage
Still Server-Managed
Manual Installation
Main Security Settings
Scraper Access
Access Credentials
Web UI Accounts
Accessing PentAGI from External Networks
Configuration Steps
Running PentAGI with Podman
Podman Rootless Configuration
Podman Rootful Mode
Docker Compatibility
Assistant Configuration
How to Use PentAGI After Login
1. Create your first flow
2. Use templates for repeatable workflows
3. Monitor execution and review output
4. Use the Assistant view to steer an active flow
5. Manage flow-scoped files
API Access
Generating API Tokens
Using API Tokens
API Exploration and Testing
GraphQL Playground
Swagger UI
Generating API Clients
GraphQL Clients
REST API Clients
API Usage Examples
Security Best Practices
Token Management
Custom LLM Provider Configuration
Ollama Provider Configuration
Configuration Variables
Ollama Cloud Configuration
Local Ollama Configuration
Creating Custom Ollama Models with Extended Context
Example: Qwen3 32B FP16 with Extended Context
Example: QwQ 32B FP16 with Extended Context
OpenAI Provider Configuration
Configuration Variables
Configuration Examples
Supported Models
Anthropic Provider Configuration
Configuration Variables
Configuration Examples
Supported Models
Google AI (Gemini) Provider Configuration
Configuration Variables
Configuration Examples
Supported Models
AWS Bedrock Provider Configuration
Configuration Variables
Configuration Examples
Supported Models
Tested but Incompatible Models
DeepSeek Provider Configuration
Configuration Variables
Configuration Examples
Supported Models
GLM Provider Configuration
Configuration Variables
Configuration Examples
Supported Models
Kimi Provider Configuration
Configuration Variables
Configuration Examples
Supported Models
Qwen Provider Configuration
Configuration Variables
Configuration Examples
Supported Models
Alternative Integrations
Advanced Setup
Langfuse Integration
Database Credentials
Encryption and Security Keys
Admin Credentials
API Keys and Tokens
S3 Storage
Monitoring and Observability
Knowledge Graph Integration (Graphiti)
What is Graphiti?
Enabling Graphiti
What Gets Stored
Current Limitations
GitHub and Google OAuth Integration
Docker Image Configuration
Restricted Networks, Docker Mirrors, and Proxies
Development
Development Requirements
Environment Setup
Backend Setup
Testing
Frontend Setup
Backend Configuration
PostgreSQL / pgvector connection pool sizing
Frontend Configuration
Running the Application
Backend
Frontend
Testing LLM Agents
Key Features
Usage Scenarios
For Developers (with local Go environment)
For Users (using Docker image)
Using Pre-configured Providers
Using OpenAI with Unverified Organizations
Using LiteLLM Proxy
Running Tests in a Production Environment
Command-line Options
Available Agent Types
Available Test Groups
Example Provider Configuration
Optimization Workflow
Embedding Configuration and Testing
Supported Embedding Providers
Environment Variables
Provider-Specific Limitations
Why Consistent Embedding Providers Matter
Embedding Tester Utility (etester)
Using Docker
Advanced Search Options
Memory Lifecycle Across Flows
Common Troubleshooting Scenarios
Function Testing with ftester
Key Features
Usage Modes
Command Line Arguments
Interactive Mode
Environment Functions
Search Functions
Vector Database Functions
AI Agent Functions
Utility Functions
Debugging Stuck AI Flows
Testing Environment Variables
Developing New AI Agent Behaviors
Verifying Docker Container Setup
Command-line Options
Pentesting Prompt Methodology
Building
Building Docker Image
Linux/macOS
Windows (PowerShell)
Quick build without version
Credits
License
Third-Party Dependencies
VXControl Cloud Services

Production readiness

PentAGI

Table of Contents

Overview

Features

Current Capability Boundaries

Architecture

System Context

Global Summarizer Configuration Options

Assistant Summarizer Configuration Options

Summarizer Environment Configuration

Execution Monitoring (Beta)

Intelligent Task Planning (Beta)

Tool Call Limits (Always Active)

Reflector Integration (Always Active)

Recommendations for Open Source Models

Quick Start

System Requirements

Using Installer (Recommended)

Current Web Settings Coverage

Still Server-Managed

Manual Installation

Main Security Settings

Scraper Access

Access Credentials

Web UI Accounts

Accessing PentAGI from External Networks

Configuration Steps

Running PentAGI with Podman

Podman Rootless Configuration

Podman Rootful Mode

Docker Compatibility

Assistant Configuration

How to Use PentAGI After Login

1. Create your first flow

2. Use templates for repeatable workflows

3. Monitor execution and review output

4. Use the Assistant view to steer an active flow

5. Manage flow-scoped files

API Access

Generating API Tokens

Using API Tokens

API Exploration and Testing

GraphQL Playground

Swagger UI

Generating API Clients

GraphQL Clients

REST API Clients

API Usage Examples

Security Best Practices

Token Management

Custom LLM Provider Configuration

Ollama Provider Configuration

Configuration Variables

Ollama Cloud Configuration

Local Ollama Configuration

Creating Custom Ollama Models with Extended Context

Example: Qwen3 32B FP16 with Extended Context

Example: QwQ 32B FP16 with Extended Context

OpenAI Provider Configuration

Configuration Variables

Configuration Examples

Supported Models

Anthropic Provider Configuration

Configuration Variables

Configuration Examples

Supported Models

Google AI (Gemini) Provider Configuration

Configuration Variables

Configuration Examples

Supported Models

AWS Bedrock Provider Configuration

Configuration Variables

Configuration Examples

Supported Models

Tested but Incompatible Models

DeepSeek Provider Configuration

Configuration Variables

Configuration Examples

Supported Models