AI Distiller (aid
)
> Note: This is the very first version of this tool. We would be very grateful for any feedback in the form of a discussion or by creating an issue on GitHub. Thank you!
🚀 MCP Server Available: Install the Model Context Protocol server for AI Distiller from NPM: @janreges/ai-distiller-mcp
- seamlessly integrate with Claude, Cursor, and other MCP-compatible AI tools!
Detection priority:
.aidrc
file - Create this empty file to explicitly mark your project root- Language markers -
go.mod
,package.json
,pyproject.toml
, etc. - Version control -
.git
directory - Environment variable -
AID_PROJECT_ROOT
(fallback if no markers found) - Current directory - Final fallback with warning
# Mark a specific directory as project root (recommended)
touch /my/project/.aidrc
# Run from anywhere in your project - outputs always go to project root
cd deep/nested/directory
aid ../../../src # Output: <project-root>/.aid/aid.src.txt
# Use environment variable as fallback (useful for CI/CD)
AID_PROJECT_ROOT=/build/workspace aid src/
🌍 Language Support
Currently supports 12 languages via tree-sitter:
- Full Support: Python, Go, JavaScript, PHP, Ruby
- Beta: TypeScript, Java, C#, Rust, Kotlin, Swift, C++
- Coming Soon: Zig, Scala, Clojure
Language-Specific Documentation:
- C++ - C++11/14/17/20 support with templates, namespaces, modern features
- C# - Complete C# 12 support with records, nullable reference types, pattern matching
- Go - Full Go support with interfaces, goroutines, generics (1.18+)
- Java - Java 8-21 support with records, sealed classes, pattern matching
- JavaScript - ES6+ support with classes, modules, async/await
- Kotlin - Kotlin 1.x support with coroutines, data classes, sealed classes
- PHP - PHP 7.4+ with PHP 8.x features (attributes, union types, enums)
- Python - Full Python 3.x support with type hints, async/await, decorators
- Ruby - Ruby 2.x/3.x support with blocks, modules, metaprogramming
- Rust - Rust 2018/2021 editions with traits, lifetimes, async
- Swift - Swift 5.x support with protocols, extensions, property wrappers
- TypeScript - TypeScript 4.x/5.x with generics, decorators, type system
🎯 How It Works
- Scans your codebase recursively for supported file types (10+ languages)
- Parses each file using language-specific tree-sitter parsers (all bundled, no dependencies)
- Extracts only what you need: public APIs, type signatures, class hierarchies
- Outputs in your preferred format: compact text, markdown, or structured JSON
All tree-sitter grammars are compiled into the aid
binary - zero external dependencies!
🚀 Transform Massive Codebases Into AI-Friendly Context
> The Problem: Modern codebases contain thousands of files with millions of lines. But for AI to understand your code architecture, suggest improvements, or help with development, it doesn't need to see every implementation detail - it needs the structure and public interfaces.
> The Solution: AI Distiller extracts only what matters - public APIs, types, and signatures - reducing codebase size by 90-98% while preserving all essential information for AI comprehension.
Project | Files | Original Tokens | Distilled Tokens | Fits in Context1 | Speed2 |
---|---|---|---|---|---|
⚛️ react | 1,781 | ~5.5M | 250K (-95%) | ✅ Gemini3 | 2,875 files/s |
🎨 vscode | 4,768 | ~22.5M | 2M (-91%) | ⚠️ Needs chunking | 5,072 files/s |
🐍 django | 970 | ~10M | 256K (-97%) | ✅ Gemini3 | 4,199 files/s |
📦 prometheus | 685 | ~8.5M | 154K (-98%) | ✅ Claude/Gemini | 3,071 files/s |
🦀 rust-analyzer | 1,275 | ~5.5M | 172K (-97%) | ✅ Claude/Gemini | 10,451 files/s |
🚀 astro | 1,058 | ~10.5M | 149K (-99%) | ✅ Claude/Gemini | 5,212 files/s |
💎 rails | 394 | ~1M | 104K (-90%) | ✅ ChatGPT-4o | 4,864 files/s |
🐘 laravel | 1,443 | ~3M | 238K (-92%) | ✅ Gemini3 | 4,613 files/s |
⚡ nestjs | 802 | ~1.5M | 107K (-93%) | ✅ ChatGPT-4o | 8,813 files/s |
👻 ghost | 2,184 | ~8M | 235K (-97%) | ✅ Gemini3 | 4,719 files/s |
2 Processing speed with 12 parallel workers on AMD Ryzen 7945HX. Use `-w 1` for serial mode or `-w N` for custom workers.
3 These frameworks exceed 200K tokens and work only with Gemini due to its larger 1M token context window.
🎯 Why This Matters for AI-Assisted Development
Large codebases are overwhelming for AI models. A typical web framework like Django has ~10 million tokens of source code. Even with Claude's 200K context window, you'd need to split it into 50+ chunks, losing coherence and relationships between components.
But here's the good news: Most real-world projects that teams have invested hundreds to thousands of hours developing are much smaller. Thanks to AI Distiller, the vast majority of typical business applications, SaaS products, and internal tools can fit entirely within AI context windows, enabling unprecedented AI assistance quality.
⚠️ The Hidden Problem with AI Coding Tools
Most AI agents and IDEs are "context misers" - they try to save tokens at the expense of actual codebase knowledge. They rely on:
- 🔍 Grep/search to find relevant code snippets
- 📄 Limited context showing only 10-50 lines around matches
- 🎲 Guessing interfaces based on partial information
This is why AI-generated code often fails on first attempts - the AI is literally guessing method signatures, parameter types, and return values because it can't see the full picture.
AI Distiller changes the game by giving AI complete knowledge of:
- ✅ Exact interfaces of all classes, methods, and functions
- ✅ All parameter types and their expected values
- ✅ Return types and data structures
- ✅ Full inheritance hierarchies and relationships
Instead of playing "code roulette", AI can now write correct code from the start.
Result: Django's 10M tokens compress to just 256K tokens - suddenly the entire framework fits in a single AI conversation, leading to:
- 🎯 More accurate suggestions - AI sees all available APIs at once
- 🚀 Less hallucination - No more inventing methods that don't exist
- 💡 Better architecture advice - AI understands the full system design
- ⚡ Faster development - Especially for "vibe coding" with AI assistance
- 💰 40x cost reduction - Pay for 256K tokens instead of 10M on API calls
🔧 Flexible for Different Use Cases
# Process entire codebase (default: public APIs only)
aid ./my-project
# Process specific directory or module
aid ./my-project/src/auth
aid ./my-project/src/api
# Process a directory
aid ./my-project/core/
# Process individual file
aid src/main.py
# Include protected/private for deeper analysis
aid ./my-project --private=1 --protected=1
# Include implementations for small projects
aid ./my-small-lib --implementation=1
# Everything for complete understanding
aid ./micro-service --private=1 --protected=1 --implementation=1
Granular Control: Process your entire codebase, specific modules, directories, or even individual files. Perfect for focusing AI on the exact context it needs - whether that's understanding the whole system architecture or diving deep into a specific authentication module.
📈 Full benchmark details | 🧪 Reproduce these results
🚀 Quick Start
One-Line Installation
macOS / Linux / WSL:
# Install to ~/.aid/bin (recommended, no sudo required)
curl -sSL https://raw.githubusercontent.com/janreges/ai-distiller/main/install.sh | bash
# Install to /usr/local/bin (requires sudo)
curl -sSL https://raw.githubusercontent.com/janreges/ai-distiller/main/install.sh | bash -s -- --sudo
Windows PowerShell:
iwr https://raw.githubusercontent.com/janreges/ai-distiller/main/install.ps1 -useb | iex
The installer will:
- Detect your OS and architecture automatically
- Download the appropriate pre-built binary
- Install to
~/.aid/bin/aid
by default (no sudo required) - Or to
/usr/local/bin/aid
with--sudo
flag - Guide you through PATH configuration if needed
Basic Usage
# Basic usage
aid . # Current directory, output is saved to file in ./aid
aid . --stdout # Current directory, output is printed to STDOUT
aid src/ # Specific directory
aid main.py # Specific file
📖 Example Output
Python Class Example
Input (car.py
):
class Car:
"""A car with basic attributes and methods."""
def __init__(self, make: str, model: str):
self.make = make
self.model = model
self._mileage = 0 # Private
def drive(self, distance: int) -> None:
"""Drive the car."""
if distance > 0:
self._mileage += distance
Output (aid car.py --format text --implementation=0
):
<file path="car.py">
class Car:
+def __init__(self, make: str, model: str)
+def drive(self, distance: int) -> None
</file>
TypeScript Interface Example
Input (api.ts
):
export interface User {
id: number;
name: string;
email?: string;
}
export class UserService {
private cache = new Map<number, user="">();
async getUser(id: number): Promise<user null="" |=""> {
return this.cache.get(id) || null;
}
}
Output (aid api.ts --format text --implementation=0
):
<file path="api.ts">
export interface User {
id: number;
name: string;
email?: string;
}
export class UserService {
+async getUser(id: number): Promise<user null="" |="">
}
</user></file>
</number,>
📖 Guides & Examples
Deep Code Analysis Prompt Generation
AI Distiller generates sophisticated analysis prompts that AI assistants can execute for comprehensive codebase understanding:
aid internal \
--private=1 --protected=1 --implementation=1 \
--ai-action=flow-for-deep-file-to-file-analysis
✅ AI Analysis Task List generated successfully!
📋 Task List: .aid/ANALYSIS-TASK-LIST.internal.2025-06-20.md
📊 Summary File: .aid/ANALYSIS-SUMMARY.internal.2025-06-20.md
📁 Analysis Reports Directory: .aid/analysis.internal/2025-06-20
🤖 Ready for AI-driven analysis workflow!
📂 Files to analyze: 158
💡 If you are an AI agent, please read the Task List above and carefully follow all instructions to systematically analyze each file.
What AI Distiller generates:
- 📋 Task list prompt - A structured checklist for AI to follow (
.aid/ANALYSIS-TASK-LIST.PROJECT.DATE.md
) - 🎯 Analysis instructions - Detailed prompts guiding AI through security, performance, and quality checks
- 📊 Code structure - Distilled code included in the prompt files for AI to analyze
- 📁 Directory structure - Pre-created folders where AI agents can save their analysis results
How to use the generated prompts:
- For AI agents: Direct the agent to read the generated task list file and follow instructions
- For web AI tools: Copy the content of generated files and paste into Gemini (best for large codebases due to 1M context)
- For small codebases: Use
--stdout
to get prompt directly without saving to file
Note: The analysis dimensions (Security, Performance, Maintainability, Readability) are part of the prompts that guide the AI - AI Distiller itself doesn't perform any analysis.
🤖 Use with Claude Code/Desktop (MCP)
AI Distiller now integrates seamlessly with Claude Code/Desktop through the Model Context Protocol (MCP), enabling AI agents to analyze and understand codebases directly within conversations.
# One-line installation
claude mcp add aid -- npx -y @janreges/ai-distiller-mcp
📦 NPM Package: @janreges/ai-distiller-mcp
- Full documentation and examples available
Available MCP Tools
🔍 Code Structure Tools:
distill_file
- Extract structure from a single filedistill_directory
- Extract structure from entire directorieslist_files
- Browse directories with file statisticsget_capabilities
- Get info about AI Distiller capabilities
🎯 Specialized AI Analysis Tools:
aid_hunt_bugs
- Generate bug-hunting prompts with distilled codeaid_suggest_refactoring
- Create refactoring analysis promptsaid_generate_diagram
- Produce diagram generation prompts (Mermaid)aid_analyze_security
- Generate security audit prompts (OWASP Top 10)aid_generate_docs
- Create documentation generation promptsaid_deep_file_analysis
- Systematic file-by-file analysis workflowaid_multi_file_docs
- Multi-file documentation workflowaid_complex_analysis
- Enterprise-grade analysis promptsaid_performance_analysis
- Performance optimization promptsaid_best_practices
- Code quality and best practices prompts
🔧 Core Analysis Engine:
aid_analyze
- Direct access to all AI actions for custom workflows
Important: AI Distiller generates analysis prompts with distilled code - it does NOT perform the actual analysis! The output is a specialized prompt + distilled code that AI agents (like Claude) then execute. For large codebases, you can copy the output to tools like Gemini 2.0 with 1M context window.
Smart Context Management: AI agents can analyze your entire project for understanding the big picture, then zoom into specific modules (auth, API, database) for detailed work. No more overwhelming AI with irrelevant code!
📖 Complete CLI Reference
Command Synopsis
aid <path> [OPTIONS]
Core Arguments and Options
🎯 Primary Arguments
Argument | Type | Default | Description |
---|---|---|---|
<path> | String | (required) | Path to source file or directory to analyze. Use .git for git history mode, - (or empty) for stdin input |
📁 Output Options
Option | Type | Default | Description |
---|---|---|---|
-o, --output | String | .aid/<dirname>.[options].txt | Output file path. Auto-generated based on input directory basename and options if not specified |
--stdout | Flag | false | Print output to stdout in addition to file. When used alone, no file is created |
--format | String | text | Output format: text (ultra-compact), md (clean Markdown), jsonl (one JSON per file), json-structured (rich semantic data), xml (structured XML) |
🤖 AI Actions
Option | Type | Default | Description |
---|---|---|---|
--ai-action | String | (none) | Generate pre-configured prompts with distilled code for AI analysis. See AI Actions section below |
--ai-output | String | ./.aid/<action>.<timestamp>.<dirname>.md | Custom output path for generated AI prompt files |
👁️ Visibility Filtering
Option | Type | Default | Description |
---|---|---|---|
--public | 0|1 | 1 | Include public members (methods, functions, classes) |
--protected | 0|1 | 0 | Include protected members |
--internal | 0|1 | 0 | Include internal/package-private members |
--private | 0|1 | 0 | Include private members |
📝 Content Filtering
Option | Type | Default | Description |
---|---|---|---|
--comments | 0|1 | 0 | Include inline and block comments |
--docstrings | 0|1 | 1 | Include documentation comments (docstrings, JSDoc, etc.) |
--implementation | 0|1 | 0 | Include function/method bodies (implementation details) |
--imports | 0|1 | 1 | Include import/require statements |
--annotations | 0|1 | 1 | Include decorators and annotations |
🎛️ Alternative Filtering Syntax
Option | Type | Default | Description |
---|---|---|---|
--include-only | String | (none) | Include ONLY these categories (comma-separated: public,protected,imports ) |
--exclude-items | String | (none) | Exclude these categories (comma-separated: private,comments,implementation ) |
📂 File Selection
Option | Type | Default | Description |
---|---|---|---|
--include | String | (all files) | Include file patterns (comma-separated: *.go,*.py or multiple: --include "*.go" --include "*.py" ) |
--exclude | String | (none) | Exclude file patterns (comma-separated: *test*,*.json or multiple: --exclude "*test*" --exclude "vendor/**" ) |
-r, --recursive | 0|1 | 1 | Process directories recursively. Set to 0 to process only immediate directory contents |
🔧 Processing Options
Option | Type | Default | Description |
---|---|---|---|
--raw | Flag | false | Process all text files without language parsing. Overrides all content filters |
--lang | String | auto | Force language detection: auto , python , typescript , javascript , go , rust , java , csharp , kotlin , cpp , php , ruby , swift |
📍 Path Control
Option | Type | Default | Description |
---|---|---|---|
--file-path-type | String | relative | Path format in output: relative or absolute |
--relative-path-prefix | String | (empty) | Custom prefix for relative paths (e.g., module/ → module/src/file.go ) |
⚡ Performance Options
Option | Type | Default | Description |
---|---|---|---|
-w, --workers | Integer | 0 | Number of parallel workers. 0 = auto (80% of CPU cores), 1 = serial processing, 2+ = specific worker count |
📊 Summary Output Options
Option | Type | Default | Description |
---|---|---|---|
--summary-type | String | visual-progress-bar | Summary format after processing. See Summary Types below |
--no-emoji | Flag | false | Disable emojis in summary output for plain text terminals |
📜 Git Mode Options (when path is .git
)
Option | Type | Default | Description |
---|---|---|---|
--git-limit | Integer | 200 | Number of commits to analyze. Use 0 for all commits |
--with-analysis-prompt | Flag | false | Add comprehensive AI analysis prompt for commit quality, patterns, and insights |
🐛 Diagnostic Options
Option | Type | Default | Description |
---|---|---|---|
-v, --verbose | Count | 0 | Verbose output. Use -vv for detailed info, -vvv for full trace with data dumps |
--version | Flag | false | Show version information and exit |
--help | Flag | false | Show help message |
--help-extended | Flag | false | Show complete documentation (man page style) |
--cheat | Flag | false | Show quick reference card |
AI Actions Detailed
AI actions generate pre-configured prompts combined with distilled code that AI agents can then execute for specific analysis tasks:
Action | Generated Prompt Type | AI Agent Will |
---|---|---|
prompt-for-refactoring-suggestion | Refactoring analysis prompt with distilled code | Analyze code for improvements, technical debt, effort sizing |
prompt-for-complex-codebase-analysis | Enterprise-grade analysis prompt with full codebase | Generate architecture diagrams, compliance checks, findings |
prompt-for-security-analysis | Security audit prompt with OWASP Top 10 guidelines | Detect vulnerabilities, suggest remediation steps |
prompt-for-performance-analysis | Performance optimization prompt with complexity focus | Identify bottlenecks, analyze scalability issues |
prompt-for-best-practices-analysis | Code quality prompt with industry standards | Assess code quality, suggest improvements |
prompt-for-bug-hunting | Bug detection prompt with pattern analysis | Find bugs, analyze quality metrics |
prompt-for-single-file-docs | Documentation generation prompt for single file | Create comprehensive API documentation |
prompt-for-diagrams | Diagram generation prompt with Mermaid syntax | Generate 10+ architecture and process diagrams |
flow-for-deep-file-to-file-analysis | Systematic analysis task list with directory structure | Perform file-by-file deep analysis |
flow-for-multi-file-docs | Documentation workflow with file relationships | Create interconnected documentation |
Summary Types
Type | Description | Example Output |
---|---|---|
visual-progress-bar | Default. Shows compression progress bar with colors | ✅ Distilled 150 files [████████░░] 85% (5MB → 750KB) |
stock-ticker | Compact stock market style | 📊 AID 97.5% ▲ | 5MB→128KB | ~1.2M tokens saved |
speedometer-dashboard | Multi-line dashboard with detailed metrics | Shows files, size, tokens, processing time in box format |
minimalist-sparkline | Single line with sparkline visualization | ▁▃▅▇█ 150 files → 97.5% reduction (750KB) ✓ |
ci-friendly | Clean format for CI/CD pipelines | [aid] ✓ 85.9% saved | 21 kB → 2.9 kB | 4ms |
json | Machine-readable JSON output | {"original_bytes":5242880,"distilled_bytes":131072,...} |
off | Disable summary output | No summary displayed |
Exit Codes
Code | Meaning |
---|---|
0 | Success |
1 | General error (file not found, parse error, etc.) |
2 | Invalid arguments or conflicting options |
Examples
# Basic usage - distill with default settings (public APIs only)
aid ./src
# Include all visibility levels and implementation
aid ./src --private=1 --protected=1 --internal=1 --implementation=1
# Generate security analysis prompt (AI agent will execute the analysis)
aid --ai-action prompt-for-security-analysis ./api --private=1
# Process only Python and Go files, exclude tests
aid --include "*.py,*.go" --exclude "*test*,*spec*" ./
# Git history analysis with AI insights
aid .git --with-analysis-prompt --git-limit=500
# Raw text processing for documentation
aid ./docs --raw
# Force single-threaded processing for debugging (-v, -vv, -vvv)
aid ./complex-code -w 1 -vv
# Custom output with absolute paths
aid ./lib --output=/tmp/analysis.txt --file-path-type=absolute
# CI/CD integration with clean output
aid ./internal --summary-type=ci-friendly --no-emoji
⚠️ Limitations
- Syntax Errors: Files with syntax errors may be skipped or partially processed
- Dynamic Features: Runtime-determined types/interfaces in dynamic languages are not resolved
- Macro Expansion: Complex macros (Rust, C++) show pre-expansion source
- Generated Code: Consider using
.aidignore
to skip generated files
🔒 Security Considerations
⚠️ Important: AI Distiller extracts code structure which may include:
- Function and variable names that could reveal business logic (e.g.,
processPayment
,calculateTaxEvasion
) - API endpoints and internal routes (e.g.,
/api/v1/internal/user-data
) - Type information and data structures
- Comments and docstrings (unless stripped)
- File paths revealing project structure or codenames
Recommendations:
- Always review output before sending to external services
- Use
--comments=0
to remove potentially sensitive documentation - Consider running a secrets scanner on your codebase first
- For maximum security, run AI Distiller in an isolated environment
- Future: We're exploring an
--obfuscate
flag to anonymize sensitive identifiers
🛠️ Advanced Usage
⚡ Parallel Processing
AI Distiller now supports parallel processing for significantly faster analysis of large codebases:
# Use default parallel processing (80% of CPU cores)
aid ./src
# Force serial processing (original behavior)
aid ./src -w 1
# Use specific number of workers
aid ./src -w 16
# Check performance with verbose output
aid ./src -v # Shows: "Using 25 parallel workers (32 CPU cores available)"
Performance Benefits:
- React packages: 3.5s → 0.5s (7x faster)
- Large codebases: Near-linear speedup with CPU cores
- Maintains identical output order as serial processing
Processing from stdin
AI Distiller can process code directly from stdin, perfect for:
- Quick code snippet analysis
- Pipeline integration
- Testing without creating files
- Dynamic code generation workflows
# Auto-detect language from stdin
echo 'class User { getName() { return this.name; } }' | aid --format text
# Explicit language specification
cat mycode.php | aid --lang php --private=0 --protected=0
# Use "-" to explicitly read from stdin
aid - --lang python < snippet.py
# Pipeline example: extract structure from generated code
generate-code.sh | aid --lang typescript --format json
Language Detection: When using stdin without --lang
, AI Distiller automatically detects the language based on syntax patterns. Supported languages for auto-detection: python, typescript, javascript, go, ruby, swift, rust, java, c#, kotlin, c++, php.
Integration with AI Tools
# Create a context file for Claude or GPT
aid ./src --format text --implementation=0 > context.txt
# Generate a codebase summary for RAG systems
aid . --format json | jq -r '.files[].symbols[].name' > symbols.txt
# Extract API surface for documentation
aid ./api --comments=0 --implementation=0 --format md > api-ref.md
🚫 Ignoring Files with .aidignore
AI Distiller respects .aidignore
files for excluding files and directories from processing. The syntax is similar to .gitignore
.
Important: What AI Distiller Processes
AI Distiller only processes source code files with these extensions:
- Python:
.py
,.pyw
,.pyi
- JavaScript:
.js
,.mjs
,.cjs
,.jsx
- TypeScript:
.ts
,.tsx
,.d.ts
- Go:
.go
- Rust:
.rs
- Ruby:
.rb
,.rake
,.gemspec
- Java:
.java
- C#:
.cs
- Kotlin:
.kt
,.kts
- C++:
.cpp
,.cc
,.cxx
,.c++
,.h
,.hpp
,.hh
,.hxx
,.h++
- PHP:
.php
,.phtml
,.php3
,.php4
,.php5
,.php7
,.phps
,.inc
- Swift:
.swift
Note: Files like .log
, .txt
, .md
, images, PDFs, and other non-source files are automatically ignored by AI Distiller, so you don't need to add them to .aidignore
.
Default Ignored Directories
AI Distiller automatically ignores these common dependency and build directories:
node_modules/
- npm packagesvendor/
- Go and PHP dependenciestarget/
- Rust build outputbuild/
,dist/
- Common build directories__pycache__/
,.pytest_cache/
,venv/
,.venv/
,env/
,.env/
- Python.gradle/
,gradle/
- Java/KotlinPods/
- Swift/iOS dependencies.bundle/
- Ruby bundlerbin/
,obj/
- Compiled binaries.vs/
,.idea/
,.vscode/
- IDE directoriescoverage/
,.nyc_output/
- Test coveragebower_components/
- Legacy JavaScript.terraform/
- Terraform.git/
,.svn/
,.hg/
- Version control
You can override these defaults using !
patterns in .aidignore
(see Advanced Usage below).
Basic Syntax
Create a .aidignore
file in your project root or any subdirectory:
# Comments start with hash
*.test.js # Ignore test files
*.spec.ts # Ignore spec files
temp/ # Ignore temp directory
build/ # Ignore build directory
/secrets.py # Ignore secrets.py only in root
node_modules/ # Ignore node_modules everywhere
**/*.bak # Ignore .bak files in any directory
src/test_* # Ignore test_* files in src/
!important.test.js # Don't ignore important.test.js (negation)
How It Works
.aidignore
files work recursively - place them in any directory- Patterns are relative to the directory containing the
.aidignore
file - Use
/
prefix for patterns relative to the.aidignore
location - Use
**
for recursive matching - Directory patterns should end with
/
- Use
!
prefix to negate a pattern (re-include previously ignored files)
Examples
# .aidignore in project root
node_modules/ # Excludes all node_modules directories
*.test.js # Excludes all test files
*.spec.ts # Excludes all spec files
dist/ # Excludes dist directory
.env.py # Excludes environment config files
vendor/ # Excludes vendor directory
# More specific patterns
src/**/test_*.py # Test files in src subdirectories
!src/test_utils.py # But include this specific test file
/config/*.local.py # Local config files in root config dir
**/*_generated.go # Generated Go files anywhere
Advanced Usage: Including Normally Ignored Content
Include Default-Ignored Directories
Use !
patterns to include directories that are ignored by default:
# Include vendor directory for analysis
!vendor/
# Include specific node_modules package
!node_modules/my-local-package/
# Include Python virtual environment
!venv/
Include Non-Source Files
You can also include files that AI Distiller normally doesn't process:
# Include all markdown files
!*.md
!**/*.md
# Include configuration files
!*.yaml
!*.json
!.env
# Include specific documentation
!docs/**/*.txt
!README.md
!CHANGELOG.md
When you include non-source files with !
patterns, AI Distiller will include their raw content in the output.
Nested .aidignore Files
You can place .aidignore
files in subdirectories for more specific control:
# project/.aidignore
*.test.py
!vendor/ # Include vendor in this project
# project/src/.aidignore
test_*.go
*.mock.ts
!test_helpers.ts # Exception: include test_helpers.ts
🎯 Git History Analysis Mode
AI Distiller includes a special mode for analyzing git repositories. When you pass a .git
directory, it switches to git log mode:
# View formatted git history
aid .git
# Limit to recent commits (default is 200)
aid .git --git-limit=500
# Include AI analysis prompt for comprehensive insights
aid .git --git-limit=1000 --with-analysis-prompt
The --with-analysis-prompt
flag adds a sophisticated prompt combined with git history that AI agents can use to generate:
- Contributor statistics with expertise areas and collaboration patterns
- Timeline analysis with development phases and activity visualization
- Functional categorization of commits (features, fixes, refactoring)
- Codebase evolution insights including technology shifts
- Actionable recommendations based on discovered patterns
The output file contains both the analysis prompt and formatted git history, ready for AI agents to process. Perfect for understanding project history, identifying knowledge silos, or generating impressive development reports.
❓ FAQ
How accurate are the token counts?
Token counts are estimated using OpenAI's cl100k_base tokenizer (1 token ≈ 4 characters). Actual token usage varies by model - Claude and GPT-4 use similar tokenizers, while others may differ by ±10%.
Can AI Distiller handle very large repositories?
Yes! We've tested on repositories with 50,000+ files. The parallel processing mode (-w
flag) scales linearly with CPU cores. Memory usage is bounded - large files are processed in streaming chunks.
What about generated code and vendor directories?
Create a .aidignore
file (same syntax as .gitignore
) to exclude generated files, vendor directories, or any paths you don't want processed.
What happens with unsupported file types?
Files with unknown or unsupported extensions are automatically skipped - no errors, no interruption. AI Distiller only processes files it has parsers for, ensuring clean and relevant output. This means you can safely run it on mixed repositories containing documentation, images, configs, etc.
Is my code sent anywhere?
No! AI Distiller runs 100% locally. It only extracts and formats your code structure - you decide what to do with the output. The tool itself makes no network connections.
Which programming languages are supported?
Currently 12+ languages via tree-sitter: Python, TypeScript, JavaScript, Go, Java, C#, Rust, Ruby, Swift, Kotlin, PHP, C++. All parsers are bundled in the binary - no external dependencies needed.
🤝 Contributing
We welcome contributions! See CONTRIBUTING.md for guidelines.
Development Setup
# Clone and setup
git clone https://github.com/janreges/ai-distiller
cd ai-distiller
make dev-init # Initialize development environment
# Run tests
make test # Unit tests
make test-integration # Integration tests
# Build binary
make build # Build for current platform
Building Release Binaries
AI Distiller requires CGO for full language support via tree-sitter parsers. To build release binaries for all supported platforms:
Prerequisites
Ubuntu/Debian:
# Install cross-compilation toolchains
sudo apt-get update
sudo apt-get install -y gcc-aarch64-linux-gnu gcc-mingw-w64-x86-64
# For macOS cross-compilation, you need osxcross:
# 1. Clone osxcross: git clone https://github.com/tpoechtrager/osxcross tools/osxcross
# 2. Obtain macOS SDK (see https://github.com/tpoechtrager/osxcross#packaging-the-sdk)
# 3. Place SDK in tools/osxcross/tarballs/
# 4. Build osxcross: cd tools/osxcross && ./build.sh
Build All Platforms
# Build release archives for all platforms
./scripts/build-releases.sh
# This creates:
# - aid-linux-amd64.tar.gz (Linux 64-bit)
# - aid-linux-arm64.tar.gz (Linux ARM64)
# - aid-darwin-amd64.tar.gz (macOS Intel)
# - aid-darwin-arm64.tar.gz (macOS Apple Silicon)
# - aid-windows-amd64.zip (Windows 64-bit)
The script automatically detects available toolchains and builds for all possible platforms. Each archive contains the aid
binary (or aid.exe
for Windows) with full language support.
Note: Without proper toolchains, only the native platform will be built.
📄 License
MIT License - see LICENSE for details.
🙏 Acknowledgments
- Built on tree-sitter for accurate parsing
- Inspired by the need for better AI-code interaction
- Created with ❤️ by Claude Code & Ján Regeš from SiteOne (Czech Republic).