Tech Stack
AI Trading System Tech Stack
Last updated: January 21, 2026
Our autonomous AI trading system leverages cutting-edge AI/ML technologies to execute options trades. This page documents the complete technical architecture powering our 86% win rate iron condor strategy.
System Architecture Overview
Broker")] FRED[("FRED API
Treasury Yields")] NEWS[("Market News
Sentiment")] end subgraph AI["AI Layer"] CLAUDE["Claude Opus 4.5
(Critical Decisions)"] OPENROUTER["OpenRouter Gateway
(DeepSeek, Mistral, Kimi)"] RAG["Vertex AI RAG
(Lessons + Trades)"] GEMINI["Gemini 2.0 Flash
(Retrieval)"] end subgraph CORE["Core Trading System"] ORCH["Trading Orchestrator"] GATES["Gate Pipeline
(Momentum, Sentiment, Risk)"] EXEC["Trade Executor"] MCP["MCP Servers
(Protocol Layer)"] end subgraph OUTPUT["Output Layer"] WEBHOOK["Dialogflow Webhook
(Cloud Run)"] BLOG["GitHub Pages Blog"] DEVTO["Dev.to Articles"] end ALPACA --> ORCH FRED --> ORCH NEWS --> OPENROUTER ORCH --> GATES GATES --> CLAUDE GATES --> OPENROUTER GATES --> RAG RAG --> GEMINI GATES --> EXEC EXEC --> ALPACA ORCH --> MCP MCP --> WEBHOOK ORCH --> BLOG ORCH --> DEVTO
AI/ML Technologies
1. Claude (Anthropic SDK)
ACTIVE Primary LLM for Critical Decisions
Claude Opus 4.5 is our primary reasoning engine, used for all trade-critical decisions where accuracy matters more than cost.
$0.25/1M tokens"] MEDIUM["Medium Tasks"] --> SONNET["Claude Sonnet
$3/1M tokens"] CRITICAL["Trade Decisions"] --> OPUS["Claude Opus 4.5
$15/1M tokens"] end
Key Integration Points:
src/agents/base_agent.py- All agents inherit Claude reasoningsrc/utils/self_healing.py- Autonomous error recoverysrc/orchestrator/gates.py- Trade gate decisions
Why Claude for Trading:
- Highest reasoning accuracy for financial decisions
- Strong instruction following (critical for risk rules)
- Low hallucination rate on numerical data
from anthropic import Anthropic
class BaseAgent:
def __init__(self, name: str, model: str = "claude-opus-4-5-20251101"):
self.client = Anthropic()
self.model = model
def reason_with_llm(self, prompt: str) -> dict:
response = self.client.messages.create(
model=self.model,
max_tokens=4096,
messages=[{"role": "user", "content": prompt}]
)
return response
2. OpenRouter (Multi-LLM Gateway)
ACTIVE Cost-Optimized Inference
OpenRouter provides access to multiple LLMs through a single API, enabling us to route tasks to the most cost-effective model.
$0.14/$0.28 per 1M"] MISTRAL["Mistral Medium 3
$0.40/$2.00 per 1M"] KIMI["Kimi K2
$0.39/$1.90 per 1M
#1 Trading Benchmark"] end end subgraph Tasks["Task Routing"] SENT["Sentiment Analysis"] --> DS RESEARCH["Market Research"] --> MISTRAL TRADE["Trade Signals"] --> KIMI end API --> Models
Model Selection (from StockBench benchmarks):
| Model | Cost (In/Out) | Trading Sortino | Use Case |
|---|---|---|---|
| DeepSeek | $0.14/$0.28 | 0.021 | Sentiment, News |
| Mistral Medium 3 | $0.40/$2.00 | - | Research, Analysis |
| Kimi K2 | $0.39/$1.90 | 0.042 | Trade Signals |
MCP Server Integration:
# mcp/servers/openrouter/sentiment.py
class SentimentAnalyzer:
def analyze(self, news: list[str]) -> float:
# Routes to DeepSeek for cost efficiency
response = openrouter.chat(
model="deepseek/deepseek-chat",
messages=[{"role": "user", "content": news_prompt}]
)
return parse_sentiment(response)
3. Vertex AI RAG (Retrieval-Augmented Generation)
ACTIVE Cloud Semantic Search
Our RAG system stores all trade history and lessons learned, enabling the system to learn from past mistakes and successes.
512 tokens, 100 overlap"] LESSONS["Lessons Learned"] --> CHUNK CHUNK --> EMBED["text-embedding-004
768 dimensions"] end subgraph Storage["Vector Storage"] EMBED --> CORPUS["Vertex AI RAG Corpus
(GCP Managed)"] end subgraph Query["Query Pipeline"] QUERY["User Query"] --> RETRIEVAL["Hybrid Search
(Semantic + Keyword)"] CORPUS --> RETRIEVAL RETRIEVAL --> RERANK["Re-ranking"] RERANK --> GEMINI["Gemini 2.0 Flash
Generation"] GEMINI --> RESPONSE["Contextual Response"] end
Architecture Decisions:
- 768D Embeddings: Google’s text-embedding-004 (best price/performance)
- Hybrid Search: Combines semantic similarity with keyword matching
- Chunking Strategy: 512 tokens with 100 overlap (optimal for financial docs)
- Top-K: Returns 5 most relevant chunks per query
Key Files:
src/rag/vertex_rag.py- Core RAG implementationrag_knowledge/lessons_learned/- 200+ documented lessonsscripts/query_vertex_rag.py- CLI query interface
from vertexai.preview import rag
from vertexai.preview.generative_models import GenerativeModel
class VertexRAG:
def query(self, query_text: str) -> list[dict]:
rag_retrieval = rag.Retrieval(
source=rag.VertexRagStore(
rag_corpora=[self.corpus.name],
similarity_top_k=5,
vector_distance_threshold=0.7,
),
)
model = GenerativeModel(
model_name="gemini-2.0-flash",
tools=[rag_retrieval],
)
return model.generate_content(query_text)
4. MCP (Model Context Protocol)
ACTIVE Protocol Layer for Tool Integration
MCP provides a standardized way for AI agents to interact with external tools and data sources.
(Orders, Market)"] OPENROUTER_MCP["OpenRouter Server
(Sentiment, Stocks)"] TRADE_MCP["Trade Server
(Execution)"] end end subgraph External["External APIs"] ALPACA_API["Alpaca API"] OR_API["OpenRouter API"] end Agents --> CLIENT CLIENT --> Servers ALPACA_MCP --> ALPACA_API OPENROUTER_MCP --> OR_API
Server Implementations:
mcp/servers/alpaca/- Market data, order executionmcp/servers/openrouter/- Sentiment, stock analysis, IPO researchmcp/servers/trade_agent.py- High-level trade coordination
5. LangGraph (Pipeline Checkpointing)
ACTIVE Fault-Tolerant Execution
LangGraph patterns enable checkpoint-based recovery for our trade execution pipeline.
✓ Checkpoint"] --> G2["Sentiment Gate
✓ Checkpoint"] G2 --> G3["Risk Gate
✓ Checkpoint"] G3 --> EXEC["Execute Trade"] end subgraph Recovery["Failure Recovery"] FAIL["Gate Failure"] --> LOAD["Load Last Checkpoint"] LOAD --> RETRY["Retry from Checkpoint"] end G2 -.-> FAIL
Implementation:
@dataclass
class PipelineCheckpoint:
thread_id: str # e.g., "trade:SPY:2026-01-21T14:30:00"
checkpoint_id: str
gate_index: int
gate_name: str
context_json: str # Serialized state
Data Flow Architecture
Trade Execution Flow
Blog Generation Flow
Infrastructure
Cloud Services
| Service | Provider | Purpose |
|---|---|---|
| RAG Corpus | Google Cloud (Vertex AI) | Vector search, embeddings |
| Webhook | Google Cloud Run | Dialogflow integration |
| CI/CD | GitHub Actions | Automated testing, deployment |
| Blog Hosting | GitHub Pages | Static site hosting |
| Broker | Alpaca | Paper/Live trading |
Cost Optimization
Budget Controls:
- BATS framework routes 80% of queries to cost-effective models
- RAG reduces repeated LLM calls via cached knowledge
- Batch processing during off-peak hours
Technology Status
| Technology | Status | Notes |
|---|---|---|
| Claude (Anthropic) | ACTIVE | Primary reasoning engine |
| OpenRouter | ACTIVE | Multi-LLM gateway |
| Vertex AI RAG | ACTIVE | Cloud semantic search |
| MCP Protocol | ACTIVE | Tool integration layer |
| LangGraph | ACTIVE | Pipeline checkpointing |
| Gemini 2.0 Flash | ACTIVE | RAG retrieval |
| LangSmith | DEPRECATED | Removed Jan 2026, replaced by RAG |
| LangChain | DEPRECATED | Migrated to direct SDK |
How Tech Stack Affects Trading
1. Decision Quality
- Claude Opus 4.5 provides highest reasoning accuracy for trade decisions
- RAG enables learning from 200+ documented mistakes
- Result: 86% win rate on iron condors
2. Cost Efficiency
- OpenRouter routing reduces LLM costs by 70%
- BATS framework matches task complexity to model cost
- Result: <$50/month AI costs for full system
3. Reliability
- LangGraph checkpoints enable recovery from failures
- MCP protocol standardizes tool interactions
- Result: Zero trade execution failures in 90 days
4. Continuous Learning
- Vertex AI RAG captures every lesson automatically
- Blog sync shares learnings publicly
- Result: System improves with every trade
This tech stack documentation is auto-updated. View source at GitHub.