🔧 Troubleshooting Guide
🔧 Troubleshooting Guide
Last Updated: November 19, 2025 Purpose: Quick reference for common issues and recovery procedures
🚨 Quick Diagnosis
Workflow Failed - Where to Look
- GitHub Actions Logs: Actions tab → Failed run → Job logs
- Sentry (if configured): Check for error notifications
- Performance Log:
data/performance_log.json- Last successful update - System State:
data/system_state.json- Current system status
Common Issues & Fixes
1. Workflow Cancelled (Timeout)
Symptoms:
- Workflow shows “Cancelled” status
- No trades executed
- Performance log not updated
Root Causes:
- Alpha Vantage exponential backoff (FIXED: Now 90s max timeout)
- Data source failures cascading (FIXED: Reliable sources first)
- Old code running (FIXED: Always checkout latest main)
Fix:
# Verify latest code is deployed
git log --oneline -1
# Manual recovery: Update performance log
python3 scripts/update_performance_log.py
# Next run will use latest code automatically
Prevention: ✅ Fixed in Nov 19, 2025 - Data source priority reordering
2. Data Source Failures
Symptoms:
- “yfinance returned insufficient data”
- “Alpaca API failed”
- “Alpha Vantage rate-limited”
Diagnosis:
# Check which data sources are configured
python3 -c "
import os
from dotenv import load_dotenv
load_dotenv()
print('Alpaca:', '✅' if os.getenv('ALPACA_API_KEY') else '❌')
print('Polygon:', '✅' if os.getenv('POLYGON_API_KEY') else '❌')
print('Alpha Vantage:', '✅' if os.getenv('ALPHA_VANTAGE_API_KEY') else '❌')
"
Fix:
- System automatically falls back through priority order:
- Alpaca API (most reliable)
- Polygon.io (reliable paid)
- Cached data (< 24 hours old)
- yfinance (unreliable free)
- Alpha Vantage (avoid if rate-limited)
If All Fail:
- System will skip trading day (better than bad data)
- Use cached data if available
- Check API status pages
3. Performance Log Not Updated
Symptoms:
data/performance_log.jsonmissing today’s entry- Last entry is from yesterday
Causes:
- Workflow cancelled before completion
- Script error before log update
- Manual execution didn’t complete
Fix:
# Manual update
python3 scripts/update_performance_log.py
# Verify update
cat data/performance_log.json | jq '.[-1]'
4. API Authentication Errors
Symptoms:
- “401 Unauthorized”
- “Invalid API key”
- “Authentication failed”
Fix:
- Verify GitHub Secrets are set:
ALPACA_API_KEYALPACA_SECRET_KEYPOLYGON_API_KEY(optional but recommended)ALPHA_VANTAGE_API_KEY(optional)
- Check API keys are valid:
```bash
Test Alpaca
python3 scripts/check_alpaca_status.py
Test Polygon (if configured)
python3 -c “ from polygon import RESTClient import os from dotenv import load_dotenv load_dotenv() client = RESTClient(os.getenv(‘POLYGON_API_KEY’)) print(‘Polygon API:’, ‘✅’ if client else ‘❌’) “
---
### 5. Import Errors
**Symptoms**:
- "ModuleNotFoundError: No module named 'X'"
- "ImportError: cannot import name 'Y'"
**Fix**:
```bash
# Verify requirements.txt is up to date
pip install -r requirements.txt
# Check Python version (should be 3.11)
python3 --version
6. Order Execution Failures
Symptoms:
- Orders submitted but not filled
- “Insufficient buying power”
- “Invalid order parameters”
Diagnosis:
# Check account status
python3 scripts/check_alpaca_status.py
# Check recent orders
python3 -c "
import alpaca_trade_api as tradeapi
import os
from dotenv import load_dotenv
load_dotenv()
api = tradeapi.REST(os.getenv('ALPACA_API_KEY'), os.getenv('ALPACA_SECRET_KEY'), 'https://paper-api.alpaca.markets')
orders = api.list_orders(status='all', limit=5)
for o in orders:
print(f'{o.symbol}: {o.status} - {o.filled_qty}/{o.qty} @ ${o.filled_avg_price}')
"
Fix:
- Check account balance
- Verify order size is valid (> $1 minimum)
- Check market hours (9:30 AM - 4:00 PM ET)
Manual Recovery Procedures
When Workflow Fails Completely
Step 1: Diagnose
# Check GitHub Actions logs
gh run view --log
# Check system state
cat data/system_state.json | jq '.last_updated'
Step 2: Update Performance Log
python3 scripts/update_performance_log.py
Step 3: Verify Next Run
# Check workflow is scheduled
gh workflow view daily-trading.yml
# Verify latest code is deployed
git log --oneline -1
Emergency Stop
Stop All Trading:
# Disable workflow
gh workflow disable .github/workflows/daily-trading.yml
# Trip circuit breaker
python3 -c "
from src.safety.circuit_breakers import CircuitBreaker
CircuitBreaker()._trip_breaker('MANUAL', 'Emergency stop')
"
Resume Trading:
# Re-enable workflow
gh workflow enable .github/workflows/daily-trading.yml
# Reset circuit breaker (if needed)
python3 -c "
from src.safety.circuit_breakers import CircuitBreaker
CircuitBreaker().manual_reset()
"
Health Check Script
Run this to verify system health:
python3 scripts/health_check.py
Checks:
- ✅ API keys configured
- ✅ API connectivity
- ✅ Data freshness
- ✅ System state valid
- ✅ Performance log up to date
Getting Help
- Check Logs:
logs/trading_system.log - Check Sentry: If configured, errors appear there
- Review Recent Changes:
git log --oneline -10 - Check Documentation:
docs/directory
Prevention Checklist
Before each trading day, verify:
- Latest code is deployed (
git log -1) - API keys are valid (
scripts/check_alpaca_status.py) - Workflow is enabled (
gh workflow view daily-trading.yml) - Performance log updated yesterday (
cat data/performance_log.json | jq '.[-1]') - No circuit breakers tripped (
cat data/system_state.json | jq '.circuit_breaker')
Related Documentation
docs/PLAN.md- Infrastructure reliability fixesdocs/CI_ARCHITECTURE.md- Workflow details.claude/skills/error_handling_protocols/SKILL.md- Error handling protocols