Screeps Monitoring
The Screeps Monitoring workflow is a comprehensive autonomous analysis system that serves as the “strategic brain” of the Screeps GPT project. It combines bot performance monitoring, PTR telemetry analysis, repository health assessment, and intelligent decision-making to guide project development.
Overview
Workflow: .github/workflows/screeps-monitoring.yml
Schedule: Every 30 minutes (via cron schedule)
Duration: Up to 45 minutes
MCP Servers: github, screeps-mcp, screeps-api
This workflow provides comprehensive autonomous oversight by analyzing the Screeps bot’s in-game performance, monitoring PTR stats for anomalies, evaluating repository health, and making strategic decisions about priorities and improvements. It consolidates the functionality of the former copilot-autonomous-monitor.yml and screeps-stats-monitor.yml workflows.
Architecture
Multi-Phase Analysis Pipeline
The workflow executes in seven mandatory phases:
Phase 1: Authentication & Connection Validation
- Authenticates GitHub CLI with repository access
- Verifies Screeps MCP server connection
- Fetches PTR telemetry data using
scripts/fetch-screeps-stats.mjs - Fetches bot performance data from game console
- Logs all connection states for debugging
PTR Telemetry Collection:
The workflow executes the telemetry fetch script which:
- Uses environment variables:
SCREEPS_TOKEN(orSCREEPS_STATS_TOKEN),SCREEPS_HOST,SCREEPS_STATS_API - Fetches user stats from the Screeps API endpoint
/api/user/stats - Stores results in
reports/screeps-stats/latest.json - Copies snapshot to
reports/copilot/ptr-stats.jsonfor analysis - Creates failure snapshot with error details if API is unavailable
Phase 2: Bot Performance Analysis
Evaluates game-side performance through three dimensions:
A. Game State Assessment
- Spawning status and creep population across rooms
- CPU usage patterns and efficiency metrics
- Energy economy (income, expenses, storage, construction)
- Room control level (RCL) progress and upgrade rates
- Defense capabilities and threat responses
B. Strategic Execution Evaluation
- Strategy alignment with documented goals
- Resource allocation and creep behavior bottlenecks
- Room expansion opportunities and territory control
- Trade and market activity analysis
C. Memory & Performance Health
- Memory usage and leak detection
- Tick execution time and CPU bucket trends
- Error logs and exception patterns
- Memory segment usage and cleanup
Phase 3: PTR Stats Anomaly Detection
Analyzes the PTR telemetry snapshot for critical performance anomalies requiring immediate attention.
Anomaly Detection Criteria:
Critical Priority Anomalies (priority/critical):
- CPU usage > 95% for 3+ consecutive ticks
- Memory crashes or persistent errors
- Zero creep spawning for 10+ ticks when resources available
- Room abandonment without explicit strategy
High Priority Anomalies (priority/high):
- CPU usage > 80% for 10+ consecutive ticks
- Energy efficiency drop > 20% from baseline
- Creep population deviation > 30% from target
- Construction progress stalled for 50+ ticks
Medium Priority Anomalies (priority/medium):
- Suboptimal resource allocation patterns
- Minor performance degradations < 10%
- Non-critical strategy execution delays
Requirements:
- All anomaly issues must have concrete evidence with exact metric values and thresholds
- All issue titles must start with
PTR:to identify monitoring findings - All severity labels must be justified with specific impact assessment
- All analysis must be reproducible with stored snapshot data
After Copilot analysis completes, the workflow also executes scripts/check-ptr-alerts.ts which:
- Reads the PTR stats snapshot from
reports/screeps-stats/latest.json - Analyzes for high CPU usage (>80% sustained), critical CPU (>95%), and low energy reserves
- Controller Health Monitoring: Checks controller downgrade timers across all rooms
- Critical Alert (< 12 hours): Immediate email + push notification
- Warning Alert (< 24 hours): Email + push notification for attention
- Info Alert (< 48 hours): Logged for monitoring awareness
- Tracks upgrader count, energy availability, and controller progress per room
- Sends push notifications via Push by Techulus for critical and high severity alerts
- Sends email notifications with detailed controller status for warning/critical alerts
- Provides real-time alerting independent of issue creation
Controller Health Data:
The workflow collects comprehensive controller metrics through console telemetry:
ticksToDowngrade: Actual downgrade timer from game statecontrollerProgress: Current progress toward next RCLcontrollerProgressTotal: Total energy required for next levelupgraderCount: Number of active upgrader creeps per roomenergyAvailable: Energy immediately available for upgrading
Alert history is preserved in reports/bot-snapshots/ for trend analysis and incident investigation.
Phase 4: Repository Health Analysis
Evaluates development infrastructure through GitHub MCP tools:
A. Codebase Quality
- Recent CI/CD failures and workflow health
- Open issues and PR blockers
- Code coverage trends and test quality
- Technical debt and refactoring needs
B. Automation Effectiveness
- Copilot agent activity assessment
- Deployment frequency and success rates
- Monitoring alert patterns
- Documentation freshness
C. Development Velocity
- Commit frequency and momentum
- Feature implementation backlog
- Dependency and blocking analysis
Phase 5: Strategic Decision Making
Applies intelligent prioritization based on impact assessment:
Priority Levels:
- Critical (
priority/critical): Bot non-functional, memory crashes, security vulnerabilities, complete automation failures - High (
priority/high): Major performance degradation (>20%), strategy execution failures, important CI/CD issues, documentation gaps preventing improvements - Medium (
priority/medium): Optimization opportunities, refactoring needs, workflow improvements, non-blocking doc updates - Low (
priority/low): Minor quality improvements, nice-to-have features, documentation polish
Phase 6: Autonomous Issue Management
For each identified action:
- Searches existing issues to prevent duplicates
- Creates new issues with evidence-based descriptions
- For strategic issues: Title prefixed with
[Autonomous Monitor] - For PTR anomalies: Title prefixed with
PTR:
- For strategic issues: Title prefixed with
- Updates existing issues with new analysis
- Closes resolved issues when fixes are validated
Issue Quality Requirements:
- Concrete evidence from bot performance, PTR stats, or repository analysis
- Measurable impact assessment
- Actionable recommendations with alternatives
- Clear success criteria and validation methods
Phase 7: Strategic Recommendations
Generates comprehensive analysis report:
- Overall bot health score (0-100 scale)
- PTR performance status (operational/degraded/critical)
- Top 3 priorities for game performance
- Top 3 priorities for development infrastructure
- Emerging opportunities (expansion, optimization, automation)
- Risk assessment and mitigation strategies
Safety Controls
Allowed Actions
✅ Read bot state, memory, and console output
✅ Execute read-only console commands for analysis
✅ Fetch and analyze PTR telemetry data
✅ Create, update, comment on, and close GitHub issues
✅ Search repository code and documentation
✅ Analyze workflow logs and automation health
Prohibited Actions
❌ Execute destructive console commands
❌ Modify Memory without explicit approval
❌ Create or merge pull requests automatically
❌ Change repository settings or secrets
❌ Deploy code changes automatically
Rate Limiting
- Maximum 10 GitHub issues created per run
- Maximum 5 Screeps console commands per analysis phase
- Graceful degradation if APIs unavailable
- Runs every 30 minutes (not continuously)
Error Handling
- Screeps API unavailable: Creates monitoring issue, continues with repository analysis
- PTR telemetry fetch fails: Documents failure, continues with strategic monitoring
- GitHub API fails: Logs error, stores analysis locally
- MCP tools fail: Fallbacks to available tools, notes limitations in output
Configuration
Required Secrets
Screeps Access:
SCREEPS_TOKEN(required) - Screeps API authentication tokenSCREEPS_STATS_TOKEN(optional) - Alternative stats tokenSCREEPS_HOST(optional) - Server hostname, defaults toscreeps.comSCREEPS_SHARD(optional) - Default shard, defaults toshard3SCREEPS_PORT,SCREEPS_PROTOCOL(optional) - Server connection parametersSCREEPS_STATS_HOST(optional) - PTR stats endpointSCREEPS_STATS_API(optional) - PTR stats API URL
GitHub Access:
COPILOT_TOKEN(required) - GitHub token with Copilot Requests scopePUSH_TOKEN(optional) - Push by Techulus token for real-time PTR alerts- Default
GITHUB_TOKENused for repository operations (issues, PRs)
Permissions
1 | permissions: |
Usage
Manual Trigger
Execute the workflow manually from GitHub Actions UI:
- Navigate to Actions → Screeps Monitoring
- Click “Run workflow” button
- Select branch (typically
main) - Monitor execution in workflow run logs
Schedule
Automatically runs every 30 minutes (cron: */30 * * * *) to provide high-frequency monitoring of both strategic health and PTR performance metrics. Also triggers automatically on completion of the “Deploy Screeps AI” workflow.
Viewing Results
Workflow Logs:
- Detailed execution logs available in GitHub Actions run
- Verbose logging enabled for debugging and audit trail
- JSON summary output at end of logs
Issue Creation:
- New issues tagged with
monitoring,copilot,automationlabels - Strategic issue titles prefixed with
[Autonomous Monitor] - PTR anomaly issue titles prefixed with
PTR: - Evidence and recommendations included in issue body
Artifacts:
- Analysis report uploaded as workflow artifact
- PTR stats snapshot stored in
reports/screeps-stats/latest.json - Copilot analysis snapshot in
reports/copilot/ptr-stats.json - 30-day retention for historical tracking
- Download from workflow run page
Push Notifications:
- Critical and high severity PTR alerts sent via Push by Techulus
- Notifications include alert type, severity, and link to workflow run
- Requires
PUSH_TOKENsecret for real-time alerting - See Push Notifications Guide for configuration details
Integration with Other Workflows
Consolidated Monitoring
This workflow consolidates two previously separate monitoring systems:
- Strategic Autonomous Monitoring: Comprehensive analysis of bot performance and repository health using MCP servers
- PTR Stats Monitoring: High-frequency telemetry collection with anomaly detection and push notifications
The consolidation provides:
- Single workflow execution instead of two parallel runs every 30 minutes
- Combined analysis correlating PTR metrics with strategic performance
- Unified issue creation with consistent labeling and evidence
- Reduced workflow complexity and execution overhead
Triggers Downstream Automation
Issues created by the monitoring workflow can trigger:
- Copilot Todo Automation when labeled with
Todo - CI Autofix if monitoring identifies workflow failures
Data Flow
1 | Every 30 Minutes (Cron + Deploy Completion) |
Best Practices
Monitoring the Monitor
- Review workflow execution logs weekly for patterns
- Validate that created issues are actionable and accurate
- Adjust priority thresholds if too many/few issues created
- Monitor execution time to ensure 45-minute timeout is sufficient
Tuning Analysis
- Update prompt template (
.github/copilot/prompts/screeps-monitor) to refine analysis criteria - Adjust console commands in Phase 2 for specific metrics
- Customize priority thresholds in Phase 5 based on project needs
- Configure PTR anomaly detection thresholds in Phase 3
Issue Quality
- Issues should be self-contained with all evidence included
- Validate that recommendations are actionable and specific
- Check for duplicate prevention (search before create)
- Ensure severity labels match actual impact
Safety Validation
- Audit issue creation patterns to prevent noise
- Verify no destructive actions attempted
- Review rate limiting effectiveness
- Check error handling for API failures
Troubleshooting
Workflow Fails to Start
- Check
COPILOT_TOKENsecret is configured - Verify
SCREEPS_TOKENsecret exists - Review workflow syntax with yamllint
Screeps MCP Connection Fails
- Validate
SCREEPS_TOKENhas correct permissions - Check
SCREEPS_HOSTif using private server - Review MCP config in
.github/mcp/screeps-mcp.json
No Issues Created
- Review strategic decision-making logs for criteria matching
- Check if existing issues prevent duplicates
- Verify bot performance is within normal thresholds
Timeout Issues
- Review execution logs for slow operations
- Check if MCP servers are responsive
- Consider reducing analysis scope or increasing timeout
Rate Limiting Hit
- Verify max 10 issues per run not exceeded
- Check max 5 console commands per phase
- Review error handling logs for API failures
Future Enhancements
Potential improvements to consider:
- Trend Analysis: Track bot health score over time for regression detection
- Predictive Analysis: Machine learning to predict issues before they occur
- Resource Optimization: Automatic tuning of spawning and upgrade strategies
- Cross-Shard Analysis: Compare performance across multiple shards
- Market Intelligence: Automated trade and market strategy optimization
- Expansion Planning: Territory analysis for optimal room claiming
Related Documentation
- Automation Overview - Complete workflow documentation
- Push Notifications - PTR alert notification setup
- Copilot Repository Review - Code quality audits
- Issue Triage Workflow - Issue processing
- Todo Automation - Automated implementation