Screeps Monitoring

The Screeps Monitoring workflow is a comprehensive autonomous analysis system that serves as the “strategic brain” of the Screeps GPT project. It combines bot performance monitoring, PTR telemetry analysis, repository health assessment, and intelligent decision-making to guide project development.

Overview

Workflow: .github/workflows/screeps-monitoring.yml
Schedule: Every 30 minutes (via cron schedule)
Duration: Up to 45 minutes
MCP Servers: github, screeps-mcp, screeps-api

This workflow provides comprehensive autonomous oversight by analyzing the Screeps bot’s in-game performance, monitoring PTR stats for anomalies, evaluating repository health, and making strategic decisions about priorities and improvements. It consolidates the functionality of the former copilot-autonomous-monitor.yml and screeps-stats-monitor.yml workflows.

Architecture

Multi-Phase Analysis Pipeline

The workflow executes in seven mandatory phases:

Phase 1: Authentication & Connection Validation

Authenticates GitHub CLI with repository access
Verifies Screeps MCP server connection
Fetches PTR telemetry data using scripts/fetch-screeps-stats.mjs
Fetches bot performance data from game console
Logs all connection states for debugging

PTR Telemetry Collection:

The workflow executes the telemetry fetch script which:

Uses environment variables: SCREEPS_TOKEN (or SCREEPS_STATS_TOKEN), SCREEPS_HOST, SCREEPS_STATS_API
Fetches user stats from the Screeps API endpoint /api/user/stats
Stores results in reports/screeps-stats/latest.json
Copies snapshot to reports/copilot/ptr-stats.json for analysis
Creates failure snapshot with error details if API is unavailable

Phase 2: Bot Performance Analysis

Evaluates game-side performance through three dimensions:

A. Game State Assessment

Spawning status and creep population across rooms
CPU usage patterns and efficiency metrics
Energy economy (income, expenses, storage, construction)
Room control level (RCL) progress and upgrade rates
Defense capabilities and threat responses

B. Strategic Execution Evaluation

Strategy alignment with documented goals
Resource allocation and creep behavior bottlenecks
Room expansion opportunities and territory control
Trade and market activity analysis

C. Memory & Performance Health

Memory usage and leak detection
Tick execution time and CPU bucket trends
Error logs and exception patterns
Memory segment usage and cleanup

Phase 3: PTR Stats Anomaly Detection

Analyzes the PTR telemetry snapshot for critical performance anomalies requiring immediate attention.

Anomaly Detection Criteria:

Critical Priority Anomalies (priority/critical):

CPU usage > 95% for 3+ consecutive ticks
Memory crashes or persistent errors
Zero creep spawning for 10+ ticks when resources available
Room abandonment without explicit strategy

High Priority Anomalies (priority/high):

CPU usage > 80% for 10+ consecutive ticks
Energy efficiency drop > 20% from baseline
Creep population deviation > 30% from target
Construction progress stalled for 50+ ticks

Medium Priority Anomalies (priority/medium):

Suboptimal resource allocation patterns
Minor performance degradations < 10%
Non-critical strategy execution delays

Requirements:

All anomaly issues must have concrete evidence with exact metric values and thresholds
All issue titles must start with PTR: to identify monitoring findings
All severity labels must be justified with specific impact assessment
All analysis must be reproducible with stored snapshot data

After Copilot analysis completes, the workflow also executes scripts/check-ptr-alerts.ts which:

Reads the PTR stats snapshot from reports/screeps-stats/latest.json
Analyzes for high CPU usage (>80% sustained), critical CPU (>95%), and low energy reserves
Controller Health Monitoring: Checks controller downgrade timers across all rooms
- Critical Alert (< 12 hours): Immediate email + push notification
- Warning Alert (< 24 hours): Email + push notification for attention
- Info Alert (< 48 hours): Logged for monitoring awareness
Tracks upgrader count, energy availability, and controller progress per room
Sends push notifications via Push by Techulus for critical and high severity alerts
Sends email notifications with detailed controller status for warning/critical alerts
Provides real-time alerting independent of issue creation

Controller Health Data:

The workflow collects comprehensive controller metrics through console telemetry:

ticksToDowngrade: Actual downgrade timer from game state
controllerProgress: Current progress toward next RCL
controllerProgressTotal: Total energy required for next level
upgraderCount: Number of active upgrader creeps per room
energyAvailable: Energy immediately available for upgrading

Alert history is preserved in reports/bot-snapshots/ for trend analysis and incident investigation.

Phase 4: Repository Health Analysis

Evaluates development infrastructure through GitHub MCP tools:

A. Codebase Quality

Recent CI/CD failures and workflow health
Open issues and PR blockers
Code coverage trends and test quality
Technical debt and refactoring needs

B. Automation Effectiveness

Copilot agent activity assessment
Deployment frequency and success rates
Monitoring alert patterns
Documentation freshness

C. Development Velocity

Commit frequency and momentum
Feature implementation backlog
Dependency and blocking analysis

Phase 5: Strategic Decision Making

Applies intelligent prioritization based on impact assessment:

Priority Levels:

Critical (priority/critical): Bot non-functional, memory crashes, security vulnerabilities, complete automation failures
High (priority/high): Major performance degradation (>20%), strategy execution failures, important CI/CD issues, documentation gaps preventing improvements
Medium (priority/medium): Optimization opportunities, refactoring needs, workflow improvements, non-blocking doc updates
Low (priority/low): Minor quality improvements, nice-to-have features, documentation polish

Phase 6: Autonomous Issue Management

For each identified action:

Searches existing issues to prevent duplicates
Creates new issues with evidence-based descriptions
- For strategic issues: Title prefixed with [Autonomous Monitor]
- For PTR anomalies: Title prefixed with PTR:
Updates existing issues with new analysis
Closes resolved issues when fixes are validated

Issue Quality Requirements:

Concrete evidence from bot performance, PTR stats, or repository analysis
Measurable impact assessment
Actionable recommendations with alternatives
Clear success criteria and validation methods

Phase 7: Strategic Recommendations

Generates comprehensive analysis report:

Overall bot health score (0-100 scale)
PTR performance status (operational/degraded/critical)
Top 3 priorities for game performance
Top 3 priorities for development infrastructure
Emerging opportunities (expansion, optimization, automation)
Risk assessment and mitigation strategies

Safety Controls

Allowed Actions

✅ Read bot state, memory, and console output
✅ Execute read-only console commands for analysis
✅ Fetch and analyze PTR telemetry data
✅ Create, update, comment on, and close GitHub issues
✅ Search repository code and documentation
✅ Analyze workflow logs and automation health

Prohibited Actions

❌ Execute destructive console commands
❌ Modify Memory without explicit approval
❌ Create or merge pull requests automatically
❌ Change repository settings or secrets
❌ Deploy code changes automatically

Rate Limiting

Maximum 10 GitHub issues created per run
Maximum 5 Screeps console commands per analysis phase
Graceful degradation if APIs unavailable
Runs every 30 minutes (not continuously)

Error Handling

Screeps API unavailable: Creates monitoring issue, continues with repository analysis
PTR telemetry fetch fails: Documents failure, continues with strategic monitoring
GitHub API fails: Logs error, stores analysis locally
MCP tools fail: Fallbacks to available tools, notes limitations in output

Configuration

Required Secrets

Screeps Access:

SCREEPS_TOKEN (required) - Screeps API authentication token
SCREEPS_STATS_TOKEN (optional) - Alternative stats token
SCREEPS_HOST (optional) - Server hostname, defaults to screeps.com
SCREEPS_SHARD (optional) - Default shard, defaults to shard3
SCREEPS_PORT, SCREEPS_PROTOCOL (optional) - Server connection parameters
SCREEPS_STATS_HOST (optional) - PTR stats endpoint
SCREEPS_STATS_API (optional) - PTR stats API URL

GitHub Access:

COPILOT_TOKEN (required) - GitHub token with Copilot Requests scope
PUSH_TOKEN (optional) - Push by Techulus token for real-time PTR alerts
Default GITHUB_TOKEN used for repository operations (issues, PRs)

Permissions

permissions:
  contents: read
  issues: write
  pull-requests: read

Usage

Manual Trigger

Execute the workflow manually from GitHub Actions UI:

Navigate to Actions → Screeps Monitoring
Click “Run workflow” button
Select branch (typically main)
Monitor execution in workflow run logs

Schedule

Automatically runs every 30 minutes (cron: */30 * * * *) to provide high-frequency monitoring of both strategic health and PTR performance metrics. Also triggers automatically on completion of the “Deploy Screeps AI” workflow.

Viewing Results

Workflow Logs:

Detailed execution logs available in GitHub Actions run
Verbose logging enabled for debugging and audit trail
JSON summary output at end of logs

Issue Creation:

New issues tagged with monitoring, copilot, automation labels
Strategic issue titles prefixed with [Autonomous Monitor]
PTR anomaly issue titles prefixed with PTR:
Evidence and recommendations included in issue body

Artifacts:

Analysis report uploaded as workflow artifact
PTR stats snapshot stored in reports/screeps-stats/latest.json
Copilot analysis snapshot in reports/copilot/ptr-stats.json
30-day retention for historical tracking
Download from workflow run page

Push Notifications:

Critical and high severity PTR alerts sent via Push by Techulus
Notifications include alert type, severity, and link to workflow run
Requires PUSH_TOKEN secret for real-time alerting
See Push Notifications Guide for configuration details

Integration with Other Workflows

Consolidated Monitoring

This workflow consolidates two previously separate monitoring systems:

Strategic Autonomous Monitoring: Comprehensive analysis of bot performance and repository health using MCP servers
PTR Stats Monitoring: High-frequency telemetry collection with anomaly detection and push notifications

The consolidation provides:

Single workflow execution instead of two parallel runs every 30 minutes
Combined analysis correlating PTR metrics with strategic performance
Unified issue creation with consistent labeling and evidence
Reduced workflow complexity and execution overhead

Triggers Downstream Automation

Issues created by the monitoring workflow can trigger:

Copilot Todo Automation when labeled with Todo
CI Autofix if monitoring identifies workflow failures

Data Flow

Every 30 Minutes (Cron + Deploy Completion)
    ↓
[Authenticate & Connect]
    ↓
[Fetch PTR Telemetry] → reports/screeps-stats/latest.json
    ↓
[Analyze Bot Performance] ← Screeps MCP Server
    ↓
[Detect PTR Anomalies] ← PTR Stats Snapshot
    ↓
[Analyze Repository Health] ← GitHub MCP Server
    ↓
[Strategic Decision Making]
    ↓
[Issue Management] → Creates/Updates Issues (Strategic + PTR)
    ↓
[Check PTR Alerts] → Send Push Notifications
    ↓
[Strategic Report] → Workflow Artifact
    ↓
[Triggers Downstream] → Todo/Spec-Kit Workflows

Best Practices

Monitoring the Monitor

Review workflow execution logs weekly for patterns
Validate that created issues are actionable and accurate
Adjust priority thresholds if too many/few issues created
Monitor execution time to ensure 45-minute timeout is sufficient

Tuning Analysis

Update prompt template (.github/copilot/prompts/screeps-monitor) to refine analysis criteria
Adjust console commands in Phase 2 for specific metrics
Customize priority thresholds in Phase 5 based on project needs
Configure PTR anomaly detection thresholds in Phase 3

Issue Quality

Issues should be self-contained with all evidence included
Validate that recommendations are actionable and specific
Check for duplicate prevention (search before create)
Ensure severity labels match actual impact

Safety Validation

Audit issue creation patterns to prevent noise
Verify no destructive actions attempted
Review rate limiting effectiveness
Check error handling for API failures

Troubleshooting

Workflow Fails to Start

Check COPILOT_TOKEN secret is configured
Verify SCREEPS_TOKEN secret exists
Review workflow syntax with yamllint

Screeps MCP Connection Fails

Validate SCREEPS_TOKEN has correct permissions
Check SCREEPS_HOST if using private server
Review MCP config in .github/mcp/screeps-mcp.json

No Issues Created

Review strategic decision-making logs for criteria matching
Check if existing issues prevent duplicates
Verify bot performance is within normal thresholds

Timeout Issues

Review execution logs for slow operations
Check if MCP servers are responsive
Consider reducing analysis scope or increasing timeout

Rate Limiting Hit

Verify max 10 issues per run not exceeded
Check max 5 console commands per phase
Review error handling logs for API failures

Future Enhancements

Potential improvements to consider:

Trend Analysis: Track bot health score over time for regression detection
Predictive Analysis: Machine learning to predict issues before they occur
Resource Optimization: Automatic tuning of spawning and upgrade strategies
Cross-Shard Analysis: Compare performance across multiple shards
Market Intelligence: Automated trade and market strategy optimization
Expansion Planning: Territory analysis for optimal room claiming

Automation Overview - Complete workflow documentation
Push Notifications - PTR alert notification setup
Copilot Repository Review - Code quality audits
Issue Triage Workflow - Issue processing
Todo Automation - Automated implementation