Improvement Metrics and Measurement
This document defines metrics for measuring AI strategy effectiveness and validating that changes improve performance without introducing regressions.
Overview
Measuring Screeps AI performance requires tracking multiple dimensions: resource efficiency, CPU usage, progression rate, and stability. This document provides a framework for objective evaluation.
Core Metrics Categories
1. Resource Efficiency Metrics
Energy Per Tick (EPT)
Definition: Average energy harvested per game tick
Calculation:
1 | EPT = totalEnergyHarvested / ticksElapsed; |
Baseline Targets:
- RCL 1: 8-12 EPT (2 harvesters, 1 source)
- RCL 2: 15-20 EPT (3-4 harvesters, 2 sources)
- RCL 3: 20-25 EPT (4-5 harvesters, 2 sources)
- RCL 4+: 25-35 EPT (optimized harvesters)
Collection Method:
1 | // Add to Memory tracking |
Improvement Indicator: Higher EPT = better efficiency
Harvest Efficiency Ratio
Definition: Percentage of time harvesters spend actively harvesting
Calculation:
1 | HarvestEfficiency = (harvestTicks / totalHarvesterTicks) * 100; |
Baseline Targets:
- Good: >60% (most time harvesting)
- Acceptable: 40-60% (some travel overhead)
- Poor: <40% (too much idle/travel time)
Collection Method:
1 | // Track per-harvester in Memory |
Improvement Indicator: Higher percentage = better
Energy Waste Rate
Definition: Energy lost due to overharvesting or storage overflow
Calculation:
1 | WasteRate = (energyWasted / totalEnergyHarvested) * 100; |
Baseline Targets:
- Excellent: <5% waste
- Good: 5-10% waste
- Poor: >10% waste
Collection Method:
1 | // Detect waste events |
Improvement Indicator: Lower percentage = better
2. CPU Efficiency Metrics
CPU Per Tick
Definition: Average CPU consumed per game tick
Calculation:
1 | avgCPU = totalCPU / ticksElapsed; |
Baseline Targets (10 CPU limit):
- RCL 1: 1.5-2.5 CPU/tick
- RCL 2: 2.5-4.0 CPU/tick
- RCL 3: 3.5-5.5 CPU/tick
- RCL 4+: 5.0-8.0 CPU/tick
Collection Method:
1 | // Tracked by PerformanceTracker automatically |
Improvement Indicator: Lower CPU = better (if functionality maintained)
CPU Per Creep
Definition: Average CPU consumed per living creep
Calculation:
1 | CPUPerCreep = cpuUsed / creepCount; |
Baseline Targets:
- Excellent: <0.3 CPU/creep
- Good: 0.3-0.5 CPU/creep
- Acceptable: 0.5-0.8 CPU/creep
- Poor: >0.8 CPU/creep
Collection Method:
1 | const cpuPerCreep = snapshot.cpuUsed / snapshot.creepCount; |
Improvement Indicator: Lower CPU/creep = better efficiency
CPU Bucket Trend
Definition: Change in CPU bucket over time
Calculation:
1 | BucketTrend = (currentBucket - startBucket) / ticksElapsed; |
Baseline Targets:
- Increasing: >0 (gaining bucket)
- Stable: ~0 (balanced usage)
- Decreasing: <0 (losing bucket) ⚠️
Collection Method:
1 | Memory.metrics.bucketHistory = Memory.metrics.bucketHistory || []; |
Improvement Indicator: Positive or zero trend = sustainable
3. Progression Metrics
Room Control Level (RCL) Progression Rate
Definition: Average ticks per RCL level
Calculation:
1 | TicksPerLevel = ticksElapsed / (currentRCL - startRCL); |
Baseline Targets:
- RCL 1→2: ~5,000-8,000 ticks
- RCL 2→3: ~10,000-15,000 ticks
- RCL 3→4: ~15,000-25,000 ticks
- RCL 4→5: ~25,000-40,000 ticks
Collection Method:
1 | // Track RCL changes |
Improvement Indicator: Fewer ticks per level = faster progression
Controller Upgrade Rate (CPT)
Definition: Average control points generated per tick
Calculation:
1 | CPT = controllerPoints / ticksElapsed; |
Baseline Targets:
- RCL 1: 0.5-1.0 CPT (1 upgrader)
- RCL 2: 1.0-2.0 CPT (1-2 upgraders)
- RCL 3: 2.0-4.0 CPT (2-3 upgraders)
- RCL 4+: 4.0-8.0 CPT (3-5 upgraders)
Collection Method:
1 | // Track controller progress |
Improvement Indicator: Higher CPT = faster upgrades
4. Stability Metrics
Population Stability
Definition: Variance in creep population over time
Calculation:
1 | StdDev = sqrt(variance(populationHistory)); |
Baseline Targets:
- Excellent: StdDev <1 (very stable)
- Good: StdDev 1-2 (minor fluctuations)
- Poor: StdDev >2 (unstable population)
Collection Method:
1 | Memory.metrics.populationHistory = Memory.metrics.populationHistory || []; |
Improvement Indicator: Lower variance = more stable
Spawn Uptime Percentage
Definition: Percentage of ticks spawns are actively spawning
Calculation:
1 | SpawnUptime = (spawningTicks / totalTicks) * 100; |
Baseline Targets:
- Healthy: 60-80% (continuous production)
- Acceptable: 40-60% (periodic production)
- Poor: <40% (insufficient energy or demand)
Collection Method:
1 | let spawningTicks = 0; |
Improvement Indicator: 60-80% is optimal (too high = bottleneck, too low = underutilized)
Error Rate
Definition: Number of errors or warnings per 1000 ticks
Calculation:
1 | ErrorRate = (errorCount / totalTicks) * 1000; |
Baseline Targets:
- Excellent: 0 errors per 1000 ticks
- Acceptable: <5 errors per 1000 ticks
- Poor: >10 errors per 1000 ticks
Collection Method:
1 | // Count warnings and errors from logs |
Improvement Indicator: Lower rate = more stable
Composite Metrics
Overall Efficiency Score (OES)
Definition: Weighted combination of key metrics
Calculation:
1 | OES = EPT_score * 0.3 + CPU_score * 0.3 + CPT_score * 0.2 + Stability_score * 0.2; |
Scoring (0-100 scale):
- Each metric normalized to 0-100 range
- Baseline = 50 (acceptable)
- Target = 75+ (good)
- Excellent = 90+ (optimal)
Example:
1 | // Normalize EPT (baseline: 10, target: 20) |
Metric Collection Infrastructure
Automated Collection (in Kernel)
Add to src/runtime/metrics/MetricsCollector.ts:
1 | export class MetricsCollector { |
Manual Inspection (Console)
View Current Metrics:
1 | console.log(JSON.stringify(Memory.metrics, null, 2)); |
Calculate Summary:
1 | function summarizeMetrics() { |
A/B Testing Framework
Baseline Collection
Phase 1: Establish Baseline (1000+ ticks)
1 | Memory.baseline = { |
Comparison Collection
Phase 2: Test New Strategy (1000+ ticks)
1 | Memory.comparison = { |
Regression Detection
Statistical Significance
Use T-Test to determine if difference is meaningful:
1 | function tTest(sample1: number[], sample2: number[]): { significant: boolean; pValue: number } { |
Automated Alerts
In SystemEvaluator:
1 | // Add regression checking |
Improvement Validation Checklist
Before declaring improvement successful:
- Collected 1000+ ticks of baseline metrics
- Collected 1000+ ticks of comparison metrics
- EPT improved or stable (within 5%)
- CPU/creep improved or stable (within 5%)
- Bucket trend neutral or positive
- No increase in error rate
- Population stability maintained
- Spawn uptime maintained
- Statistical significance verified (p < 0.05)
- No adverse side effects observed
Reporting Template
Improvement Report Format
1 | ## Strategy Change: [Brief Description] |
Best Practices
DO:
- ✓ Collect baseline before changes
- ✓ Run comparisons for sufficient time (1000+ ticks)
- ✓ Verify statistical significance
- ✓ Track multiple dimensions (not just one metric)
- ✓ Document all measurements
- ✓ Compare like-for-like (same RCL, room conditions)
DON’T:
- ✗ Cherry-pick favorable metrics
- ✗ Compare different RCL levels directly
- ✗ Ignore side effects (CPU increase for EPT gain)
- ✗ Make conclusions from <100 tick samples
- ✗ Optimize single metric at expense of others
MONITOR:
- ⚠ Metric trends over time
- ⚠ Correlation between metrics
- ⚠ External factors (attacks, room conditions)
- ⚠ Long-term stability (10000+ ticks)
Related Documentation
- Strategy Testing - Testing methodologies for changes
- Safe Refactoring - How to modify code safely
- Performance Monitoring - Real-time monitoring
- Creep Roles - Expected performance characteristics
- Scaling Strategies - Performance by RCL