Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Performance Optimization

This guide covers profiling, benchmarking, and optimization techniques for AstraWeave games and the engine itself.

Performance Philosophy

AstraWeave targets consistent 60+ FPS gameplay with AI-native features. Key principles:

  1. Measure First: Always profile before optimizing
  2. Budget Time: Allocate frame time across systems
  3. Batch Operations: Minimize per-entity overhead
  4. Cache Strategically: Trade memory for speed where appropriate
  5. Offload to Threads: Parallelize independent work

Frame Budget

At 60 FPS, each frame has ~16.67ms. Recommended allocation:

SystemBudgetNotes
Game Logic2-3msECS systems, gameplay
AI2-4msPerception, planning, behaviors
Physics2-3msCollision, dynamics
Rendering6-8msDraw calls, GPU submission
Audio0.5-1msMixing, spatial
Buffer1-2msHeadroom for spikes

Profiling Tools

Tracy Integration

AstraWeave integrates with Tracy for real-time profiling:

[dependencies]
astraweave-profiling = { version = "0.1", features = ["tracy"] }
#![allow(unused)]
fn main() {
use astraweave_profiling::*;

fn my_system(query: Query<&MyComponent>) {
    profile_scope!("my_system");
    
    for component in query.iter() {
        profile_scope!("process_entity");
        // Work
    }
}
}

Run with Tracy:

cargo run --release --features tracy

Built-in Profiler

Enable the debug overlay:

#![allow(unused)]
fn main() {
use astraweave_profiling::prelude::*;

app.add_plugin(ProfilingPlugin::default());
}

Press F3 in-game to toggle the performance overlay showing:

  • Frame time graph
  • System timing breakdown
  • Memory usage
  • Draw call count

CPU Profiling

For detailed CPU analysis:

# Linux (perf)
perf record -g cargo run --release
perf report

# Windows (ETW)
cargo build --release
# Use Windows Performance Analyzer

# macOS (Instruments)
cargo instruments --release -t time

GPU Profiling

# NVIDIA Nsight
cargo run --release
# Attach Nsight Graphics

# RenderDoc
cargo run --release --features renderdoc
# Press F12 to capture frame

Optimization Techniques

ECS Optimization

Query Optimization

#![allow(unused)]
fn main() {
// Bad: Iterating all entities
fn slow_system(query: Query<&Transform>) {
    for transform in query.iter() {
        // Processes all entities with Transform
    }
}

// Good: Filter to relevant entities
fn fast_system(
    query: Query<&Transform, (With<Enemy>, Without<Dead>)>,
) {
    for transform in query.iter() {
        // Only active enemies
    }
}
}

Parallel Iteration

#![allow(unused)]
fn main() {
use rayon::prelude::*;

fn parallel_system(query: Query<&mut Transform>) {
    query.par_iter_mut().for_each(|mut transform| {
        // Thread-safe processing
        transform.translation.y += 0.1;
    });
}
}

Change Detection

#![allow(unused)]
fn main() {
fn efficient_update(
    query: Query<&MyComponent, Changed<MyComponent>>,
) {
    for component in query.iter() {
        // Only processes recently changed entities
    }
}
}

Archetypes

Group components that are commonly accessed together:

#![allow(unused)]
fn main() {
// Good: Components often queried together
#[derive(Bundle)]
struct EnemyBundle {
    transform: Transform,
    health: Health,
    ai: AiAgent,
    collider: Collider,
}

// Avoid: Rarely used components on common entities
struct RarelyUsedData { /* ... */ }
}

Memory Optimization

Component Size

#![allow(unused)]
fn main() {
// Bad: Large component
#[derive(Component)]
struct LargeComponent {
    data: [f32; 1000],  // 4KB per entity
    name: String,
}

// Good: Split into data and reference
#[derive(Component)]
struct SmallComponent {
    data_handle: Handle<LargeData>,  // 8 bytes
    flags: u8,
}
}

Object Pools

#![allow(unused)]
fn main() {
use astraweave_ecs::pool::*;

#[derive(Resource)]
struct BulletPool {
    pool: EntityPool<BulletBundle>,
}

impl BulletPool {
    fn spawn(&mut self, commands: &mut Commands) -> Entity {
        self.pool.get_or_spawn(commands, || BulletBundle::default())
    }
    
    fn despawn(&mut self, entity: Entity) {
        self.pool.return_entity(entity);
    }
}
}

Arena Allocation

For temporary allocations:

#![allow(unused)]
fn main() {
use bumpalo::Bump;

fn batch_process(entities: &[Entity]) {
    let arena = Bump::new();
    
    let temp_data: &mut [Vec3] = arena.alloc_slice_fill_default(entities.len());
    
    // Work with temp_data
    // Arena automatically freed at scope end
}
}

Rendering Optimization

Batching

#![allow(unused)]
fn main() {
// Enable instanced rendering for repeated meshes
#[derive(Component)]
struct InstancedMesh {
    mesh: Handle<Mesh>,
    material: Handle<Material>,
    instances: Vec<Transform>,
}
}

Level of Detail

#![allow(unused)]
fn main() {
#[derive(Component)]
struct LodGroup {
    distances: [f32; 3],
    meshes: [Handle<Mesh>; 3],
}

fn lod_system(
    camera: Query<&Transform, With<Camera>>,
    mut lod_query: Query<(&Transform, &LodGroup, &mut Handle<Mesh>)>,
) {
    let camera_pos = camera.single().translation;
    
    for (transform, lod, mut mesh) in lod_query.iter_mut() {
        let distance = transform.translation.distance(camera_pos);
        
        let lod_level = if distance < lod.distances[0] { 0 }
            else if distance < lod.distances[1] { 1 }
            else { 2 };
        
        *mesh = lod.meshes[lod_level].clone();
    }
}
}

Culling

#![allow(unused)]
fn main() {
#[derive(Component)]
struct Visibility {
    pub visible: bool,
    pub render_layers: u32,
}

fn frustum_culling_system(
    camera: Query<(&Camera, &Transform)>,
    mut renderables: Query<(&Transform, &Aabb, &mut Visibility)>,
) {
    let (camera, cam_transform) = camera.single();
    let frustum = camera.compute_frustum(cam_transform);
    
    for (transform, aabb, mut visibility) in renderables.iter_mut() {
        let world_aabb = aabb.transformed(transform);
        visibility.visible = frustum.intersects_aabb(&world_aabb);
    }
}
}

AI Optimization

Tick Budgeting

#![allow(unused)]
fn main() {
#[derive(Resource)]
pub struct AiBudget {
    pub max_ms_per_frame: f32,
    pub agents_processed: usize,
}

fn budgeted_ai_system(
    mut budget: ResMut<AiBudget>,
    mut agents: Query<&mut AiAgent>,
    time: Res<Time>,
) {
    let start = std::time::Instant::now();
    budget.agents_processed = 0;
    
    for mut agent in agents.iter_mut() {
        if start.elapsed().as_secs_f32() * 1000.0 > budget.max_ms_per_frame {
            break;
        }
        
        agent.tick();
        budget.agents_processed += 1;
    }
}
}

LOD for AI

#![allow(unused)]
fn main() {
#[derive(Component)]
pub struct AiLod {
    pub distance_from_player: f32,
    pub update_frequency: u32,
    pub frames_since_update: u32,
}

fn ai_lod_system(
    player: Query<&Transform, With<Player>>,
    mut ai_query: Query<(&Transform, &mut AiLod, &mut AiAgent)>,
) {
    let player_pos = player.single().translation;
    
    for (transform, mut lod, mut agent) in ai_query.iter_mut() {
        lod.distance_from_player = transform.translation.distance(player_pos);
        
        lod.update_frequency = match lod.distance_from_player {
            d if d < 20.0 => 1,   // Every frame
            d if d < 50.0 => 2,   // Every 2 frames
            d if d < 100.0 => 5,  // Every 5 frames
            _ => 10,              // Every 10 frames
        };
        
        lod.frames_since_update += 1;
        if lod.frames_since_update >= lod.update_frequency {
            lod.frames_since_update = 0;
            agent.tick();
        }
    }
}
}

LLM Caching

#![allow(unused)]
fn main() {
#[derive(Resource)]
pub struct LlmCache {
    cache: LruCache<u64, String>,
    hit_count: u64,
    miss_count: u64,
}

impl LlmCache {
    pub fn get_or_generate<F>(
        &mut self,
        prompt: &str,
        generator: F,
    ) -> &str
    where
        F: FnOnce() -> String,
    {
        let hash = hash_prompt(prompt);
        
        if !self.cache.contains(&hash) {
            self.miss_count += 1;
            let response = generator();
            self.cache.put(hash, response);
        } else {
            self.hit_count += 1;
        }
        
        self.cache.get(&hash).unwrap()
    }
}
}

Physics Optimization

Broad Phase

#![allow(unused)]
fn main() {
#[derive(Resource)]
pub struct PhysicsConfig {
    pub broad_phase: BroadPhaseType,
    pub substeps: u32,
    pub velocity_iterations: u32,
}

pub enum BroadPhaseType {
    BruteForce,     // < 100 entities
    SpatialHash,    // 100-1000 entities
    BvhTree,        // > 1000 entities
}
}

Sleeping

#![allow(unused)]
fn main() {
#[derive(Component)]
pub struct RigidBody {
    pub sleeping: bool,
    pub sleep_threshold: f32,
    pub sleep_timer: f32,
}

fn sleep_system(mut bodies: Query<(&Velocity, &mut RigidBody)>) {
    for (velocity, mut body) in bodies.iter_mut() {
        if velocity.linear.length_squared() < body.sleep_threshold {
            body.sleep_timer += delta;
            if body.sleep_timer > 0.5 {
                body.sleeping = true;
            }
        } else {
            body.sleep_timer = 0.0;
            body.sleeping = false;
        }
    }
}
}

Benchmarking

Criterion Benchmarks

#![allow(unused)]
fn main() {
use criterion::{criterion_group, criterion_main, Criterion, BenchmarkId};

fn bench_ecs_iteration(c: &mut Criterion) {
    let mut group = c.benchmark_group("ecs_iteration");
    
    for entity_count in [1000, 10000, 100000] {
        let world = create_world_with_entities(entity_count);
        
        group.bench_with_input(
            BenchmarkId::new("query", entity_count),
            &world,
            |b, world| {
                b.iter(|| {
                    let mut count = 0;
                    for _ in world.query::<&Transform>().iter() {
                        count += 1;
                    }
                    count
                });
            },
        );
    }
    
    group.finish();
}

criterion_group!(benches, bench_ecs_iteration);
criterion_main!(benches);
}

Performance Regression Testing

# .github/workflows/bench.yml
name: Benchmarks
on: [push, pull_request]

jobs:
  benchmark:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run benchmarks
        run: cargo bench --all -- --save-baseline new
      - name: Compare with main
        run: |
          git fetch origin main
          cargo bench --all -- --baseline main --save-baseline new

Common Performance Issues

Issue: Frame Spikes

Symptoms: Occasional stutters, inconsistent frame times

Causes:

  • GC in scripting
  • Asset loading on main thread
  • Large allocations

Solutions:

#![allow(unused)]
fn main() {
// Pre-warm asset loading
fn warmup_system(asset_server: Res<AssetServer>) {
    asset_server.load::<Mesh>("meshes/common.gltf");
    asset_server.load::<Texture>("textures/atlas.png");
}

// Use streaming for large assets
let handle = asset_server.load_async::<LargeAsset>("path").await;
}

Issue: High CPU Usage

Symptoms: High CPU, low GPU utilization

Causes:

  • Inefficient queries
  • Too many systems
  • Excessive allocations

Solutions:

#![allow(unused)]
fn main() {
// Combine related systems
fn combined_system(
    mut query: Query<(&mut Transform, &Velocity, &mut Health)>,
) {
    for (mut transform, velocity, mut health) in query.iter_mut() {
        transform.translation += velocity.0;
        health.regen();
    }
}
}

Issue: Memory Growth

Symptoms: Increasing memory over time

Causes:

  • Entity leaks
  • Cache growth
  • Asset retention

Solutions:

#![allow(unused)]
fn main() {
// Periodic cleanup
fn cleanup_system(
    mut commands: Commands,
    dead_entities: Query<Entity, With<Dead>>,
    time: Res<Time>,
    mut cleanup_timer: ResMut<CleanupTimer>,
) {
    cleanup_timer.tick(time.delta());
    
    if cleanup_timer.just_finished() {
        for entity in dead_entities.iter() {
            commands.entity(entity).despawn_recursive();
        }
    }
}
}

Best Practices

1. **Profile in Release**: Debug builds are 10-50x slower
2. **Measure Realistically**: Test with actual content, not empty scenes
3. **Test on Target Hardware**: Don't only test on dev machines
4. **Budget Early**: Set performance targets before development
5. **Automate Testing**: Catch regressions in CI
- **Premature Optimization**: Don't optimize without profiling data
- **Micro-benchmarks**: Real-world performance may differ
- **Ignoring Memory**: CPU speed means nothing if you're thrashing cache
- **Single-threaded Thinking**: Utilize all cores