Performance
GaleforceCSS is faster than Tailwind v3 by a meaningful margin. Numbers below are what to expect on real workloads.
Real-world baseline
horizon-tailwind-react (2,551 candidates → 706 rules):
| Tailwind 3 CLI | GaleforceCSS CLI | Speedup | |
|---|---|---|---|
| Cold build | 246 ms | 10 ms | 23x |
| Warm import | 15 ms | 2 ms | 7.5x |
Synthetic smoke (734 candidates):
| Tailwind 3 CLI | GaleforceCSS CLI | Speedup | |
|---|---|---|---|
| Cold build | 195 ms | 5 ms | 38x |
Pure compute
The compile loop — no spawn, no I/O — runs in 1.24 ms on the real corpus.
| Candidate | Cost |
|---|---|
flex (static) | 0.34 µs |
| Unknown utility | 0.21 µs |
bg-red-500 (value lookup) | 0.41 µs |
dark:hover:bg-blue-500/50 | 0.41 µs |
Where the gains come from
default_theme()cached inOnceLock— was rebuilding ~12k JSON-tree allocations per build. 245 ms → 16.5 ms (15x).- Prefix-indexed
find_value_utilities— replaces a linear walk over ~150 entries withHashMap<&str, Vec<&'static ValueUtility>>. 16.5 ms → 2.4 ms (6.8x). - Rayon parallelization —
par_iterover candidates whenn ≥ 256. 2.4 ms → 1.2 ms (1.9x). - Scanner fast path for explicit file paths — bypass
WalkBuilderinstances when the Vite plugin pre-expands globs. ~16k files: 2.0 s of overhead removed. - Parallel content scan in the CLI —
par_iteracross 8 cores. 3.9 s → 460 ms on a 16k-file repo.
The hot loop is now ~0.4 µs/candidate for color resolution. Further gains need lower-level work (skipping String allocations in the rule emitter, tighter rayon chunking).
HMR latency
For small edits, the compile loop is rarely the bottleneck:
| Phase | Typical |
|---|---|
| File watch debounce | 200 ms (fixed) |
| Scanner re-walk | 2–10 ms |
| Compile | 1–5 ms |
| IPC round-trip (CLI bridge) | 1–3 ms |
| Vite HMR send | 5–20 ms (browser-side) |
Levers if you need more:
- Lower watch debounce. Default 200 ms in
galeforcecss watch. - Narrow
contentglobs. Fewer files = faster scan. - Use
createCompileStream()in long-running tools. Amortizes process startup.
Benchmarking
bash
pnpm --filter @galeforcecss/conformance bench # synthetic
pnpm --filter @galeforcecss/conformance bench -- --project /path/to/your/app # real
cargo bench -p galeforce-compiler # pure computeMemory
The CLI binary uses mimalloc — ~25% faster on warm-stream compile vs the system allocator. Peak resident on the real corpus is well under 50 MB.