Documentation Index
Fetch the complete documentation index at: https://docs.ellomas.com/llms.txt
Use this file to discover all available pages before exploring further.
Performance
Seedling is designed for high-throughput data generation. This page covers performance characteristics, benchmarks, and optimization strategies.Benchmarks
| Configuration | Rows | Time | Throughput |
|---|---|---|---|
| SQL writer, simple schema | 10,000 | 0.3s | 33K rows/s |
| SQL writer, simple schema | 500,000 | 12s | 42K rows/s |
| SQL writer, complex schema (15 tables) | 500,000 | 18s | 28K rows/s |
| COPY (Postgres) | 1,000,000 | 8s | 125K rows/s |
| Direct DB insert | 500,000 | 22s | 23K rows/s |
Parallel Generation
When--parallel is enabled, Seedling identifies independent table subgraphs and generates them concurrently:
- Tables with no FK dependencies on each other run in parallel workers
- Performance scales with available CPU cores
- Caution: Parallel generation breaks determinism (row ordering varies between runs)
Batching
The--batch-size flag controls how many rows are generated per batch:
- Smaller batches use less memory but more overhead
- Larger batches are faster but use more RAM per table
- Default: 1000
Optimization Tips
- Use COPY for Postgres:
--copymode is 3-5x faster than batched INSERTs - Increase batch size for large tables:
--batch-size 5000reduces per-batch overhead - Use deterministic mode with smaller samples for development:
--seed 42 --count 1000 - Avoid parallelism if you need deterministic output
- Truncate before insert with
--truncateto avoid unique constraint violations from existing data - Dry-run first with
--dry-runto verify the plan before spending time generating