Documentation Index
Fetch the complete documentation index at: https://docs.ellomas.com/llms.txt
Use this file to discover all available pages before exploring further.
How It Works
Seedling follows a four-stage pipeline: Introspect → Plan → Generate → Write.Pipeline
Stage 1: Introspect
Readsinformation_schema from a live Postgres or MySQL database and produces a structured Schema object containing:
- All tables and their columns with types
- Foreign key relationships
- Unique constraints, NOT NULL, CHECK, DEFAULT values
- Column comments (used as generator hints)
schema.yaml (or JSON), which serves as the input for generation.
Stage 2: Plan
ThePlanBuilder performs a topological sort of tables by FK dependency (Kahn’s algorithm). This ensures parent rows exist before child rows referencing them.
Each column is automatically assigned a generator based on its type and name:
| Column Type | Auto-detected Generator |
|---|---|
serial / bigserial | Sequence |
varchar(255) with name “email” | Email |
varchar with name “phone” | Phone |
timestamptz | Now |
| FK column | FK lookup from parent |
varchar | Random string |
Stage 3: Generate
TheStreamGenerator iterates tables in dependency order:
- For each table, generate N rows
- Each row’s columns are produced by their assigned generators
- FK columns look up values from already-generated parent rows via
FKPool - Unique constraints are tracked and enforced by
UniqueTracker - Circular FK dependencies are split into multi-pass groups
Stage 4: Write
Generated rows are streamed to a writer. Supported formats:| Writer | Description |
|---|---|
SqlWriter | Batched INSERT INTO ... statements |
CsvWriter | One file per table |
JsonLinesWriter | One JSON object per row |
ParquetWriter | Tabular output |
DbWriter | Direct batched INSERT into database |
CopyWriter | Postgres COPY protocol (max throughput) |
Determinism
When a seed is provided (--seed <int>), all generators use a ChaCha8-based deterministic PRNG. Every column gets a derived sub-seed, ensuring:
- Same schema + seed + count = identical output
- Parallel generation is deterministic within a run
- No
crypto/randusage in deterministic mode