Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.ellomas.com/llms.txt

Use this file to discover all available pages before exploring further.

How It Works

Seedling follows a four-stage pipeline: Introspect → Plan → Generate → Write.

Pipeline

Stage 1: Introspect

Reads information_schema from a live Postgres or MySQL database and produces a structured Schema object containing:
  • All tables and their columns with types
  • Foreign key relationships
  • Unique constraints, NOT NULL, CHECK, DEFAULT values
  • Column comments (used as generator hints)
Output is written as schema.yaml (or JSON), which serves as the input for generation.
seedling introspect --db postgres://localhost:5432/mydb --output schema.yaml

Stage 2: Plan

The PlanBuilder performs a topological sort of tables by FK dependency (Kahn’s algorithm). This ensures parent rows exist before child rows referencing them. Each column is automatically assigned a generator based on its type and name:
Column TypeAuto-detected Generator
serial / bigserialSequence
varchar(255) with name “email”Email
varchar with name “phone”Phone
timestamptzNow
FK columnFK lookup from parent
varcharRandom string

Stage 3: Generate

The StreamGenerator iterates tables in dependency order:
  1. For each table, generate N rows
  2. Each row’s columns are produced by their assigned generators
  3. FK columns look up values from already-generated parent rows via FKPool
  4. Unique constraints are tracked and enforced by UniqueTracker
  5. Circular FK dependencies are split into multi-pass groups

Stage 4: Write

Generated rows are streamed to a writer. Supported formats:
WriterDescription
SqlWriterBatched INSERT INTO ... statements
CsvWriterOne file per table
JsonLinesWriterOne JSON object per row
ParquetWriterTabular output
DbWriterDirect batched INSERT into database
CopyWriterPostgres COPY protocol (max throughput)

Determinism

When a seed is provided (--seed <int>), all generators use a ChaCha8-based deterministic PRNG. Every column gets a derived sub-seed, ensuring:
  • Same schema + seed + count = identical output
  • Parallel generation is deterministic within a run
  • No crypto/rand usage in deterministic mode

Architecture Diagram