Skip to content

Data Generation Architecture

Scope: This section covers the core data generation system - PDL schema language, generator types, N-gram models, and the portable .phony package format. For cloud platform features (sync, mock API, registry), see Cloud Platform Architecture.

Overview

Phony's data generation architecture is built on the principle of "Data Generation as Code" - treating synthetic data generation with the same rigor as infrastructure management.

┌─────────────────────────────────────────────────────────────────────────┐
│                    DATA GENERATION ARCHITECTURE                          │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│                         PDL Schema                                       │
│                     (schema.pdl.json)                                    │
│                            │                                             │
│    ┌───────────────────────┼───────────────────────┐                    │
│    ▼           ▼           ▼           ▼           ▼                    │
│ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐                      │
│ │ Logic │ │ List  │ │ Model │ │Statis-│ │Linked │                      │
│ │       │ │       │ │(N-gram│ │tical  │ │       │                      │
│ │UUIDs, │ │Codes, │ │Names, │ │Distri-│ │City+  │                      │
│ │Numbers│ │Enums  │ │Text   │ │butions│ │Country│                      │
│ └───┬───┘ └───┬───┘ └───┬───┘ └───┬───┘ └───┬───┘                      │
│     └─────────┴─────────┼─────────┴─────────┘                          │
│                         ▼                                               │
│    ┌────────────────────────────────────────┐                          │
│    │          Template Generator            │                          │
│    │    (Compose, Format, Operations)       │                          │
│    └────────────────────┬───────────────────┘                          │
│                         │                                               │
│    ┌────────────────────┼────────────────────┐                         │
│    ▼                    ▼                    ▼                          │
│ ┌─────────┐      ┌─────────────┐      ┌─────────┐                      │
│ │ Event   │      │ Cross-Table │      │ Privacy │                      │
│ │Sequence │      │ Operations  │      │ Features│                      │
│ │         │      │             │      │         │                      │
│ │Chrono-  │      │Sum, Count,  │      │Diff Priv│                      │
│ │logical  │      │Aggregates   │      │Geo-Anon │                      │
│ └────┬────┘      └──────┬──────┘      └────┬────┘                      │
│      └──────────────────┼──────────────────┘                           │
│                         ▼                                               │
│                 ┌──────────────┐                                        │
│                 │    .phony    │                                        │
│                 │   Package    │                                        │
│                 │  (Portable)  │                                        │
│                 └──────────────┘                                        │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Core Components

ComponentDescriptionDocumentation
Generator TypesSeven unified generator abstractions (Logic, List, Model, Template, Statistical, Linked, Event Sequence)Generator Types
Advanced ConceptsConsistency, linking, statistical generation, differential privacy, format-preserving transformationAdvanced Concepts
N-gram ModelsStatistical text generation using Markov chainsN-gram Models
PDL SpecificationDeclarative JSON schema for data generationPDL Specification
Expression LanguageTemplate syntax (PEL) for compositionExpression Language
Package FormatPortable, self-contained .phony packagesPackage Format
Locale SystemMulti-layer inheritance for i18nLocale System
Execution ModelOSS vs Cloud runtime differencesExecution Model

Key Principles

  1. Declarative - Define WHAT data you need, not HOW to generate it
  2. Portable - Same .phony package runs on CLI, PHP, Python, Cloud
  3. Composable - Mix N-gram models + templates + lists + logic seamlessly
  4. Deterministic - Same seed produces same output across all runtimes
  5. Consistent - Same input produces same output across tables and databases
  6. Statistically Accurate - Generated data matches real-world distributions
  7. Privacy-Preserving - Differential privacy and k-anonymity support
  8. Relationship-Aware - Linked generators ensure valid data combinations
  9. Open - OSS core with MIT license, Cloud for enterprise features

Phony Cloud Platform Specification