Synthetic Data Generator

GenAI-powered synthetic dataset and document generation with local LLMs

Synthetic Data Generator screenshot

Full-featured synthetic data generation using local LLMs (Ollama, llama.cpp) via OpenAI-compatible API. Generates structured tabular data (CSV, Excel) with statistical distributions, temporal consistency, and geographic validation. Creates multi-format documents (Word, PDF, Text, Markdown) with domain constraints and correlation preservation. Includes web UI (Gradio), CLI for batch processing, and Docker containerization.

Tech Stack

Python Gradio LangChain Local LLMs Pydantic Docker asyncio