SchemaFlux

Structured data compiler for AI systems.

AI pipelines run on data, and most of that data is messy. SchemaFlux takes heterogeneous inputs, validates them against schemas, and outputs clean structured records your models and evals can actually use.

go get github.com/greynewell/schemaflux

Data in, structure out

Fine-tuning datasets are full of bad labels, truncated inputs, and format errors. RAG pipelines ingest documents with no schema enforcement. SchemaFlux catches these problems at the boundary — before they reach your model or corrupt your training run.

SchemaFlux is the data layer of the MIST Stack, which also includes MatchSpec for evaluation, InferMux for inference routing, and TokenTrace for observability. All four packages are written in Go with zero external dependencies and communicate over a shared message protocol. The stack follows the principles of Eval-Driven Development.