From Pipeline Sprawl to Table-Based Data Flows: ETL with Pub/Sub for Tables
Pub/Sub for Tables is neither a replacement for messaging systems like Kafka, nor a substitute for OLTP or OLAP databases.
It's a declarative approach to data flows, applying publish /subscribe semantics to tables rather than messages or events.
Why tables?
Because nearly all data used in analytics, whether it originates from APIs, logs, streams, or files—eventually becomes tabular. Starting from tables, rather than converting to them later, allows metadata to stay intact throughout the lifecycle. This means semantics can be preserved end-to-end, enabling better trust, alignment, and clarity across the organization.
This is the foundation we are building at Tabsdata.
Tabsdata is an open-core system for managing data flows around immutable, versioned tables. It provides:
* Declarative data publishing, transformation, and subscription
* Full lineage and provenance for every table and update
* Free-form Python transformations in a controlled, reproducible environment
* A Rust-based core for safety and performance
* Python bindings for developers
* Both a browser-based UI and a comprehensive scriptable CLI
Here is a short overview and demo we shared recently:
https://www.youtube.com/watch?v=qCZIRC9khmA
More details on the project:
https://tabsdata.com
We are still early and would deeply value your feedback, pushback, or suggestions. If you have dealt with pipeline sprawl, loss of trust in data lineage, or the burden of orchestration, please share your thoughts.
What resonates? What does not? What would make something like this more useful in your world?