Data Quality With or Without Apache Spark and Its Ecosystem
Few solutions exist in the open-source community either in the form of libraries or complete stand-alone platforms, which can be used to assure certain data quality, especially when continuous imports happen. Organizations may consider picking up one of the available options – Apache Griffin, Deequ, DDQ, and Great Expectations.
In this presentation, we’ll compare these different open-source products across different dimensions, like maturity, documentation, extensibility, and features like data profiling and anomaly detection.