Data Lineage in Context of Interactive Analysis

This presentation focuses on tracking data lineage for interactive data exploration through notebooks. A set of techniques is shown to demonstrate how to audit data journey from code entered in notebook down to levels of execution planning, DataFrames, RDDs and Hadoop’s file formats back to visualizations displayed to data analyst. Custom tool has been created using Java Instrumentation API, that allowed us to add extra security to certain parts of Spark driver’s JVM runtime environment.

watch Data Lineage in Context of Interactive Analysis on YouTube.

See Also