Data Lineage in Context of Interactive Analysis
This presentation focuses on tracking data lineage for interactive data exploration through notebooks. A set of techniques is shown to demonstrate how to audit data journey from code entered in notebook down to levels of execution planning, DataFrames, RDDs and Hadoop’s file formats back to visualizations displayed to data analyst. Custom tool has been created using Java Instrumentation API, that allowed us to add extra security to certain parts of Spark driver’s JVM runtime environment.
Read more about Data Lineage in Context of Interactive Analysis