Apache Spark

Apache Spark is an open-source distributed computing system that provides fast and flexible data processing capabilities. It is designed to handle large-scale data processing tasks, making it a popular choice for big data applications. With Apache Spark, users can easily manipulate data using familiar programming languages like Java, Scala, and Python.

I’ve been an active Apache Spark user since early 2015 and started to get exposed to the internals of it more and more in recent years.

This tag provides valuable insights, tutorials, and best practices for using Apache Spark to its fullest potential. Whether you’re a beginner or an experienced user, you’ll find resources to help you improve your data processing capabilities and achieve better results.

Strategies for Data Quality With Apache Spark April 24, 2023

Introduction to Data Quality With Apache Spark April 20, 2023

Reverse-Engineering a Search Language June 30, 2022

Data Quality With or Without Apache Spark and Its Ecosystem May 28, 2021

Data Privacy With Apache Spark November 1, 2020

Building Our Data Science Platform With Spark and Jupyter (external) November 27, 2018

Data Lineage in Context of Interactive Analysis October 17, 2018