Data Privacy With Apache Spark

November 1, 2020
apache-spark - read more about Content about Apache Spark, an open-source distributed computing system that provides fast and flexible data processing capabilities.architecture - read more about Solutions architecture is the process of designing and developing scalable, reliable, and cost-effective software solutions.tokenization - read more about The tokenization technique used to protect sensitive data by replacing it with a token, which is a unique identifier that does not reveal any sensitive data.obfuscation - read more about

Pseudonymization
vs Anonymization — Pseudonymization vs Anonymization

In this talk, we’ll compare different data privacy techniques & protection of personally identifiable information and their effects on statistical usefulness, re-identification risks, data schema, format preservation, read & write performance.

We’ll cover different offense and defense techniques. You’ll learn what k-anonymity and quasi-identifier are. Think of discovering the world of suppression, perturbation, obfuscation, encryption, tokenization, and watermarking with elementary code examples, in case no third-party products cannot be used. We’ll see what approaches might be adopted to minimize the risks of data exfiltration.

Some of the abovementioned techniques are barely an inconvenience to implement, but difficult to support in the long run. We’ll show on which occasions Databricks Delta can help to make your datasets privacy-ready.

watch Data Privacy With Apache Spark on YouTube.

Data Privacy With Apache Spark

See Also