OSS Year 2022 in Review: Projects Launched

It’s probably clear when I took the actual vacation.
It’s probably clear when I took the actual vacation.

Okay, it’s this time of the year, and everyone is checking their GitHub stats. I’ll join the pack on my OSS summary for the year 2022. Here’s a short recap with my own thoughts about the four projects I’ve been driving.

Read more about OSS Year 2022 in Review: Projects Launched

Tech Book Reviews: Go

Paper stack
Paper stack

Nothing like a paper book is falling onto your face, reminding you that you’re falling asleep and need to turn off the bedlight. I’ve read some books about GoLang and would like to share some of my opinions. The list goes in the reverse chronological order of me reading them. Some are good, some are great, and some are just there. Let’s Go.

Read more about Tech Book Reviews: Go

Open Source: Is It the Holy Grail or a Can of Worms?

Photo by Virginia Johnson
 on Unsplash
Photo by Virginia Johnson on Unsplash

Do you ever wonder if you should include a third-party library in your code or not? Sometimes it’s worth it, but mostly it’s not. Here’s a quick way to tell: If the library is doing something you don’t comprehend, or if it’s doing something you could do yourself with little effort, then don’t use it. The only exception to this rule is if the library is doing something that would be very difficult or time-consuming to do yourself. In that case, it might be worth using the library even if you don’t fully understand it.

Read more about Open Source: Is It the Holy Grail or a Can of Worms?

How Golang Generics Empower Concise APIs

Tired Gopher
 (of Quasilyte) is extracting table into memory
Tired Gopher (of Quasilyte) is extracting table into memory

You’ve likely heard and read dozens of stories about generics in Go about ordinary slices and maps but haven’t yet thought about a fun way to apply this feature. Let’s implement the peer of pandas.read_html , which maps HTML tables into slices of structs! If it’s achievable even with Rust , why shouldn’t it be with Go?! This essay will show you a thrilling mix of reflection and generics to reach concise external APIs for your libraries.

Read more about How Golang Generics Empower Concise APIs

Reverse-Engineering a Search Language

Simple Abstract Syntax Tree (AST) transformation
Simple Abstract Syntax Tree (AST) transformation

This goes about a couple architectural and organizational approaches on achieving Distributed Data Mesh, which is essentially a combination of mindset, fully automated infrastructure, continuous integration for data pipelines, dedicated departamental or team collaborative environments, and security enforcement.

Read more about Reverse-Engineering a Search Language

Data Mesh With Terraform and Databricks

Experimental Terraform Exporter
Experimental Terraform Exporter

This goes about a couple architectural and organizational approaches on achieving Distributed Data Mesh, which is essentially a combination of mindset, fully automated infrastructure, continuous integration for data pipelines, dedicated departamental or team collaborative environments, and security enforcement.

Read more about Data Mesh with Terraform and Databricks

Growing a Terraform Provider to Millions of Downloads

Error recovery in Terraform Resources

This will be a story of building and growing a Databricks Terraform Provider over the course of two years, as well as tactics, techniques and procedures that allowed it to achieve millions of installations.

Read more about Growing a Terraform provider to millions of downloads

Data Quality With or Without Apache Spark and Its Ecosystem

Data Quality Pillars
Data Quality Pillars

Few solutions exist in the open-source community either in the form of libraries or complete stand-alone platforms, which can be used to assure certain data quality, especially when continuous imports happen. Organizations may consider picking up one of the available options – Apache Griffin, Deequ, DDQ, and Great Expectations.

Read more about Data Quality with or without Apache Spark and its ecosystem

Using Terraform Enable Distributed Data Mesh

In this session we’ll learn about Databricks (Labs) Terraform integration and how it can automate literally every aspect required for a production-grade platform: data security, permissions, continuous deployment and so on. We’ll learn how Scribd offers their internal customers flexibility without acting as gatekeepers. Just about anything they might need in Databricks is a pull request away.

Data Privacy With Apache Spark

Pseudonymization
 vs Anonymization
Pseudonymization vs Anonymization

In this talk, we’ll compare different data privacy techniques & protection of personally identifiable information and their effects on statistical usefulness, re-identification risks, data schema, format preservation, read & write performance.

Read more about Data Privacy with Apache Spark