Serge Smertin

OSS Year 2022 in Review: Projects Launched

It’s probably clear when I took the actual vacation.

Okay, it’s this time of the year, and everyone is checking their GitHub stats. I’ll join the pack on my OSS summary for the year 2022. Here’s a short recap with my own thoughts about the four projects I’ve been driving.

Dec 31, 2022 5 minutes open-source terraform go scala labs

Tech Book Reviews: Go

Nothing like a paper book is falling onto your face, reminding you that you’re falling asleep and need to turn off the bedlight. I’ve read some books about GoLang and would like to share some of my opinions. The list goes in the reverse chronological order of me reading them. Some are good, some are great, and some are just there. Let’s Go.

Oct 7, 2022 4 minutes book-review go

Open Source: Is It the Holy Grail or a Can of Worms?

Photo by Virginia Johnson
on Unsplash — Photo by Virginia Johnson on Unsplash

Do you ever wonder if you should include a third-party library in your code or not? Sometimes it’s worth it, but mostly it’s not. Here’s a quick way to tell: If the library is doing something you don’t comprehend, or if it’s doing something you could do yourself with little effort, then don’t use it. The only exception to this rule is if the library is doing something that would be very difficult or time-consuming to do yourself. In that case, it might be worth using the library even if you don’t fully understand it.

Sep 26, 2022 8 minutes open-source architecture go python java

How Golang Generics Empower Concise APIs

Tired Gopher
(of Quasilyte) is extracting table into memory — Tired Gopher *(of Quasilyte)* is extracting table into memory

You’ve likely heard and read dozens of stories about generics in Go about ordinary slices and maps but haven’t yet thought about a fun way to apply this feature. Let’s implement the peer of pandas.read_html , which maps HTML tables into slices of structs! If it’s achievable even with Rust , why shouldn’t it be with Go?! This essay will show you a thrilling mix of reflection and generics to reach concise external APIs for your libraries.

Sep 18, 2022 9 minutes go data-extraction

Reverse-Engineering a Search Language

Simple Abstract Syntax Tree (AST) transformation

This goes about a couple architectural and organizational approaches on achieving Distributed Data Mesh, which is essentially a combination of mindset, fully automated infrastructure, continuous integration for data pipelines, dedicated departamental or team collaborative environments, and security enforcement.

Jun 30, 2022 1 minute splunk apache-spark databricks scala cybersecurity

Data Mesh With Terraform and Databricks

This goes about a couple architectural and organizational approaches on achieving Distributed Data Mesh, which is essentially a combination of mindset, fully automated infrastructure, continuous integration for data pipelines, dedicated departamental or team collaborative environments, and security enforcement.

Jun 28, 2022 1 minute databricks terraform data-mesh architecture devops

Growing a Terraform Provider to Millions of Downloads

This will be a story of building and growing a Databricks Terraform Provider over the course of two years, as well as tactics, techniques and procedures that allowed it to achieve millions of installations.

Jun 14, 2022 1 minute go databricks terraform devops

Data Quality With or Without Apache Spark and Its Ecosystem

Few solutions exist in the open-source community either in the form of libraries or complete stand-alone platforms, which can be used to assure certain data quality, especially when continuous imports happen. Organizations may consider picking up one of the available options – Apache Griffin, Deequ, DDQ, and Great Expectations.

May 28, 2021 1 minute apache-spark data-quality architecture

Using Terraform Enable Distributed Data Mesh

In this session we’ll learn about Databricks (Labs) Terraform integration and how it can automate literally every aspect required for a production-grade platform: data security, permissions, continuous deployment and so on. We’ll learn how Scribd offers their internal customers flexibility without acting as gatekeepers. Just about anything they might need in Databricks is a pull request away.

May 27, 2021 1 minute terraform databricks architecture data-mesh

Data Privacy With Apache Spark

Pseudonymization
vs Anonymization — Pseudonymization vs Anonymization

In this talk, we’ll compare different data privacy techniques & protection of personally identifiable information and their effects on statistical usefulness, re-identification risks, data schema, format preservation, read & write performance.

Nov 1, 2020 1 minute apache-spark architecture tokenization obfuscation