Reflecting on the Year 2023

Introducing the Databricks SDK ecosystem
Introducing the Databricks SDK ecosystem

As we bid adieu to 2023, I’m thrilled to share the incredible journey of open-source work and innovation that unfolded throughout the year. It’s been a year marked by challenges, triumphs, and the relentless pursuit of excellence in tech and development.

Read more about Reflecting on the Year 2023

Unlocking the Power of Databricks SDKs

In this session, learn best practices for when and how to use SDK, command-line interface, or Terraform integration to seamlessly integrate with Databricks and revolutionize how you integrate with the Databricks Lakehouse. The session covers using shell scripts to automate complex tasks and streamline operations that improve scalability.

Strategies for Data Quality With Apache Spark

Data Quality Landscape
Data Quality Landscape

In fact, many data teams are guilty of overlooking critical questions like “Are we actually monitoring the data?” after deploying multiple pipelines to production. They might celebrate the success of the first pipeline and feel confident about deploying more. Still, they need to consider the health and robustness of their ETL pipeline for long-term production use. This lack of foresight can lead to significant problems down the line and undermine trust in the data sets produced by the pipeline. In the previous post we’ve scratched the surface of how one can check data quality with Apache Spark . But the real complexity lies in the greater data quality landscape, which involves people and processes, not just the Spark clusters.

Read more about Strategies for Data Quality with Apache Spark

Introduction to Data Quality With Apache Spark

High-Quality Spark
High-Quality Spark

What really happens in the data engineering world is that the data team deploys the first pipeline through production, and everyone is happy. They deploy the second, third fifth, and tenth pipelines to production. But then, they started thinking, Hmm, are we actually monitoring the data? Is our ETL pipeline healthy and robust enough for production use for other teams to trust the data sets that are produced by this pipeline? “Data quality, requires a certain level of sophistication within the enterprise to even understand that it’s a problem.” - and this quote was from Colleen Graham in Performance Management Driving BI Spending article from 2006, but it pertains even to nowadays.

Read more about Introduction to Data Quality with Apache Spark

Fingerprinting Process Trees on Linux With Rust

This is how you could imagine fingerprinting process trees
This is how you could imagine fingerprinting process trees

Name fingerprinting is a cybersecurity forensics technique used to identify and track processes running on a computer system by using the process name or other identifiable information. This information could include the process’s file name, file path, command line arguments, and other identifying indicators of compromise .

Read more about Fingerprinting Process Trees on Linux with Rust

Exposing Azure Storage on Domain Apex With Let's Encrypt SSL

Simplified Azure CDN Let’s Encrypt flow with Terraform
.
Simplified Azure CDN Let’s Encrypt flow with Terraform .

Hello, reader; in this article, I will explain how to expose an Azure Storage Account through a top-level domain with the Let’s Encrypt SSL certificate you can get for free, almost all via Terraform .

Read more about Exposing Azure Storage on Domain Apex with Let's Encrypt SSL

What Happened if Unit-Tests Unlock Self-Healing in Go?

Gopher with a wrench fixing a test.
Gopher with a wrench fixing a test.

Driving unit test coverage is essential but very dull. We need to make it as fun as possible. And for the “shippable” OSS products, it’s vital. It differs from the SaaS world, where you roll out an emergency release for all users. Once a user downloads something and runs it in their environment — it’s done. You cannot effortlessly swap the binary artifact. And if it’s broken — it’s your fault. The best way to prevent this is decent unit-testing coverage. This time we’ll cover something boring and automatable — API calls to a predefined service.

Read more about What Happened If Unit-Tests Unlock Self-Healing in Go?

GitHub Dependabot in Action

Experience with Dependabot a repository. Screenshot processed in GIMP
Experience with Dependabot a repository. Screenshot processed in GIMP

I’ve used this awesome tool on 20 open-source projects over the last two years. Here’s my opinion.

Read more about GitHub Dependabot in Action

OSS Year 2022 in Review: Projects Launched

It’s probably clear when I took the actual vacation.
It’s probably clear when I took the actual vacation.

Okay, it’s this time of the year, and everyone is checking their GitHub stats. I’ll join the pack on my OSS summary for the year 2022. Here’s a short recap with my own thoughts about the four projects I’ve been driving.

Read more about OSS Year 2022 in Review: Projects Launched

Tech Book Reviews: Go

Paper stack
Paper stack

Nothing like a paper book is falling onto your face, reminding you that you’re falling asleep and need to turn off the bedlight. I’ve read some books about GoLang and would like to share some of my opinions. The list goes in the reverse chronological order of me reading them. Some are good, some are great, and some are just there. Let’s Go.

Read more about Tech Book Reviews: Go