Building Databricks Integrations With Go

Applied AI is not that scary and it’s even possible in Go
Applied AI is not that scary and it’s even possible in Go

This talk highlights the transformative impact of the Databricks SDK for Go, enhancing development through seamless deployment, simplified packaging, and enriched user experiences. We will focus on two key examples: the Databricks CLI and Databricks Labs Watchdog, demonstrating the SDK’s potential.

Read more about Building Databricks integrations with Go

Building Robust Python Applications on Top of Databricks

Inversion of Control
 in Python is not that difficult
Inversion of Control in Python is not that difficult

Discover how to build robust Python applications on Databricks by leveraging lessons from Terraform, UCX, and the Python SDK. Learn to utilize Databricks Labs UCX to identify incompatible code with Unity Catalog , automate fixes, and migrate extensive Hive Metastore Tables, maintaining permissions and updating cluster settings.

Read more about Building robust Python applications on top of Databricks

Reflecting on the Year 2023

Introducing the Databricks SDK ecosystem
Introducing the Databricks SDK ecosystem

As we bid adieu to 2023, I’m thrilled to share the incredible journey of open-source work and innovation that unfolded throughout the year. It’s been a year marked by challenges, triumphs, and the relentless pursuit of excellence in tech and development.

Read more about Reflecting on the Year 2023

Unlocking the Power of Databricks SDKs

Streamlined SDK compiler infrastructure
Streamlined SDK compiler infrastructure

In this session, learn best practices for when and how to use SDK, command-line interface, or Terraform integration to seamlessly integrate with Databricks and revolutionize how you integrate with the Databricks Lakehouse. The session covers using shell scripts to automate complex tasks and streamline operations that improve scalability.

Strategies for Data Quality With Apache Spark

Data Quality Landscape
Data Quality Landscape

In fact, many data teams are guilty of overlooking critical questions like “Are we actually monitoring the data?” after deploying multiple pipelines to production. They might celebrate the success of the first pipeline and feel confident about deploying more. Still, they need to consider the health and robustness of their ETL pipeline for long-term production use. This lack of foresight can lead to significant problems down the line and undermine trust in the data sets produced by the pipeline. In the previous post we’ve scratched the surface of how one can check data quality with Apache Spark . But the real complexity lies in the greater data quality landscape, which involves people and processes, not just the Spark clusters.

Read more about Strategies for Data Quality with Apache Spark

Introduction to Data Quality With Apache Spark

High-Quality Spark
High-Quality Spark

What really happens in the data engineering world is that the data team deploys the first pipeline through production, and everyone is happy. They deploy the second, third fifth, and tenth pipelines to production. But then, they started thinking, Hmm, are we actually monitoring the data? Is our ETL pipeline healthy and robust enough for production use for other teams to trust the data sets that are produced by this pipeline? “Data quality, requires a certain level of sophistication within the enterprise to even understand that it’s a problem.” - and this quote was from Colleen Graham in Performance Management Driving BI Spending article from 2006, but it pertains even to nowadays.

Read more about Introduction to Data Quality with Apache Spark

Fingerprinting Process Trees on Linux With Rust

This is how you could imagine fingerprinting process trees
This is how you could imagine fingerprinting process trees

Name fingerprinting is a cybersecurity forensics technique used to identify and track processes running on a computer system by using the process name or other identifiable information. This information could include the process’s file name, file path, command line arguments, and other identifying indicators of compromise .

Read more about Fingerprinting Process Trees on Linux with Rust

Exposing Azure Storage on Domain Apex With Let's Encrypt SSL

Simplified Azure CDN Let’s Encrypt flow with Terraform
.
Simplified Azure CDN Let’s Encrypt flow with Terraform .

Hello, reader; in this article, I will explain how to expose an Azure Storage Account through a top-level domain with the Let’s Encrypt SSL certificate you can get for free, almost all via Terraform .

Read more about Exposing Azure Storage on Domain Apex with Let's Encrypt SSL

What Happened if Unit-Tests Unlock Self-Healing in Go?

Gopher with a wrench fixing a test.
Gopher with a wrench fixing a test.

Driving unit test coverage is essential but very dull. We need to make it as fun as possible. And for the “shippable” OSS products, it’s vital. It differs from the SaaS world, where you roll out an emergency release for all users. Once a user downloads something and runs it in their environment — it’s done. You cannot effortlessly swap the binary artifact. And if it’s broken — it’s your fault. The best way to prevent this is decent unit-testing coverage. This time we’ll cover something boring and automatable — API calls to a predefined service.

Read more about What Happened If Unit-Tests Unlock Self-Healing in Go?

GitHub Dependabot in Action

Experience with Dependabot a repository. Screenshot processed in GIMP
Experience with Dependabot a repository. Screenshot processed in GIMP

I’ve used this awesome tool on 20 open-source projects over the last two years. Here’s my opinion.

Read more about GitHub Dependabot in Action