Serge Smertin

Building Databricks Integrations With Go

Applied AI is not that scary and it’s even possible in Go

This talk highlights the transformative impact of the Databricks SDK for Go, enhancing development through seamless deployment, simplified packaging, and enriched user experiences. We will focus on two key examples: the Databricks CLI and Databricks Labs Watchdog, demonstrating the SDK’s potential.

Jun 13, 2024 1 minute databricks open source go llm

Building Robust Python Applications on Top of Databricks

Inversion of Control
in Python is not that difficult — Inversion of Control in Python is not that difficult

Discover how to build robust Python applications on Databricks by leveraging lessons from Terraform, UCX, and the Python SDK. Learn to utilize Databricks Labs UCX to identify incompatible code with Unity Catalog , automate fixes, and migrate extensive Hive Metastore Tables, maintaining permissions and updating cluster settings.

Jun 12, 2024 1 minute databricks product launch open source python code linting

Reflecting on the Year 2023

Introducing the Databricks SDK ecosystem

As we bid adieu to 2023, I’m thrilled to share the incredible journey of open-source work and innovation that unfolded throughout the year. It’s been a year marked by challenges, triumphs, and the relentless pursuit of excellence in tech and development.

Dec 31, 2023 7 minutes open-source go python java labs

Unlocking the Power of Databricks SDKs

In this session, learn best practices for when and how to use SDK, command-line interface, or Terraform integration to seamlessly integrate with Databricks and revolutionize how you integrate with the Databricks Lakehouse. The session covers using shell scripts to automate complex tasks and streamline operations that improve scalability.

Jun 26, 2023 1 minute databricks product launch open source go python java code generation compiler

Strategies for Data Quality With Apache Spark

In fact, many data teams are guilty of overlooking critical questions like “Are we actually monitoring the data?” after deploying multiple pipelines to production. They might celebrate the success of the first pipeline and feel confident about deploying more. Still, they need to consider the health and robustness of their ETL pipeline for long-term production use. This lack of foresight can lead to significant problems down the line and undermine trust in the data sets produced by the pipeline. In the previous post we’ve scratched the surface of how one can check data quality with Apache Spark . But the real complexity lies in the greater data quality landscape, which involves people and processes, not just the Spark clusters.

Apr 24, 2023 10 minutes apache-spark data-quality

Introduction to Data Quality With Apache Spark

What really happens in the data engineering world is that the data team deploys the first pipeline through production, and everyone is happy. They deploy the second, third fifth, and tenth pipelines to production. But then, they started thinking, Hmm, are we actually monitoring the data? Is our ETL pipeline healthy and robust enough for production use for other teams to trust the data sets that are produced by this pipeline? “Data quality, requires a certain level of sophistication within the enterprise to even understand that it’s a problem.” - and this quote was from Colleen Graham in Performance Management Driving BI Spending article from 2006, but it pertains even to nowadays.

Apr 20, 2023 8 minutes apache-spark data-quality architecture

Fingerprinting Process Trees on Linux With Rust

This is how you could imagine fingerprinting process trees

Name fingerprinting is a cybersecurity forensics technique used to identify and track processes running on a computer system by using the process name or other identifiable information. This information could include the process’s file name, file path, command line arguments, and other identifying indicators of compromise .

Apr 16, 2023 9 minutes cybersecurity rust open-source

Exposing Azure Storage on Domain Apex With Let's Encrypt SSL

Simplified Azure CDN Let’s Encrypt flow with Terraform
. — Simplified Azure CDN Let’s Encrypt flow with Terraform .

Hello, reader; in this article, I will explain how to expose an Azure Storage Account through a top-level domain with the Let’s Encrypt SSL certificate you can get for free, almost all via Terraform .

Feb 23, 2023 9 minutes azure terraform dns letsencrypt architecture

What Happened if Unit-Tests Unlock Self-Healing in Go?

Driving unit test coverage is essential but very dull. We need to make it as fun as possible. And for the “shippable” OSS products, it’s vital. It differs from the SaaS world, where you roll out an emergency release for all users. Once a user downloads something and runs it in their environment — it’s done. You cannot effortlessly swap the binary artifact. And if it’s broken — it’s your fault. The best way to prevent this is decent unit-testing coverage. This time we’ll cover something boring and automatable — API calls to a predefined service.

Feb 23, 2023 7 minutes unit-testing go open-source