Building and managing data pipelines can be complex and time-consuming, particularly when dealing with evolving datasets, frequent updates, and ensuring reliability at scale. Traditional methods often require extensive manual effort to maintain - such as MERGE statements, keeping track of which data was loaded last. This is where declarative programming comes into play. By allowing developers to define what they want instead of how to achieve it, declarative pipelines simplify the pipeline creation and management process.
It’s finally here! The long-awaited feature of for loops is available in Databricks Workflows. This allows you to neatly create multiple tasks based on a list of input values. I finally had time to check it out and have a go at it myself, so let’s see it in action!
Getting certified in tech is a hot topic these days. All major cloud providers offer various tracks and certifications (AWS, Azure, GCP), but new courses with accompanying certification exams for various other topics are popping up left, right, and center - just check out Udemy or Coursera. They’re often touted as essential for practical skills, but are they relevant in real-world scenarios? So before you go maxing out your credit card on every certification under the sun, let’s go over the true value of getting bonafide certified.
Several popular engineering and analytics frameworks (e.g., Delta Lake - and by extension Databricks, and Azure Synapse Analytics) use the same file format under the hood - Apache Parquet. It makes sense, as Parquet is an efficient file format suitable for large-scale analytical queries - exactly what the likes of Databricks are meant to do. However, this does not explain from a technical perspective why this makes parquet the de facto choice for these frameworks. Time to figure out what parquet is, how it works, and tie that to when parquet shines ☀️
Databricks CTO Matei Zahari discusses generative AI
NextGenLakehouse have a great newsletter on Substack and their own YouTube channel. They recently had Databricks CTO Matei Zaharia on to discuss the Databricks platform and how Generative AI will make all of our lives that much easier 🙂