Dive into a collaborative space where members like YOU can exchange knowledge, tips, and best practices. Join the conversation today and unlock a wealth of collective wisdom to enhance your experience and drive success.
Why Databricks is the Future of Data Analytics for Gen ZIn the fast-paced world of data analytics, staying ahead of the curve is crucial. For Gen Z, who are digital natives and always on the lookout for the latest tech trends, understanding the diffe...
Lakehouse Federation - Databricks In the world of data, innovation is constant. And the most recent revolution comes with Lakehouse Federation, a fusion between data lakes and data warehouses, taking data manipulation to a new level. This advancement...
Hey Quick Question, Can we use it for the production version ? We have application server as SQL server, we are planning to use lakehouse federation so we can bypass creating and maintaining 100 of workflows. as we a small dataset I am not too sure o...
Excited to share my latest publication on arXiv!“Hub Star Modeling 2.0 for Medallion Architecture” https://arxiv.org/abs/2504.08788This new version builds on the original Hub Star Modeling approach, published last year, and now tailored for the Meda...
When I first got into managing schemas in Databricks, it took me a while to realize that putting in a little planning up front could save me a ton of headaches later on.I was working with these deeply nested, constantly changing JSON files. At first,...
Great tip @genevive_mdonça! schemaHints help avoid issues with evolving JSON data, making data processing more reliable and easier to maintain. Thanks for sharing.
In Spark, data skew can be the silent killer of performance. One wide partition pulling in 90% of the data?But even with AQE (Adaptive Query Execution) turned on in Databricks, skewness isn't always automatically identified— and here’s why.What Is co...
One of the solution of [FAILED_READ_FILE.NO_HINT] Error while reading file, when display() or SELECTI got stuck with the above error when using `spark.read.table().display()` or directly query the table using %sql.While the display method is just one...
Hi everyone!I’ve just released an open-source tool that generates a semantic layer in Databricks notebooks from a Power BI dataset using the Power BI REST API. Im not an expert yet, but it gets job done and instead of using AtScale/dbt/or the PBI Sem...
Here is how to trained a lightweight Convolutional Neuronal Network (CNN) to detect pneumonia from chest X-rays pictures on Azure Databricks. I promise no LLMs, no hype, just real-world deep learning:1. Built it with TensorFlow & Keras on Databricks2...
Hey Databricks community,I wanted to take a moment to share some things I’ve learned while working with Databricks in real projects—especially around schema management, Unity Catalog, Autoloader, and streaming jobs. These are the kinds of small detai...
Hi All, I have data which looks like this High Corona40% 50cl Pm £13.29 but when saving it as a csv it is getting converted into High Corona40% 50cl Pm £13.29 . wherever we have the euro sign . I thing to note here is while displaying the data i...
Hey folks,Ever notice how a query that used to run super fast suddenly starts dragging? We’ve all been there. As data grows, those little inefficiencies in your SQL start showing up — and they show up hard. That’s where something cool comes in: using...
In today’s data-driven world, the success of any business use case relies heavily on trust in the data. This trust is built upon key pillars such as data accuracy, consistency, freshness, and overall quality. When organizations release data into prod...
Data Engineering has come a long way. From the days of manual ETL scripts to the modern world of automated, AI-driven data pipelines, the evolution has been nothing short of fascinating. As a data engineer working across various platforms, I’ve seen ...
Managing complex, embedded workflows efficiently is a key challenge for enterprise architects. As organizations scale their data ecosystems, optimizing resource allocation becomes crucial. Databricks Cluster Pools offer a strategic solution to minimi...
We successfully migrated a client’s MySQL databases to DB using a dual-approach that maintained 100% data integrity while enabling real-time analytics.After struggling with batch-based updates and analytics delays, we implemented:- One-time historica...