Dive into a collaborative space where members like YOU can exchange knowledge, tips, and best practices. Join the conversation today and unlock a wealth of collective wisdom to enhance your experience and drive success.
With managed tables, you can reduce your storage and compute costs thanks to predictive optimization or file list caching. Now it is really time to migrate external tables to managed ones, thanks to ALTER SET MANAGED functionality.
Read more:
- h...
Are you familiar with this scenario: Your data team spends 80% of their time fixing infrastructure issues instead of extracting insights.In today’s data-driven world, organisations are drowning in data but starving for actionable insights. Traditiona...
Hi everyone,I’m leading an implementation where we’re comparing events from two real-time streams — a Source and a Target — in Databricks Structured Streaming (Scala).Our goal is to identify and emit “delta” differences between corresponding records ...
I have a notebook with a text widget where I want to be able to edit the value of the widget within the notebook and then reference it in SQL code. For example, assuming there is a text widget named Var1 that has input value "Hello", I would want to ...
It seems that only way to use parameters in sql code block is to use dbutils.widget and you cannot change those parameters without removing widget and setting it up again in code
Just the policy_id can specify the entire cluster configuration. Yes, we can inherit default and fixed values from policies. Updating runtime version for 100s of jobs, for example, is much easier this way.
Read more:
- https://databrickster.med...
Now you can define relations also directly in Genie. It includes options like “Many to One”, “One to Many”, “One to One”, “Many to Many”.
Read more:
- https://databrickster.medium.com/relationship-in-databricks-genie-f8bf59a9b578
- https://www.su...
Real-Time Mode in Spark StreamingApache Spark™ Structured Streaming has been the backbone of mission-critical pipelines for years — from ETL to near real-time analytics and machine learning.Now, Databricks has introduced something game-changing: Real...
Over the years working as a data engineer, I’ve started to see my role very differently. In the beginning, most of my focus was on building pipelines—extracting, transforming, and loading data so it could land in the right place. Pipelines were the g...
@Brahmareddy thanks for this! .Think you've nailed it on the head there. If the stakeholders trust the data and there's integrity, governance, and a single source of truth, you've got a recipe for a great product! Love this take @Brahmareddy . Really...
Hey everybody, I've been dying to share this with the community. Over the last few weeks, I've been thinking about how I can do a Data Pull from the Community to highlight some of the cool stuff we all do! 拾. Below is a snippet of visual from the Dat...
I'll aim to have the data for the challenge sorted and ready for next week! . I want to strip out some of the columns and figure out where is best to host the data . Potentially I could have it on the Databricks Marketplace or Github.All the best,BS
IntroductionOne of our ETL pipelines used to take 10 hours to complete. After tuning and scaling in Databricks, it finished in just about 1 hour — a 90% reduction in runtime.That’s the power of Spark tuning.Databricks, built on Apache Spark, is a po...
This is a fantastic breakdown of Spark optimization techniques, @savlahanish27!Definitely helpful for anyone working on performance tuning in Databricks.
Hey Community!We have built something cool for Data Engineers in Databricks!Raw Files -> Semantic Model -> Data Products without writing ETL/ELT Code.Demo/ Guide - https://youtu.be/wjQYXrBwA-oNotebook - https://github.com/Intugle/data-tools/blob/main...
Over the past few years working as a data engineer, I’ve seen how quickly companies are moving their platforms to Databricks and AWS. The flexibility and scale these platforms provide are amazing, but one challenge always comes up again and again: ho...
Hi @Brahmareddy very good insights , I can summarize this as follows: Area Best Practice ExampleSchema ManagementDefine schemas in JSON/YAML, enforce with Delta LakeGovernanceUse Unity Catalog for access, lineage, and ownershipMonitoringSet up Lakeho...
Many enterprises today launch AI agents with high hopes but more often than not, those pilots never reach production. The culprit? Complexity, poor evaluations, ballooning costs, and governance gaps.Why do so many AI agent pilots never make it to pro...
To see all Databricks training and enablement offerings, please visit our Learning Library and Certifications Catalog.
To use your Databricks Academy Labs coupons, please -
Head to Databricks Academy Across the top navigation, select Subscriptions C...
In today’s fast-evolving digital landscape, organizations are under immense pressure to modernize their data infrastructure for better scalability, agility, and advanced analytics. One of the most powerful shifts in recent times has been the migratio...