Dive into a collaborative space where members like YOU can exchange knowledge, tips, and best practices. Join the conversation today and unlock a wealth of collective wisdom to enhance your experience and drive success.
Introduction to LakeflowAt the Databricks Data + AI Summit 2025, Databricks unveiled Lakeflow, a revolutionary approach to data engineering. While many of us have used Delta Live Tables (DLT) for declarative pipeline management, Lakeflow goes beyond,...
Databricks LLM Evolution and Future ProspectsDatabricks has progressed from a big-data compute engine to a full-stack AI powerhouse that designs, trains, and serves state‐of‐the-art large language models (LLMs). This article explores two key technica...
Thanks, @RiyazAliM, for checking out the blog post!More insights on Databricks LLM and Dolly are on the way in the next one. Stay tuned and keep learning!Best,Ayush
As organizations increasingly migrate from legacy platforms—like on-prem SQL Server, Oracle Exadata, Teradata, Informatica, Cloudera, or Netezza—to modern cloud architectures, one critical question often arises:"Are we just lifting and shifting the s...
Enforce schema consistency using declarative contracts on Databricks Lakehouse.Industrial AI is transforming how operations are optimized, from forecasting equipment failure to streamlining supply chains. But even the most advanced models are only as...
Hello.Our databricks is on Azure. We are trying to connect with AWS S3 as an external source from Unity Catalog.We have followed all steps given here, is there anything additional required?https://docs.databricks.com/aws/en/connect/unity-catalog/clou...
Hi @gdschld ,what ID have you used here:"sts:ExternalId": "<STORAGE-CREDENTIAL-EXTERNAL-ID>"I haven't done this for some time and got a bit confused with this STORAGE-CREDENTIAL-EXTERNAL_ID. I used to put there Databricks Account ID.I found this, it ...
Unity Catalog system tables provide lots of metadata & log data related to the operations of Databricks. System tables are organized into separate schemas containing one to a few tables owned and updated by Databricks. The storage and the cost of the...
It's in the Databricks CLI Unity Catalog section Databricks CLI commands | Databricks DocumentationmetastoresCommands to manage metastores, which are the top-level container of objects in Unity Catalog:assign, create, current, delete, get, list, summ...
How AI-powered development accelerated my data engineering workflow Watch the Complete Development Process YouTube Video: See the entire 30-minute development sessionThis is a screen recording without voice narration showing the complete development ...
I use the databricks extension in vs code for all my work. Is there any way for me to add a cell title from the extension itself?. There is no point in adding in the server version of this notebook cause when I sync the local to sever, it will overwr...
One needs to use # DBTITLE 1,cell_title in a py file # COMMAND ----------
# DBTITLE 1,Title 1
from pyspark.sql import SparkSession
from delta.tables import DeltaTable
from pyspark.sql.functions import *
The Databricks SDK is a script (written in Python, in our case) which lets you control and automate actions on Databricks using the methods available in the WorkSpaceClient (more about this below).Why do we need Databricks SDK:- Automation: You can d...
Missed the Apache Spark 4.0 release? It is not just a version bump, it is a whole new level for big data processing. Some of the highlights that really stood out to me:1. SQL just got way more powerful: reusable UDFs, scripting, session variables, an...
Yeah, Spark 4.0 brings powerful enhancements while staying compatible with existing workloads.Thank you for putting this together and highlighting the key updates, @ilir_nuredini.
Just got out of a session on Data Modeling using the Data Vault paradigm. Highly recommended to help think through complex data design. Look out for Data Modeling 101 for Data Lakehouse Demystified by Luan Medeiros.
Why Should You Use Databricks Asset Bundles (DABs)?Without proper tooling, Data Engineering and Machine Learning projects can quickly become messy.That is why we recommend leveraging DABs to solve these common challenges:1. Collaboration:Without stru...
In today’s data-driven world, the role of a data engineer is critical in designing and maintaining the infrastructure that allows for the efficient collection, storage, and analysis of large volumes of data. Databricks certifications holds significan...
When I first started handling schema management in Databricks, I realized that a little bit of planning could save me a lot of headaches down the road. Here’s what I’ve learned and some simple tips that helped me manage schema changes effectively. On...
Question:I have a role called dev-dataengineer with the following privileges on the catalog dap_catalog_dev:APPLY TAGCREATE FUNCTIONCREATE MATERIALIZED VIEWCREATE TABLECREATE VOLUMEEXECUTEREAD VOLUMEREFRESHSELECTUSE SCHEMAWRITE VOLUMEDespite this, u...
Managing assets in UC is always a overhead maintenance. We have this access controls in terraform codes and it is always hard to see what level of access is given to different personas in the org. We are building an audit dashboard for it.