Data Engineering

Forum Posts

Sorted by:

by DM0341 • Visitor

4 hours ago

27 Views
1 replies
0 kudos

SQL Stored Procedures - Notebook to always run the CREATE query

I have a stored procedure that is saved as a query file. I can run it and the proc is created. However I want to take this one step further. I want my notebook to run the query file called sp_Remit.sql so if there is any changes to the proc between t...

Data Engineering

27 Views
1 replies
0 kudos

4 hours ago

View Replies

Latest Reply

mynameiskevin
Visitor

an hour ago

0 kudos

Something like this?import os query_name = "test_query.sql" query_path = os.path.abspath(query_name) # Read query contents with open(query_path, "r") as f: query_str = f.read() # Run it spark.sql(query_str)You can read the script from the sql ...

0 kudos

an hour ago

by fundat • New Contributor II

2 hours ago

22 Views
1 replies
0 kudos

Course - Introduction to Apache Spark

Hi,In the course Introduction to Apache Spark; according to Apache Spark Runtime Architecture; Page 6 of 15. It says that :The cluster manager allocates resources and assigns tasks......Workers perform tasks assigned by the driverCan you help me plea...

Data Engineering

22 Views
1 replies
0 kudos

2 hours ago

View Replies

Latest Reply

BS_THE_ANALYST
Esteemed Contributor II

2 hours ago

0 kudos

Hi @fundat Perhaps the picture is useful here:Give this blog a read, I think this will answer some of your questions: https://medium.com/@knoldus/understanding-the-working-of-spark-driver-and-executor-4fec0e669399 .All the best,BS

0 kudos

2 hours ago

by dhruvs2 • Visitor

5 hours ago

26 Views
1 replies
1 kudos

How to trigger a Databricks job only after multiple other jobs have completed

We have a use case where Job C should start only after both Job A and Job B have successfully completed.In Airflow, we achieve this using an ExternalTaskSensor to set dependencies across different DAGs.Is there a way to configure something similar in...

Data Engineering

26 Views
1 replies
1 kudos

5 hours ago

View Replies

Latest Reply

BS_THE_ANALYST
Esteemed Contributor II

2 hours ago

1 kudos

Hey @dhruvs2 You could use Lakeflow Jobs for this. You can add a job as a task:Then you can just follow the docs from here: https://docs.databricks.com/aws/en/jobs/ there's loads of great sections / tutorials.To answer your specific question:When con...

1 kudos

2 hours ago

by crami • New Contributor

2 hours ago

14 Views
0 replies
0 kudos

Declative Pipeline: Can pipeline or job be deployed run_as using asset bundle

Hi, I have very interesting scenario. I am trying to use Declarative pipelines for first time. The platform team has made workspace artefacts as devops based deployment [infra as code], meaning, I cannot create compute. I have to create compute with ...

Data Engineering

14 Views
0 replies
0 kudos

2 hours ago

by Sakthi0311 • Visitor

8 hours ago

36 Views
2 replies
0 kudos

How to enable Liquid Clustering on an existing Delta Live Table (DLT) and syntax for enabling it

Hi all,I’m working with Delta Live Tables (DLT) and want to enable Liquid Clustering on an existing DLT table that was already created without it.Could someone please clarify:How can I enable Liquid Clustering on an existing DLT table (without recre...

Data Engineering

36 Views
2 replies
0 kudos

8 hours ago

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

7 hours ago

0 kudos

Hi @Sakthi0311 ,For SQL language you can enable LC for materialized views and streaming tables. So the syntax looks following:If you want to use automatic clustering then use CLUSTER BY AUTO.

0 kudos

7 hours ago

1 More Replies

by excavator-matt • New Contributor III

3 weeks ago

439 Views
6 replies
2 kudos

Resolved! How do use Databricks Lakeflow Declarative Pipeline on AWS DMS data?

Hi!I am trying to replicate an AWS RDS PostgreSQL database in Databricks. I have successfully manage to enable CDC using AWS DMS that writes an initial load file and continuous CDC files in parquet.I have been trying to follow the official guide Repl...

Data Engineering

AUTO CDC

AWS DMS

declarative pipelines

LakeFlow

439 Views
6 replies
2 kudos

3 weeks ago

View Replies

Latest Reply

mmayorga
Databricks Employee

7 hours ago

2 kudos

hey @excavator-matt Let's remember that the Bronze layer is for mere raw ingestion; this provides a baseline for auditing and to start applying transformations based on the different use cases you need to serve. Systems and their requirements change...

2 kudos

7 hours ago

5 More Replies

by Adam_Borlase • New Contributor III

10 hours ago

37 Views
4 replies
3 kudos

Resolved! Quota Limit Exhausted Error when Creating Data Ingestion with SQL Server Connector (Azure)

Good Day all,I am having an issue with our first Data Ingestion Pipelines, I am wanting to connect to our Azure SQL Server with our Unity Connector (I can access the data in Unity Catalog). When I am on Step 3 of the process (Source) when it is scann...

Data Engineering

37 Views
4 replies
3 kudos

10 hours ago

View Replies

Latest Reply

Adam_Borlase
New Contributor III

8 hours ago

3 kudos

you for all of your assistance!

3 kudos

8 hours ago

3 More Replies

by ghofigjong • New Contributor

02-27-2023 12:29:55 AM

11120 Views
5 replies
3 kudos

Resolved! How does partition pruning work on a merge into statement?

I have a delta table that is partitioned by Year, Date and month. I'm trying to merge data to this on all three partition columns + an extra column (an ID). My merge statement is below:MERGE INTO delta.<path of delta table> oldData using df newData ...

Data Engineering

11120 Views
5 replies
3 kudos

02-27-2023 12:29:55 AM

View Replies

Latest Reply

Umesh_S
New Contributor II

03-30-2023 1:24:57 PM

3 kudos

Isn't the suggested idea only filtering the input dataframe (resulting in a smaller amount of data to match across the whole delta table) rather than prune the delta table for relevant partitions to scan?

3 kudos

03-30-2023 1:24:57 PM

4 More Replies

by yit • Contributor III

9 hours ago

31 Views
3 replies
2 kudos

Does Autoloader supports loading PDF files?

I need to process PDF files already ingested. Based on the documentation, Autoloader does not support PDFs - or am I missing something?Also, I've found this sparkPDF library in other discussions in the community, but from what I see it's only for bat...

Data Engineering

31 Views
3 replies
2 kudos

9 hours ago

View Replies

Latest Reply

yit
Contributor III

9 hours ago

2 kudos

Any suggestions how to handle PDFs? @szymon_dybczak

2 kudos

9 hours ago

2 More Replies

by Filip • New Contributor II

08-22-2024 3:31:47 AM

6752 Views
7 replies
0 kudos

How to Assign User Managed Identity to DBR Cluster so I can use it for quering ADLSv2?

Hi,I'm trying to figure out if we can switch from Entra ID SPN's to User Assigned Managed Indentities and everything works except I can't figure out how to access the lake files from python notebook.I've tried with below code and was running it on a ...

Data Engineering

6752 Views
7 replies
0 kudos

08-22-2024 3:31:47 AM

View Replies

Latest Reply

Coffee77
Contributor

9 hours ago

0 kudos

Besides, this only works in dedicated clusters, non working on shared ones. Why? No idea at all. Latest case, IMDS (Internal Metadata Service) used by Azure to inject token endpoint inside resources as a unique secure and valid channel to get tokens ...

0 kudos

9 hours ago

6 More Replies

by pooja_bhumandla • New Contributor II

9 hours ago

11 Views
0 replies
0 kudos

When to Use and when Not to Use Liquid Clustering?

Hi everyone,I’m looking for some practical guidance and experiences around when to choose Liquid Clustering versus sticking with traditional partitioning + Z-ordering.From what I’ve gathered so far:For small tables (<10TB), Liquid Clustering gives s...

Data Engineering

11 Views
0 replies
0 kudos

9 hours ago

by Phani1 • Valued Contributor II

05-11-2025 10:32:57 PM

2528 Views
1 replies
0 kudos

Genie Integrating with streamlit

Hi All,What are the best practices to follow while integrating with Genie and streamlit , and are there any limitations?how best way to present in UI level on user perceptive ?Regards,Phani

Data Engineering

2528 Views
1 replies
0 kudos

05-11-2025 10:32:57 PM

View Replies

Latest Reply

AbhaySingh
Databricks Employee

11 hours ago

0 kudos

This should help you get started. Please let us know if you've any specific question after you've looked at the links below. https://blog.streamlit.io/best-practices-for-building-genai-apps-with-streamlit/ https://databrickster.medium.com/call-genie-...

0 kudos

11 hours ago

by AkhileshVB • New Contributor

06-27-2025 9:23:46 PM

2268 Views
2 replies
1 kudos

Resolved! Syncing lakebase table to delta table

I have been exploring Lakebase and I wanted to know if there is a way to sync CDC data from Lakebase tables to delta table in Lakehouse. I know the other way is possible and that's what was shown in the demo. Can you tell how I can I sync both the ta...

Data Engineering

2268 Views
2 replies
1 kudos

06-27-2025 9:23:46 PM

View Replies

Latest Reply

Malthe
Contributor II

11 hours ago

1 kudos

Just wanted to mention that the ETL from Lakebase to Delta Tables preview is mentioned here:https://www.databricks.com/blog/how-use-lakebase-transactional-data-layer-databricks-apps

1 kudos

11 hours ago

1 More Replies

by ChrisLawford_n1 • Contributor

12 hours ago

14 Views
0 replies
0 kudos

Network error on subsequent runs using serverless compute in DLT

Hello,When running on a serverless cluster in DLT our notebook first tries to install some python whls onto the cluster. We have noticed that when in development and running a pipeline many times over in a short space of time between runs that the pi...

Data Engineering

14 Views
0 replies
0 kudos

12 hours ago

by julius_bkr • Visitor

13 hours ago

48 Views
3 replies
3 kudos

Hive Metastore End of Life

Hello everyone,is there a rough date on which the Hive Metastore will be deactivated?In the end, I ask the question again, which was already asked 2 years ago:Solved: Hive metastore table access control End of Support - Databricks Community - 50487We...

Data Engineering

48 Views
3 replies
3 kudos

13 hours ago

View Replies

Latest Reply

Abhishek_Patel
New Contributor

12 hours ago

3 kudos

HI @julius_bkr I do not think Databricks has any plans to retire HMA completely. However, Unity Catalog is the strategic direction and the recommended approach for new deployments and migrating existing data governance. Databricks is investing heavil...

3 kudos

12 hours ago

2 More Replies

Databricks Community

Forum Posts

SQL Stored Procedures - Notebook to always run the CREATE query

Course - Introduction to Apache Spark

How to trigger a Databricks job only after multiple other jobs have completed

Declative Pipeline: Can pipeline or job be deployed run_as using asset bundle

How to enable Liquid Clustering on an existing Delta Live Table (DLT) and syntax for enabling it

Resolved! How do use Databricks Lakeflow Declarative Pipeline on AWS DMS data?

Resolved! Quota Limit Exhausted Error when Creating Data Ingestion with SQL Server Connector (Azure)

Resolved! How does partition pruning work on a merge into statement?

Does Autoloader supports loading PDF files?

How to Assign User Managed Identity to DBR Cluster so I can use it for quering ADLSv2?

When to Use and when Not to Use Liquid Clustering?

Genie Integrating with streamlit

Resolved! Syncing lakebase table to delta table

Network error on subsequent runs using serverless compute in DLT

Hive Metastore End of Life

Join Us as a Local Community Builder!

Azure Data Factory and Photon

Quota Limit Exhausted Error when Creating Data Ing...

How do use Databricks Lakeflow Declarative Pipelin...

Pass parameters between jobs

[NUMERIC_VALUE_OUT_OF_RANGE.WITHOUT_SUGGESTION] T...