cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

DM0341
by Visitor
  • 27 Views
  • 1 replies
  • 0 kudos

SQL Stored Procedures - Notebook to always run the CREATE query

I have a stored procedure that is saved as a query file. I can run it and the proc is created. However I want to take this one step further. I want my notebook to run the query file called sp_Remit.sql so if there is any changes to the proc between t...

  • 27 Views
  • 1 replies
  • 0 kudos
Latest Reply
mynameiskevin
  • 0 kudos

Something like this?import os query_name = "test_query.sql" query_path = os.path.abspath(query_name) # Read query contents with open(query_path, "r") as f: query_str = f.read() # Run it spark.sql(query_str)You can read the script from the sql ...

  • 0 kudos
fundat
by New Contributor II
  • 22 Views
  • 1 replies
  • 0 kudos

Course - Introduction to Apache Spark

Hi,In the course Introduction to Apache Spark; according to Apache Spark Runtime Architecture; Page 6 of 15. It says that :The cluster manager allocates resources and assigns tasks......Workers perform tasks assigned by the driverCan you help me plea...

fundat_3-1761596488970.png
  • 22 Views
  • 1 replies
  • 0 kudos
Latest Reply
BS_THE_ANALYST
Esteemed Contributor II
  • 0 kudos

Hi @fundat Perhaps the picture is useful here:Give this blog a read, I think this will answer some of your questions: https://medium.com/@knoldus/understanding-the-working-of-spark-driver-and-executor-4fec0e669399 .All the best,BS

  • 0 kudos
dhruvs2
by Visitor
  • 26 Views
  • 1 replies
  • 1 kudos

How to trigger a Databricks job only after multiple other jobs have completed

We have a use case where Job C should start only after both Job A and Job B have successfully completed.In Airflow, we achieve this using an ExternalTaskSensor to set dependencies across different DAGs.Is there a way to configure something similar in...

  • 26 Views
  • 1 replies
  • 1 kudos
Latest Reply
BS_THE_ANALYST
Esteemed Contributor II
  • 1 kudos

Hey @dhruvs2 You could use Lakeflow Jobs for this. You can add a job as a task:Then you can just follow the docs from here: https://docs.databricks.com/aws/en/jobs/ there's loads of great sections / tutorials.To answer your specific question:When con...

  • 1 kudos
Sakthi0311
by Visitor
  • 36 Views
  • 2 replies
  • 0 kudos

How to enable Liquid Clustering on an existing Delta Live Table (DLT) and syntax for enabling it

 Hi all,I’m working with Delta Live Tables (DLT) and want to enable Liquid Clustering on an existing DLT table that was already created without it.Could someone please clarify:How can I enable Liquid Clustering on an existing DLT table (without recre...

  • 36 Views
  • 2 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @Sakthi0311 ,For SQL language you can enable LC for materialized views and streaming tables. So the syntax looks following:If you want to use automatic clustering then use CLUSTER BY AUTO. 

  • 0 kudos
1 More Replies
excavator-matt
by New Contributor III
  • 439 Views
  • 6 replies
  • 2 kudos

Resolved! How do use Databricks Lakeflow Declarative Pipeline on AWS DMS data?

Hi!I am trying to replicate an AWS RDS PostgreSQL database in Databricks. I have successfully manage to enable CDC using AWS DMS that writes an initial load file and continuous CDC files in parquet.I have been trying to follow the official guide Repl...

Data Engineering
AUTO CDC
AWS DMS
declarative pipelines
LakeFlow
  • 439 Views
  • 6 replies
  • 2 kudos
Latest Reply
mmayorga
Databricks Employee
  • 2 kudos

hey @excavator-matt  Let's remember that the Bronze layer is for mere raw ingestion; this provides a baseline for auditing and to start applying transformations based on the different use cases you need to serve. Systems and their requirements change...

  • 2 kudos
5 More Replies
Adam_Borlase
by New Contributor III
  • 37 Views
  • 4 replies
  • 3 kudos

Resolved! Quota Limit Exhausted Error when Creating Data Ingestion with SQL Server Connector (Azure)

Good Day all,I am having an issue with our first Data Ingestion Pipelines, I am wanting to connect to our Azure SQL Server with our Unity Connector (I can access the data in Unity Catalog). When I am on Step 3 of the process (Source) when it is scann...

  • 37 Views
  • 4 replies
  • 3 kudos
Latest Reply
Adam_Borlase
New Contributor III
  • 3 kudos

you for all of your assistance!

  • 3 kudos
3 More Replies
ghofigjong
by New Contributor
  • 11120 Views
  • 5 replies
  • 3 kudos

Resolved! How does partition pruning work on a merge into statement?

I have a delta table that is partitioned by Year, Date and month. I'm trying to merge data to this on all three partition columns + an extra column (an ID). My merge statement is below:MERGE INTO delta.<path of delta table> oldData using df newData ...

  • 11120 Views
  • 5 replies
  • 3 kudos
Latest Reply
Umesh_S
New Contributor II
  • 3 kudos

Isn't the suggested idea only filtering the input dataframe (resulting in a smaller amount of data to match across the whole delta table) rather than prune the delta table for relevant partitions to scan?

  • 3 kudos
4 More Replies
yit
by Contributor III
  • 31 Views
  • 3 replies
  • 2 kudos

Does Autoloader supports loading PDF files?

I need to process PDF files already ingested. Based on the documentation, Autoloader does not support PDFs - or am I missing something?Also, I've found this sparkPDF library in other discussions in the community, but from what I see it's only for bat...

  • 31 Views
  • 3 replies
  • 2 kudos
Latest Reply
yit
Contributor III
  • 2 kudos

Any suggestions how to handle PDFs? @szymon_dybczak 

  • 2 kudos
2 More Replies
Filip
by New Contributor II
  • 6752 Views
  • 7 replies
  • 0 kudos

How to Assign User Managed Identity to DBR Cluster so I can use it for quering ADLSv2?

Hi,I'm trying to figure out if we can switch from Entra ID SPN's to User Assigned Managed Indentities and everything works except I can't figure out how to access the lake files from python notebook.I've tried with below code and was running it on a ...

  • 6752 Views
  • 7 replies
  • 0 kudos
Latest Reply
Coffee77
Contributor
  • 0 kudos

Besides, this only works in dedicated clusters, non working on shared ones. Why? No idea at all. Latest case, IMDS (Internal Metadata Service) used by Azure to inject token endpoint inside resources as a unique secure and valid channel to get tokens ...

  • 0 kudos
6 More Replies
pooja_bhumandla
by New Contributor II
  • 11 Views
  • 0 replies
  • 0 kudos

When to Use and when Not to Use Liquid Clustering?

 Hi everyone,I’m looking for some practical guidance and experiences around when to choose Liquid Clustering versus sticking with traditional partitioning + Z-ordering.From what I’ve gathered so far:For small tables (<10TB), Liquid Clustering gives s...

  • 11 Views
  • 0 replies
  • 0 kudos
Phani1
by Valued Contributor II
  • 2528 Views
  • 1 replies
  • 0 kudos

Genie Integrating with streamlit

Hi All,What are the best practices to follow while integrating with Genie and streamlit , and are there any limitations?how best way to present in UI level on user perceptive ?Regards,Phani

  • 2528 Views
  • 1 replies
  • 0 kudos
Latest Reply
AbhaySingh
Databricks Employee
  • 0 kudos

This should help you get started. Please let us know if you've any specific question after you've looked at the links below. https://blog.streamlit.io/best-practices-for-building-genai-apps-with-streamlit/ https://databrickster.medium.com/call-genie-...

  • 0 kudos
AkhileshVB
by New Contributor
  • 2268 Views
  • 2 replies
  • 1 kudos

Resolved! Syncing lakebase table to delta table

I have been exploring Lakebase and I wanted to know if there is a way to sync CDC data from Lakebase tables to delta table in Lakehouse. I know the other way is possible and that's what was shown in the demo. Can you tell how I can I sync both the ta...

  • 2268 Views
  • 2 replies
  • 1 kudos
Latest Reply
Malthe
Contributor II
  • 1 kudos

Just wanted to mention that the ETL from Lakebase to Delta Tables preview is mentioned here:https://www.databricks.com/blog/how-use-lakebase-transactional-data-layer-databricks-apps 

  • 1 kudos
1 More Replies
julius_bkr
by Visitor
  • 48 Views
  • 3 replies
  • 3 kudos

Hive Metastore End of Life

Hello everyone,is there a rough date on which the Hive Metastore will be deactivated?In the end, I ask the question again, which was already asked 2 years ago:Solved: Hive metastore table access control End of Support - Databricks Community - 50487We...

  • 48 Views
  • 3 replies
  • 3 kudos
Latest Reply
Abhishek_Patel
New Contributor
  • 3 kudos

HI @julius_bkr I do not think Databricks has any plans to retire HMA completely. However, Unity Catalog is the strategic direction and the recommended approach for new deployments and migrating existing data governance. Databricks is investing heavil...

  • 3 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels