Data Engineering

Forum Posts

Sorted by:

by jeremy98 • Honored Contributor

04-16-2025 12:26:54 AM

1752 Views
4 replies
0 kudos

how to fallback the entire job in case of failure of the cluster?

Hi community,My team and I are using a job that is triggered based on dynamic scheduling, with the schedule defined within some of the job's tasks. However, this job is attached to a cluster that is always running and never terminated.I understand th...

Data Engineering

1752 Views
4 replies
0 kudos

04-16-2025 12:26:54 AM

View Replies

Latest Reply

RiyazAliM
Honored Contributor

04-29-2025 9:33:10 PM

0 kudos

Hey @jeremy98 Have you had a chance to experiment with Databricks Serverless offering? Ideally, serverless would spin up times are around ~1 min. It has inbuilt autoscaling based on the workload, seems good fit for your usecase. Check out more info f...

0 kudos

04-29-2025 9:33:10 PM

3 More Replies

by suja • New Contributor

04-29-2025 7:48:49 PM

1272 Views
1 replies
0 kudos

Exploring parallelism for multiple tables

I am new to databricks. The app we need to build reads from hive tables, go thru bronze, silver and gold layers and store in relational db tables. There are multiple hive tables with no dependencies. What is the best way to achieve parallelism. Do w...

Data Engineering

1272 Views
1 replies
0 kudos

04-29-2025 7:48:49 PM

View Replies

Latest Reply

lingareddy_Alva
Esteemed Contributor

04-29-2025 8:43:57 PM

0 kudos

Hi @suja Use Databricks Workflows (Jobs) with Task ParallelismInstead of using threads within a single notebook, leverage Databricks Jobs to define multiple tasks, each responsible for a table. Tasks can: 1. Run in parallel ...

0 kudos

04-29-2025 8:43:57 PM

by ABINASH • New Contributor

04-28-2025 4:48:49 AM

1038 Views
1 replies
0 kudos

Flattening VARIANT column.

Hi Team, I am facing an issue, i have a json file which is around 700kb and it contains only 1 record, so after reading the data and flattening the file the record is now 620 million. Now while i am writing the dataframe into delta lake it is taking ...

Data Engineering

1038 Views
1 replies
0 kudos

04-28-2025 4:48:49 AM

View Replies

Latest Reply

samshifflett46
Databricks Partner

04-29-2025 6:15:01 PM

0 kudos

Hey @ABINASH, The JSON file being flattened to 620 million records seems like the area of optimization would be to restructure the JSON file. My initial thought being that the JSON file is extremely nested which is causing a large amount of redundant...

0 kudos

04-29-2025 6:15:01 PM

by sondergaard • New Contributor II

04-29-2025 7:00:22 AM

1821 Views
2 replies
0 kudos

Simba ODBC driver // .Net Core

Hi,I have been looking into the Simba Spark ODBC driver to see if it can simplify our integration with .Net Core. The first results were promising, but when I started to process larger queries I started to notice out-of-memory exceptions in the conta...

Data Engineering

1821 Views
2 replies
0 kudos

04-29-2025 7:00:22 AM

View Replies

Latest Reply

Rjdudley
Honored Contributor

04-29-2025 10:23:49 AM

0 kudos

Something we're considering for a similar purpose (.NET Core service pulling data from Databricks) is the ADO.NET connector from CData: Databricks Driver: ADO.NET Provider | Create & integrate .NET apps

0 kudos

04-29-2025 10:23:49 AM

1 More Replies

by ashraf1395 • Honored Contributor

04-28-2025 10:56:55 PM

1392 Views
1 replies
0 kudos

Fething the catalog and schema which is set in dlt pipeline configuration

I have a dlt pipeline and the notebook which is running on the dlt pipeline has some requirements.I want to get the catalog and schema which is set my dlt pipeline. Reason for it: I have to specify my volume files paths etc and my volume is on the sa...

Data Engineering

1392 Views
1 replies
0 kudos

04-28-2025 10:56:55 PM

View Replies

Latest Reply

SP_6721
Honored Contributor II

04-29-2025 5:41:54 AM

0 kudos

Hi @ashraf1395 Can you try this to get the catalog and schema set by your DLT pipeline in the notebookcatalog = spark.conf.get("pipelines.catalog")schema = spark.conf.get("pipelines.schema")

0 kudos

04-29-2025 5:41:54 AM

by ankit001mittal • New Contributor III

04-29-2025 2:40:27 AM

961 Views
1 replies
0 kudos

DLT Pipeline Stats on Object level

Hi Guys,I want to create a table where I want to store information about each DLT pipelines on object/table id level details about how much time it took for waiting for resources and how much time it took to run for each object and numbers or records...

Data Engineering

dlt

system

961 Views
1 replies
0 kudos

04-29-2025 2:40:27 AM

View Replies

Latest Reply

RiyazAliM
Honored Contributor

04-29-2025 3:56:34 AM

0 kudos

Hi @ankit001mittal DLT Event logs helps you to gather most of the information you've mentioned above. Below is the documentation to the DLT Event Logs:https://docs.databricks.com/aws/en/dlt/observabilityLet me know if any questions.Best,

0 kudos

04-29-2025 3:56:34 AM

by Ekaterina_Paste • Databricks Partner

02-25-2022 12:46:47 AM

22277 Views
12 replies
2 kudos

Resolved! Can't login to databricks community edition

I enter my valid login and password here https://community.cloud.databricks.com/login.html but it says "Invalid email address or password"

Data Engineering

22277 Views
12 replies
2 kudos

02-25-2022 12:46:47 AM

View Replies

Latest Reply

Venkat124488
New Contributor II

04-29-2025 1:41:09 AM

2 kudos

data bricks cluster is terminating each 15 sec in community edition. Could you please help me on this issue.

2 kudos

04-29-2025 1:41:09 AM

11 More Replies

by cookiebaker • New Contributor III

04-23-2025 12:06:10 AM

4109 Views
7 replies
6 kudos

Resolved! Some DLTpipelines suddely seem to take different runtime 16.1 instead of 15.4 since last night (CET)

Hello, Suddenly since last night on some of our DLT pipelines we're getting failures saying that our hive_metastore control table cannot be found. All of our DLT's are set up the same (serverless), and one Shared Compute on runtime version 15.4. For ...

Data Engineering

4109 Views
7 replies
6 kudos

04-23-2025 12:06:10 AM

View Replies

Latest Reply

cookiebaker
New Contributor III

04-29-2025 12:19:53 AM

6 kudos

@voo-rodrigo Hello, thanks for updating the progress on your end! I've tested as well and confirmed that the DLT can read the hive_metastore via Serverless again.

6 kudos

04-29-2025 12:19:53 AM

6 More Replies

by BrendanTierney • New Contributor II

08-17-2021 3:47:53 AM

6817 Views
6 replies
3 kudos

Resolved! Community Edition is not allocating Cluster

I've been trying to use the Community edition for the past 3 days without success.I go to run a Notebook and it begins to allocated the Cluster, but it it never finishes. Sometimes it times out after 15 minutes.Waiting for cluster to start: Finding i...

Data Engineering

6817 Views
6 replies
3 kudos

08-17-2021 3:47:53 AM

View Replies

Latest Reply

JD2001
New Contributor II

04-28-2025 8:34:00 PM

3 kudos

I am running into the same issue since today. It worked fine till yesterday.

3 kudos

04-28-2025 8:34:00 PM

5 More Replies

by ZacayDaushin • New Contributor

01-24-2024 11:57:14 PM

3054 Views
3 replies
0 kudos

How to access system.access.table_lineage

I try to make a select from system.access.table_lineage but i dont have to see the tablewhat permission to i have

Data Engineering

3054 Views
3 replies
0 kudos

01-24-2024 11:57:14 PM

View Replies

Latest Reply

Nivethan_Venkat
Databricks MVP

03-10-2025 4:20:17 PM

0 kudos

Hi @ZacayDaushin,To query the table in system catalog, you need to have SELECT permission on top of the table to query and see the results.Best Regards,Nivethan V

0 kudos

03-10-2025 4:20:17 PM

2 More Replies

by smpa01 • Contributor

04-28-2025 11:31:18 AM

1246 Views
1 replies
1 kudos

Resolved! tbl name as paramater marker

I am getting an error here, when I do this//this works fine declare sqlStr = 'select col1 from catalog.schema.tbl LIMIT (?)'; declare arg1 = 500; EXECUTE IMMEDIATE sqlStr USING arg1; //this does not declare sqlStr = 'select col1 from (?) LIMIT (?)';...

Data Engineering

1246 Views
1 replies
1 kudos

04-28-2025 11:31:18 AM

View Replies

Latest Reply

lingareddy_Alva
Esteemed Contributor

04-28-2025 12:30:13 PM

1 kudos

@smpa01 In SQL EXECUTE IMMEDIATE, you can only parameterize values, not identifiers like table names, column names, or database names.That is, placeholders (?) can only replace constant values, not object names (tables, schemas, columns, etc.).SELECT...

1 kudos

04-28-2025 12:30:13 PM

by p_romm • New Contributor III

02-19-2025 3:26:30 AM

1402 Views
4 replies
0 kudos

Structured Streaming writeStream - Query is no longer active causes task to fail

Hi, I execute readStream/writeStream in workflow task. Write stream uses .trigger(availableNow=True) option. After writeStream I'm waiting query to finish with query.awaitTermination(). However from time to time, pipeline ends with "Query <id> is no ...

Data Engineering

1402 Views
4 replies
0 kudos

02-19-2025 3:26:30 AM

View Replies

Latest Reply

cmathieu
New Contributor III

04-28-2025 9:15:18 AM

0 kudos

@Alberto_Umana this bug was apparently fixed a few months ago, but we're still facing the same issue on our end.

0 kudos

04-28-2025 9:15:18 AM

3 More Replies

by 397973 • New Contributor III

04-28-2025 6:57:12 AM

1407 Views
1 replies
1 kudos

Resolved! Several unavoidable for loops are slowing this PySpark code. Is it possible to improve it?

Hi. I have a PySpark notebook that takes 25 minutes to run as opposed to one minute in on-prem Linux + Pandas. How can I speed it up?It's not a volume issue. The input is around 30k rows. Output is the same because there's no filtering or aggregation...

Data Engineering

1407 Views
1 replies
1 kudos

04-28-2025 6:57:12 AM

View Replies

Latest Reply

lingareddy_Alva
Esteemed Contributor

04-28-2025 9:10:55 AM

1 kudos

@397973 Spark is optimized for 100s of GB or millions of rows, NOT small in-memory lookups with heavy control flow (unless engineered carefully).That's why Pandas is much faster for your specific case now.Pre-load and Broadcast All MappingsInstead of...

1 kudos

04-28-2025 9:10:55 AM

by Lo • New Contributor II

04-27-2025 2:45:51 PM

1793 Views
1 replies
0 kudos

SocketTimeoutException when creating execution context in Databricks Community Edition

Hello,I’m experiencing an issue in Databricks Community Edition.When I try to run a notebook, I get this error:"Exception when creating execution context: java.net.SocketTimeoutException: connect Timeout"What I have tried:- Restarting the cluster- Ch...

Data Engineering

1793 Views
1 replies
0 kudos

04-27-2025 2:45:51 PM

View Replies

Latest Reply

Advika
Community Manager

04-28-2025 4:17:00 AM

0 kudos

Hello @Lo! There is a similar thread where another user encountered the same issue and shared a solution that worked for them. I suggest reviewing that thread to see if the solution is helpful in your case as well.

0 kudos

04-28-2025 4:17:00 AM

by vidya_kothavale • Contributor

04-26-2025 11:48:50 AM

1846 Views
1 replies
1 kudos

Issue reading Vertica table into Databricks - Numeric value out of range

I am trying to read a Vertica table into a Spark DataFrame using JDBC in Databricks.Here is my sample code:hostname = ""username = ""password = ""database_port = ""database_name = ""qry_col_level = f"""SELECT * FROM analytics_DS.ansh_units_cum_dash""...

Data Engineering

1846 Views
1 replies
1 kudos

04-26-2025 11:48:50 AM

View Replies

Latest Reply

Renu_
Valued Contributor II

04-28-2025 3:21:39 AM

1 kudos

Hi @vidya_kothavale, based on my research and understanding, Databricks and Spark's JDBC connectors currently don’t offer an automatic way to truncate or round high precision decimal values when loading data. To handle this, you would need to either:...

1 kudos

04-28-2025 3:21:39 AM

Databricks Community

Forum Posts

how to fallback the entire job in case of failure of the cluster?

Exploring parallelism for multiple tables

Flattening VARIANT column.

Simba ODBC driver // .Net Core

Fething the catalog and schema which is set in dlt pipeline configuration

DLT Pipeline Stats on Object level

Resolved! Can't login to databricks community edition

Resolved! Some DLTpipelines suddely seem to take different runtime 16.1 instead of 15.4 since last night (CET)

Resolved! Community Edition is not allocating Cluster

How to access system.access.table_lineage

Resolved! tbl name as paramater marker

Structured Streaming writeStream - Query is no longer active causes task to fail

Resolved! Several unavoidable for loops are slowing this PySpark code. Is it possible to improve it?

SocketTimeoutException when creating execution context in Databricks Community Edition

Issue reading Vertica table into Databricks - Numeric value out of range

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template