Data Engineering

Forum Posts

Sorted by:

by rchauhan • New Contributor II

08-01-2023 4:58:02 PM

23487 Views
3 replies
4 kudos

org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 188.0 failed 4

When I am trying to read the data from sql server through jdbc connect , I get the below error while merging the data into databricks table . Can you please help whats the issue related to? : org.apache.spark.SparkException: Job aborted due to stage...

Data Engineering

23487 Views
3 replies
4 kudos

08-01-2023 4:58:02 PM

View Replies

Latest Reply

MDV
New Contributor III

03-25-2024 8:38:00 AM

4 kudos

@rchauhan did you find a solution to the problem or know what settings caused the problem ?

4 kudos

03-25-2024 8:38:00 AM

2 More Replies

by SankaraiahNaray • New Contributor II

03-20-2024 4:07:18 AM

3534 Views
4 replies
0 kudos

OPTIMIZE with liquid clustering makes filter slower than without OPTIMIZE

I created 15 Million records as a Delta Table and i'm running a simple filter query on that table based on one column value - which will return only one record. Because all the values on that column are unique.Delta Table is not partitioned.Before en...

Data Engineering

3534 Views
4 replies
0 kudos

03-20-2024 4:07:18 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

03-25-2024 8:15:12 AM

0 kudos

it seems that for this specific query Liquid Clustering has worse performance. It does not have better performance for all queries.The following are examples of scenarios that benefit from clustering:Tables often filtered by high cardinality columns...

0 kudos

03-25-2024 8:15:12 AM

3 More Replies

by mvmiller • New Contributor III

03-21-2024 1:36:10 PM

5954 Views
2 replies
3 kudos

Module not found, despite it being installed on job cluster?

We observed the following error in a notebook which was running from a Databricks workflow: ModuleNotFoundError: No module named '<python package>'The error message speaks for itself - it obviously couldn't find the python package. What is peculiar ...

Data Engineering

5954 Views
2 replies
3 kudos

03-21-2024 1:36:10 PM

View Replies

Latest Reply

mvmiller
New Contributor III

03-25-2024 6:03:14 AM

3 kudos

Thanks, @Walter_C. Supposing that your second possible explanation, Cluster Initialization Timing, could be a factor, are there any best practices or recommendations for preventing this from being a recurring issue, down the road?

3 kudos

03-25-2024 6:03:14 AM

1 More Replies

by MarinD • New Contributor II

03-25-2024 4:51:09 AM

2893 Views
0 replies
0 kudos

CI/CD Databricks Asset Bundles - DLT pipelines - unity catalog and target schema

Is it possible for the CI/CD Databricks Asset Bundles YAML file to describe unity catalog and target schema as destination needed for the DLT pipeline? Or that's just not possible today.In case this functionality is not possible today, are there any ...

Data Engineering

2893 Views
0 replies
0 kudos

03-25-2024 4:51:09 AM

by Etyr • Contributor

03-18-2024 3:04:21 AM

4181 Views
2 replies
1 kudos

[FinOps] Tagging queries in databricks

Hello,I see that it is possible to tag catalogs/databases/tables. But I did not find a way to tag a query for our finop use case.In Azure you can check billings dependings on tags.A concrete example: In Azure Machine Learning, I have a schedule that ...

Data Engineering

Delta Live Table - Cannot redefine dataset

Hi,I am new to Delta Live Table.I am trying to create a delta live table from the databricks tutorial.I have created a notebook and attached an interactive cluster -DBR 14.3-LTS.I am running the below code.When I ran it for the 1st time it ran succes...

Data Engineering

Delta Live Table

dlt

4348 Views
2 replies
1 kudos

03-22-2024 6:28:58 AM

View Replies

Latest Reply

Walter_C
Databricks Employee

03-23-2024 7:55:19 AM

1 kudos

The error message "AnalysisException: Cannot redefine dataset 'sales_orders_raw'" is indicating that you're trying to create a table that already exists. In Databricks, once a Delta Live Table (DLT) is defined, it cannot be redefined or overwritten. ...

1 kudos

03-23-2024 7:55:19 AM

1 More Replies

by Avi759787 • New Contributor

03-24-2024 9:28:55 PM

3106 Views
0 replies
0 kudos

Driver is up but is not responsive, likely due to GC.

I am using Interactive cluster to run frequent (every 15min) batch job.After certain time (example: 6hours), the cluster continuously starts showing Driver is up but is not responsive, likely due to GC. in event log and all jobs starts failing.If the...

Data Engineering

3106 Views
0 replies
0 kudos

03-24-2024 9:28:55 PM

by WearBeard • New Contributor

03-24-2024 8:25:16 AM

3411 Views
1 replies
0 kudos

Consume updated data from the Materialized view and send it as append to a streaming table

Hello everyone! I'm using DLT and I'm pretty new to them. I'm trying to take the updates from a materialized view and send them to a streaming table as an append.For example, if I have a MV of 400 records, I want an append to be made to the streaming...

Data Engineering

3411 Views
1 replies
0 kudos

03-24-2024 8:25:16 AM

View Replies

Latest Reply

Priyanka_Biswas
Databricks Employee

03-24-2024 6:58:39 PM

0 kudos

Hi @WearBeard By default, streaming tables require append-only sources. The encountered error is due to an update or delete operation on the 'streaming_table_test'. To fix this issue, perform a Full Refresh on the 'streaming_table_test' table. You ca...

0 kudos

03-24-2024 6:58:39 PM

by Sas • New Contributor II

03-24-2024 2:20:44 PM

2001 Views
0 replies
0 kudos

Not able to create mount point in Databricks

HiI am trying to create mount point in Azure Databricks, but mount point creation is failing with below error messageDBUtils.FS Handler.mount() got an unexpected keyword argument 'extra_config'I am using following codedef setup_mount(storage_account_...

Data Engineering

2001 Views
0 replies
0 kudos

03-24-2024 2:20:44 PM

by badari_narayan • New Contributor II

03-07-2024 2:42:05 AM

1740 Views
1 replies
0 kudos

Exam got suspended without any reason

Hi Team,My Databricks Certified Associate Developer for Apache Spark 3.0 - Python exam got suspended on 7th March 2024I was there continuously in front of the camera and suddenly the alert appeared, and support person asked me to show the full table ...

Data Engineering

1740 Views
1 replies
0 kudos

03-07-2024 2:42:05 AM

View Replies

Latest Reply

vinay076
New Contributor III

03-24-2024 12:39:26 AM

0 kudos

hi @badari_narayan did you exam got rescheduled..i am also facing same issue my exam got suspemded

0 kudos

03-24-2024 12:39:26 AM

by felix_counter • New Contributor III

09-29-2023 1:07:17 AM

3094 Views
2 replies
0 kudos

Fail to install package dependency located on private pypi server during .whl installation

Hello,I recently switched from DBR 12.2 LTS to DBR 13.3 LTS and observed the following behavior:My goal is to install a python library from a .whl file. I am using the UI for this task (Cluster settings -> Libraries -> Install new -> 'Python Whl' as ...

Data Engineering

3094 Views
2 replies
0 kudos

09-29-2023 1:07:17 AM

View Replies

Latest Reply

robbe
New Contributor III

03-23-2024 12:17:17 PM

0 kudos

Hey Felix, I have run into a similar issue recently (my wheel needs a Git HTTPS redirect that's specified in the init script - but I can install it fine from inside a notebook).I wonder whether you found a solution (perhaps moving a more recent DBR v...

0 kudos

03-23-2024 12:17:17 PM

1 More Replies

by mvmiller • New Contributor III

03-21-2024 1:41:01 PM

3375 Views
1 replies
0 kudos

How to ignore Writestream UnknownFieldException error

I have a parquet file that I am trying to write to a delta table:df.writeStream .format("delta") .option("checkpointLocation", f"{targetPath}/delta/{tableName}/__checkpoints") .trigger(once=True) .foreachBatch(processTable) .outputMode("append")...

Data Engineering

3375 Views
1 replies
0 kudos

03-21-2024 1:41:01 PM

View Replies

Latest Reply

shan_chandra
Databricks Employee

03-22-2024 3:53:08 PM

0 kudos

@mvmiller - Per the below documentation, The stream will fail with unknownFieldException, the schema evolution mode by default is addNewColumns. so, Databricks recommends configuring Auto Loader streams with workflows to restart automatically after s...

0 kudos

03-22-2024 3:53:08 PM

by RTabur • New Contributor II

03-22-2024 6:52:02 AM

1964 Views
2 replies
0 kudos

[Bug] Orphan storage location

Hello,I'm not able to re-create an external location after removing its owner from Databricks Account. I'm getting the following error:Input path url 'abfss://foo@bar.dfs.core.windows.net/' overlaps with an existing external location within 'CreateEx...

Data Engineering

1964 Views
2 replies
0 kudos

03-22-2024 6:52:02 AM

View Replies

Latest Reply

PL_db
Databricks Employee

03-22-2024 7:32:07 AM

0 kudos

Your metastore admin can list all external locationsYour metastore admin can then drop the external location

0 kudos

03-22-2024 7:32:07 AM

1 More Replies

by AxelBrsn • New Contributor III

03-19-2024 7:05:40 AM

7793 Views
2 replies
0 kudos

Resolved! Importing python to DLT - Not working with DLT Pipeline

Hello, we are trying to adapt our developments (notebook with delta tables), into Delta Live Tables Pipelines.We tried to import Python files that are very useful for data transformations (silver data cleaning, for example) :From the Cluster (run man...

Data Engineering

Delta Live Table

import

pipeline

python

7793 Views
2 replies
0 kudos

03-19-2024 7:05:40 AM

View Replies

Latest Reply

AxelBrsn
New Contributor III

03-22-2024 7:04:45 AM

0 kudos

The solution is to import from Python but also add the python file in the Pipeline settings, in the list of source code.

0 kudos

03-22-2024 7:04:45 AM

1 More Replies

by data-engineer-d • Contributor

03-21-2024 10:53:17 AM

4286 Views
3 replies
4 kudos

Parametrize the DLT pipeline for dynamic loading of many tables

I am trying to ingest hundreds of tables with CDC, where I want to create a generic/dynamic pipeline which can accept parameters (e.g table_name, schema, file path) and run the logic on it. However, I am not able to find a way to pass parameters to p...

Data Engineering

Delta Live Tables

4286 Views
3 replies
4 kudos

03-21-2024 10:53:17 AM

View Replies

Latest Reply

Gilg
Contributor II

03-21-2024 3:32:49 PM

4 kudos

If you have different folders for each of your source tables, you can leverage python loops to naturally iterate over the folders.To do this, you need to create a create_pipeline function that has table_name, schema, path as your parameters. Inside t...

4 kudos

03-21-2024 3:32:49 PM

2 More Replies

Databricks Community

Forum Posts

org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 188.0 failed 4

OPTIMIZE with liquid clustering makes filter slower than without OPTIMIZE

Module not found, despite it being installed on job cluster?

CI/CD Databricks Asset Bundles - DLT pipelines - unity catalog and target schema

[FinOps] Tagging queries in databricks

Delta Live Table - Cannot redefine dataset

Driver is up but is not responsive, likely due to GC.

Consume updated data from the Materialized view and send it as append to a streaming table

Not able to create mount point in Databricks

Exam got suspended without any reason

Fail to install package dependency located on private pypi server during .whl installation

How to ignore Writestream UnknownFieldException error

[Bug] Orphan storage location

Resolved! Importing python to DLT - Not working with DLT Pipeline

Parametrize the DLT pipeline for dynamic loading of many tables

Join Us as a Local Community Builder!

Encountering an error while setting up a single-no...

AUTO CDC API and sequence column

when automatic liquid clustering is enabled, how t...

Can't mergeSchema handle int and bigint?

Understanding least common type in databricks