Data Engineering

Forum Posts

Sorted by:

by deng_dev • New Contributor III

12-28-2023 2:58:40 AM

361 Views
1 replies
0 kudos

Getting "Job aborted" exception while saving data to the database

Hi!We have job, that runs every hour. It extracts data from the API and saves to the databricks table.Sometimes job fails with error "org.apache.spark.SparkException". Here is the full error:An error occurred while calling o7353.saveAsTable. : org.ap...

Data Engineering

361 Views
1 replies
0 kudos

12-28-2023 2:58:40 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

01-10-2024 11:45:55 AM

0 kudos

Do you have any NULL values in your data? Please verify that you data is valid

0 kudos

01-10-2024 11:45:55 AM

by 532664 • New Contributor III

01-02-2024 6:08:12 AM

1841 Views
11 replies
3 kudos

Resolved! Replay(backfill) DLT CDC using kafka

Hello,We are receiving DB CDC binlogs through Kafka and synchronizing tables in OLAP system using the apply_changes function in Delta Live Table (DLT). A month ago, a column was added to our table, but due to a type mismatch, it's being stored incorr...

Data Engineering

1841 Views
11 replies
3 kudos

01-02-2024 6:08:12 AM

View Replies

Latest Reply

jcozar
Contributor

01-03-2024 8:51:47 AM

3 kudos

Thank you @532664 for your detailed response! That's seems to me a very good solution, and it also helps me with my doubts

3 kudos

01-03-2024 8:51:47 AM

10 More Replies

by Prashant777 • New Contributor II

05-15-2023 12:09:12 AM

3172 Views
4 replies
0 kudos

Error in SQL statement: UnsupportedOperationException: Cannot perform Merge as multiple source rows matched and attempted to modify the same

My code:- CREATE OR REPLACE TEMPORARY VIEW preprocessed_source ASSELECT Key_ID, Distributor_ID, Customer_ID, Customer_Name, ChannelFROM integr_masterdata.Customer_Master;-- Step 2: Perform the merge operation using the preprocessed source tableM...

Data Engineering

3172 Views
4 replies
0 kudos

05-15-2023 12:09:12 AM

View Replies

Latest Reply

Tread
New Contributor II

01-10-2024 6:56:04 AM

0 kudos

Hey as previously stated you could drop the duplicates of the columns that contain the said duplicates(code you can find online pretty easily), I have had this problem myself and it came when creating a temporary view from a dataframe, the dataframe ...

0 kudos

01-10-2024 6:56:04 AM

3 More Replies

by sunkam • New Contributor

08-07-2022 10:40:57 AM

2018 Views
4 replies
0 kudos

Unable to read from azure blob using SAS token\\

I have tried many times all the answers from the internet and stackover flowI have already created the config section before this steps, it passed but this below step is not executing.

Data Engineering

2018 Views
4 replies
0 kudos

08-07-2022 10:40:57 AM

View Replies

Latest Reply

aockenden
New Contributor III

01-10-2024 6:52:15 AM

0 kudos

We were getting this problem when using directory-scoped SAS tokens. While I know there are a number of potential issues that can cause this problem, one potential explanation is that it turns out there is an undocumented spark setting needed on the ...

0 kudos

01-10-2024 6:52:15 AM

3 More Replies

by Hemendra_Singh • New Contributor II

01-10-2024 12:39:35 AM

752 Views
1 replies
1 kudos

Resolved! Unity catalog - external table and managed table

do the external tables which we create or manage through unity catalog supports acid properties and time traveling, and if we go for the performance issue which is more faster to query and why ?

Data Engineering

752 Views
1 replies
1 kudos

01-10-2024 12:39:35 AM

View Replies

Latest Reply

Kaniz
Community Manager

01-10-2024 2:08:02 AM

1 kudos

Hi @Hilium, External tables in the Unity Catalog reference an external storage path. They are used when you require direct access to the data outside of Azure Databricks clusters or Databricks SQL warehouses. However, the ACID properties and time-tra...

1 kudos

01-10-2024 2:08:02 AM

by Bilal1 • New Contributor III

02-16-2022 10:37:25 PM

16097 Views
6 replies
2 kudos

Resolved! Simply writing a dataframe to a CSV file (non-partitioned)

When writing a dataframe in Pyspark to a CSV file, a folder is created and a partitioned CSV file is created. I have then rename this file in order to distribute it my end user.Is there any way I can simply write my data to a CSV file, with the name ...

Data Engineering

16097 Views
6 replies
2 kudos

02-16-2022 10:37:25 PM

View Replies

Latest Reply

Bilal1
New Contributor III

02-16-2022 11:28:41 PM

2 kudos

Thanks for confirming that that's the only way

2 kudos

02-16-2022 11:28:41 PM

5 More Replies

by jorgemarmol • New Contributor II

07-06-2023 1:54:34 AM

1123 Views
4 replies
0 kudos

Delta Live Tables: Too much time to do the "setting up"

Hello community!Recently I have been working in delta live table for a big project. My team and me have been studying so much and finally we have built a good pipeline with CDC that load 608 entities (and, therefore, 608 delta live tables and 608 mat...

Data Engineering

1123 Views
4 replies
0 kudos

07-06-2023 1:54:34 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

01-09-2024 7:44:38 AM

0 kudos

Interesting...DLT probably spends x seconds/table for the setup.If you have time, you could do some tests to see if the table setup scales linearly (1 table, 5 sec for setup, 10 tables 50 sec etc).If you do, please share the outcome.

0 kudos

01-09-2024 7:44:38 AM

3 More Replies

by gardener • New Contributor III

01-08-2024 4:50:46 AM

527 Views
2 replies
0 kudos

Resolved! Url column issue in UC information_schema.schemata view definition

Hi, I recently observed that, after creating a new catalog (without a managed location) in Unity Catalog, a column named 'url' is included in the definition of the information_schema.schemata view.However, there is no url column in the underlying tab...

Data Engineering

527 Views
2 replies
0 kudos

01-08-2024 4:50:46 AM

View Replies

Latest Reply

Kaniz
Community Manager

01-08-2024 11:52:51 PM

0 kudos

Hi @gardener, Based on the Databricks documentation, the information_schema.schemata view should contain the following columns: catalog_name: Catalog containing the schema.schema_name: Name of the schema.schema_owner: User or group (principal) that c...

0 kudos

01-08-2024 11:52:51 PM

1 More Replies

by N_M • New Contributor III

01-05-2024 8:10:05 AM

1604 Views
5 replies
1 kudos

Resolved! ignoreCorruptFiles behavior with CSV and COPY INTO

HiI'm using the COPY INTO command to insert new data (in form of CSVs) into an already existing table.The SQL query takes care of the conversion of the fields to the target table schema (well, there isn't other way to do that), and schema update is n...

Data Engineering

COPY INTO

ignoreCorruptFiles

1604 Views
5 replies
1 kudos

01-05-2024 8:10:05 AM

View Replies

Latest Reply

N_M
New Contributor III

01-09-2024 5:33:30 AM

1 kudos

I actually found an option that could solve the newline issue I mentioned in my previous post:setting spark.sql.csv.parser.columnPruning.enabled to false withspark.conf.set("spark.sql.csv.parser.columnPruning.enabled", False)will consider malformed r...

1 kudos

01-09-2024 5:33:30 AM

4 More Replies

by datakilljoy • New Contributor II

01-08-2024 6:27:22 AM

801 Views
1 replies
0 kudos

Best practice for Azure Key vault secrets in spark config

HelloI created a compute in which I refer the secret inside the spark config like this: spark.hadoop.fs.azure.account.key.xxxxxxxxxx.dfs.core.windows.net {{secrets/kv-xxxxxxx-xxxx/secret-name}} This, however, gives me the following warning. I've l...

Data Engineering

cluster

compute

spark

801 Views
1 replies
0 kudos

01-08-2024 6:27:22 AM

View Replies

Latest Reply

datakilljoy
New Contributor II

01-09-2024 3:39:38 AM

0 kudos

Extra info: I have used the format following the instructions on this page for spark configurationhttps://learn.microsoft.com/en-us/azure/databricks/connect/storage/azure-storage#:~:text=Use%20the%20following%20format%20to%20set%20the%20cluster%20Spa...

0 kudos

01-09-2024 3:39:38 AM

by merca • Valued Contributor II

01-09-2024 1:24:40 AM

916 Views
1 replies
0 kudos

Resolved! Problems with DLT, Unity catalog and external connection

I have following code:org = spark.read.table("catalog.dbo.organisation") @dlt.create_table() def organization(): return orgThe catalog is an external azure sql database (using external connector)When i validate this in Delta live table workflow I...

Data Engineering

916 Views
1 replies
0 kudos

01-09-2024 1:24:40 AM

View Replies

Latest Reply

Sumit671
New Contributor III

01-09-2024 2:56:03 AM

0 kudos

use preview channel while create pipeline instead of current

0 kudos

01-09-2024 2:56:03 AM

by leelee3000 • New Contributor III

12-12-2023 1:37:31 PM

629 Views
3 replies
2 kudos

Development Feedback Loop

I've noticed that the current development cycle for DLT jobs is quite time-consuming. The process of coding, saving, running in a workflow, and debugging seems arduous, and the feedback loop is slow. Is there a way to run DLT jobs without relying on ...

Data Engineering

629 Views
3 replies
2 kudos

12-12-2023 1:37:31 PM

View Replies

Latest Reply

Kaniz
Community Manager

12-14-2023 1:54:20 AM

2 kudos

Hi @leelee3000, Developing and iterating on Delta Live Tables (DLT) jobs can be time-consuming when relying solely on traditional workflows. Databricks Jobs: Databricks jobs allow you to orchestrate multiple tasks within a Databricks job, creating ...

2 kudos

12-14-2023 1:54:20 AM

2 More Replies

by prapot • New Contributor II

02-14-2022 9:48:50 PM

5335 Views
2 replies
2 kudos

Resolved! How to write a Spark DataFrame to CSV file with our .CRC in Azure Databricks?

val spark:SparkSession = SparkSession.builder() .master("local[3]") .appName("SparkByExamples.com") .getOrCreate()//Spark Read CSV Fileval df = spark.read.option("header",true).csv("address.csv")//Write DataFrame to address directorydf.write...

Data Engineering

5335 Views
2 replies
2 kudos

02-14-2022 9:48:50 PM

View Replies

Latest Reply

Nw2this
New Contributor II

01-08-2024 6:09:53 PM

2 kudos

Will your csv have the name prefix 'part-' or can you name it whatever you like?

2 kudos

01-08-2024 6:09:53 PM

1 More Replies

by hukel • Contributor

12-26-2023 2:31:42 PM

1169 Views
6 replies
0 kudos

Unsupported datatype 'TimestampNTZType' with liquid clustering

I'm experimenting with liquid clustering and have some questions about compatible types (somewhat similar to Liquid clustering with boolean columns ).Table created as CREATE TABLE IF NOT EXISTS <TABLE> ( _time DOUBLE , timestamp TIMESTAMP_NT...

Data Engineering

1169 Views
6 replies
0 kudos

12-26-2023 2:31:42 PM

View Replies

Latest Reply

Wojciech_BUK
Contributor III

12-27-2023 5:42:57 AM

0 kudos

Hi,just educated guess:There is limitation in liquid clustering docs: You can only specify columns with statistics collected for clustering keysPerhaps it is related to data types for which you can collect statistics?But i could not find related docs...

0 kudos

12-27-2023 5:42:57 AM

5 More Replies

by NathanE • New Contributor II

10-16-2023 4:27:10 AM

1728 Views
2 replies
1 kudos

Java 21 support with Databricks JDBC driver

Hello,I was wondering if there was any timeline for Java 21 support with the Databricks JDBC driver (current version is 2.34).One of the required change is to update the dependency to arrow to version 13.0 (current version is 9.0.0).The current worka...

Data Engineering

driver

java21

JDBC

1728 Views
2 replies
1 kudos

10-16-2023 4:27:10 AM

View Replies

Latest Reply

Fabich
New Contributor II

01-08-2024 7:29:06 AM

1 kudos

Hello @Kaniz Any update on this topic of Java 21 ? Any timeline ?Our clients really want to upgrade to Java 21 and we don't want to disable Arrow for performance reasons

1 kudos

01-08-2024 7:29:06 AM

1 More Replies

User

Count

1601

736

343

284

247

Databricks

Forum Posts

Getting "Job aborted" exception while saving data to the database

Resolved! Replay(backfill) DLT CDC using kafka

Error in SQL statement: UnsupportedOperationException: Cannot perform Merge as multiple source rows matched and attempted to modify the same

Unable to read from azure blob using SAS token\\

Resolved! Unity catalog - external table and managed table

Resolved! Simply writing a dataframe to a CSV file (non-partitioned)

Delta Live Tables: Too much time to do the "setting up"

Resolved! Url column issue in UC information_schema.schemata view definition

Resolved! ignoreCorruptFiles behavior with CSV and COPY INTO

Best practice for Azure Key vault secrets in spark config

Resolved! Problems with DLT, Unity catalog and external connection

Development Feedback Loop

Resolved! How to write a Spark DataFrame to CSV file with our .CRC in Azure Databricks?

Unsupported datatype 'TimestampNTZType' with liquid clustering

Java 21 support with Databricks JDBC driver

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...