Data Engineering

Forum Posts

Sorted by:

by upatint07 • New Contributor II

08-26-2024 9:35:45 PM

3909 Views
1 replies
0 kudos

Facing Issue in "import dlt" using Databricks Runtime 14.3 LTS version

Facing issues while Importing dlt library in Databricks Runtime 14.3 LTS. Previously while using the Runtime 13.1 The `import dlt` was working fine but now when updating the Runtime version to 14.3 LTS it is giving me error.

Data Engineering

3909 Views
1 replies
0 kudos

08-26-2024 9:35:45 PM

View Replies

Latest Reply

VZLA
Databricks Employee

12-27-2024 10:20:14 AM

0 kudos

Thanks for your question! Unfortunately, this is actually a known limitation with Spark Connect clusters.

0 kudos

12-27-2024 10:20:14 AM

by CURIOUS_DE • Contributor III

08-27-2024 8:14:12 AM

4010 Views
1 replies
1 kudos

A Surprise Findings in Delta Live Table

While DLT has some powerful features, I found myself doing a double-take when I realized it doesn’t natively support hard deletes. Instead, it leans on a delete flag identifier to manage these in the source table. A bit surprising for a tool of its c...

Data Engineering

4010 Views
1 replies
1 kudos

08-27-2024 8:14:12 AM

View Replies

Latest Reply

VZLA
Databricks Employee

12-27-2024 10:13:14 AM

1 kudos

Thanks for your feedback ! I believe, Delta Live Tables (DLT) does not natively support hard deletes and instead uses a delete flag identifier to manage deletions, a design choice rooted in ensuring compliance with regulations like GDPR and CCPA. Thi...

1 kudos

12-27-2024 10:13:14 AM

by dener • New Contributor

08-29-2024 4:49:38 AM

4043 Views
1 replies
0 kudos

Infinity load execution

I am experiencing performance issues when loading a table with 50 million rows into Delta Lake on AWS using Databricks. Despite successfully handling other larger tables, this especific table/process takes hours and doesn't finish. Here's the command...

Data Engineering

4043 Views
1 replies
0 kudos

08-29-2024 4:49:38 AM

View Replies

Latest Reply

VZLA
Databricks Employee

12-27-2024 10:06:50 AM

0 kudos

Thank you for your question! To optimize your Delta Lake write process: Disable Overhead Options: Avoid overwriteSchema and mergeSchema unless necessary. Use: df.write.format("delta").mode("overwrite").save(sink) Increase Parallelism: Use repartition...

0 kudos

12-27-2024 10:06:50 AM

by alexgavrysh • New Contributor

08-30-2024 2:04:04 AM

4109 Views
1 replies
0 kudos

Job scheduled run fail alert

Hello,I have a job that should run every six hours. I need to set up an alert for the case if this doesn't start (for example, someone paused it). How do I configure such an alert using Databricks native alerts?Theoretically, this may be done using s...

Data Engineering

4109 Views
1 replies
0 kudos

08-30-2024 2:04:04 AM

View Replies

Latest Reply

VZLA
Databricks Employee

12-27-2024 10:00:22 AM

0 kudos

Thank you for your question! Here’s a concise workflow to set up an alert for missed job runs in Databricks: Write a Query: Use system tables to identify jobs that haven’t started on time.Save the Query: Save this query in Databricks SQL as a named q...

0 kudos

12-27-2024 10:00:22 AM

by Thor • New Contributor III

08-30-2024 2:57:32 AM

3863 Views
1 replies
0 kudos

Native code in Databricks clusters

Is it possible to install our own binaries (lib or exec) in Databricks clusters and use JNI to execute them?I guess that Photon is native code as far as I could read so it must use a similar technic.

Data Engineering

3863 Views
1 replies
0 kudos

08-30-2024 2:57:32 AM

View Replies

Latest Reply

VZLA
Databricks Employee

12-27-2024 9:54:52 AM

0 kudos

Thanks for your question! I believe it should be possible, although Photon itself is not extensible by users. Are you currently facing any issues while installing and using your own libraries, and JNI to execute them?

0 kudos

12-27-2024 9:54:52 AM

by ed_carv • New Contributor

09-02-2024 3:35:06 AM

3964 Views
1 replies
1 kudos

Databricks S3 Commit Service

Is Databricks S3 Commit Service enabled by default if Unity Catalog is not enabled and the compute resources run in our AWS account (classic compute plane)? If not, how can it be enabled?This service seems to resolve the limitations with multi-cluste...

Data Engineering

3964 Views
1 replies
1 kudos

09-02-2024 3:35:06 AM

View Replies

Latest Reply

VZLA
Databricks Employee

12-27-2024 9:24:52 AM

1 kudos

No, the Databricks S3 commit service is not guaranteed to be enabled by default in the AWS classic compute plane. The configuration may vary based on your specific workspace setup. How can it be enabled? To enable the Databricks S3 commit service, ...

1 kudos

12-27-2024 9:24:52 AM

by David_Billa • New Contributor III

12-24-2024 2:47:25 AM

2057 Views
8 replies
3 kudos

Extract datetime value from the file name

I've the filename as below and I want to extract the datetime values and convert to datetime data type. This_is_new_file_2024_12_06T11_00_49_AM.csvHere I want to extract only '2024_12_06T11_00_49' and convert to datetime value in new field. I tried S...

Data Engineering

2057 Views
8 replies
3 kudos

12-24-2024 2:47:25 AM

View Replies

Latest Reply

Walter_C
Databricks Employee

12-24-2024 6:42:38 AM

3 kudos

Unfortunately I am not able to make it work with SQL functions

3 kudos

12-24-2024 6:42:38 AM

7 More Replies

by peritus • New Contributor II

12-25-2024 12:20:37 PM

5079 Views
5 replies
5 kudos

Synchronize SQLServer tables to Databricks

I'm new to Databricks and, I'm looking to get data from an external database into Databricks and keep it synchronized when changes occur in the source tables. It seems like I may be able to some form of change data capture and the delta live tables. ...

Data Engineering

5079 Views
5 replies
5 kudos

12-25-2024 12:20:37 PM

View Replies

Latest Reply

john533
New Contributor III

12-27-2024 5:26:55 AM

5 kudos

To synchronize data from an external database into Databricks with change data capture (CDC), you can use Delta Live Tables (DLT). Start by configuring a JDBC connection to your source database and use a CDC tool (like Debezium or database-native CDC...

5 kudos

12-27-2024 5:26:55 AM

4 More Replies

by GowthamR • New Contributor II

12-26-2024 2:57:54 AM

1015 Views
2 replies
0 kudos

Regarding Unity Catalog Self Assume Capabilities

Hi Team, Good Day! Recently in a Credentials section under Catalog , we have to add the self assume capabilities in the IAM role right.. Is it only for the Roles associated with Unity Catalog or for all the roles?. Thanks, Gowtham

Data Engineering

1015 Views
2 replies
0 kudos

12-26-2024 2:57:54 AM

View Replies

Latest Reply

RiyazAliM
Honored Contributor

12-26-2024 3:43:33 AM

0 kudos

Hi @GowthamR,I believe it's only for the roles associated with UC. I was going through this community post on including self-assume capabilities for AWS IAM roles and it's mentioned that this change does not affect storage credentials that are not cr...

0 kudos

12-26-2024 3:43:33 AM

1 More Replies

by skanapuram • New Contributor II

12-26-2024 7:12:33 AM

2102 Views
4 replies
0 kudos

Error com.databricks.common.client.DatabricksServiceHttpClientException 403 Invalid access token

Hi I got this error "com.databricks.common.client.DatabricksServiceHttpClientException: 403: Invalid access token" during the run of a workflow job. It has been working for a while without error. Nothing has changed in regards to code or cluster. And...

Data Engineering

2102 Views
4 replies
0 kudos

12-26-2024 7:12:33 AM

View Replies

Latest Reply

john533
New Contributor III

12-26-2024 10:22:51 PM

0 kudos

When the access token used for authentication is invalid or has expired, the error "com.databricks.common.client.DatabricksServiceHttpClientException: 403: Invalid access token" usually appears. Have you looked at the task cluster's driver logs? It m...

0 kudos

12-26-2024 10:22:51 PM

3 More Replies

by theron • New Contributor

12-26-2024 7:51:42 AM

931 Views
1 replies
0 kudos

Liquid Clustering - Implementing with Spark Streaming’s foreachBatch Upsert

Hi there!I’d like to use Liquid Clustering in a Spark Streaming process with foreachBatch(upsert). However, I’m not sure of the correct approach.The Databricks documentation suggests using .clusterBy(key) when writing streaming data. In my case, I'm ...

Data Engineering

931 Views
1 replies
0 kudos

12-26-2024 7:51:42 AM

View Replies

Latest Reply

RiyazAliM
Honored Contributor

12-26-2024 7:06:35 PM

0 kudos

Hi @theron If you have enabled LC while table creation, you must have already specified the cluster column. Thus I don't see a reason to mention .clusterBy(key) again.Let me know if any questionsCheers!If you want to create a brand new table with LC...

0 kudos

12-26-2024 7:06:35 PM

by David_Billa • New Contributor III

12-26-2024 9:57:06 AM

1464 Views
1 replies
0 kudos

Create delta table with space in field names per source CSV file

There are few fields in the source CSV file which has spaces in the field names. When I tried to create the table like `SOURCE 1 NAME` as string for few fields, I got an error message like 'INVALID_COLUMN_NAME_AS_PATH' error. Runtime version is 10.2...

Data Engineering

1464 Views
1 replies
0 kudos

12-26-2024 9:57:06 AM

View Replies

Latest Reply

RiyazAliM
Honored Contributor

12-26-2024 6:44:28 PM

0 kudos

Hi @David_Billa Before ingesting the csv data into the delta table, you could create delta table using the table properties as shown below:CREATE TABLE catalog_name.schema_name.table_name (`a` string, `b c` string) TBLPROPERTIES ( 'delta.minReade...

0 kudos

12-26-2024 6:44:28 PM

by lauraxyz • Contributor

12-26-2024 10:45:56 AM

1873 Views
4 replies
0 kudos

%run command: Pass Notebook path as a parameter

Hi team!I have a Notebook (notebook A) in workspace and I'd like to execute it with %run command from another Notebook (notebook B). It works perfect with command: %run /workspace/path/to/notebook/ANow, i want to specify above path in a variable, an...

Data Engineering

1873 Views
4 replies
0 kudos

12-26-2024 10:45:56 AM

View Replies

Latest Reply

Walter_C
Databricks Employee

12-26-2024 6:16:49 PM

0 kudos

These global temp views are available to all workloads running against a compute resource, but they do not persist beyond the lifecycle of the cluster or the session that created them.You can refer to https://docs.databricks.com/en/views/index.html#t...

0 kudos

12-26-2024 6:16:49 PM

3 More Replies

by jeremy98 • Honored Contributor

12-26-2024 4:41:26 AM

4211 Views
2 replies
0 kudos

How to migrate the data from Postgres to Databricks?

Hello Community,I have a question about migrating data from PostgreSQL to Databricks. My PostgreSQL database receives new data every hour, and I want to synchronize these hourly inserts with the bronze layer in my Databricks catalog.Currently, I’m us...

Data Engineering

4211 Views
2 replies
0 kudos

12-26-2024 4:41:26 AM

View Replies

Latest Reply

jeremy98
Honored Contributor

12-26-2024 11:25:40 AM

0 kudos

Hello Walter,Thank you for your help - you're amazing. I wanted to explain my current challenge in more detail:We have a platform that stores data in PostgreSQL, with a pipeline ingesting millions of rows every hour. We're trying to migrate this data...

0 kudos

12-26-2024 11:25:40 AM

1 More Replies

by lauraxyz • Contributor

12-18-2024 3:50:00 PM

1401 Views
6 replies
1 kudos

refresh online table: How to get update_id and check status of a specific update

Hi!I have a workflow job to trigger a refresh of an online table. How can I get the update_id with this specific refresh?Also, is it possible to get the status from this specific update_id?Thanks!

Data Engineering

1401 Views
6 replies
1 kudos

12-18-2024 3:50:00 PM

View Replies

Latest Reply

lauraxyz
Contributor

12-19-2024 3:20:21 PM

1 kudos

Another qq: Since online table has 3 sync mode: Snapshot, Triggered, and Continuous. when refreshing the online table with w.pipelines.start_update(pipeline_id='{pipeline_id}', full_refresh=True) which sync mode is used by default?

1 kudos

12-19-2024 3:20:21 PM

5 More Replies

Databricks Community

Forum Posts

Facing Issue in "import dlt" using Databricks Runtime 14.3 LTS version

A Surprise Findings in Delta Live Table

Infinity load execution

Job scheduled run fail alert

Native code in Databricks clusters

Databricks S3 Commit Service

Extract datetime value from the file name

Synchronize SQLServer tables to Databricks

Regarding Unity Catalog Self Assume Capabilities

Error com.databricks.common.client.DatabricksServiceHttpClientException 403 Invalid access token

Liquid Clustering - Implementing with Spark Streaming’s foreachBatch Upsert

Create delta table with space in field names per source CSV file

%run command: Pass Notebook path as a parameter

How to migrate the data from Postgres to Databricks?

refresh online table: How to get update_id and check status of a specific update

Join Us as a Local Community Builder!

how to avoid extra column after retry upon Unknown...

user standard serverless with asset bundle on Azur...

ONLY PNG format is available for databricks dashbo...

How to create a Unity Catalog Connection to SQL Se...

remote_query() is not working