Data Engineering

Forum Posts

Sorted by:

by JoseU • New Contributor

07-17-2024 10:40:25 AM

1982 Views
1 replies
0 kudos

Cannot install libraries to cluster

Getting the following error when trying to install libraries to all purpose compute using the Library tab in Cluster details. We had vendor setup the cluster and they have since dropped off. I have switched the owner to an active AD user however stil...

Data Engineering

1982 Views
1 replies
0 kudos

07-17-2024 10:40:25 AM

View Replies

Latest Reply

k1t3k
New Contributor II

11-14-2024 1:32:47 AM

0 kudos

Have you managed to find a solution for this?

0 kudos

11-14-2024 1:32:47 AM

by AmineDE • New Contributor II

11-12-2024 6:25:22 AM

1990 Views
2 replies
1 kudos

[DATATYPE_MISMATCH.DATA_DIFF_TYPES] Cannot resolve "coalesce(VALUE, false)"

Hi All,i got the error below using a compute instance with any of the runtime but not with sql warehouse xsmall. How could we explain it ?Runtime Error in model sics_business_recon_unpivoted (models\Engineering\Business\sics_business_recon_unpivoted....

Data Engineering

1990 Views
2 replies
1 kudos

11-12-2024 6:25:22 AM

View Replies

Latest Reply

SparkJun
Databricks Employee

11-12-2024 10:29:03 PM

1 kudos

can you try to run this on DBSQL and see if that errors out: coalesce(CAST(VALUE AS BOOLEAN), false)? What's the DBR version for your cluster?

1 kudos

11-12-2024 10:29:03 PM

1 More Replies

by slakshmanan • New Contributor III

10-07-2024 7:13:34 PM

4656 Views
9 replies
1 kudos

how to use rest api to find long running query in databricks

how to use rest api to find long running query in databricks from sql/queries/all

Data Engineering

4656 Views
9 replies
1 kudos

10-07-2024 7:13:34 PM

View Replies

Latest Reply

Srini_ADB
New Contributor II

11-13-2024 11:45:03 PM

1 kudos

@Ajay-Pandey Thanks. This API works fine. But it is showing only the current day queries. How can we get the all queries which is currently running.

1 kudos

11-13-2024 11:45:03 PM

8 More Replies

by jgrycz • New Contributor III

11-13-2024 6:35:53 AM

3910 Views
2 replies
2 kudos

Resolved! Can not set Service Principal User role to a service principal

Hi!I'm trying to assign `Service Principal Users` role to a newly create Service Principal using terraform.For that I use following block of code:```resource "databricks_service_principal_role" "sp_job_runner_user_role" { service_principal_id = data...

Data Engineering

3910 Views
2 replies
2 kudos

11-13-2024 6:35:53 AM

View Replies

Latest Reply

jgrycz
New Contributor III

11-13-2024 11:21:56 PM

2 kudos

@Alberto_Umana thanks for help!

2 kudos

11-13-2024 11:21:56 PM

1 More Replies

by CliveChan • New Contributor II

10-06-2024 8:15:52 PM

1007 Views
3 replies
0 kudos

Coursera Applied Data Science for Data Analysts - Classroom Setup Failed

I tried to run the lab in the Coursera Applied Data Science for Data Analysts Classroom Setup and failed with the following error: Is there any fix for this?

Screenshot 2024-10-07 at 11.14.38 AM.png

Data Engineering

1007 Views
3 replies
0 kudos

10-06-2024 8:15:52 PM

View Replies

Latest Reply

CliveChan
New Contributor II

11-13-2024 10:57:53 PM

0 kudos

0 kudos

11-13-2024 10:57:53 PM

2 More Replies

by Vetrivel • Databricks Partner

11-12-2024 11:24:22 PM

2564 Views
2 replies
0 kudos

Cost Optimization for serverless Delta Live Table Implementation

I am currently using serverless Delta Live Tables for our silver layer, specifically leveraging the apply changes API method for SCD Type 2. However, we have observed that the costs are higher than initially anticipated, and I would like to seek your...

Data Engineering

2564 Views
2 replies
0 kudos

11-12-2024 11:24:22 PM

View Replies

Latest Reply

Mounika_Tarigop
Databricks Employee

11-13-2024 12:22:53 PM

0 kudos

To optimize DBU consumption and reduce costs while using serverless Delta Live Tables (DLT) for your silver layer, particularly with the apply changes API method for Slowly Changing Dimension (SCD) Type 2, consider the following options: - Instead o...

0 kudos

11-13-2024 12:22:53 PM

1 More Replies

by MikeGo • Contributor II

10-15-2024 4:22:07 PM

3265 Views
10 replies
0 kudos

why latestOffset and getBatch takes so long time

Hi team,Kinesis -> delta table raw -> job with trigger=availableNow -> delta table target. The Kinesis->delta table raw is running continuously. The job is daily with trigger=availableNow. The job reads from raw, do some transformation, and run a MER...

Data Engineering

3265 Views
10 replies
0 kudos

10-15-2024 4:22:07 PM

View Replies

Latest Reply

MikeGo
Contributor II

11-13-2024 2:58:25 PM

0 kudos

@VZLA , thanks for the input and suggestion. Will create a support ticket.

0 kudos

11-13-2024 2:58:25 PM

9 More Replies

by MikeGo • Contributor II

11-08-2024 12:29:33 AM

893 Views
2 replies
0 kudos

Can I have sequence guarantee when replicate with CDF

Hi team,I have a delta table src, and somehow I want to replicate it to another table tgt with CDF, sort of (spark .readStream .format("delta") .option("readChangeFeed", "true") .table('src') .writeStream .format("delta") ...

Data Engineering

893 Views
2 replies
0 kudos

11-08-2024 12:29:33 AM

View Replies

Latest Reply

MikeGo
Contributor II

11-13-2024 2:54:56 PM

0 kudos

Thanks. If the replicated table can have the _commit_version in strict sequence, I can take it as a global ever-incremental col and consume the delta of it (e.g. in batch way) with select * from replicated_tgt where _commit_version > ( selecct la...

0 kudos

11-13-2024 2:54:56 PM

1 More Replies

by jorgemarmol • New Contributor II

07-06-2023 1:54:34 AM

7778 Views
10 replies
2 kudos

Delta Live Tables: Too much time to do the "setting up"

Hello community!Recently I have been working in delta live table for a big project. My team and me have been studying so much and finally we have built a good pipeline with CDC that load 608 entities (and, therefore, 608 delta live tables and 608 mat...

Data Engineering

7778 Views
10 replies
2 kudos

07-06-2023 1:54:34 AM

View Replies

Latest Reply

DataEngineer
New Contributor II

11-13-2024 1:24:09 PM

2 kudos

Increase the worker and driver to higher configuraion on the pipeline. It will take initially for setting up but once the setup is completed, the ingestion would be faster. Here you can save the one hour took for ingestion..

2 kudos

11-13-2024 1:24:09 PM

9 More Replies

by camilo_s • Databricks Partner

06-13-2024 9:07:38 AM

8661 Views
10 replies
9 kudos

Git credentials for service principals running Jobs

I know the documentation for setting up Git credentials for Service Principals: you have to use a PAT from your Git provider, which is inevitably tied to an user and has a lifecycle of its own.Doesn't this kind of defeats the purpose of running a job...

Data Engineering

8661 Views
10 replies
9 kudos

06-13-2024 9:07:38 AM

View Replies

Latest Reply

clarkh
New Contributor II

11-13-2024 12:43:21 PM

9 kudos

@nicole_lu_PMRunning into a similar issue with a job that needs to run in a service principal context and is connected to GitHub to execute a specific file. Would the work around be to create a PAT for GitHub under the service principal creds?

9 kudos

11-13-2024 12:43:21 PM

9 More Replies

by ms_221 • New Contributor II

11-11-2024 5:50:49 AM

1643 Views
1 replies
0 kudos

Need to load the data from databricks to Snowflake table having ID,which automatically increments

I want to load the data from df (say 3 columns c1,c2,c3) into the snowflake table say (test1) having columns (c1,c2,c3) and ID autoincrement column.The df and snowflake table (test1) have same column definition and same datatypes. In the target tabl...

Data Engineering

1643 Views
1 replies
0 kudos

11-11-2024 5:50:49 AM

View Replies

Latest Reply

Walter_C
Databricks Employee

11-13-2024 10:42:07 AM

0 kudos

To load data from a DataFrame into a Snowflake table with an autoincrement ID column, you can follow these steps: First, ensure that your Snowflake table (test1) is created with an autoincrement ID column:CREATE OR REPLACE TABLE test1 ( ID INT AU...

0 kudos

11-13-2024 10:42:07 AM

by Guigui • New Contributor II

11-12-2024 12:20:41 PM

2640 Views
3 replies
0 kudos

Job start time timezone

It is mentionned in the documentation that the job.start_time is a value of time in UTC timezone but I wonder if it's always the case because as the start_time is in UTC timezone for a scheduled job, it is in local timezone when it is manually trigge...

Data Engineering

2640 Views
3 replies
0 kudos

11-12-2024 12:20:41 PM

View Replies

Latest Reply

Mounika_Tarigop
Databricks Employee

11-13-2024 8:21:07 AM

0 kudos

To determine whether a Databricks job was triggered manually or by schedule, you can use the dynamic value reference {{job.trigger.type}}. T

0 kudos

11-13-2024 8:21:07 AM

2 More Replies

by RobDineen • Contributor

11-11-2024 3:31:48 AM

2737 Views
4 replies
0 kudos

Resolved! Pyspark to_date not coping with single digit Day or Month

Hi there i have a simple Pyspark To_date function but fails due to days or months from 1-9 sois there a nice easy way to get round this at allRegardsRob

Data Engineering

2737 Views
4 replies
0 kudos

11-11-2024 3:31:48 AM

View Replies

Latest Reply

RobDineen
Contributor

11-13-2024 7:45:18 AM

0 kudos

Resolved using format_string dff = df.withColumn("DayofMonthFormatted", when(df.DayofMonth.isin([1,2,3,4,5,6,7,8,9]), format_string("0%d", df.DayofMonth)).otherwise(df.DayofMonth))

0 kudos

11-13-2024 7:45:18 AM

3 More Replies

by Avinash_Narala • Databricks Partner

11-13-2024 5:39:38 AM

3825 Views
2 replies
2 kudos

Fully serverless databricks SaaS

I'm exploring Databricks' fully serverless SaaS option, as shown in the attached image, which promises quick setup and $400 in initial credits. I'm curious about the pros and cons of using this fully serverless setup.Specifically, would this option b...

Data Engineering

3825 Views
2 replies
2 kudos

11-13-2024 5:39:38 AM

View Replies

Latest Reply

gchandra
Databricks Employee

11-13-2024 5:41:49 AM

2 kudos

There are; if you have spark config, customer jars, and init scripts, they won't work. Please check this page for long list of limitations. https://docs.databricks.com/en/compute/serverless/limitations.html

2 kudos

11-13-2024 5:41:49 AM

1 More Replies

by rcostanza • New Contributor III

10-10-2024 2:24:34 PM

2747 Views
4 replies
2 kudos

Resolved! Changing git's author field when committing through Databricks

I have a git folder to a Bitbucket repo. Whenever I commit something, the commit uses my Bitbucket username (the unique name) in the field author, making it less readable when I'm reading a list of commits.For example, commits end up like this: commi...

Data Engineering

GIT

2747 Views
4 replies
2 kudos

10-10-2024 2:24:34 PM

View Replies

Latest Reply

yermulnik
New Contributor II

11-13-2024 4:52:39 AM

2 kudos

Just found us suffering from the same issue since we enforced a GitHub ruleset to require commit emails to match our Org email pattern of `*@ourorgdomain.com`.

2 kudos

11-13-2024 4:52:39 AM

3 More Replies

Databricks Community

Forum Posts

Cannot install libraries to cluster

[DATATYPE_MISMATCH.DATA_DIFF_TYPES] Cannot resolve "coalesce(VALUE, false)"

how to use rest api to find long running query in databricks

Resolved! Can not set Service Principal User role to a service principal

Coursera Applied Data Science for Data Analysts - Classroom Setup Failed

Cost Optimization for serverless Delta Live Table Implementation

why latestOffset and getBatch takes so long time

Can I have sequence guarantee when replicate with CDF

Delta Live Tables: Too much time to do the "setting up"

Git credentials for service principals running Jobs

Need to load the data from databricks to Snowflake table having ID,which automatically increments

Job start time timezone

Resolved! Pyspark to_date not coping with single digit Day or Month

Fully serverless databricks SaaS

Resolved! Changing git's author field when committing through Databricks

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template