cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

JoseU
by New Contributor
  • 1982 Views
  • 1 replies
  • 0 kudos

Cannot install libraries to cluster

Getting the following error when trying to install libraries to all purpose compute using the Library tab in Cluster details. We had vendor setup the cluster and they have since dropped off. I have switched the owner to an active AD user however stil...

  • 1982 Views
  • 1 replies
  • 0 kudos
Latest Reply
k1t3k
New Contributor II
  • 0 kudos

Have you managed to find a solution for this?

  • 0 kudos
AmineDE
by New Contributor II
  • 1990 Views
  • 2 replies
  • 1 kudos

[DATATYPE_MISMATCH.DATA_DIFF_TYPES] Cannot resolve "coalesce(VALUE, false)"

Hi All,i got the error below using a compute instance with any of the runtime but not with sql warehouse xsmall. How could we explain it ?Runtime Error in model sics_business_recon_unpivoted (models\Engineering\Business\sics_business_recon_unpivoted....

  • 1990 Views
  • 2 replies
  • 1 kudos
Latest Reply
SparkJun
Databricks Employee
  • 1 kudos

can you try to run this on DBSQL and see if that errors out: coalesce(CAST(VALUE AS BOOLEAN), false)? What's the DBR version for your cluster? 

  • 1 kudos
1 More Replies
slakshmanan
by New Contributor III
  • 4656 Views
  • 9 replies
  • 1 kudos

how to use rest api to find long running query in databricks

how to use rest api to find long running query in databricks from sql/queries/all

  • 4656 Views
  • 9 replies
  • 1 kudos
Latest Reply
Srini_ADB
New Contributor II
  • 1 kudos

@Ajay-Pandey Thanks. This API works fine. But it is showing only the current day queries. How can we get the all queries which is currently running.

  • 1 kudos
8 More Replies
jgrycz
by New Contributor III
  • 3910 Views
  • 2 replies
  • 2 kudos

Resolved! Can not set Service Principal User role to a service principal

Hi!I'm trying to assign `Service Principal Users` role to a newly create Service Principal using terraform.For that I use following block of code:```resource "databricks_service_principal_role" "sp_job_runner_user_role" {  service_principal_id = data...

Screenshot 2024-11-13 at 15.30.56.png
  • 3910 Views
  • 2 replies
  • 2 kudos
Latest Reply
jgrycz
New Contributor III
  • 2 kudos

@Alberto_Umana thanks for help! 

  • 2 kudos
1 More Replies
CliveChan
by New Contributor II
  • 1007 Views
  • 3 replies
  • 0 kudos

Coursera Applied Data Science for Data Analysts - Classroom Setup Failed

I tried to run the lab in the Coursera Applied Data Science for Data Analysts Classroom Setup and failed with the following error: Is there any fix for this?

Screenshot 2024-10-07 at 11.14.38 AM.png
  • 1007 Views
  • 3 replies
  • 0 kudos
Latest Reply
CliveChan
New Contributor II
  • 0 kudos

 

  • 0 kudos
2 More Replies
Vetrivel
by Databricks Partner
  • 2564 Views
  • 2 replies
  • 0 kudos

Cost Optimization for serverless Delta Live Table Implementation

I am currently using serverless Delta Live Tables for our silver layer, specifically leveraging the apply changes API method for SCD Type 2. However, we have observed that the costs are higher than initially anticipated, and I would like to seek your...

  • 2564 Views
  • 2 replies
  • 0 kudos
Latest Reply
Mounika_Tarigop
Databricks Employee
  • 0 kudos

To optimize DBU consumption and reduce costs while using serverless Delta Live Tables (DLT) for your silver layer, particularly with the apply changes API method for Slowly Changing Dimension (SCD) Type 2, consider the following options:  - Instead o...

  • 0 kudos
1 More Replies
MikeGo
by Contributor II
  • 3265 Views
  • 10 replies
  • 0 kudos

why latestOffset and getBatch takes so long time

Hi team,Kinesis -> delta table raw -> job with trigger=availableNow -> delta table target. The Kinesis->delta table raw is running continuously. The job is daily with trigger=availableNow. The job reads from raw, do some transformation, and run a MER...

Brad_0-1729034240965.png
  • 3265 Views
  • 10 replies
  • 0 kudos
Latest Reply
MikeGo
Contributor II
  • 0 kudos

@VZLA , thanks for the input and suggestion. Will create a support ticket. 

  • 0 kudos
9 More Replies
MikeGo
by Contributor II
  • 893 Views
  • 2 replies
  • 0 kudos

Can I have sequence guarantee when replicate with CDF

Hi team,I have a delta table src, and somehow I want to replicate it to another table tgt with CDF, sort of (spark .readStream .format("delta") .option("readChangeFeed", "true") .table('src') .writeStream .format("delta") ...

  • 893 Views
  • 2 replies
  • 0 kudos
Latest Reply
MikeGo
Contributor II
  • 0 kudos

Thanks. If the replicated table can have the _commit_version in strict sequence, I can take it as a global ever-incremental col and consume the delta of it (e.g. in batch way) with select * from replicated_tgt where _commit_version > ( selecct la...

  • 0 kudos
1 More Replies
jorgemarmol
by New Contributor II
  • 7778 Views
  • 10 replies
  • 2 kudos

Delta Live Tables: Too much time to do the "setting up"

Hello community!Recently I have been working in delta live table for a big project. My team and me have been studying so much and finally we have built a good pipeline with CDC that load 608 entities (and, therefore, 608 delta live tables and 608 mat...

jorgemarmol_0-1688633577282.png
  • 7778 Views
  • 10 replies
  • 2 kudos
Latest Reply
DataEngineer
New Contributor II
  • 2 kudos

Increase the worker and driver to higher configuraion on the pipeline. It will take initially for setting up but once the setup is completed, the ingestion would be faster. Here you can save the one hour took for ingestion.. 

  • 2 kudos
9 More Replies
camilo_s
by Databricks Partner
  • 8661 Views
  • 10 replies
  • 9 kudos

Git credentials for service principals running Jobs

I know the documentation for setting up Git credentials for Service Principals: you have to use a PAT from your Git provider, which is inevitably tied to an user and has a lifecycle of its own.Doesn't this kind of defeats the purpose of running a job...

  • 8661 Views
  • 10 replies
  • 9 kudos
Latest Reply
clarkh
New Contributor II
  • 9 kudos

@nicole_lu_PMRunning into a similar issue with a job that needs to run in a service principal context and is connected to GitHub to execute a specific file. Would the work around be to create a PAT for GitHub under the service principal creds?

  • 9 kudos
9 More Replies
ms_221
by New Contributor II
  • 1643 Views
  • 1 replies
  • 0 kudos

Need to load the data from databricks to Snowflake table having ID,which automatically increments

I want to load the data from  df (say 3 columns c1,c2,c3) into the snowflake table say (test1) having columns (c1,c2,c3) and ID autoincrement column.The df and snowflake table (test1) have same column definition and same datatypes. In the target tabl...

  • 1643 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

To load data from a DataFrame into a Snowflake table with an autoincrement ID column, you can follow these steps: First, ensure that your Snowflake table (test1) is created with an autoincrement ID column:CREATE OR REPLACE TABLE test1 ( ID INT AU...

  • 0 kudos
Guigui
by New Contributor II
  • 2640 Views
  • 3 replies
  • 0 kudos

Job start time timezone

It is mentionned in the documentation that the job.start_time is a value of time in UTC timezone but I wonder if it's always the case because as the start_time is in UTC timezone for a scheduled job, it is in local timezone when it is manually trigge...

  • 2640 Views
  • 3 replies
  • 0 kudos
Latest Reply
Mounika_Tarigop
Databricks Employee
  • 0 kudos

To determine whether a Databricks job was triggered manually or by schedule, you can use the dynamic value reference {{job.trigger.type}}. T

  • 0 kudos
2 More Replies
RobDineen
by Contributor
  • 2737 Views
  • 4 replies
  • 0 kudos

Resolved! Pyspark to_date not coping with single digit Day or Month

Hi there i have a simple Pyspark To_date function but fails due to days or months from 1-9 sois there a nice easy way to get round this at allRegardsRob

RobDineen_0-1731324661487.png
  • 2737 Views
  • 4 replies
  • 0 kudos
Latest Reply
RobDineen
Contributor
  • 0 kudos

Resolved using format_string dff = df.withColumn("DayofMonthFormatted", when(df.DayofMonth.isin([1,2,3,4,5,6,7,8,9]), format_string("0%d", df.DayofMonth)).otherwise(df.DayofMonth))

  • 0 kudos
3 More Replies
Avinash_Narala
by Databricks Partner
  • 3825 Views
  • 2 replies
  • 2 kudos

Fully serverless databricks SaaS

I'm exploring Databricks' fully serverless SaaS option, as shown in the attached image, which promises quick setup and $400 in initial credits. I'm curious about the pros and cons of using this fully serverless setup.Specifically, would this option b...

  • 3825 Views
  • 2 replies
  • 2 kudos
Latest Reply
gchandra
Databricks Employee
  • 2 kudos

There are; if you have spark config, customer jars, and init scripts, they won't work. Please check this page for long list of limitations. https://docs.databricks.com/en/compute/serverless/limitations.html

  • 2 kudos
1 More Replies
rcostanza
by New Contributor III
  • 2747 Views
  • 4 replies
  • 2 kudos

Resolved! Changing git's author field when committing through Databricks

I have a git folder to a Bitbucket repo. Whenever I commit something, the commit uses my Bitbucket username (the unique name) in the field author, making it less readable when I'm reading a list of commits.For example, commits end up like this: commi...

  • 2747 Views
  • 4 replies
  • 2 kudos
Latest Reply
yermulnik
New Contributor II
  • 2 kudos

Just found us suffering from the same issue since we enforced a GitHub ruleset to require commit emails to match our Org email pattern of `*@ourorgdomain.com`.

  • 2 kudos
3 More Replies
Labels