cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

159312
by New Contributor III
  • 3796 Views
  • 1 replies
  • 0 kudos

How to set pipelines.incompatibleViewCheck.enabled = false

I tried to load a static table as source to a streaming dlt pipeline. I understand this is not optimum, but it provides the best path toward eventually having a full streaming pipeline. When I do I get the following error:pyspark.sql.utils.Analysis...

  • 3796 Views
  • 1 replies
  • 0 kudos
Latest Reply
kfoster
Contributor
  • 0 kudos

when you declare a table or view, you can pass use something as this: @dlt.table( spark_conf={ "pipelines.incompatibleViewCheck.enabled": "false" } )

  • 0 kudos
PrebenOlsen
by New Contributor III
  • 3063 Views
  • 1 replies
  • 1 kudos

Resolved! Why does @dlt.table from a table give different results than from a view?

I have some data in silver that I read in as a view using the __apply_changes function on. I create a table based on this, and I then want to create my gold-table, after doing a .groupBy() and .pivot(). The transformations I do in the gold-table aren...

image image
  • 3063 Views
  • 1 replies
  • 1 kudos
Latest Reply
PrebenOlsen
New Contributor III
  • 1 kudos

I have found a temporary solution to solve this. The .pivot("columnName") should automatically grab all the values it can find, but for some reason it does not. I need to specify the values, using.pivot("group_name", "group0", "group1", "group2"...) ...

  • 1 kudos
SatishGunjal
by New Contributor
  • 3524 Views
  • 1 replies
  • 0 kudos

Data frame takes long time to print count of rows

We have a pyspark data frame with 50 MN records. We can display records from it, but it takes around 10 minutes to print the shape of dataframe. We aim to use this data for modelling that will take some numerical features based on the final data fra...

  • 3524 Views
  • 1 replies
  • 0 kudos
Latest Reply
Hanna08
New Contributor II
  • 0 kudos

Thanks for the detailed explanation. For those who want to have constant technical support for their work processes, I recommend JD Young. Here is only the latest information about the update in the world of information technology solutions and cyber...

  • 0 kudos
Cano
by New Contributor III
  • 2006 Views
  • 1 replies
  • 2 kudos

How to add notebook to my Databricks jdbc url?

Please how do I add a notebook to the jdbc url in order to run queries externally?jdbc:databricks://dbc-a1b2345c-d6e7.cloud.databricks.com:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/1234567890123456/1234-567890-reef123;AuthMech=3;...

  • 2006 Views
  • 1 replies
  • 2 kudos
Latest Reply
ranged_coop
Valued Contributor II
  • 2 kudos

Not sure if it is possible.Alternatively you could try adding your notebook to a job, and then triggering that job via jobs api.Please refer below link Jobs API 2.1 | Databricks on AWS

  • 2 kudos
tomnguyen_195
by New Contributor III
  • 3204 Views
  • 2 replies
  • 3 kudos

DLT maintenance job got stuck

Hi all,Recently we just realize a huge cost associate with our databricks account and the main culprit of it is DLT's pipeline maintenance job that got auto-scheduled to run but got stucked and cost us thousand of DBU. Do you know what would be the r...

  • 3204 Views
  • 2 replies
  • 3 kudos
Latest Reply
tinai_long
New Contributor III
  • 3 kudos

Same question. These maintenance jobs run for the maximum timeout (168 hours) and do not terminate. Example below:

  • 3 kudos
1 More Replies
Sha_1890
by New Contributor III
  • 6560 Views
  • 8 replies
  • 0 kudos

How to execute a series of stored procedures using scala in databricks

I am working in a migration project, where lift and shift method is used to migrate SQL server DB from onprem to AZure Cloud. There are a lot of stored procedures used for integration in On prem. Now here in On prem , to process the XMl file and exec...

  • 6560 Views
  • 8 replies
  • 0 kudos
Latest Reply
Noopur_Nigam
Databricks Employee
  • 0 kudos

Hi @shafana Roohi Jahubar​ I hope that your queries are answered. Please let me know if you have more doubts.

  • 0 kudos
7 More Replies
TMNGB
by New Contributor II
  • 3356 Views
  • 2 replies
  • 2 kudos

Resolved! Does MERGE statement preserve order? (Slowly Changing Dimensions)

In the case of processing multiple source files - with potentially, one or multiple entity versions per source - being able to use the MERGE statement whilst preserving the order is key to ensure the correct versioning of entity versions (aka, versio...

  • 3356 Views
  • 2 replies
  • 2 kudos
Latest Reply
Noopur_Nigam
Databricks Employee
  • 2 kudos

Hi @Guilherme Banhudo​ I hope that werners answer would have helped you. Please let me know if you still have doubts or queries.

  • 2 kudos
1 More Replies
SaiN
by New Contributor II
  • 3148 Views
  • 2 replies
  • 4 kudos

How to get Cost Per Job on a Single Cluster?

How will you get the granular information for cost per job for a single cluster in Azure Databricks? I know we can give Tags for Jobs as well Only Cluster we have. But I can only see Cluster Tag but not the Job TAGs in Cost Analysis on Azure Portal. ...

  • 3148 Views
  • 2 replies
  • 4 kudos
Latest Reply
Prabakar
Databricks Employee
  • 4 kudos

Hi @Sainath Nagare​  Job tags will be propagated on the job clusters. If you are using an interactive cluster for your job then you won't be able to see the Job tag.

  • 4 kudos
1 More Replies
77796
by New Contributor II
  • 5815 Views
  • 4 replies
  • 0 kudos

Databricks S3A error - java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory not found

We are getting the below error for runtime 10.x and 11.x when writing to s3 via saveAsNewAPIHadoopFile function. The same jobs are running fine on runtime 9.x and 7.x. The difference betwen 9.x and 10.x is the former has hadoop 2.7 bindings with sp...

  • 5815 Views
  • 4 replies
  • 0 kudos
Latest Reply
77796
New Contributor II
  • 0 kudos

We have resolved this issue by using s3 scheme instead of s3a i.e. pairRDD.saveAsNewAPIHadoopFile("s3://bucket/testout.dat",

  • 0 kudos
3 More Replies
zyang
by Contributor II
  • 4746 Views
  • 5 replies
  • 2 kudos

azure databricks notebook cannot load the difference

I am trying to commit and push my change to the branch, I cannot load the difference. I haven't changed many cells and each cells doesn't exceed the 500 lines in the notebook file. I am wondering why this happens and how to solve it?

Screenshot 2022-06-26 101907
  • 4746 Views
  • 5 replies
  • 2 kudos
Latest Reply
Vidula
Honored Contributor
  • 2 kudos

Hey there @z yang​ Hope all is well! Just wanted to check in if you were able to resolve your issue, and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Th...

  • 2 kudos
4 More Replies
OldDogNewTrix
by New Contributor
  • 1520 Views
  • 3 replies
  • 0 kudos
  • 1520 Views
  • 3 replies
  • 0 kudos
Latest Reply
Vidula
Honored Contributor
  • 0 kudos

Hey there @Jim Carlson​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you...

  • 0 kudos
2 More Replies
Yagao
by New Contributor
  • 1520 Views
  • 2 replies
  • 0 kudos

How to do python within sql query in Databricks ?

How to do python within sql query in Databricks ?

  • 1520 Views
  • 2 replies
  • 0 kudos
Latest Reply
Vidula
Honored Contributor
  • 0 kudos

Hi @Ya Gao​ Hope you are well. Just wanted to see if you were able to find an answer to your question and would you like to mark an answer as best? It would be really helpful for the other members too.Cheers!

  • 0 kudos
1 More Replies
Mapajr
by New Contributor III
  • 3680 Views
  • 2 replies
  • 3 kudos

Issues pushing repos on Gitlab with Databricks

Our company uses Gitlab enterprise edition and we link our repos up to databricks through this. Randomly we will get errors when trying to push the repo and we have to spend hours debugging trying to figure out what is causing the push error on datab...

  • 3680 Views
  • 2 replies
  • 3 kudos
Latest Reply
Vidula
Honored Contributor
  • 3 kudos

Hey there @Mark Patrick​ Hope you are well. Just wanted to see if you were able to find an answer to your question and would you like to mark an answer as best? It would be really helpful for the other members too.Cheers!

  • 3 kudos
1 More Replies
ViktorWolf
by New Contributor
  • 2061 Views
  • 3 replies
  • 0 kudos

Why sometimes autoloader lose the checkpoint path and break the streaming?

Why sometimes autoloader lose the checkpoint path and break the streaming?

  • 2061 Views
  • 3 replies
  • 0 kudos
Latest Reply
Vidula
Honored Contributor
  • 0 kudos

Hi @Vittorio Antonacci​ Hope all is well! Just wanted to check in if you were able to resolve your issue, and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from y...

  • 0 kudos
2 More Replies
al_joe
by Contributor
  • 1213 Views
  • 0 replies
  • 3 kudos

Why is this simple numerical operation not precise?

I was experimenting with beginner tutorial and saw this strange output ...Why is this so ? And why is the behavior not consistent for ALL rows updated by the same statement?8.8 - 1 = 7.800000000000001See screenshot ...

20220827_180902_msedge_DE_2.1_-_Managing_Delta_Tables_-_Databricks_-_Pers
  • 1213 Views
  • 0 replies
  • 3 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels