cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

SatishGunjal
by New Contributor
  • 3518 Views
  • 1 replies
  • 0 kudos

Data frame takes long time to print count of rows

We have a pyspark data frame with 50 MN records. We can display records from it, but it takes around 10 minutes to print the shape of dataframe. We aim to use this data for modelling that will take some numerical features based on the final data fra...

  • 3518 Views
  • 1 replies
  • 0 kudos
Latest Reply
Hanna08
New Contributor II
  • 0 kudos

Thanks for the detailed explanation. For those who want to have constant technical support for their work processes, I recommend JD Young. Here is only the latest information about the update in the world of information technology solutions and cyber...

  • 0 kudos
Cano
by New Contributor III
  • 2002 Views
  • 1 replies
  • 2 kudos

How to add notebook to my Databricks jdbc url?

Please how do I add a notebook to the jdbc url in order to run queries externally?jdbc:databricks://dbc-a1b2345c-d6e7.cloud.databricks.com:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/1234567890123456/1234-567890-reef123;AuthMech=3;...

  • 2002 Views
  • 1 replies
  • 2 kudos
Latest Reply
ranged_coop
Valued Contributor II
  • 2 kudos

Not sure if it is possible.Alternatively you could try adding your notebook to a job, and then triggering that job via jobs api.Please refer below link Jobs API 2.1 | Databricks on AWS

  • 2 kudos
tomnguyen_195
by New Contributor III
  • 3203 Views
  • 2 replies
  • 3 kudos

DLT maintenance job got stuck

Hi all,Recently we just realize a huge cost associate with our databricks account and the main culprit of it is DLT's pipeline maintenance job that got auto-scheduled to run but got stucked and cost us thousand of DBU. Do you know what would be the r...

  • 3203 Views
  • 2 replies
  • 3 kudos
Latest Reply
tinai_long
New Contributor III
  • 3 kudos

Same question. These maintenance jobs run for the maximum timeout (168 hours) and do not terminate. Example below:

  • 3 kudos
1 More Replies
Sha_1890
by New Contributor III
  • 6557 Views
  • 8 replies
  • 0 kudos

How to execute a series of stored procedures using scala in databricks

I am working in a migration project, where lift and shift method is used to migrate SQL server DB from onprem to AZure Cloud. There are a lot of stored procedures used for integration in On prem. Now here in On prem , to process the XMl file and exec...

  • 6557 Views
  • 8 replies
  • 0 kudos
Latest Reply
Noopur_Nigam
Databricks Employee
  • 0 kudos

Hi @shafana Roohi Jahubar​ I hope that your queries are answered. Please let me know if you have more doubts.

  • 0 kudos
7 More Replies
TMNGB
by New Contributor II
  • 3338 Views
  • 2 replies
  • 2 kudos

Resolved! Does MERGE statement preserve order? (Slowly Changing Dimensions)

In the case of processing multiple source files - with potentially, one or multiple entity versions per source - being able to use the MERGE statement whilst preserving the order is key to ensure the correct versioning of entity versions (aka, versio...

  • 3338 Views
  • 2 replies
  • 2 kudos
Latest Reply
Noopur_Nigam
Databricks Employee
  • 2 kudos

Hi @Guilherme Banhudo​ I hope that werners answer would have helped you. Please let me know if you still have doubts or queries.

  • 2 kudos
1 More Replies
SaiN
by New Contributor II
  • 3147 Views
  • 2 replies
  • 4 kudos

How to get Cost Per Job on a Single Cluster?

How will you get the granular information for cost per job for a single cluster in Azure Databricks? I know we can give Tags for Jobs as well Only Cluster we have. But I can only see Cluster Tag but not the Job TAGs in Cost Analysis on Azure Portal. ...

  • 3147 Views
  • 2 replies
  • 4 kudos
Latest Reply
Prabakar
Databricks Employee
  • 4 kudos

Hi @Sainath Nagare​  Job tags will be propagated on the job clusters. If you are using an interactive cluster for your job then you won't be able to see the Job tag.

  • 4 kudos
1 More Replies
77796
by New Contributor II
  • 5806 Views
  • 4 replies
  • 0 kudos

Databricks S3A error - java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory not found

We are getting the below error for runtime 10.x and 11.x when writing to s3 via saveAsNewAPIHadoopFile function. The same jobs are running fine on runtime 9.x and 7.x. The difference betwen 9.x and 10.x is the former has hadoop 2.7 bindings with sp...

  • 5806 Views
  • 4 replies
  • 0 kudos
Latest Reply
77796
New Contributor II
  • 0 kudos

We have resolved this issue by using s3 scheme instead of s3a i.e. pairRDD.saveAsNewAPIHadoopFile("s3://bucket/testout.dat",

  • 0 kudos
3 More Replies
zyang
by Contributor II
  • 4733 Views
  • 5 replies
  • 2 kudos

azure databricks notebook cannot load the difference

I am trying to commit and push my change to the branch, I cannot load the difference. I haven't changed many cells and each cells doesn't exceed the 500 lines in the notebook file. I am wondering why this happens and how to solve it?

Screenshot 2022-06-26 101907
  • 4733 Views
  • 5 replies
  • 2 kudos
Latest Reply
Vidula
Honored Contributor
  • 2 kudos

Hey there @z yang​ Hope all is well! Just wanted to check in if you were able to resolve your issue, and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Th...

  • 2 kudos
4 More Replies
OldDogNewTrix
by New Contributor
  • 1517 Views
  • 3 replies
  • 0 kudos
  • 1517 Views
  • 3 replies
  • 0 kudos
Latest Reply
Vidula
Honored Contributor
  • 0 kudos

Hey there @Jim Carlson​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you...

  • 0 kudos
2 More Replies
Yagao
by New Contributor
  • 1517 Views
  • 2 replies
  • 0 kudos

How to do python within sql query in Databricks ?

How to do python within sql query in Databricks ?

  • 1517 Views
  • 2 replies
  • 0 kudos
Latest Reply
Vidula
Honored Contributor
  • 0 kudos

Hi @Ya Gao​ Hope you are well. Just wanted to see if you were able to find an answer to your question and would you like to mark an answer as best? It would be really helpful for the other members too.Cheers!

  • 0 kudos
1 More Replies
Mapajr
by New Contributor III
  • 3676 Views
  • 2 replies
  • 3 kudos

Issues pushing repos on Gitlab with Databricks

Our company uses Gitlab enterprise edition and we link our repos up to databricks through this. Randomly we will get errors when trying to push the repo and we have to spend hours debugging trying to figure out what is causing the push error on datab...

  • 3676 Views
  • 2 replies
  • 3 kudos
Latest Reply
Vidula
Honored Contributor
  • 3 kudos

Hey there @Mark Patrick​ Hope you are well. Just wanted to see if you were able to find an answer to your question and would you like to mark an answer as best? It would be really helpful for the other members too.Cheers!

  • 3 kudos
1 More Replies
ViktorWolf
by New Contributor
  • 2059 Views
  • 3 replies
  • 0 kudos

Why sometimes autoloader lose the checkpoint path and break the streaming?

Why sometimes autoloader lose the checkpoint path and break the streaming?

  • 2059 Views
  • 3 replies
  • 0 kudos
Latest Reply
Vidula
Honored Contributor
  • 0 kudos

Hi @Vittorio Antonacci​ Hope all is well! Just wanted to check in if you were able to resolve your issue, and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from y...

  • 0 kudos
2 More Replies
al_joe
by Contributor
  • 1210 Views
  • 0 replies
  • 3 kudos

Why is this simple numerical operation not precise?

I was experimenting with beginner tutorial and saw this strange output ...Why is this so ? And why is the behavior not consistent for ALL rows updated by the same statement?8.8 - 1 = 7.800000000000001See screenshot ...

20220827_180902_msedge_DE_2.1_-_Managing_Delta_Tables_-_Databricks_-_Pers
  • 1210 Views
  • 0 replies
  • 3 kudos
RiyazAliM
by Honored Contributor
  • 7395 Views
  • 6 replies
  • 4 kudos

Issue with .dbc in the Advanced Data Engineering course in Databricks Academy

The very first notebook of the dbc notebook which is a setup cell fails.

image
  • 7395 Views
  • 6 replies
  • 4 kudos
Latest Reply
Niha1
New Contributor III
  • 4 kudos

Hi Riyaz,Please find the snippet of the error below--:"AnalysisException: Path does not exist: dbfs:/user/nniha9188@gmail.com/dbacademy/machine_learning/datasets/airbnb/sf-listings/sf-listings-2019-03-06-clean.parquet"Source-The source for this datas...

  • 4 kudos
5 More Replies
Shakzz
by New Contributor III
  • 5699 Views
  • 2 replies
  • 10 kudos
  • 5699 Views
  • 2 replies
  • 10 kudos
Latest Reply
Vidula
Honored Contributor
  • 10 kudos

Hey there @Shakti Chand​ Hope all is well! Just wanted to check in if you were able to resolve your issue, and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from y...

  • 10 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels