Data Engineering

Forum Posts

Sorted by:

by iptkrisna • New Contributor III

06-14-2023 3:26:53 AM

1234 Views
1 replies
2 kudos

Jobs Data Pipeline Runtime Increase Significantly

Hi, I am facing an issue where one of my jobs taking so long since certain time, previously its only needs less than 1 hour to run a batch job that load json data and do a truncate and load to a delta table, but since june 2nd, it become so long that...

Data Engineering

1234 Views
1 replies
2 kudos

06-14-2023 3:26:53 AM

View Replies

Latest Reply

Anonymous
Not applicable

06-15-2023 11:49:47 PM

2 kudos

Hi @krisna math Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

2 kudos

06-15-2023 11:49:47 PM

by Hubert-Dudek • Esteemed Contributor III

05-05-2023 1:58:28 AM

1458 Views
1 replies
4 kudos

spark 3.4 and databricks 13 introduce two new types of timestamps for handling time zone information:- TIMESTAMP WITH LOCAL TIME ZONE: This type assum...

spark 3.4 and databricks 13 introduce two new types of timestamps for handling time zone information:- TIMESTAMP WITH LOCAL TIME ZONE: This type assumes that the input data is in the session's local time zone and converts it to UTC before processing....

Data Engineering

1458 Views
1 replies
4 kudos

05-05-2023 1:58:28 AM

View Replies

Latest Reply

Anonymous
Not applicable

05-07-2023 11:27:41 PM

4 kudos

This is helpful! Timestamps are always the reason to mess up the business logic as we know.

4 kudos

05-07-2023 11:27:41 PM

by nolanlavender00 • New Contributor

02-10-2023 11:39:10 AM

5104 Views
2 replies
0 kudos

How to control garbage collection while using Autoloader File Notification?

I am using Autoloader to load files from a directory. I have set up File Notification with the Event Subscription. I have a backfill interval set to 1 day and have not run the stream for a week. There should only be about ~100 new files to pick up an...

Data Engineering

5104 Views
2 replies
0 kudos

02-10-2023 11:39:10 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-10-2023 12:23:03 AM

0 kudos

Hi @nolanlavender008 Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answ...

0 kudos

04-10-2023 12:23:03 AM

1 More Replies

by alejandrofm • Valued Contributor

03-04-2023 7:47:51 AM

16755 Views
4 replies
0 kudos

AppendDataExecV1 Taking a lot of time

Hi, I have a Pyspark job that takes about an hour to complete, when looking at the SQL tab on Spark UI I see this:Those processes run for more than 1 minute on a 60-minute process.This is Ganglia for that period (the last snapshot, will look into a l...

Data Engineering

16755 Views
4 replies
0 kudos

03-04-2023 7:47:51 AM

View Replies

Latest Reply

Vartika
Databricks Employee

03-30-2023 11:44:10 PM

0 kudos

Hi @Alejandro Martinez Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you...

0 kudos

03-30-2023 11:44:10 PM

3 More Replies

by srDataEngineer • New Contributor II

02-16-2023 8:17:47 AM

4052 Views
4 replies
3 kudos

Resolved! how does databricks time travel work

Hi, Since it is not very well explained, I want to know if the table history is a snapshot of the whole table at that point of time containing all the data or it tracks only some metadata of the table changes.To be more precise : if I have a table in...

Data Engineering

4052 Views
4 replies
3 kudos

02-16-2023 8:17:47 AM

View Replies

Latest Reply

Anonymous
Not applicable

02-17-2023 10:56:13 PM

3 kudos

Hi @data engineer Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so...

3 kudos

02-17-2023 10:56:13 PM

3 More Replies

by tinendra • New Contributor III

02-14-2023 4:19:07 AM

4274 Views
5 replies
5 kudos

How to reduce time while loading data into the azure synapse table?

Hi All,I just wanted to know if is there any option to reduce time while loading Pyspark Dataframe into the Azure synapse table using Databricks.like..I have a pyspark dataframe that has around 40k records and I am trying to load data into the azure ...

Data Engineering

4274 Views
5 replies
5 kudos

02-14-2023 4:19:07 AM

View Replies

Latest Reply

Anonymous
Not applicable

02-16-2023 9:39:53 PM

5 kudos

Hi @Tinendra Kumar Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Tha...

5 kudos

02-16-2023 9:39:53 PM

4 More Replies

by resolver101757 • New Contributor

12-16-2022 4:07:13 AM

1149 Views
0 replies
0 kudos

i want to plot multiple data frames from a pandas datafreame

Hi all, i want to plot multiple charts from a pandas datafreame. However, when i run the code below it says "Command result size exceeds limit: Exceeded 20971520 bytes (current = 20973124)". If I move line 11 and place at 21 (outside of the functi...

Data Engineering

1149 Views
0 replies
0 kudos

12-16-2022 4:07:13 AM

by KVNARK • Honored Contributor II

12-07-2022 7:50:52 AM

1324 Views
2 replies
4 kudos

How much time does it take for the databricks partner account to get created

How much time does it take for the databricks partner account to get created after we submit the application to databricks.?

Data Engineering

1324 Views
2 replies
4 kudos

12-07-2022 7:50:52 AM

View Replies

Latest Reply

Harshjot
Contributor III

12-07-2022 8:55:03 AM

4 kudos

Hi @KVNARK . On training academy? It was instant for me.

4 kudos

12-07-2022 8:55:03 AM

1 More Replies

by stinodego • New Contributor III

11-18-2022 12:48:49 AM

4405 Views
8 replies
19 kudos

Python job run error messages are unreadable

This has been going on for some time now; all errors look like this (note the weird `[0;34m` marks everywhere). How can we fix this?We're not doing anything crazy, this is just the latest runtime with pretty much the simplest possible hello world pro...

Data Engineering

4405 Views
8 replies
19 kudos

11-18-2022 12:48:49 AM

View Replies

Latest Reply

VaibB
Contributor

12-02-2022 12:03:34 PM

19 kudos

Have you tried detaching and reattaching the notebook? Or Cluster restart? Did you check you are not importing any specific library someone else with the right access might have installed some library with install to all clusters as checked.

19 kudos

12-02-2022 12:03:34 PM

7 More Replies

by PaulP • New Contributor II

10-14-2022 12:36:54 PM

2879 Views
3 replies
6 kudos

What is the best expected starting time for a cluster when using a pool?

Hi! I'm doing some tests to get an idea of how much time could be saved starting a cluster by using a pool and was wondering if the results I get are what should be expected.We're using AWS Databricks and used i3.xlarge as instance type (if that matt...

Data Engineering

2879 Views
3 replies
6 kudos

10-14-2022 12:36:54 PM

View Replies

Latest Reply

Anonymous
Not applicable

11-25-2022 11:11:49 PM

6 kudos

Hi @Paul Pelletier Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Tha...

6 kudos

11-25-2022 11:11:49 PM

2 More Replies

by assapin • New Contributor

11-12-2022 2:54:32 AM

1497 Views
0 replies
0 kudos

{{start_time}} isn't accurate and doesn't behave logically for multi-task jobs

I am trying to run an incremental data processing job using python wheel.The job is scheduled to run e.g. every hour.For my code to know what data increment to process, I inject it with the {{start_time}} as part of the command line, like so["end_dat...

Data Engineering

1497 Views
0 replies
0 kudos

11-12-2022 2:54:32 AM

by Dineshkumar_Raj • New Contributor

05-30-2022 10:25:16 PM

2845 Views
2 replies
1 kudos

why the job running time and command execution time not matching in databricks

I have a azure databricks job and it's triggered via ADF using a API call. I want see why the job has been taking n minutes to complete the tasks. When the job execution results, The job execution time says 15 mins and the individual cells/commands d...

Data Engineering

2845 Views
2 replies
1 kudos

05-30-2022 10:25:16 PM

View Replies

Latest Reply

Anonymous
Not applicable

07-29-2022 9:49:02 AM

1 kudos

Hey there @DineshKumar Does @Prabakar Ammeappin's response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly? Else please let us know if you need more help. Cheers!

1 kudos

07-29-2022 9:49:02 AM

1 More Replies

by dududu • New Contributor II

04-25-2022 8:31:30 AM

1316 Views
1 replies
0 kudos

How to explain the huge time latency between two jobs? How to optimize the job to reduce the latency ?

I have met a problem , you can see in the picture as followed: there is some long delay between some jobs , I don't understand what happened and how to optimize the job ? Can anybody help me ? Thanks a lot.

Data Engineering

1316 Views
1 replies
0 kudos

04-25-2022 8:31:30 AM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

07-25-2022 2:40:39 PM

0 kudos

Hi @jieping zhang,Did you check the driver's logs? do you see any error messages? please provide more details

0 kudos

07-25-2022 2:40:39 PM

by CHANDY • New Contributor

05-25-2022 9:36:54 PM

991 Views
0 replies
0 kudos

real time data processing

Say I am getting a customer record from an website. I want to read the massage & then insert/update that one to snowflake table , depending on the records insert/update is successful I need to respond back the success / failure massage in say 1 sec. ...

Data Engineering

991 Views
0 replies
0 kudos

05-25-2022 9:36:54 PM

by LukaszJ • Contributor III

05-14-2022 5:13:57 AM

815 Views
0 replies
0 kudos

Real time query plotting

Hello,I have a table on Azure Databricks that I keep updating with the "A" notebook.And I want to real time plotting the query result from the table (let's say SELECT COUNT(name), name FROM my_schema.my_table GROUP BY name).I know about Azure Applica...

Data Engineering

815 Views
0 replies
0 kudos

05-14-2022 5:13:57 AM