Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
Hi, I am facing an issue where one of my jobs taking so long since certain time, previously its only needs less than 1 hour to run a batch job that load json data and do a truncate and load to a delta table, but since june 2nd, it become so long that...
spark 3.4 and databricks 13 introduce two new types of timestamps for handling time zone information:- TIMESTAMP WITH LOCAL TIME ZONE: This type assumes that the input data is in the session's local time zone and converts it to UTC before processing....
I am using Autoloader to load files from a directory. I have set up File Notification with the Event Subscription. I have a backfill interval set to 1 day and have not run the stream for a week. There should only be about ~100 new files to pick up an...
Hi @nolanlavender008 Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answ...
Hi, I have a Pyspark job that takes about an hour to complete, when looking at the SQL tab on Spark UI I see this:Those processes run for more than 1 minute on a 60-minute process.This is Ganglia for that period (the last snapshot, will look into a l...
Hi @Alejandro Martinez Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you...
Hi, Since it is not very well explained, I want to know if the table history is a snapshot of the whole table at that point of time containing all the data or it tracks only some metadata of the table changes.To be more precise : if I have a table in...
Hi @data engineer Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so...
Hi All,I just wanted to know if is there any option to reduce time while loading Pyspark Dataframe into the Azure synapse table using Databricks.like..I have a pyspark dataframe that has around 40k records and I am trying to load data into the azure ...
Hi @Tinendra Kumar Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Tha...
Hi all, i want to plot multiple charts from a pandas datafreame. However, when i run the code below it says "Command result size exceeds limit: Exceeded 20971520 bytes (current = 20973124)". If I move line 11 and place at 21 (outside of the functi...
This has been going on for some time now; all errors look like this (note the weird `[0;34m` marks everywhere). How can we fix this?We're not doing anything crazy, this is just the latest runtime with pretty much the simplest possible hello world pro...
Have you tried detaching and reattaching the notebook? Or Cluster restart? Did you check you are not importing any specific library someone else with the right access might have installed some library with install to all clusters as checked.
Hi! I'm doing some tests to get an idea of how much time could be saved starting a cluster by using a pool and was wondering if the results I get are what should be expected.We're using AWS Databricks and used i3.xlarge as instance type (if that matt...
Hi @Paul Pelletier Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Tha...
I am trying to run an incremental data processing job using python wheel.The job is scheduled to run e.g. every hour.For my code to know what data increment to process, I inject it with the {{start_time}} as part of the command line, like so["end_dat...
I have a azure databricks job and it's triggered via ADF using a API call. I want see why the job has been taking n minutes to complete the tasks. When the job execution results, The job execution time says 15 mins and the individual cells/commands d...
Hey there @DineshKumar Does @Prabakar Ammeappin's response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly? Else please let us know if you need more help. Cheers!
I have met a problem , you can see in the picture as followed: there is some long delay between some jobs , I don't understand what happened and how to optimize the job ? Can anybody help me ? Thanks a lot.
Say I am getting a customer record from an website. I want to read the massage & then insert/update that one to snowflake table , depending on the records insert/update is successful I need to respond back the success / failure massage in say 1 sec. ...
Hello,I have a table on Azure Databricks that I keep updating with the "A" notebook.And I want to real time plotting the query result from the table (let's say SELECT COUNT(name), name FROM my_schema.my_table GROUP BY name).I know about Azure Applica...