cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

bergmaal
by New Contributor III
  • 2857 Views
  • 2 replies
  • 1 kudos

Workflows 7 second delay between tasks

When you have a job in Workflows with multiple tasks running after one another, there seems to be a consistent 7 seconds delay between execution of the tasks. Or, more precisely, every task has an approximate 7 second overhead before the code actuall...

Data Engineering
delay
overhead
tasks
Workflows
  • 2857 Views
  • 2 replies
  • 1 kudos
Latest Reply
JensH
New Contributor III
  • 1 kudos

Hi @bergmaal , I am experiencing the same issue.My Databricks consultant suggested opening a support ticket as this should not be normal behavior.Did you solve this issue yet?We observed these delays do not seem to occur in workflows that use noteboo...

  • 1 kudos
1 More Replies
NotARobot
by New Contributor III
  • 2388 Views
  • 1 replies
  • 1 kudos

Delta Live Tables UDFs and Versions

Trying to do a url_decode on a column, which works great in development, but running via DLT fails when trying multiple ways.1. pyspark.sql.functions.url_decode - This is new as of 3.5.0, but isn't supported using whatever version running a DLT pipel...

  • 2388 Views
  • 1 replies
  • 1 kudos
Latest Reply
NotARobot
New Contributor III
  • 1 kudos

Thanks @Retired_mod, for reference if anybody finds this, the DLT release docs are here: https://docs.databricks.com/en/release-notes/delta-live-tables/index.htmlThis shows which versions are running for CURRENT and PREVIEW channels. In this case, wa...

  • 1 kudos
KrzysztofPrzyso
by New Contributor III
  • 12928 Views
  • 2 replies
  • 0 kudos

Shared job clusters on Azure Data Factory ADF

Hi Databricks Community,If only possible I would like to use Shared Jobs Cluster on external orchestrator like Azure Data Factory (ADF) or Synapse Workspace.The main reasons for using Shared Job cluster are:reduction of start-up time (<1min vs 5 min ...

  • 12928 Views
  • 2 replies
  • 0 kudos
Latest Reply
KrzysztofPrzyso
New Contributor III
  • 0 kudos

Hi Sai Kumar,Many thanks for your response.Unfortunately using analytical clusters is not really an option for for me due to cost differences between job clusters and analytical clusters.Job cluster also offer assurance that the latest deployed versi...

  • 0 kudos
1 More Replies
prasad95
by New Contributor III
  • 11284 Views
  • 3 replies
  • 1 kudos
Data Engineering
Delta Lake
  • 11284 Views
  • 3 replies
  • 1 kudos
Latest Reply
saikumar246
Databricks Employee
  • 1 kudos

Hi, @prasad95 Thank you for sharing your concern here.  In addition to the @Retired_mod comments you can follow below To capture Change Data (CDC) from DynamoDB Streams and write it into a Delta table in Databricks: 1. Connect to DynamoDB Streams and...

  • 1 kudos
2 More Replies
User16752245312
by Databricks Employee
  • 5474 Views
  • 3 replies
  • 2 kudos
  • 5474 Views
  • 3 replies
  • 2 kudos
Latest Reply
Aria
New Contributor III
  • 2 kudos

We are using azure.I dont see an option for deployment name. Secondly, we have already deployed all our workspaces and wants to have user friendly URLs.Like some changes in DNS server or proxy URLs.

  • 2 kudos
2 More Replies
Kaizen
by Valued Contributor
  • 2916 Views
  • 2 replies
  • 0 kudos

Copy Local file using a Shared Cluster

Hi, I am saving some files locally on my cluster and moving them after my job. These are log files of my process so I cant directly reference a DBFS location. However the dbutils.fs.cp command does not work on the shared cluster. This does however wo...

  • 2916 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaizen
Valued Contributor
  • 0 kudos

For reference when doing this on a single user (personal) cluster - the file is store in:/databricks/driver/logs.txt Which has no issue accessing and copying to dbfs after using the dbutil commands

  • 0 kudos
1 More Replies
thethirtyfour
by New Contributor III
  • 6371 Views
  • 2 replies
  • 1 kudos

Resolved! Install R Package "sf"

Hi,I am trying to install the following four dependency packages in order to install "slu-openGIS/postmastr" directly from GitHub:unitssftigristidycensusWhen attempting to install "units", I received the following configuration error: %r install.pack...

  • 6371 Views
  • 2 replies
  • 1 kudos
Latest Reply
thethirtyfour
New Contributor III
  • 1 kudos

Thank you!

  • 1 kudos
1 More Replies
jcozar
by Contributor
  • 7569 Views
  • 5 replies
  • 2 kudos

Resolved! CDC and raw data

Hi, I am using debezium server to send data from Postgres to a Kafka topic (in fact, Azure EventHub). My question is, what are the best practices and recommendations to save raw data and then implement a medallion architecture?For clarification, I wa...

  • 7569 Views
  • 5 replies
  • 2 kudos
Latest Reply
jcozar
Contributor
  • 2 kudos

Thank you very much @Palash01 ! It has been really helpful!

  • 2 kudos
4 More Replies
rt-slowth
by Contributor
  • 4540 Views
  • 3 replies
  • 0 kudos

Error : . If you expect to delete or update rows to the source table in the future.......

Flow 'user_silver' has FAILED fatally. An error occurred because we detected an update or delete to one or more rows in the source table. Streaming tables may only use append-only streaming sources. If you expect to delete or update rows to the sourc...

  • 4540 Views
  • 3 replies
  • 0 kudos
Latest Reply
Palash01
Valued Contributor
  • 0 kudos

Hey @rt-slowth Just checking in if the provided solution was helpful to you. If yes, please accept this as a Best Solution so that this thread can be considered closed.

  • 0 kudos
2 More Replies
rt-slowth
by Contributor
  • 12473 Views
  • 5 replies
  • 0 kudos

Questions about the design of bronze, silver, and gold for live streaming pipelines

I'm envisioning a live streaming pipeline.The bronze, or data ingestion, is being fetched using the directory listing mode of the autoloader.I'm not using File Notification Mode because I detect about 2-300 data changes per hour.I'm thinking about im...

Data Engineering
Delta Live Table
spark
  • 12473 Views
  • 5 replies
  • 0 kudos
Latest Reply
Palash01
Valued Contributor
  • 0 kudos

Hey @rt-slowth Thank you for sharing the code snippets. The code structure appears to be on the right track, and its dynamic nature is promising. With a few minor adjustments, it should achieve the desired outcome. Also, find the attached code syntax...

  • 0 kudos
4 More Replies
brickster_2018
by Databricks Employee
  • 9408 Views
  • 2 replies
  • 0 kudos

Resolved! How does Delta solve the large number of small file problems?

Delta creates more small files during merge and updates operations.

  • 9408 Views
  • 2 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

Delta solves the large number of small file problems using the below operations available for a Delta table. Optimize writes helps to optimizes the write operation by adding an additional shuffle step and reducing the number of output files. By defau...

  • 0 kudos
1 More Replies
ranged_coop
by Valued Contributor II
  • 26654 Views
  • 22 replies
  • 28 kudos

How to install Chromium Browser and Chrome Driver on DBX runtime 10.4 and above ?

Hi Team,We are wondering if there is a recommended way to install the chromium browser and chrome driver on Databricks Runtime 10.4 and above ?I have been through the site and have come across several links to this effect, but they all seem to be ins...

  • 26654 Views
  • 22 replies
  • 28 kudos
Latest Reply
Kaizen
Valued Contributor
  • 28 kudos

Look into Playwrite instead of Selenium. I went through the same process y'all went through here (ended up writing a init script to install the drivers etc)This is all done for you in playwright. Refer to this post - I hope it helps!!https://communit...

  • 28 kudos
21 More Replies
seefoods
by Contributor II
  • 1534 Views
  • 2 replies
  • 0 kudos

cluster metrics collection

Hello @Debayan please how can i collect metrics provided by clusters metrics for databricks runtime 13.1 or latest using shell bash script. Cordially, Aubert EMAKO

  • 1534 Views
  • 2 replies
  • 0 kudos
Latest Reply
Debayan
Databricks Employee
  • 0 kudos

Hi, Cluster metrics is an UI tool and available in the UI only.  For reference:  https://docs.databricks.com/en/compute/cluster-metrics.html

  • 0 kudos
1 More Replies
chari
by Contributor
  • 8748 Views
  • 2 replies
  • 0 kudos

writing spark dataframe as CSV to a repo

Hi,I wrote a spark dataframe as csv to a repo (synced with github). But when I checked the folder, the file wasn't there. Here is my code:spark_df.write.format('csv').option('header','true').mode('overwrite').save('/Repos/abcd/mno/data') No error mes...

  • 8748 Views
  • 2 replies
  • 0 kudos
Latest Reply
feiyun0112
Honored Contributor
  • 0 kudos

 the folder 'Repos' is not your repo, it's `dbfs:/Repos`, please checkdbutils.fs.ls('/Repos/abcd/mno/data') 

  • 0 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels