cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

rt-slowth
by Contributor
  • 4046 Views
  • 3 replies
  • 0 kudos

Error : . If you expect to delete or update rows to the source table in the future.......

Flow 'user_silver' has FAILED fatally. An error occurred because we detected an update or delete to one or more rows in the source table. Streaming tables may only use append-only streaming sources. If you expect to delete or update rows to the sourc...

  • 4046 Views
  • 3 replies
  • 0 kudos
Latest Reply
Palash01
Valued Contributor
  • 0 kudos

Hey @rt-slowth Just checking in if the provided solution was helpful to you. If yes, please accept this as a Best Solution so that this thread can be considered closed.

  • 0 kudos
2 More Replies
rt-slowth
by Contributor
  • 11430 Views
  • 5 replies
  • 0 kudos

Questions about the design of bronze, silver, and gold for live streaming pipelines

I'm envisioning a live streaming pipeline.The bronze, or data ingestion, is being fetched using the directory listing mode of the autoloader.I'm not using File Notification Mode because I detect about 2-300 data changes per hour.I'm thinking about im...

Data Engineering
Delta Live Table
spark
  • 11430 Views
  • 5 replies
  • 0 kudos
Latest Reply
Palash01
Valued Contributor
  • 0 kudos

Hey @rt-slowth Thank you for sharing the code snippets. The code structure appears to be on the right track, and its dynamic nature is promising. With a few minor adjustments, it should achieve the desired outcome. Also, find the attached code syntax...

  • 0 kudos
4 More Replies
brickster_2018
by Databricks Employee
  • 8614 Views
  • 2 replies
  • 0 kudos

Resolved! How does Delta solve the large number of small file problems?

Delta creates more small files during merge and updates operations.

  • 8614 Views
  • 2 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

Delta solves the large number of small file problems using the below operations available for a Delta table. Optimize writes helps to optimizes the write operation by adding an additional shuffle step and reducing the number of output files. By defau...

  • 0 kudos
1 More Replies
ranged_coop
by Valued Contributor II
  • 23902 Views
  • 22 replies
  • 28 kudos

How to install Chromium Browser and Chrome Driver on DBX runtime 10.4 and above ?

Hi Team,We are wondering if there is a recommended way to install the chromium browser and chrome driver on Databricks Runtime 10.4 and above ?I have been through the site and have come across several links to this effect, but they all seem to be ins...

  • 23902 Views
  • 22 replies
  • 28 kudos
Latest Reply
Kaizen
Valued Contributor
  • 28 kudos

Look into Playwrite instead of Selenium. I went through the same process y'all went through here (ended up writing a init script to install the drivers etc)This is all done for you in playwright. Refer to this post - I hope it helps!!https://communit...

  • 28 kudos
21 More Replies
seefoods
by New Contributor III
  • 1252 Views
  • 2 replies
  • 0 kudos

cluster metrics collection

Hello @Debayan please how can i collect metrics provided by clusters metrics for databricks runtime 13.1 or latest using shell bash script. Cordially, Aubert EMAKO

  • 1252 Views
  • 2 replies
  • 0 kudos
Latest Reply
Debayan
Databricks Employee
  • 0 kudos

Hi, Cluster metrics is an UI tool and available in the UI only.  For reference:  https://docs.databricks.com/en/compute/cluster-metrics.html

  • 0 kudos
1 More Replies
chari
by Contributor
  • 8086 Views
  • 2 replies
  • 0 kudos

writing spark dataframe as CSV to a repo

Hi,I wrote a spark dataframe as csv to a repo (synced with github). But when I checked the folder, the file wasn't there. Here is my code:spark_df.write.format('csv').option('header','true').mode('overwrite').save('/Repos/abcd/mno/data') No error mes...

  • 8086 Views
  • 2 replies
  • 0 kudos
Latest Reply
feiyun0112
Honored Contributor
  • 0 kudos

 the folder 'Repos' is not your repo, it's `dbfs:/Repos`, please checkdbutils.fs.ls('/Repos/abcd/mno/data') 

  • 0 kudos
1 More Replies
Salman1
by New Contributor
  • 1140 Views
  • 0 replies
  • 0 kudos

Cannot find UDF on subsequent job runs on same cluster.

Hello, I am trying to run jobs with a JAR task type using databricks on AWS on an all-purpose cluster. The issue I'm facing is that the job will complete the first run successfully but on any subsequent runs, it will fail. I have to restart my cluste...

  • 1140 Views
  • 0 replies
  • 0 kudos
chari
by Contributor
  • 3835 Views
  • 2 replies
  • 0 kudos

Fatal error when writing a big pandas dF

Hello DB community,I was trying to write a pandas dataframe containing 100000 rows as excel. Moments in the execution I received a fatal error : "Python kernel is unresponsive."However, I am constrained from increasing the number of clusters or other...

Data Engineering
Databricks
excel
python
  • 3835 Views
  • 2 replies
  • 0 kudos
Latest Reply
Ayushi_Suthar
Databricks Employee
  • 0 kudos

Hi @chari ,Thanks for bringing up your concerns, always happy to help  We understand that you are facing the following error while you are writing a pandas dataframe containing 100000rows in excel. As per the Error >>> Fatal error: The Python kernel ...

  • 0 kudos
1 More Replies
Yaacoub
by New Contributor
  • 9660 Views
  • 2 replies
  • 1 kudos

[UDF_MAX_COUNT_EXCEEDED] Exceeded query-wide UDF limit of 5 UDFs

In my project I defined a UDF: @udf(returnType=IntegerType()) def ends_with_one(value, bit_position): if bit_position + len(value) < 0: return 0 else: return int(value[bit_position] == '1') spark.udf.register("ends_with_one"...

  • 9660 Views
  • 2 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 1 kudos

Hi @Yaacoub, Just a friendly follow-up. Have you had a chance to review my colleague's reply? Please inform us if it contributes to resolving your query.

  • 1 kudos
1 More Replies
abelian-grape
by New Contributor III
  • 7991 Views
  • 4 replies
  • 0 kudos

Intermittent error databricks job kept running

Hi i have the following error, but the job kept running, is that normal?{     "message": "The service at /api/2.0/jobs/runs/get?run_id=899157004942769 is temporarily unavailable. Please try again later. [TraceId: -]",     "error_code": "TEMPORARILY_U...

  • 7991 Views
  • 4 replies
  • 0 kudos
Latest Reply
abelian-grape
New Contributor III
  • 0 kudos

@Ayushi_Suthar also when ever it happens the job status does not change to "failed". But it keeps running. Is that normal?

  • 0 kudos
3 More Replies
joao_vnb
by New Contributor III
  • 64419 Views
  • 7 replies
  • 11 kudos

Resolved! Automate the Databricks workflow deployment

Hi everyone,Do you guys know if it's possible to automate the Databricks workflow deployment through azure devops (like what we do with the deployment of notebooks)?

  • 64419 Views
  • 7 replies
  • 11 kudos
Latest Reply
asingamaneni
New Contributor II
  • 11 kudos

Did you get a chance to try Brickflows - https://github.com/Nike-Inc/brickflowYou can find the documentation here - https://engineering.nike.com/brickflow/v0.11.2/Brickflow uses - Databricks Asset Bundles(DAB) under the hood but provides a Pythonic w...

  • 11 kudos
6 More Replies
isaac_gritz
by Databricks Employee
  • 9223 Views
  • 1 replies
  • 2 kudos

Change Data Capture with Databricks

How to leverage Change Data Capture (CDC) from your databases to DatabricksChange Data Capture allows you to ingest and process only changed records from database systems to dramatically reduce data processing costs and enable real-time use cases suc...

  • 9223 Views
  • 1 replies
  • 2 kudos
Latest Reply
prasad95
New Contributor III
  • 2 kudos

Hi, @isaac_gritz can you provide any reference resource to achieve the AWS DynamoDB CDC to Delta Tables.Thank You,

  • 2 kudos
DatBoi
by Contributor
  • 7430 Views
  • 2 replies
  • 1 kudos

Resolved! What happens to table created with CTAS statement when data in source table has changed

Hey all - I am sure this has been documented / answered before but what happens to a table created with a CTAS statement when data in the source table has changed? Does the sink table reflect the changes? Or is the data stored when the table is defin...

  • 7430 Views
  • 2 replies
  • 1 kudos
Latest Reply
SergeRielau
Databricks Employee
  • 1 kudos

CREATE TABLE AS (CTAS) is a "one and done" kind of statement.The new table retains no memory on how it came to be.Therefore it will be oblivious to changes in the source.Views, as you say, stored queries, no data is persisted. And therefore the query...

  • 1 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels