cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Mihai_Cog
by Contributor
  • 6418 Views
  • 6 replies
  • 10 kudos

Resolved! Change Data Feed Databricks

Hello,I am doing some testing with this feature Change Data Feed using Databricks and Pyspark, of course the Delta format and I don't understand something:I created a tableSaved some data insideEnabled Change Data Feed featureApply a merge with a dat...

  • 6418 Views
  • 6 replies
  • 10 kudos
Latest Reply
Tharun-Kumar
Honored Contributor II
  • 10 kudos

@Mihai_Cog You have to split your merge statement into 2 parts. (Update and Insert/Delete).MERGE INTO test t USING src s ON s.Id = t.Id and s.date_field = t.date_field and s.fields <> t.fields WHEN MATCHED THEN UPDATE SET * MERGE INTO test t USING sr...

  • 10 kudos
5 More Replies
felix_counter
by New Contributor III
  • 3253 Views
  • 3 replies
  • 3 kudos

Resolved! Order of delta table after read not as expected

Dear Databricks Community,I am performing three consecutive 'append' writes to a delta table, whereas the first append creates the table. Each append consists of two rows, which are ordered by column 'id' (see example in the attached screenshot). Whe...

  • 3253 Views
  • 3 replies
  • 3 kudos
Latest Reply
felix_counter
New Contributor III
  • 3 kudos

Thanks a lot @Lakshay and @Tharun-Kumar for your valued contributions!

  • 3 kudos
2 More Replies
dzm
by New Contributor
  • 1280 Views
  • 1 replies
  • 0 kudos

Using Libreoffice in Databricks

Hi Community, I'm using Databricks E2, and need to convert pptx files to pdf files.This can be done in either a python or an R notebook using #LibreofficeTo achieve this I'd have to download LibreOffice; I'm not too sure on how to do that. Would I ha...

Data Engineering
pdf
pptx
python
R
  • 1280 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

I suppose by Libreoffice you mean the sdk, without the frontend?You will have to install the jar as a library on the compute cluster.From that moment on, you can use the classes in your code.If you cannot run the jar from a command line, it might be ...

  • 0 kudos
The_raj
by New Contributor
  • 3975 Views
  • 1 replies
  • 2 kudos

Error while reading file <file path>. [DEFAULT_FILE_NOT_FOUND]

Hi,I have a workflow created where there are 5 notebooks in it. One of the notebooks is failing with below error. I have tried refreshing the table. Still facing the same issue. When I try to run the notebook manually, it works fine. Can someone plea...

  • 3975 Views
  • 1 replies
  • 2 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 2 kudos

Hi @The_raj ,  The error message you are encountering indicates a failure during the execution of a Spark job on Databricks. Specifically, it seems that Task 736 in Stage 92.0 failed multiple times, and the most recent loss was due to a "DEFAULT_FILE...

  • 2 kudos
samuraidjakk
by New Contributor II
  • 1413 Views
  • 2 replies
  • 1 kudos

Resolved! Lineage from Unity Catalog on GCP

We are in the prosess of trying to do a PoC of our pipelines using DLT. Normally, we use another tool and we have created a custom program to extract lineage. We want to try to get / display lineage using Unity Catalog.But.. we are on GCP, and it see...

  • 1413 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @samuraidjakk  Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best? If not, please tell us so we can help you. Thanks!

  • 1 kudos
1 More Replies
Magnus
by Contributor
  • 2364 Views
  • 3 replies
  • 1 kudos

Auto Loader fails when reading json element containing space

I'm using Auto Loader as part of a Delta Live Tables pipeline to ingest json files, and today it failed with this error message:om.databricks.sql.transaction.tahoe.DeltaAnalysisException: Found invalid character(s) among ' ,;{}()\n\t=' in the column ...

Data Engineering
Auto Loader
Delta Live Tables
  • 2364 Views
  • 3 replies
  • 1 kudos
Latest Reply
Tharun-Kumar
Honored Contributor II
  • 1 kudos

@Magnus You can read the input file using Pandas or Koalas (https://koalas.readthedocs.io/en/latest/index.html)then rename the columnsthen convert the Pandas/Koalas dataframe to Spark dataframe. You can write it back with the correct column name, so ...

  • 1 kudos
2 More Replies
Raghav2
by New Contributor
  • 6615 Views
  • 1 replies
  • 0 kudos

AnalysisException: [COLUMN_ALREADY_EXISTS] The column `<col>` already exists. Consider to choose an

Hey Guys,          I'm facing this exception while trying to read public s3 bucket "Analysis Exception: [COLUMN_ALREADY_EXISTS] The column `<column name>` already exists. Consider to choose another name or rename the existing column.",also thing is I...

  • 6615 Views
  • 1 replies
  • 0 kudos
Latest Reply
Lakshay
Esteemed Contributor
  • 0 kudos

You can use dbutils to read the file.%fshead <s3 path>

  • 0 kudos
zsucic1
by New Contributor III
  • 3352 Views
  • 2 replies
  • 0 kudos

Resolved! Trigger file_arrival of job on Delta Lake table change

Is there a way to avoid having to create an external data location Simply to trigger a job when new data comes to a specific Delta Lake table?

  • 3352 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @zsucic1  Hope you are well. Just wanted to see if you were able to find an answer to your question and would you like to mark an answer as best? It would be really helpful for the other members too. Cheers!

  • 0 kudos
1 More Replies
apiury
by New Contributor III
  • 1877 Views
  • 2 replies
  • 1 kudos

Consume gold data layer from web application

Hello!We are developing a web application in .NET, we need to consume data in gold layer, (as if we had a relational database), how can we do it? export data to sql server from gold layer?

  • 1877 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @apiury  Thank you for posting your question in our community! We are happy to assist you. To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your ...

  • 1 kudos
1 More Replies
442027
by New Contributor II
  • 4359 Views
  • 2 replies
  • 3 kudos

Resolved! Delta Log checkpoints not being created?

It is mentioned in the delta protocol that checkpoints for delta tables are created every 10 commits - however when I modify a table after >10 separate operations (producing >10 separate json files in the _delta_log directory), no checkpoint files ar...

  • 4359 Views
  • 2 replies
  • 3 kudos
Latest Reply
Vinay_M_R
Valued Contributor II
  • 3 kudos

 As the latest update now checkpointing of delta tables are created for every 100 commits. This is done for some improvement purpose.If you want to have a checkpoint file for delta table for every 10 commits or after any desired commits. You can cust...

  • 3 kudos
1 More Replies
ariforu
by New Contributor
  • 598 Views
  • 1 replies
  • 0 kudos

Cross region DR setup.

Does anybody have any guidance on the best practices in setting up a DR env on a different region or on a different cloud ?

  • 598 Views
  • 1 replies
  • 0 kudos
Latest Reply
karthik_p
Esteemed Contributor
  • 0 kudos

@ariforu cross region you can setup, but different cloud looks as of now it is not supported. did you get a chacne to go through this https://docs.databricks.com/administration-guide/disaster-recovery.html

  • 0 kudos
JohanBringsdal
by New Contributor
  • 609 Views
  • 0 replies
  • 0 kudos

Migrating old solution to new optimal delta lake setup

Hi Databricks community!I have previsouly worked on a project that easily could be optimized with Databricks. It is currently running on Azure Synapse, but the premise is the same.Ill describe the scenario here:1. Data owners send a constant flow of ...

  • 609 Views
  • 0 replies
  • 0 kudos
Labels