cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

modest2va
by New Contributor II
  • 278 Views
  • 6 replies
  • 2 kudos

AnalysisException: is not a Delta table. but that table is Delta table

When running a Databricks notebook,an error occurs stating that SOME_TABLE is not a Delta table.However, after executing the describe detail command and checking the format,the table is shown as Delta.Without taking any specific actions, re-running t...

  • 278 Views
  • 6 replies
  • 2 kudos
Latest Reply
Witold
Contributor III
  • 2 kudos

Another thing you could check is how the underlying data looks like. Maybe the actual writer of the data, messed it up.

  • 2 kudos
5 More Replies
DBUser2
by New Contributor II
  • 85 Views
  • 0 replies
  • 0 kudos

Simba ODBC batch queries

I'm using Simba ODBC driver to Connect to databricks. Since this driver doesn't support transactions, I was trying to run a DELETE and then INSERT query from a within a single execute, but I get an error. Is there an alternate way to perform a batch ...

  • 85 Views
  • 0 replies
  • 0 kudos
sakuraDev
by New Contributor II
  • 127 Views
  • 1 replies
  • 1 kudos

Resolved! schema is not enforced when using autoloader

Hi everyone,I am currently trying to enforce the following schema:  StructType([ StructField("site", StringType(), True), StructField("meter", StringType(), True), StructField("device_time", StringType(), True), StructField("data", St...

sakuraDev_0-1725389159389.png
  • 127 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Contributor
  • 1 kudos

Hi @sakuraDev ,I'm afraid your assumption is wrong. Here you define data field as struct type and the result is as expected. So once you have this column as struct type, you can refer to nested object using dot notation. So if you would like to get e...

  • 1 kudos
Stellar
by New Contributor II
  • 3202 Views
  • 2 replies
  • 1 kudos

Resolved! Databricks CI/CD Azure Devops

Hi all,I am looking for advice on what would be the best approach when it comes to CI/CD in Databricks and repo in general. What would be the best approach; to have main branch and branch off of it or? How will changes be propagated from dev to qa an...

  • 3202 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @Stellar, Setting up a robust CI/CD (Continuous Integration/Continuous Deployment) pipeline for Databricks involves thoughtful planning and adherence to best practices. Let’s break down the key aspects: Development Workflow: Branching Strateg...

  • 1 kudos
1 More Replies
m_weirath
by New Contributor
  • 305 Views
  • 2 replies
  • 0 kudos

DLT-META requires ddl when using cdc_apply_changes

We are setting up new DLT Pipelines using the DLT-Meta package. Everything is going well in bringing our data in from Landing to our Bronze layer when we keep the onboarding JSON fairly vanilla. However, we are hitting an issue when using the cdc_app...

  • 305 Views
  • 2 replies
  • 0 kudos
Latest Reply
dbuser17
New Contributor II
  • 0 kudos

Please check these details: https://github.com/databrickslabs/dlt-meta/issues/90

  • 0 kudos
1 More Replies
dikokob
by New Contributor II
  • 3294 Views
  • 4 replies
  • 1 kudos

Databricks Autoloader Checkpoint

Hello Databricks Community,I'm encountering an issue with the Databricks Autoloader where, after running successfully for a period of time, it suddenly stops detecting new files in the source directory. This issue only gets resolved when I reset the ...

  • 3294 Views
  • 4 replies
  • 1 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 1 kudos

@dikokob That's a weird issue however, there are two things that I would check in the first place:- cloudFiles.maxFileAge, if set to None, that's fine. If it's other value - that could cause an issue (https://docs.databricks.com/en/ingestion/cloud-ob...

  • 1 kudos
3 More Replies
DBMIVEN
by New Contributor II
  • 112 Views
  • 1 replies
  • 0 kudos

DLT streaming table showing more "Written records" than its actually writing to table

Hi!I have a DLT setup streaming data from incoming parquet files, into bronze, silver and gold tables. There is a strange bug where in the Graph gui, the number of written records for the gold streaming-table is far greater than the actual data that ...

1.png 2.png graph.png
  • 112 Views
  • 1 replies
  • 0 kudos
Latest Reply
DBMIVEN
New Contributor II
  • 0 kudos

Also after running this for a while i get these errors: 

  • 0 kudos
xecel
by New Contributor
  • 204 Views
  • 1 replies
  • 0 kudos

import error with typing_extensions, issue with pyiceberg and pydantic

Hello All,I am currently working in a Databricks environment where I am trying to use the `pyiceberg` library to interact with Iceberg table metadata directly in Unity catalog enabled. However, I'm encountering an issue with package compatibility rel...

  • 204 Views
  • 1 replies
  • 0 kudos
Latest Reply
Brahmareddy
Valued Contributor II
  • 0 kudos

Hi @xecel, How are you doing today?As per my understanding, Ensure you're using a compatible version of typing_extensions by installing a specific version like 4.4.0 that might work with pyiceberg. Try reinstalling the libraries (pyiceberg and typing...

  • 0 kudos
abduldjafar
by New Contributor
  • 78 Views
  • 0 replies
  • 0 kudos

Merge take too long

Hi all,I performed a merge process on approximately 19 million rows using two i3.4xlarge workers. However, the process took around 20 minutes to complete. How can I further optimize this process? I have already implemented the OPTIMIZE command and us...

  • 78 Views
  • 0 replies
  • 0 kudos
ggsmith
by New Contributor III
  • 299 Views
  • 1 replies
  • 0 kudos

dlt Streaming Checkpoint Not Found

I am using Delta Live Tables and have my pipeline defined using the code below. My understanding is that a checkpoint is automatically set when using Delta Live Tables. I am using the Unity Catalog and Schema settings in the pipeline as the storage d...

  • 299 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Contributor
  • 0 kudos

Hi @ggsmith ,If you use Delta Live Tables then checkpoints are stored under the storage location specified in the DLT settings. Each table gets a dedicated directory under storage_location/checkpoints/<dlt_table_name. 

  • 0 kudos
ggsmith
by New Contributor III
  • 228 Views
  • 2 replies
  • 0 kudos

Resolved! DLT Streaming Schema and Select

I am reading JSON files written to adls from Kafka using dlt and spark.readStream to create a streaming table for my raw ingest data. My schema is two arrays at the top levelNewRecord array, OldRecord array. I pass the schema and I run a select on Ne...

Data Engineering
dlt
streaming
  • 228 Views
  • 2 replies
  • 0 kudos
Latest Reply
ggsmith
New Contributor III
  • 0 kudos

I did a full refresh from the delta tables pipeline and that fixed it. I guess it was remembering the first run where I just had the top level arrays as two columns in the table. 

  • 0 kudos
1 More Replies
DBUser2
by New Contributor II
  • 204 Views
  • 2 replies
  • 0 kudos

How to use transaction when connecting to Databricks using Simba ODBC driver

I'm connecting to a databricks instance using Simba ODBC driver(version 2.8.0.1002). And I am able to perform read and write on the delta tables. But if I want to do some INSERT/UPDATE/DELETE operations within a transaction, I get the below error, an...

  • 204 Views
  • 2 replies
  • 0 kudos
Latest Reply
florence023
New Contributor III
  • 0 kudos

@DBUser2 wrote:I'm connecting to a databricks instance using Simba ODBC driver(version 2.8.0.1002). And I am able to perform read and write on the delta tables. But if I want to do some INSERT/UPDATE/DELETE operations within a transaction, I get the ...

  • 0 kudos
1 More Replies
dener
by New Contributor
  • 173 Views
  • 0 replies
  • 0 kudos

Infinity load execution

I am experiencing performance issues when loading a table with 50 million rows into Delta Lake on AWS using Databricks. Despite successfully handling other larger tables, this especific table/process takes hours and doesn't finish. Here's the command...

  • 173 Views
  • 0 replies
  • 0 kudos
ivanychev
by Contributor II
  • 1179 Views
  • 3 replies
  • 0 kudos

Resolved! Delta table takes too long to write due to S3 full scan

DBR 14.3, Spark 3.5.0. We use AWS Glue Metastore.On August 20th some of our pipelines started timing out during write to a Delta table. We're experiencing many hours of driver executing post commit hooks. We write dataframes to delta with `mode=overw...

  • 1179 Views
  • 3 replies
  • 0 kudos
Latest Reply
ivanychev
Contributor II
  • 0 kudos

spark.databricks.delta.catalog.update.enabled=true setting helped but I still don't understand why the problem started to occur.https://docs.databricks.com/en/archive/external-metastores/external-hive-metastore.html#external-apache-hive-metastore-leg...

  • 0 kudos
2 More Replies
Labels