cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

NathanE
by New Contributor II
  • 1684 Views
  • 1 replies
  • 1 kudos

Time travel on views

Hello,At my company, we design an application to analyze data, and we can do so on top of external databases such as Databricks. Our application cache some data in-memory and to avoid synchronization issues with the data on Databricks, we rely heavil...

  • 1684 Views
  • 1 replies
  • 1 kudos
Latest Reply
karthik_p
Esteemed Contributor
  • 1 kudos

@NathanE As you said, based on below article it may not support currenlty https://docs.databricks.com/en/sql/user/materialized-views.html, but at the same time looks as Materialized View is built on top of table and It is synchronous operation ( when...

  • 1 kudos
Rubini_MJ
by New Contributor
  • 8603 Views
  • 1 replies
  • 0 kudos

Resolved! Other memory of the driver is high even in a newly spun cluster

Hi Team Experts,    I am experiencing a high memory consumption in the other part in the memory utilization part in the metrics tab. Right now am not running any jobs but still out of 8gb driver memory 6gb is almost full by other and only 1.5 gb is t...

  • 8603 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16539034020
Contributor II
  • 0 kudos

Hello,  Thanks for contacting Databricks Support.  Seems you are concern with high memory consumption in the "other" category in the driver node of a Spark cluster. As there are no logs/detail information provided, I only can address several potentia...

  • 0 kudos
KiranKondamadug
by New Contributor II
  • 1036 Views
  • 0 replies
  • 0 kudos

Databricks Mosaic's grid_polyfill() is taking longer to explode the index when run using PySpark

Pyspark Configuration: pyspark --packages io.delta:delta-core_2.12:2.4.0,org.apache.hadoop:hadoop-aws:3.3.4,io.delta:delta-storage-s3-dynamodb:2.4.0 --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark...

Data Engineering
Delta Lake
Explode
mosaic
spark
  • 1036 Views
  • 0 replies
  • 0 kudos
alj_a
by New Contributor III
  • 2030 Views
  • 2 replies
  • 2 kudos

Resolved! Delta Live Table - not reading the changed record from cloud file

Hi,I am trying to ingest the data from cloudfile to bronze table. DLT is working fist time and loading the data into Bronze table. but when i add new record and change a filed in existing record the DLT pipeline goes success but it should be inserted...

Data Engineering
Databricks Delta Live Table
  • 2030 Views
  • 2 replies
  • 2 kudos
Latest Reply
alj_a
New Contributor III
  • 2 kudos

Thank you Emil. I tried all the suggestions. .read works fine it picks up the new data or changed data. but my problem is it is bronze table  as target. in this case my bronze table has duplicate records. However, let me look at the other options to ...

  • 2 kudos
1 More Replies
NathanSundarara
by Contributor
  • 1537 Views
  • 0 replies
  • 0 kudos

Lakehouse federation bringing data from SQL Server

Did any one tried to bring data using the newly announced Lakehouse federation and ingest using DELTA LIVE TABLES? I'm currently testing using Materialized Views. First loaded the full data and now loading last 3 days daily and recomputing using Mate...

Data Engineering
dlt
Lake house federation
  • 1537 Views
  • 0 replies
  • 0 kudos
rt-slowth
by Contributor
  • 3233 Views
  • 2 replies
  • 1 kudos

CRAS in @dlt

The Delta Table created as a result of the Dataframe returned by @dlt.create_table is confirmed to be overwritten when checked with the DECREASE HISTORY command.I want this to be handled as a CRAS, or CREATE AS SELECT, but how can I do this in python...

  • 3233 Views
  • 2 replies
  • 1 kudos
Latest Reply
siddhathPanchal
Contributor
  • 1 kudos

Hi @rt-slowth You can review this open source code base of Delta to know more about the DeltaTableBuilder's implementation in Python.  https://github.com/delta-io/delta/blob/master/python/delta/tables.py

  • 1 kudos
1 More Replies
dbx_deltaSharin
by New Contributor II
  • 2054 Views
  • 2 replies
  • 1 kudos

Resolved! Open sharing protocol in Datbricks notebook

Hello,I utilize an Azure Databricks notebook to access Delta Sharing tables, employing the open sharing protocol. I've successfully uploaded the 'config.share' file to dbfs. Upon executing the commands:  client = delta_sharing.SharingClient(f"/dbfs/p...

Data Engineering
DELTA SHARING
  • 2054 Views
  • 2 replies
  • 1 kudos
Latest Reply
Manisha_Jena
New Contributor III
  • 1 kudos

Hi @dbx_deltaSharin, When querying the individual partitions, the files are being read by using an S3 access point location while it is using the actual S3 name when reading the table as a whole. This information is fetched from the table metadata it...

  • 1 kudos
1 More Replies
dowdark
by New Contributor
  • 1866 Views
  • 2 replies
  • 0 kudos

UPDATE or DELETE with pipeline that needs to reprocess in DLT

i'm currently trying to replicate a existing pipeline that uses standard RDBMS. No experience in DataBricks at allI have about 4-5 tables (much like dimensions) with different events types and I want to my pipeline output a streaming table as final o...

  • 1866 Views
  • 2 replies
  • 0 kudos
Latest Reply
Manisha_Jena
New Contributor III
  • 0 kudos

Hi @dowdark, What is the error that you get when the pipeline tries to update the rows instead of performing an insert? That should give us more info about the problem Please raise an SF case with us with this error and its complete stack trace.

  • 0 kudos
1 More Replies
svrdragon
by New Contributor
  • 1643 Views
  • 0 replies
  • 0 kudos

optimizeWrite takes too long

Hi , We have a spark job write data in delta table for last 90 date partition. We have enabled spark.databricks.delta.autoCompact.enabled and delta.autoOptimize.optimizeWrite. Job takes 50 mins to complete. In that logic takes 12 mins and optimizewri...

  • 1643 Views
  • 0 replies
  • 0 kudos
pavlos_skev
by New Contributor III
  • 4866 Views
  • 2 replies
  • 0 kudos

Resolved! Invalid configuration value detected for fs.azure.account.key only when trying to save RDD

Hello,We have encountered a weird issue in our (old) set-up that looks like a bug in the Unity Catalog. The storage account which we are trying to persist is configured via External Volumes.We have a pipeline that gets XML data and stores it in an RD...

  • 4866 Views
  • 2 replies
  • 0 kudos
Latest Reply
pavlos_skev
New Contributor III
  • 0 kudos

I will post here what worked resolving this error for us, in case someone else in the future encounters this.It turns out that this error appears in this case, when we were using the below command while the directory 'staging2' already exists. To avo...

  • 0 kudos
1 More Replies
Mohammad_Younus
by New Contributor
  • 4096 Views
  • 0 replies
  • 0 kudos

Merge delta tables with data more than 200 million

HI Everyone,Im trying to merge two delta tables who have data more than 200 million in each of them. These tables are properly optimized. But upon running the job, the job is taking a long time to execute and the memory spills are huger (1TB-3TB) rec...

Mohammad_Younus_0-1698373999153.png
  • 4096 Views
  • 0 replies
  • 0 kudos
Hubert-Dudek
by Esteemed Contributor III
  • 7959 Views
  • 1 replies
  • 1 kudos

The perfect table

Unlock the Power of #Databricks: The Perfect Table in 8 Simple Steps! 

perfec_table8.png perfec_table7.png perfec_table6.png perfec_table5.png
  • 7959 Views
  • 1 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Moderator
  • 1 kudos

Hi @Hubert-Dudek, Thank you for sharing this great post

  • 1 kudos
boriste
by New Contributor II
  • 8637 Views
  • 11 replies
  • 10 kudos

Resolved! Upload to Volume inside unity catalog not possible?

 I want to upload a simple csv file to a volume which was created in our unity catalog. We are using secure cluster connectivity and our storage account (metastore) is not publicly accessable. We injected the storage in our vnet. I am getting the fol...

  • 8637 Views
  • 11 replies
  • 10 kudos
Latest Reply
jeroenvs
New Contributor III
  • 10 kudos

@Ahdri We are running into the same issue. It took a while to figure out that the error message is related to this limitation. Any updates on when we can expect the limitation to be taken away? We want to secure access to our storage accounts with a ...

  • 10 kudos
10 More Replies
harish446
by New Contributor
  • 1186 Views
  • 1 replies
  • 0 kudos

Can a not null constraint be applied on a identity column

I had a table creation script as follows for example: CREATE TABLE default.test2          (  id BIGINT GENERATED BY DEFAULT AS IDENTITY(),                name  String)using deltalocation "/mnt/datalake/xxxx"  What are the possible ways to apply not n...

Data Engineering
data engineering
Databricks
Delta Lake
Delta tables
spark
  • 1186 Views
  • 1 replies
  • 0 kudos
Latest Reply
Krishnamatta
New Contributor III
  • 0 kudos

Hi Harish,Here is the documentation for this issuehttps://docs.databricks.com/en/tables/constraints.html

  • 0 kudos
Labels