cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ruoyuqian
by New Contributor II
  • 333 Views
  • 3 replies
  • 0 kudos

Where are materialized view generated by Delta Live Table stored?

I am trying to compare tables created by DBT in Catalog vs the materialized view generated by Delta Live Table, and I noticed that the dbt generated table has Storage Location information and It points to a physical storage location however the mater...

  • 333 Views
  • 3 replies
  • 0 kudos
Latest Reply
radothede
Contributor
  • 0 kudos

I'm not sure about that, but I would check managed storage locations, in that order:1) schema managed storage location,2) catalog managed storage location,,3) metastore managed storage location,.with ref. to managed storage docs:The managed storage l...

  • 0 kudos
2 More Replies
TamD
by Contributor
  • 136 Views
  • 4 replies
  • 1 kudos

How do I drop a delta live table?

I'm a newbie and I've just done the "Run your first Delta Live Tables pipeline" tutorial.The tutorial downloads a publicly available csv baby names file and creates two new Delta Live tables from it.  Now I want to be a good dev and clean up the reso...

  • 136 Views
  • 4 replies
  • 1 kudos
Latest Reply
TamD
Contributor
  • 1 kudos

Thank you @gchandra .  Deleting the pipeline does indeed remove the materialized view definitions from the Catalog.  How can I confirm that the underlying S3 storage has also been cleared?  Just removing the pointers in the Catalog is not enough, if ...

  • 1 kudos
3 More Replies
SamAdams
by New Contributor III
  • 61 Views
  • 1 replies
  • 0 kudos

Redacted check constraint condition in Delta Table

Hello! I have a delta table with a check constraint - it's one of many that a config-driven ETL pipeline of mine generates. When someone edits the config file and deploys the change, I'd like for the check constraint to be updated as well if it's dif...

  • 61 Views
  • 1 replies
  • 0 kudos
Latest Reply
SamAdams
New Contributor III
  • 0 kudos

Figured this out with the help of @SamDataWalk 's post https://community.databricks.com/t5/data-engineering/databricks-bug-with-show-tblproperties-redacted-azure-databricks/m-p/93546It happens because Databricks thinks certain keywords in the constra...

  • 0 kudos
noorbasha534
by New Contributor II
  • 67 Views
  • 1 replies
  • 0 kudos

Databricks as a "pure" data streaming software like Confluent

DearsI was wondering if anyone has leveraged Databricks as a "pure" data streaming software in place of Confluent, Flink, Kafka etc.I see the reference architectures placing Databricks on the data processing side mostly once data is made available by...

  • 67 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Contributor III
  • 0 kudos

Hi @noorbasha534 ,It depends on what you're asking for. Kafka is primarily a messaging system, optimized for handling high-throughput, distributed message logs. Databricks can read from Kafka as a data source but doesn't replace Kafka's role in messa...

  • 0 kudos
AlokThampi
by New Contributor III
  • 226 Views
  • 7 replies
  • 5 kudos

Joining huge delta tables in Databricks

Hello,I am trying to join few delta tables as per the code below.SQLCopy select <applicable columns> FROM ReportTable G LEFT JOIN EKBETable EKBE ON EKBE.BELNR = G.ORDER_ID LEFT JOIN PurchaseOrder POL ON EKBE.EBELN = POL.PO_NOThe PurchaseOrder table c...

AlokThampi_0-1728392939237.png
  • 226 Views
  • 7 replies
  • 5 kudos
Latest Reply
AlokThampi
New Contributor III
  • 5 kudos

Hello @-werners-, @Mo ,I tried the liquid clustering option as suggested but it still doesn't seem to work. I am assuming it to be an issue with the small cluster size that I am using.Or do you suggest any other options?@noorbasha534 , the columns th...

  • 5 kudos
6 More Replies
mmenjivar
by New Contributor II
  • 144 Views
  • 0 replies
  • 0 kudos

How to use SQL Streaming tables

We have been testing the usage of Streaming Tables in our pipelines with different results depending on the streaming sourceFor Streaming Tables reading from read_files everything works as expectedFor Streaming Tables reading from read_kafka we have ...

  • 144 Views
  • 0 replies
  • 0 kudos
noorbasha534
by New Contributor II
  • 300 Views
  • 7 replies
  • 5 kudos

Resolved! Retrieve table/view popularity

DearsIs there a way to retrieve the popularity score of an unity catalog object? I looked at APIs documentation but couldn't find one that serves the need.Appreciate any thoughts.Br,Noor.

  • 300 Views
  • 7 replies
  • 5 kudos
Latest Reply
noorbasha534
New Contributor II
  • 5 kudos

@filipniziol Hi Filip, Thank you. I did a quick test. In my environment, the table query (indirect) event is getting registered with "getTemporaryTableCredential". However, the view query (direct) event is with "getTable".Thanks for your time again. ...

  • 5 kudos
6 More Replies
dikokob
by New Contributor II
  • 3948 Views
  • 5 replies
  • 1 kudos

Databricks Autoloader Checkpoint

Hello Databricks Community,I'm encountering an issue with the Databricks Autoloader where, after running successfully for a period of time, it suddenly stops detecting new files in the source directory. This issue only gets resolved when I reset the ...

  • 3948 Views
  • 5 replies
  • 1 kudos
Latest Reply
IslaCarr
New Contributor II
  • 1 kudos

Have you found something?

  • 1 kudos
4 More Replies
Brad
by Contributor
  • 78 Views
  • 0 replies
  • 0 kudos

How to disable all cache

Hi, I'm trying to test some SQL perf. I run below firstspark.conf.set('spark.databricks.io.cache.enabled', False) However, the 2nd run for the same query is still way faster than the first time run. Is there a way to make the query start from a clean...

  • 78 Views
  • 0 replies
  • 0 kudos
noorbasha534
by New Contributor II
  • 248 Views
  • 0 replies
  • 0 kudos

Lakehouse Monitoring & Expectations

DearsHas anyone successfully used at scale the lakehouse monitoring & expectations features together to measure data quality of data tables - example, to conduct freshness checks, consistency checks etc.Appreciate if you could share the lessons learn...

  • 248 Views
  • 0 replies
  • 0 kudos
noorbasha534
by New Contributor II
  • 208 Views
  • 3 replies
  • 2 kudos

Resolved! ANALYZE table for stats collection

Hi all,I understand ANALYZE table for stats collection does not interfere with write & update operations on a delta table. Please confirm.I like to execute ANALYZE table command post data loads of delta tables but at times the loads could be extended...

  • 208 Views
  • 3 replies
  • 2 kudos
Latest Reply
noorbasha534
New Contributor II
  • 2 kudos

@filipniziol thanks for your time in replying. your answer is satisfactory & resolves my queries.

  • 2 kudos
2 More Replies
ggsmith
by New Contributor III
  • 590 Views
  • 5 replies
  • 3 kudos

dlt Streaming Checkpoint Not Found

I am using Delta Live Tables and have my pipeline defined using the code below. My understanding is that a checkpoint is automatically set when using Delta Live Tables. I am using the Unity Catalog and Schema settings in the pipeline as the storage d...

  • 590 Views
  • 5 replies
  • 3 kudos
Latest Reply
szymon_dybczak
Contributor III
  • 3 kudos

Hi @ggsmith ,If you use Delta Live Tables then checkpoints are stored under the storage location specified in the DLT settings. Each table gets a dedicated directory under storage_location/checkpoints/<dlt_table_name. 

  • 3 kudos
4 More Replies
Brad
by Contributor
  • 193 Views
  • 3 replies
  • 0 kudos

How to control file size by OPTIMIZE

Hi,I have a delta table under UC, no partition, no liquid clustering. I tried  OPTIMIZE foo; -- OR ALTER TABLE foo SET TBLPROPERTIES(delta.targetFileSize = '128mb'); OPTIMIZE foo; I expect to see the files can have some change after above, but the OP...

  • 193 Views
  • 3 replies
  • 0 kudos
Latest Reply
filipniziol
New Contributor III
  • 0 kudos

Hi @Brad ,Databricks is a big data processing engine. Instead of testing 3 files try to test 3000 files  OPTIMIZE isn't merging your small files because there may not be enough files or data for it to act upon.Regarding why DESC DETAILS shows 3 files...

  • 0 kudos
2 More Replies
sathyafmt
by New Contributor II
  • 320 Views
  • 5 replies
  • 3 kudos

Resolved! Cannot read JSON from /Volumes

I am trying to read in a JSON file with this in SQL Editor & it fails with None.get   CREATE TEMPORARY VIEW multilineJson USING json OPTIONS (path="/Volumes/my_catalog/my_schema/jsondir/test.json", multiline=true); None.get is all the error it has.Th...

  • 320 Views
  • 5 replies
  • 3 kudos
Latest Reply
sathyafmt
New Contributor II
  • 3 kudos

@filipniziol - Yes, I was on Serverless SQL Warehouse. It works with "CERATE TABLE .. " thx! I am surprised that the warehouse type is impacting this feature.But I got the SQL from databricks documentation -https://docs.databricks.com/en/query/format...

  • 3 kudos
4 More Replies
Labels