Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
We have our BI facts and dimensions built in as delta table in Datarbicks env and is being used for reporting by connecting PowerBI reports using datarbricks connection. We now have a need to use this data for another application utilizing SSRS repor...
Here's me use case: I'm migrating out of an old DWH, into Databricks. When moving dimension tables into Databricks, I'd like old SKs (surrogate keys) to be maintained, while creating the SKs column as an IDENTITY column, so new dimension values get a...
I have a streaming pipeline that ingests json files from a data lake using autoloader. These files are dumped there periodically. Mostly the files contain duplicate data, but there are occasional changes. I am trying to process these files into a dat...
For clarity, here is the final code that avoids duplicates, using @Suteja Kanuri 's suggestion:import dlt
@dlt.table
def currStudents_dedup():
df = spark.readStream.format("delta").table("live.currStudents_ingest")
return (
df.drop...
Hi Team, I have a parquet file in s3 bucket which is a delta file I am able to read it but I am unable to write it as a csv file.getting the following error when i am trying to write:A transaction log for Databricks Delta was found at `s3://path/a...
Hi @yuvesh kotiala Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Tha...
I have a source delta table from which I have selectively granted access to a particular pool id(can be thought of a dummy user).
From the pool id interface, whenever I am running a select on any of the tables, even though it has access to, is faili...
Hello,I would like to integrate Databricks Delta Live Tables with Eventhub, but i cannot install com.microsoft.azure:azure-eventhubs-spark_2.12:2.3.17 on delta live cluster.I tried installed in using Init script (by adding it in Json cluster settings...
Hi All,We are facing one un-usual issue while loading data into Delta table using Spark SQL. We have one delta table which have around 135 columns and also having PARTITIONED BY. For this trying to load 15 millions of data volume but its not loading ...
@Kaniz Fatma @Parker Temple I found an root cause its because of serialization. we are using UDF to drive an column on dataframe, when we are trying to load data into delta table or write data into parquet file we are facing serialization issue ....
In my Azure Databricks workspace UI I do not have the tab "Delta live tables". In the documentation it says that there is a tab after clicking on Jobs in the main menu. I just created this Databricks resource in Azure and from my understanding the DL...
Dear all,
The Spark JDBC driver (SparkJDBC42.jar) is unable to capture certain information from the below table structure:
1. table level comment
2. the TBLPROPERTIES key-value pair information
3. PARTITION BY information
However, it captures the co...
Can we use databricks delta lake as a data warehouse kind of thing where business analysts can explore data according to their needs ?
Delta lake provides following features which I think supports this idea
support to sql syntaxprovide ACID guarante...
@austiamel47, Yes, you can certainly do this. Delta Lake is designed to be competitive with traditional data warehouses and with some tuning can power low-latency dashboards.https://databricks.com/glossary/data-lakehouse
In Notebook, My code read and write the data to delta , My delta is partitioned by calendar_date. After the initial load i am able to read the delta file and look the data just fine.But after the second load for data for 6 month , the previous part...
I think you are writing the data in override mode. what happens in delta is it doesn't delete the data for certain days even it is written by overwrite mode for versioning , and you will be able to query only most recent data,But in format parque...
The team is using Databricks Light for some pipeline development and would like to leverage Delta but are running into this error? "Databricks Delta is not enabled on your account"How can we enable Delta for our account
Databricks Light is the open source Apache Spark runtime and does not come with any type of client for Delta Lake pre-installed. You'll need to manually install open source Delta Lake in order to do any reads or writes.See our docs and release notes ...
I am trying to replicate the SQL DB like feature of maintaining the Primary Keys in Databrciks Delta approach where the data is being written to Blob Storage such as ADLS2 oe AWS S3.
I want a Auto Incremented Primary key feature using Databricks Del...
Hi @Aditya Deshpande There is no locking mechanism of PK in Delta. You can use row_number() function on the df and save using delta and do a distinct() before the write.
Hi Community
I would like to know if there is an option to create an integer sequence which persists even if the cluster is shut down. My target is to use this integer value as a surrogate key to join different tables or do Slowly changing dimensio...
Hi @pascalvanbellen ,There is no concept of FK, PK, SK in Spark. But Databricks Delta automatically takes care of SCD type scenarios. https://docs.databricks.com/spark/latest/spark-sql/language-manual/merge-into.html#slowly-changing-data-scd-type-2
...