cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

MattM
by New Contributor III
  • 6560 Views
  • 8 replies
  • 2 kudos

Resolved! Access Databricks Delta table using SSRS without copying data to AzureSQL

We have our BI facts and dimensions built in as delta table in Datarbicks env and is being used for reporting by connecting PowerBI reports using datarbricks connection. We now have a need to use this data for another application utilizing SSRS repor...

  • 6560 Views
  • 8 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

https://buyusasmm.com/product/buy-google-5-star-reviews/

  • 2 kudos
7 More Replies
Rubens
by New Contributor II
  • 2209 Views
  • 1 replies
  • 3 kudos

how to alter a column into an IDENTITY column

Here's me use case: I'm migrating out of an old DWH, into Databricks. When moving dimension tables into Databricks, I'd like old SKs (surrogate keys) to be maintained, while creating the SKs column as an IDENTITY column, so new dimension values get a...

  • 2209 Views
  • 1 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Ronen Levi​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

  • 3 kudos
Kearon
by New Contributor III
  • 5149 Views
  • 6 replies
  • 0 kudos

Resolved! Databricks Delta Live Table stored as SCD 2 is creating new records when no data changes. How do I stop this?

I have a streaming pipeline that ingests json files from a data lake using autoloader. These files are dumped there periodically. Mostly the files contain duplicate data, but there are occasional changes. I am trying to process these files into a dat...

  • 5149 Views
  • 6 replies
  • 0 kudos
Latest Reply
Kearon
New Contributor III
  • 0 kudos

For clarity, here is the final code that avoids duplicates, using @Suteja Kanuri​ 's suggestion:import dlt   @dlt.table def currStudents_dedup(): df = spark.readStream.format("delta").table("live.currStudents_ingest") return ( df.drop...

  • 0 kudos
5 More Replies
uv
by New Contributor II
  • 5770 Views
  • 3 replies
  • 2 kudos

Parquet to csv delta file

Hi Team, I have a parquet file in s3 bucket which is a delta file I am able to read it but I am unable to write it as a csv file.​getting the following error when i am trying to write:​A transaction log for Databricks Delta was found at `s3://path/a...

  • 5770 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @yuvesh kotiala​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Tha...

  • 2 kudos
2 More Replies
anisha_93
by New Contributor II
  • 4833 Views
  • 2 replies
  • 1 kudos

Error in SQL statement: KeyProviderException: Failure to initialize configuration

I have a source delta table from which I have selectively granted access to a particular pool id(can be thought of a dummy user). From the pool id interface, whenever I am running a select on any of the tables, even though it has access to, is faili...

  • 4833 Views
  • 2 replies
  • 1 kudos
Latest Reply
alicewong20
New Contributor II
  • 1 kudos

Hello all,I got the same problem. Does anyone help?

  • 1 kudos
1 More Replies
repcak
by New Contributor III
  • 5165 Views
  • 4 replies
  • 3 kudos

Resolved! Delta Live Tables with EventHub

Hello,I would like to integrate Databricks Delta Live Tables with Eventhub, but i cannot install com.microsoft.azure:azure-eventhubs-spark_2.12:2.3.17 on delta live cluster.I tried installed in using Init script (by adding it in Json cluster settings...

image
  • 5165 Views
  • 4 replies
  • 3 kudos
Latest Reply
Atanu
Databricks Employee
  • 3 kudos

I think this has some details https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-kafka-spark-tutorial @Kacper Mucha​ is the issue resolved ?

  • 3 kudos
3 More Replies
Development
by New Contributor III
  • 4794 Views
  • 5 replies
  • 5 kudos

Delta Table with 130 columns taking time

Hi All,We are facing one un-usual issue while loading data into Delta table using Spark SQL. We have one delta table which have around 135 columns and also having PARTITIONED BY. For this trying to load 15 millions of data volume but its not loading ...

  • 4794 Views
  • 5 replies
  • 5 kudos
Latest Reply
Development
New Contributor III
  • 5 kudos

@Kaniz Fatma​ @Parker Temple​  I found an root cause its because of serialization. we are using UDF to drive an column on dataframe, when we are trying to load data into delta table or write data into parquet file we are facing  serialization issue ....

  • 5 kudos
4 More Replies
ChriChri
by New Contributor II
  • 4490 Views
  • 2 replies
  • 4 kudos

Azure Databricks Delta live table tab is missing

In my Azure Databricks workspace UI I do not have the tab "Delta live tables". In the documentation it says that there is a tab after clicking on Jobs in the main menu. I just created this Databricks resource in Azure and from my understanding the DL...

  • 4490 Views
  • 2 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hi @Chr Jon​ How are you doing? Thanks for posting your question. Just checking in to see if one of the answers helped, would you let us know?

  • 4 kudos
1 More Replies
NOOR_BASHASHAIK
by Contributor
  • 871 Views
  • 0 replies
  • 0 kudos

Read metadata through JDBC driver

Dear all, The Spark JDBC driver (SparkJDBC42.jar) is unable to capture certain information from the below table structure: 1. table level comment 2. the TBLPROPERTIES key-value pair information 3. PARTITION BY information However, it captures the co...

  • 871 Views
  • 0 replies
  • 0 kudos
austiamel47
by New Contributor
  • 937 Views
  • 1 replies
  • 0 kudos

Databricks delta lake

Can we use databricks delta lake as a data warehouse kind of thing where business analysts can explore data according to their needs ? Delta lake provides following features which I think supports this idea support to sql syntaxprovide ACID guarante...

  • 937 Views
  • 1 replies
  • 0 kudos
Latest Reply
Dan_Z
Databricks Employee
  • 0 kudos

@austiamel47, Yes, you can certainly do this. Delta Lake is designed to be competitive with traditional data warehouses and with some tuning can power low-latency dashboards.https://databricks.com/glossary/data-lakehouse

  • 0 kudos
User16826994223
by Honored Contributor III
  • 1821 Views
  • 1 replies
  • 0 kudos

Resolved! Delta adds a new partition making the old partition unreadable

  In Notebook, My code read and write the data to delta , My delta is partitioned by calendar_date. After the initial load i am able to read the delta file and look the data just fine.But after the second load for data for 6 month , the previous part...

  • 1821 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

I think you are writing the data in override mode. what happens in delta is it doesn't delete the data for certain days even it is written by overwrite mode for versioning , and you will be able to query only most recent data,But in format parque...

  • 0 kudos
User16826992783
by New Contributor II
  • 1541 Views
  • 1 replies
  • 1 kudos

Receiving a "Databricks Delta is not enabled on your account" error

The team is using Databricks Light for some pipeline development and would like to leverage Delta but are running into this error? "Databricks Delta is not enabled on your account"How can we enable Delta for our account

  • 1541 Views
  • 1 replies
  • 1 kudos
Latest Reply
craig_ng
New Contributor III
  • 1 kudos

Databricks Light is the open source Apache Spark runtime and does not come with any type of client for Delta Lake pre-installed. You'll need to manually install open source Delta Lake in order to do any reads or writes.See our docs and release notes ...

  • 1 kudos
AdityaDeshpande
by New Contributor II
  • 5333 Views
  • 2 replies
  • 0 kudos

How to maintain Primary Key Column in Databricks Delta Multi Cluster environment

I am trying to replicate the SQL DB like feature of maintaining the Primary Keys in Databrciks Delta approach where the data is being written to Blob Storage such as ADLS2 oe AWS S3. I want a Auto Incremented Primary key feature using Databricks Del...

  • 5333 Views
  • 2 replies
  • 0 kudos
Latest Reply
girivaratharaja
New Contributor III
  • 0 kudos

Hi @Aditya Deshpande​ There is no locking mechanism of PK in Delta. You can use row_number() function on the df and save using delta and do a distinct() before the write.

  • 0 kudos
1 More Replies
Pascalvan_Belle
by New Contributor
  • 7730 Views
  • 1 replies
  • 0 kudos

How to create a surrogate key sequence which I can use in SCD cases?

Hi Community I would like to know if there is an option to create an integer sequence which persists even if the cluster is shut down. My target is to use this integer value as a surrogate key to join different tables or do Slowly changing dimensio...

  • 7730 Views
  • 1 replies
  • 0 kudos
Latest Reply
girivaratharaja
New Contributor III
  • 0 kudos

Hi @pascalvanbellen ,There is no concept of FK, PK, SK in Spark. But Databricks Delta automatically takes care of SCD type scenarios. https://docs.databricks.com/spark/latest/spark-sql/language-manual/merge-into.html#slowly-changing-data-scd-type-2 ...

  • 0 kudos
Labels