cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

huytran
by New Contributor III
  • 5252 Views
  • 7 replies
  • 0 kudos

Cannot write data to BigQuery when using Databricks secret

I am following this guide on writing data to the BigQuery table.Right now, I have an error when I try to write data using Databricks Secret instead of the JSON credential file and setting the GOOGLE_APPLICATION_CREDENTIALS environment variable. java....

  • 5252 Views
  • 7 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

It seems that nothing is being loaded into the GOOGLE_APPLICATION_CREDENTIALS. From https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/master/gcs/INSTALL.md   # The JSON keyfile of the service account used for GCS # access when google.clou...

  • 0 kudos
6 More Replies
Vetrivel
by Contributor
  • 1919 Views
  • 4 replies
  • 0 kudos

PowerBI performance with Databricks

We have integrated PowerBI with Databricks to generate reports. However, PowerBI generates over 8,000 lines of code, including numerous OR clauses, which cannot be modified at this time. This results in queries that take more than 4 minutes to execut...

  • 1919 Views
  • 4 replies
  • 0 kudos
Latest Reply
Vetrivel
Contributor
  • 0 kudos

Attached is the sample query generated by Power BI. Without the OR conditions the query runs within seconds.

  • 0 kudos
3 More Replies
TamD
by Contributor
  • 7521 Views
  • 7 replies
  • 2 kudos

How do I drop a delta live table?

I'm a newbie and I've just done the "Run your first Delta Live Tables pipeline" tutorial.The tutorial downloads a publicly available csv baby names file and creates two new Delta Live tables from it.  Now I want to be a good dev and clean up the reso...

  • 7521 Views
  • 7 replies
  • 2 kudos
Latest Reply
ImranA
Contributor
  • 2 kudos

@gchandra for example a table called "cars", if I remove the table from DLT pipeline and drop the table from catalog. Now if I change the schema of the table, and create the table again using the same table name "cars" through the same pipeline, Why ...

  • 2 kudos
6 More Replies
furkancelik
by New Contributor II
  • 3837 Views
  • 3 replies
  • 1 kudos

How to use Databricks Unity Catalog as metastore for a local spark session

Hello,I would like to access Databricks Unity Catalog from a Spark session created outside the Databricks environment. Previously, I used Hive metastore and didn’t face any issues connecting in this way. Now, I’ve switched the metastore to Unity Cata...

  • 3837 Views
  • 3 replies
  • 1 kudos
Latest Reply
VZLA
Databricks Employee
  • 1 kudos

@furkancelik Glad it helps. I just found this article which I believe will clarify many of your doubts. Please refer straight to the "Accessing Databricks UC from the PySpark shell". Notice the "unity" in the configuration strings will be your UC Def...

  • 1 kudos
2 More Replies
cosmicwhoop
by New Contributor
  • 798 Views
  • 1 replies
  • 0 kudos

Delta Live Tables UI - missing EVENTS

I am new to Databricks and my setup is using Microsoft Azure (Premium Tier) + DatabricksI am trying to build Delta Live Tables and dont see events, without it i am finding it hard to understand the reason for job failure. Attached are 2 screenshot1) ...

  • 798 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 0 kudos

Hi, If you are looking for the reason of job failure you can navigate to view details tab-> logs to figure out the root cause of the failure.The blank screen with no events might be caused as you might have selected one of the DLT table.You can navig...

  • 0 kudos
Direo
by Contributor II
  • 4087 Views
  • 1 replies
  • 0 kudos

Liquid Clustering on a Feature Store Table Created with FeatureEngineeringClient

Hello everyone,I'm exploring ways to perform clustering on a feature store table that I've created using the FeatureEngineeringClient in Databricks, and I'm particularly interested in applying liquid clustering to one of the columns.Here’s the scenar...

  • 4087 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 0 kudos

Hi, # Set the table name and clustering columns table_name = "feature_store_table" clustering_columns = ["column1", "column2"] # Build the SQL command sql_command = f"ALTER TABLE {table_name} CLUSTER BY ({', '.join(clustering_columns)})" # Execute ...

  • 0 kudos
William_Scardua
by Valued Contributor
  • 4048 Views
  • 1 replies
  • 0 kudos

Deleted File Retention Duration in Cluster not work

Hi Guys,I tried to set the retention period in my Cluster, but it`s not workCluster:Notebook:Does not remove physical filesDo you have any ideas? thanks

Screenshot 2024-08-19 at 19.51.25.png Screenshot 2024-08-19 at 19.29.59.png
  • 4048 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 0 kudos

Hi, It seems you are setting the retention period to 0 hours. You need to first use this config spark.databricks.delta.retentionDurationCheck.enabled to false. and then set the retentention duration config as you used. You can do a DRY RUN to check i...

  • 0 kudos
Avinash_Narala
by Valued Contributor II
  • 4280 Views
  • 1 replies
  • 1 kudos

Resolved! Move delta tables from Dev workspace to Prod Workspace

Hi all,How can i move my delta tables from Dev Workspace to Prod Workspace.Is there any dynamic logic code in python to do it?

  • 4280 Views
  • 1 replies
  • 1 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 1 kudos

To move Delta tables from a Dev workspace to a Prod workspace, you can use a combination of Delta Lake features and Databricks APIs. Here's a high-level approach with some Python code to help you get started: Method 1: Using CLONE command For smaller...

  • 1 kudos
narisuna
by New Contributor
  • 3855 Views
  • 1 replies
  • 0 kudos

single node Cluster CPU not fully used

Hello community,I use a cluster (Single node: Standard_F64s_v2 · DBR: 14.3 LTS (includes Apache Spark 3.5.0, Scala 2.12)) for a job. In this job I didn't use spark multiprocessing. Instead I use this single node cluster as a VM and use python multipr...

  • 3855 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 0 kudos

Hi, Yes there was a maintainance release on 7th august which might have caused this issue. If you are still experience this issue, please file a support ticket.

  • 0 kudos
JesseSchouten
by New Contributor
  • 4159 Views
  • 1 replies
  • 0 kudos

DLT issue - slow download speed in DLT clusters

Hi all,I'm encountering some issues within my DLT pipelines. Summarized: it takes a long time to install the cluster libraries and dependencies (using %pip installs) due to horribly slow download speeds.These are the symptoms:- From all purpose clust...

  • 4159 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 0 kudos

Hi, Possible Causes and Solutions Network Configuration: The private connectivity setup might be affecting DLT clusters differently. Cluster Configuration: Ensure DLT clusters are properly sized for the workload.Consider using a larger driver node fo...

  • 0 kudos
ChristianRRL
by Valued Contributor III
  • 4714 Views
  • 2 replies
  • 1 kudos

DLT Dedupping Best Practice in Medallion

Hi there, I have what may be a deceptively simple question but I suspect may have a variety of answers:What is the "right" place to handle dedupping using the medallion architecture?In my example, I already have everything properly laid out with data...

  • 4714 Views
  • 2 replies
  • 1 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 1 kudos

1. Deduplication in medallion architecture can be handled in bronze or silver layer.2. If keeping a complete history of all raw data, including duplicates, in the bronze layer, handle deduplication in the silver layer.3. If not keeping a complete his...

  • 1 kudos
1 More Replies
Nurota
by New Contributor II
  • 7486 Views
  • 2 replies
  • 0 kudos

Describe table extended on materialized views - UC, DLT and cluster access modes

We have a daily job with a notebook that loops through all the databases and tables, and optimizes and vacuums them.Since in UC DLT tables are materialized views, the "optimize" or "vacuum" commands do not work on them, and they need to be excluded. ...

Data Engineering
cluster access mode
dlt
materialized views
optimize
Unity Catalog
  • 7486 Views
  • 2 replies
  • 0 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 0 kudos

  The error in scenario 3 is likely due to the fact that the service principal is not an owner of the DLT pipeline that creates the materialized views. Even though the job is running on a shared cluster, the service principal still needs to be an own...

  • 0 kudos
1 More Replies
DavidMooreZA
by New Contributor II
  • 5786 Views
  • 3 replies
  • 0 kudos

Structure Streaming - Table(s) to File(s) - Is it possible?

Hi,I'm trying to do something that's probably considered a no-no. The documentation makes me believe it should be possible. But, I'm getting lots of weird errors when trying to make it work.If anyone has managed to get something similar to work, plea...

  • 5786 Views
  • 3 replies
  • 0 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 0 kudos

Hi, Can you please share what the error stack trace looks like? One possible cause of this error is that the schema of the table you are reading from does not match the schema of the data you are writing to.

  • 0 kudos
2 More Replies
Floody
by New Contributor II
  • 4927 Views
  • 1 replies
  • 0 kudos

Delta Live Tables use case

Hi all,We have the following use case and wondering if DLT is the correct approach.Landing area with daily dumps of parquet files into our Data Lake container.The daily dump does a full overwrite of the parquet each time, keeping the same file name.T...

Data Engineering
Delta Live Tables
  • 4927 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 0 kudos

Using DLT for Your Use Case DLT can be a good fit for your scenario, especially when implementing Slowly Changing Dimension (SCD) Type 2. Here's how you can approach this: Ingestion with Auto Loader: Use Auto Loader to ingest the daily parquet files ...

  • 0 kudos
Manzilla
by New Contributor II
  • 4868 Views
  • 2 replies
  • 0 kudos

Delta Live table - Adding streaming to existing table

Currently, the bronze table ingests JSON files using @Dlt.table decorator on a spark.readStream functionA daily batch job does some transformation on bronze data and stores results in the silver table.New ProcessBronze still the same.A stream has bee...

  • 4868 Views
  • 2 replies
  • 0 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 0 kudos

When you use `dlt.apply_changes` to update the silver table, it adds four hidden columns for tracking changes. These columns include `event_time`, `read_version`, `commit_version`, and `is_deleted`. When you run this process for the first time agains...

  • 0 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels