cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

huytran
by New Contributor II
  • 283 Views
  • 7 replies
  • 0 kudos

Cannot write data to BigQuery when using Databricks secret

I am following this guide on writing data to the BigQuery table.Right now, I have an error when I try to write data using Databricks Secret instead of the JSON credential file and setting the GOOGLE_APPLICATION_CREDENTIALS environment variable. java....

  • 283 Views
  • 7 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

It seems that nothing is being loaded into the GOOGLE_APPLICATION_CREDENTIALS. From https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/master/gcs/INSTALL.md   # The JSON keyfile of the service account used for GCS # access when google.clou...

  • 0 kudos
6 More Replies
Vetrivel
by Contributor
  • 205 Views
  • 4 replies
  • 0 kudos

PowerBI performance with Databricks

We have integrated PowerBI with Databricks to generate reports. However, PowerBI generates over 8,000 lines of code, including numerous OR clauses, which cannot be modified at this time. This results in queries that take more than 4 minutes to execut...

  • 205 Views
  • 4 replies
  • 0 kudos
Latest Reply
Vetrivel
Contributor
  • 0 kudos

Attached is the sample query generated by Power BI. Without the OR conditions the query runs within seconds.

  • 0 kudos
3 More Replies
ismaelhenzel
by Contributor
  • 251 Views
  • 1 replies
  • 5 kudos

DELTA LIVE TABLES - MATERIALIZED VIEW DOES NOT INCREMENT NOTHING !

I'm very disappointed with this framework. The documentation is inadequate, and it has many limitations. I want to run materialized views with incremental updates, but DLT insists on performing a full recompute. Why is it doing this? Here is the log ...

  • 251 Views
  • 1 replies
  • 5 kudos
Latest Reply
lucassvrielink
New Contributor II
  • 5 kudos

I'm dealing with the same problem.Doesn't make any sense make a feature that should make our dlt jobs faster unusable in every context. Is there a explanation for this? In my concept, even if incremental process would be 1.1x faster it should option ...

  • 5 kudos
TamD
by Contributor
  • 1693 Views
  • 7 replies
  • 2 kudos

How do I drop a delta live table?

I'm a newbie and I've just done the "Run your first Delta Live Tables pipeline" tutorial.The tutorial downloads a publicly available csv baby names file and creates two new Delta Live tables from it.  Now I want to be a good dev and clean up the reso...

  • 1693 Views
  • 7 replies
  • 2 kudos
Latest Reply
ImranA
New Contributor III
  • 2 kudos

@gchandra for example a table called "cars", if I remove the table from DLT pipeline and drop the table from catalog. Now if I change the schema of the table, and create the table again using the same table name "cars" through the same pipeline, Why ...

  • 2 kudos
6 More Replies
issa
by New Contributor II
  • 243 Views
  • 7 replies
  • 1 kudos

How to access bronze dlt in silver dlt

I have a job in Workflows thatt runs two DLT pipelines, one for Bronze_Transaction and on for Silver_Transaction. The reason for two DLT pipelines is because i want the tables to be created in bronze catalog and erp schema, and silver catalog and erp...

Data Engineering
dlt
DLT pipeline
Medallion
Workflows
  • 243 Views
  • 7 replies
  • 1 kudos
Latest Reply
filipniziol
Contributor III
  • 1 kudos

Hi @issa ,Also, could you share what is the Databricks Runtime that is used to run you DLT? 

  • 1 kudos
6 More Replies
furkancelik
by New Contributor II
  • 176 Views
  • 3 replies
  • 1 kudos

How to use Databricks Unity Catalog as metastore for a local spark session

Hello,I would like to access Databricks Unity Catalog from a Spark session created outside the Databricks environment. Previously, I used Hive metastore and didn’t face any issues connecting in this way. Now, I’ve switched the metastore to Unity Cata...

  • 176 Views
  • 3 replies
  • 1 kudos
Latest Reply
VZLA
Databricks Employee
  • 1 kudos

@furkancelik Glad it helps. I just found this article which I believe will clarify many of your doubts. Please refer straight to the "Accessing Databricks UC from the PySpark shell". Notice the "unity" in the configuration strings will be your UC Def...

  • 1 kudos
2 More Replies
cosmicwhoop
by New Contributor
  • 167 Views
  • 1 replies
  • 0 kudos

Delta Live Tables UI - missing EVENTS

I am new to Databricks and my setup is using Microsoft Azure (Premium Tier) + DatabricksI am trying to build Delta Live Tables and dont see events, without it i am finding it hard to understand the reason for job failure. Attached are 2 screenshot1) ...

  • 167 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 0 kudos

Hi, If you are looking for the reason of job failure you can navigate to view details tab-> logs to figure out the root cause of the failure.The blank screen with no events might be caused as you might have selected one of the DLT table.You can navig...

  • 0 kudos
Direo
by Contributor
  • 309 Views
  • 1 replies
  • 0 kudos

Liquid Clustering on a Feature Store Table Created with FeatureEngineeringClient

Hello everyone,I'm exploring ways to perform clustering on a feature store table that I've created using the FeatureEngineeringClient in Databricks, and I'm particularly interested in applying liquid clustering to one of the columns.Here’s the scenar...

  • 309 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 0 kudos

Hi, # Set the table name and clustering columns table_name = "feature_store_table" clustering_columns = ["column1", "column2"] # Build the SQL command sql_command = f"ALTER TABLE {table_name} CLUSTER BY ({', '.join(clustering_columns)})" # Execute ...

  • 0 kudos
drag7ter
by New Contributor III
  • 421 Views
  • 1 replies
  • 0 kudos

foreachBatch doesn't work in structured streaming

I' m trying to print out number of rows in the batch, but seems it doesn't work properly. I have 1 node compute optimized cluster and run in notebook this code:# Logging the row count using a streaming-friendly approach def log_row_count(batch_df, ba...

Capture.PNG
  • 421 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 0 kudos

Hi, Can you try this: def log_row_count(batch_df, batch_id): row_count = batch_df.count() print(f"Batch ID {batch_id}: {row_count} rows have been processed") LOGGER.info(f"Batch ID {batch_id}: {row_count} rows have been processed") ptv.w...

  • 0 kudos
William_Scardua
by Valued Contributor
  • 272 Views
  • 1 replies
  • 0 kudos

Deleted File Retention Duration in Cluster not work

Hi Guys,I tried to set the retention period in my Cluster, but it`s not workCluster:Notebook:Does not remove physical filesDo you have any ideas? thanks

Screenshot 2024-08-19 at 19.51.25.png Screenshot 2024-08-19 at 19.29.59.png
  • 272 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 0 kudos

Hi, It seems you are setting the retention period to 0 hours. You need to first use this config spark.databricks.delta.retentionDurationCheck.enabled to false. and then set the retentention duration config as you used. You can do a DRY RUN to check i...

  • 0 kudos
Avinash_Narala
by Contributor
  • 225 Views
  • 1 replies
  • 0 kudos

Move delta tables from Dev workspace to Prod Workspace

Hi all,How can i move my delta tables from Dev Workspace to Prod Workspace.Is there any dynamic logic code in python to do it?

  • 225 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 0 kudos

To move Delta tables from a Dev workspace to a Prod workspace, you can use a combination of Delta Lake features and Databricks APIs. Here's a high-level approach with some Python code to help you get started: Method 1: Using CLONE command For smaller...

  • 0 kudos
Bepposbeste1993
by New Contributor II
  • 176 Views
  • 3 replies
  • 0 kudos

Resolved! select 1 query not finishing

Hello,I have the issue that even a query like "select 1" is not finishing. The sql warehouse runs infinite. I have no idea where to look for any issues because in the SPARK UI I cant see any error.What is intresting is that also allpurpose clusters (...

  • 176 Views
  • 3 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @Bepposbeste1993, Do you have the case ID raised for this issue? 

  • 0 kudos
2 More Replies
narisuna
by New Contributor
  • 255 Views
  • 1 replies
  • 0 kudos

single node Cluster CPU not fully used

Hello community,I use a cluster (Single node: Standard_F64s_v2 · DBR: 14.3 LTS (includes Apache Spark 3.5.0, Scala 2.12)) for a job. In this job I didn't use spark multiprocessing. Instead I use this single node cluster as a VM and use python multipr...

  • 255 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 0 kudos

Hi, Yes there was a maintainance release on 7th august which might have caused this issue. If you are still experience this issue, please file a support ticket.

  • 0 kudos
JesseSchouten
by New Contributor
  • 281 Views
  • 1 replies
  • 0 kudos

DLT issue - slow download speed in DLT clusters

Hi all,I'm encountering some issues within my DLT pipelines. Summarized: it takes a long time to install the cluster libraries and dependencies (using %pip installs) due to horribly slow download speeds.These are the symptoms:- From all purpose clust...

  • 281 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 0 kudos

Hi, Possible Causes and Solutions Network Configuration: The private connectivity setup might be affecting DLT clusters differently. Cluster Configuration: Ensure DLT clusters are properly sized for the workload.Consider using a larger driver node fo...

  • 0 kudos
ChristianRRL
by Valued Contributor
  • 2007 Views
  • 2 replies
  • 1 kudos

DLT Dedupping Best Practice in Medallion

Hi there, I have what may be a deceptively simple question but I suspect may have a variety of answers:What is the "right" place to handle dedupping using the medallion architecture?In my example, I already have everything properly laid out with data...

  • 2007 Views
  • 2 replies
  • 1 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 1 kudos

1. Deduplication in medallion architecture can be handled in bronze or silver layer.2. If keeping a complete history of all raw data, including duplicates, in the bronze layer, handle deduplication in the silver layer.3. If not keeping a complete his...

  • 1 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels