cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

huytran
by New Contributor III
  • 4147 Views
  • 7 replies
  • 0 kudos

Cannot write data to BigQuery when using Databricks secret

I am following this guide on writing data to the BigQuery table.Right now, I have an error when I try to write data using Databricks Secret instead of the JSON credential file and setting the GOOGLE_APPLICATION_CREDENTIALS environment variable. java....

  • 4147 Views
  • 7 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

It seems that nothing is being loaded into the GOOGLE_APPLICATION_CREDENTIALS. From https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/master/gcs/INSTALL.md   # The JSON keyfile of the service account used for GCS # access when google.clou...

  • 0 kudos
6 More Replies
Vetrivel
by Contributor
  • 1239 Views
  • 4 replies
  • 0 kudos

PowerBI performance with Databricks

We have integrated PowerBI with Databricks to generate reports. However, PowerBI generates over 8,000 lines of code, including numerous OR clauses, which cannot be modified at this time. This results in queries that take more than 4 minutes to execut...

  • 1239 Views
  • 4 replies
  • 0 kudos
Latest Reply
Vetrivel
Contributor
  • 0 kudos

Attached is the sample query generated by Power BI. Without the OR conditions the query runs within seconds.

  • 0 kudos
3 More Replies
TamD
by Contributor
  • 5313 Views
  • 7 replies
  • 2 kudos

How do I drop a delta live table?

I'm a newbie and I've just done the "Run your first Delta Live Tables pipeline" tutorial.The tutorial downloads a publicly available csv baby names file and creates two new Delta Live tables from it.  Now I want to be a good dev and clean up the reso...

  • 5313 Views
  • 7 replies
  • 2 kudos
Latest Reply
ImranA
Contributor
  • 2 kudos

@gchandra for example a table called "cars", if I remove the table from DLT pipeline and drop the table from catalog. Now if I change the schema of the table, and create the table again using the same table name "cars" through the same pipeline, Why ...

  • 2 kudos
6 More Replies
furkancelik
by New Contributor II
  • 2432 Views
  • 3 replies
  • 1 kudos

How to use Databricks Unity Catalog as metastore for a local spark session

Hello,I would like to access Databricks Unity Catalog from a Spark session created outside the Databricks environment. Previously, I used Hive metastore and didn’t face any issues connecting in this way. Now, I’ve switched the metastore to Unity Cata...

  • 2432 Views
  • 3 replies
  • 1 kudos
Latest Reply
VZLA
Databricks Employee
  • 1 kudos

@furkancelik Glad it helps. I just found this article which I believe will clarify many of your doubts. Please refer straight to the "Accessing Databricks UC from the PySpark shell". Notice the "unity" in the configuration strings will be your UC Def...

  • 1 kudos
2 More Replies
cosmicwhoop
by New Contributor
  • 504 Views
  • 1 replies
  • 0 kudos

Delta Live Tables UI - missing EVENTS

I am new to Databricks and my setup is using Microsoft Azure (Premium Tier) + DatabricksI am trying to build Delta Live Tables and dont see events, without it i am finding it hard to understand the reason for job failure. Attached are 2 screenshot1) ...

  • 504 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 0 kudos

Hi, If you are looking for the reason of job failure you can navigate to view details tab-> logs to figure out the root cause of the failure.The blank screen with no events might be caused as you might have selected one of the DLT table.You can navig...

  • 0 kudos
Direo
by Contributor II
  • 2324 Views
  • 1 replies
  • 0 kudos

Liquid Clustering on a Feature Store Table Created with FeatureEngineeringClient

Hello everyone,I'm exploring ways to perform clustering on a feature store table that I've created using the FeatureEngineeringClient in Databricks, and I'm particularly interested in applying liquid clustering to one of the columns.Here’s the scenar...

  • 2324 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 0 kudos

Hi, # Set the table name and clustering columns table_name = "feature_store_table" clustering_columns = ["column1", "column2"] # Build the SQL command sql_command = f"ALTER TABLE {table_name} CLUSTER BY ({', '.join(clustering_columns)})" # Execute ...

  • 0 kudos
drag7ter
by Contributor
  • 2590 Views
  • 1 replies
  • 0 kudos

foreachBatch doesn't work in structured streaming

I' m trying to print out number of rows in the batch, but seems it doesn't work properly. I have 1 node compute optimized cluster and run in notebook this code:# Logging the row count using a streaming-friendly approach def log_row_count(batch_df, ba...

Capture.PNG
  • 2590 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 0 kudos

Hi, Can you try this: def log_row_count(batch_df, batch_id): row_count = batch_df.count() print(f"Batch ID {batch_id}: {row_count} rows have been processed") LOGGER.info(f"Batch ID {batch_id}: {row_count} rows have been processed") ptv.w...

  • 0 kudos
William_Scardua
by Valued Contributor
  • 2266 Views
  • 1 replies
  • 0 kudos

Deleted File Retention Duration in Cluster not work

Hi Guys,I tried to set the retention period in my Cluster, but it`s not workCluster:Notebook:Does not remove physical filesDo you have any ideas? thanks

Screenshot 2024-08-19 at 19.51.25.png Screenshot 2024-08-19 at 19.29.59.png
  • 2266 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 0 kudos

Hi, It seems you are setting the retention period to 0 hours. You need to first use this config spark.databricks.delta.retentionDurationCheck.enabled to false. and then set the retentention duration config as you used. You can do a DRY RUN to check i...

  • 0 kudos
Avinash_Narala
by Valued Contributor II
  • 2362 Views
  • 1 replies
  • 1 kudos

Resolved! Move delta tables from Dev workspace to Prod Workspace

Hi all,How can i move my delta tables from Dev Workspace to Prod Workspace.Is there any dynamic logic code in python to do it?

  • 2362 Views
  • 1 replies
  • 1 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 1 kudos

To move Delta tables from a Dev workspace to a Prod workspace, you can use a combination of Delta Lake features and Databricks APIs. Here's a high-level approach with some Python code to help you get started: Method 1: Using CLONE command For smaller...

  • 1 kudos
narisuna
by New Contributor
  • 2204 Views
  • 1 replies
  • 0 kudos

single node Cluster CPU not fully used

Hello community,I use a cluster (Single node: Standard_F64s_v2 · DBR: 14.3 LTS (includes Apache Spark 3.5.0, Scala 2.12)) for a job. In this job I didn't use spark multiprocessing. Instead I use this single node cluster as a VM and use python multipr...

  • 2204 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 0 kudos

Hi, Yes there was a maintainance release on 7th august which might have caused this issue. If you are still experience this issue, please file a support ticket.

  • 0 kudos
JesseSchouten
by New Contributor
  • 2420 Views
  • 1 replies
  • 0 kudos

DLT issue - slow download speed in DLT clusters

Hi all,I'm encountering some issues within my DLT pipelines. Summarized: it takes a long time to install the cluster libraries and dependencies (using %pip installs) due to horribly slow download speeds.These are the symptoms:- From all purpose clust...

  • 2420 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 0 kudos

Hi, Possible Causes and Solutions Network Configuration: The private connectivity setup might be affecting DLT clusters differently. Cluster Configuration: Ensure DLT clusters are properly sized for the workload.Consider using a larger driver node fo...

  • 0 kudos
ChristianRRL
by Valued Contributor II
  • 3795 Views
  • 2 replies
  • 1 kudos

DLT Dedupping Best Practice in Medallion

Hi there, I have what may be a deceptively simple question but I suspect may have a variety of answers:What is the "right" place to handle dedupping using the medallion architecture?In my example, I already have everything properly laid out with data...

  • 3795 Views
  • 2 replies
  • 1 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 1 kudos

1. Deduplication in medallion architecture can be handled in bronze or silver layer.2. If keeping a complete history of all raw data, including duplicates, in the bronze layer, handle deduplication in the silver layer.3. If not keeping a complete his...

  • 1 kudos
1 More Replies
Nurota
by New Contributor II
  • 5221 Views
  • 2 replies
  • 0 kudos

Describe table extended on materialized views - UC, DLT and cluster access modes

We have a daily job with a notebook that loops through all the databases and tables, and optimizes and vacuums them.Since in UC DLT tables are materialized views, the "optimize" or "vacuum" commands do not work on them, and they need to be excluded. ...

Data Engineering
cluster access mode
dlt
materialized views
optimize
Unity Catalog
  • 5221 Views
  • 2 replies
  • 0 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 0 kudos

  The error in scenario 3 is likely due to the fact that the service principal is not an owner of the DLT pipeline that creates the materialized views. Even though the job is running on a shared cluster, the service principal still needs to be an own...

  • 0 kudos
1 More Replies
DavidMooreZA
by New Contributor II
  • 4022 Views
  • 3 replies
  • 0 kudos

Structure Streaming - Table(s) to File(s) - Is it possible?

Hi,I'm trying to do something that's probably considered a no-no. The documentation makes me believe it should be possible. But, I'm getting lots of weird errors when trying to make it work.If anyone has managed to get something similar to work, plea...

  • 4022 Views
  • 3 replies
  • 0 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 0 kudos

Hi, Can you please share what the error stack trace looks like? One possible cause of this error is that the schema of the table you are reading from does not match the schema of the data you are writing to.

  • 0 kudos
2 More Replies
Floody
by New Contributor II
  • 3194 Views
  • 1 replies
  • 0 kudos

Delta Live Tables use case

Hi all,We have the following use case and wondering if DLT is the correct approach.Landing area with daily dumps of parquet files into our Data Lake container.The daily dump does a full overwrite of the parquet each time, keeping the same file name.T...

Data Engineering
Delta Live Tables
  • 3194 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 0 kudos

Using DLT for Your Use Case DLT can be a good fit for your scenario, especially when implementing Slowly Changing Dimension (SCD) Type 2. Here's how you can approach this: Ingestion with Auto Loader: Use Auto Loader to ingest the daily parquet files ...

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels