cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

huytran
by New Contributor II
  • 170 Views
  • 7 replies
  • 0 kudos

Cannot write data to BigQuery when using Databricks secret

I am following this guide on writing data to the BigQuery table.Right now, I have an error when I try to write data using Databricks Secret instead of the JSON credential file and setting the GOOGLE_APPLICATION_CREDENTIALS environment variable. java....

  • 170 Views
  • 7 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

It seems that nothing is being loaded into the GOOGLE_APPLICATION_CREDENTIALS. From https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/master/gcs/INSTALL.md   # The JSON keyfile of the service account used for GCS # access when google.clou...

  • 0 kudos
6 More Replies
Vetrivel
by Contributor
  • 102 Views
  • 4 replies
  • 0 kudos

PowerBI performance with Databricks

We have integrated PowerBI with Databricks to generate reports. However, PowerBI generates over 8,000 lines of code, including numerous OR clauses, which cannot be modified at this time. This results in queries that take more than 4 minutes to execut...

  • 102 Views
  • 4 replies
  • 0 kudos
Latest Reply
Vetrivel
Contributor
  • 0 kudos

Attached is the sample query generated by Power BI. Without the OR conditions the query runs within seconds.

  • 0 kudos
3 More Replies
AlexSantiago
by New Contributor II
  • 2552 Views
  • 11 replies
  • 2 kudos

spotify API get token - raw_input was called, but this frontend does not support input requests.

hello everyone, I'm trying use spotify's api to analyse my music data, but i'm receiving a error during authentication, specifically when I try get the token, above my code.Is it a databricks bug?pip install spotipyfrom spotipy.oauth2 import SpotifyO...

  • 2552 Views
  • 11 replies
  • 2 kudos
Latest Reply
spotifymod
New Contributor
  • 2 kudos

Download the APK file and ensure it’s from a trusted source in spotify app download.

  • 2 kudos
10 More Replies
TamD
by Contributor
  • 1549 Views
  • 7 replies
  • 2 kudos

How do I drop a delta live table?

I'm a newbie and I've just done the "Run your first Delta Live Tables pipeline" tutorial.The tutorial downloads a publicly available csv baby names file and creates two new Delta Live tables from it.  Now I want to be a good dev and clean up the reso...

  • 1549 Views
  • 7 replies
  • 2 kudos
Latest Reply
ImranA
New Contributor III
  • 2 kudos

@gchandra for example a table called "cars", if I remove the table from DLT pipeline and drop the table from catalog. Now if I change the schema of the table, and create the table again using the same table name "cars" through the same pipeline, Why ...

  • 2 kudos
6 More Replies
issa
by New Contributor II
  • 155 Views
  • 3 replies
  • 1 kudos

How to access bronze dlt in silver dlt

I have a job in Workflows thatt runs two DLT pipelines, one for Bronze_Transaction and on for Silver_Transaction. The reason for two DLT pipelines is because i want the tables to be created in bronze catalog and erp schema, and silver catalog and erp...

Data Engineering
dlt
DLT pipeline
Medallion
Workflows
  • 155 Views
  • 3 replies
  • 1 kudos
Latest Reply
issa
New Contributor II
  • 1 kudos

Hi @filipniziol and @ozaaditya,Thank you both for you input. I changed the code, since I figured that the SCD should be on the bronze layer and that i then should filter out open rows in silver.However, that dosent work.My idea was:Bronze layer:impor...

  • 1 kudos
2 More Replies
furkancelik
by New Contributor
  • 105 Views
  • 3 replies
  • 1 kudos

How to use Databricks Unity Catalog as metastore for a local spark session

Hello,I would like to access Databricks Unity Catalog from a Spark session created outside the Databricks environment. Previously, I used Hive metastore and didn’t face any issues connecting in this way. Now, I’ve switched the metastore to Unity Cata...

  • 105 Views
  • 3 replies
  • 1 kudos
Latest Reply
VZLA
Databricks Employee
  • 1 kudos

@furkancelik Glad it helps. I just found this article which I believe will clarify many of your doubts. Please refer straight to the "Accessing Databricks UC from the PySpark shell". Notice the "unity" in the configuration strings will be your UC Def...

  • 1 kudos
2 More Replies
cosmicwhoop
by New Contributor
  • 87 Views
  • 1 replies
  • 0 kudos

Delta Live Tables UI - missing EVENTS

I am new to Databricks and my setup is using Microsoft Azure (Premium Tier) + DatabricksI am trying to build Delta Live Tables and dont see events, without it i am finding it hard to understand the reason for job failure. Attached are 2 screenshot1) ...

  • 87 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 0 kudos

Hi, If you are looking for the reason of job failure you can navigate to view details tab-> logs to figure out the root cause of the failure.The blank screen with no events might be caused as you might have selected one of the DLT table.You can navig...

  • 0 kudos
Direo
by Contributor
  • 281 Views
  • 1 replies
  • 0 kudos

Liquid Clustering on a Feature Store Table Created with FeatureEngineeringClient

Hello everyone,I'm exploring ways to perform clustering on a feature store table that I've created using the FeatureEngineeringClient in Databricks, and I'm particularly interested in applying liquid clustering to one of the columns.Here’s the scenar...

  • 281 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 0 kudos

Hi, # Set the table name and clustering columns table_name = "feature_store_table" clustering_columns = ["column1", "column2"] # Build the SQL command sql_command = f"ALTER TABLE {table_name} CLUSTER BY ({', '.join(clustering_columns)})" # Execute ...

  • 0 kudos
drag7ter
by New Contributor III
  • 407 Views
  • 1 replies
  • 0 kudos

foreachBatch doesn't work in structured streaming

I' m trying to print out number of rows in the batch, but seems it doesn't work properly. I have 1 node compute optimized cluster and run in notebook this code:# Logging the row count using a streaming-friendly approach def log_row_count(batch_df, ba...

Capture.PNG
  • 407 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 0 kudos

Hi, Can you try this: def log_row_count(batch_df, batch_id): row_count = batch_df.count() print(f"Batch ID {batch_id}: {row_count} rows have been processed") LOGGER.info(f"Batch ID {batch_id}: {row_count} rows have been processed") ptv.w...

  • 0 kudos
William_Scardua
by Valued Contributor
  • 256 Views
  • 1 replies
  • 0 kudos

Deleted File Retention Duration in Cluster not work

Hi Guys,I tried to set the retention period in my Cluster, but it`s not workCluster:Notebook:Does not remove physical filesDo you have any ideas? thanks

Screenshot 2024-08-19 at 19.51.25.png Screenshot 2024-08-19 at 19.29.59.png
  • 256 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 0 kudos

Hi, It seems you are setting the retention period to 0 hours. You need to first use this config spark.databricks.delta.retentionDurationCheck.enabled to false. and then set the retentention duration config as you used. You can do a DRY RUN to check i...

  • 0 kudos
Avinash_Narala
by Contributor
  • 211 Views
  • 1 replies
  • 0 kudos

Move delta tables from Dev workspace to Prod Workspace

Hi all,How can i move my delta tables from Dev Workspace to Prod Workspace.Is there any dynamic logic code in python to do it?

  • 211 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 0 kudos

To move Delta tables from a Dev workspace to a Prod workspace, you can use a combination of Delta Lake features and Databricks APIs. Here's a high-level approach with some Python code to help you get started: Method 1: Using CLONE command For smaller...

  • 0 kudos
Bepposbeste1993
by New Contributor II
  • 115 Views
  • 3 replies
  • 0 kudos

Resolved! select 1 query not finishing

Hello,I have the issue that even a query like "select 1" is not finishing. The sql warehouse runs infinite. I have no idea where to look for any issues because in the SPARK UI I cant see any error.What is intresting is that also allpurpose clusters (...

  • 115 Views
  • 3 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @Bepposbeste1993, Do you have the case ID raised for this issue? 

  • 0 kudos
2 More Replies
narisuna
by New Contributor
  • 236 Views
  • 1 replies
  • 0 kudos

single node Cluster CPU not fully used

Hello community,I use a cluster (Single node: Standard_F64s_v2 · DBR: 14.3 LTS (includes Apache Spark 3.5.0, Scala 2.12)) for a job. In this job I didn't use spark multiprocessing. Instead I use this single node cluster as a VM and use python multipr...

  • 236 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 0 kudos

Hi, Yes there was a maintainance release on 7th august which might have caused this issue. If you are still experience this issue, please file a support ticket.

  • 0 kudos
JesseSchouten
by New Contributor
  • 261 Views
  • 1 replies
  • 0 kudos

DLT issue - slow download speed in DLT clusters

Hi all,I'm encountering some issues within my DLT pipelines. Summarized: it takes a long time to install the cluster libraries and dependencies (using %pip installs) due to horribly slow download speeds.These are the symptoms:- From all purpose clust...

  • 261 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 0 kudos

Hi, Possible Causes and Solutions Network Configuration: The private connectivity setup might be affecting DLT clusters differently. Cluster Configuration: Ensure DLT clusters are properly sized for the workload.Consider using a larger driver node fo...

  • 0 kudos
ChristianRRL
by Valued Contributor
  • 1962 Views
  • 2 replies
  • 1 kudos

DLT Dedupping Best Practice in Medallion

Hi there, I have what may be a deceptively simple question but I suspect may have a variety of answers:What is the "right" place to handle dedupping using the medallion architecture?In my example, I already have everything properly laid out with data...

  • 1962 Views
  • 2 replies
  • 1 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 1 kudos

1. Deduplication in medallion architecture can be handled in bronze or silver layer.2. If keeping a complete history of all raw data, including duplicates, in the bronze layer, handle deduplication in the silver layer.3. If not keeping a complete his...

  • 1 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels