cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

RyanHager
by Contributor
  • 895 Views
  • 2 replies
  • 2 kudos

Resolved! Liquid Clustering and S3 Performance

Are there any performance concerns when using liquid clustering and AWS S3.  I believe all the parquet files go in the same folder (Prefix in AWS S3 Terms) verses folders per partition when using "partition by".  And there is this note on S3 performa...

  • 895 Views
  • 2 replies
  • 2 kudos
Latest Reply
iyashk-DB
Databricks Employee
  • 2 kudos

Even though liquid clustering removes Hive-style partition folders, it typically doesn’t cause S3 prefix performance issues on Databricks. Delta tables don’t rely on directory listing for reads; they use the transaction log to locate exact files. In ...

  • 2 kudos
1 More Replies
EdemSeitkh
by New Contributor III
  • 9697 Views
  • 6 replies
  • 0 kudos

Resolved! Pass catalog/schema/table name as a parameter to sql task

Hi, i am trying to pass catalog name as a parameter into query for sql task, and it pastes it with single quotes, which results in error. Is there a way to pass raw value or other possible workarounds? query:INSERT INTO {{ catalog }}.pas.product_snap...

  • 9697 Views
  • 6 replies
  • 0 kudos
Latest Reply
detom
New Contributor II
  • 0 kudos

This works USE CATALOG IDENTIFIER({{ catalog_name }});USE SCHEMA IDENTIFIER({{ schema_name }});

  • 0 kudos
5 More Replies
Gilad-Shai
by New Contributor III
  • 1512 Views
  • 12 replies
  • 12 kudos

Resolved! Creating Serverless Cluster

Hi everyone,I am trying to create a cluster in Databricks Free Edition, but I keep getting the following error:"Cannot create serverless cluster, please try again later."I have attempted this on different days and at different times, but the issue pe...

  • 1512 Views
  • 12 replies
  • 12 kudos
Latest Reply
Gilad-Shai
New Contributor III
  • 12 kudos

Thank you all ( @Sanjeeb2024 , @Sanjeeb2024, @JAHNAVI , @Manoj12421 ), it works!It was not a DataBricks Free Edition as @Masood_Joukar  said.    

  • 12 kudos
11 More Replies
Sainath368
by Contributor
  • 700 Views
  • 4 replies
  • 2 kudos

Migrating from directory-listing to Autoloader Managed File events

Hi everyone,We are currently migrating from a directory listing-based streaming approach to managed file events in Databricks Auto Loader for processing our data in structured streaming.We have a function that handles structured streaming where we ar...

  • 700 Views
  • 4 replies
  • 2 kudos
Latest Reply
Raman_Unifeye
Honored Contributor III
  • 2 kudos

Yes, for your setup, Databricks Auto Loader will create a separate event queue for each independent stream running with the cloudFiles.useManagedFileEvents = true option.As you are running - 1 stream per table, 1 unique directory per stream and 1 uni...

  • 2 kudos
3 More Replies
halsgbs
by New Contributor III
  • 539 Views
  • 3 replies
  • 2 kudos

Alerts V2 Parameters

Hi, I'm working on using Databricks python SDK to create an alert using a notebook, but it seems with V1 there is no way to add subscribers and with V2 there is no option for adding parameters. Is my understanding correct or am I missing something? A...

  • 539 Views
  • 3 replies
  • 2 kudos
Latest Reply
iyashk-DB
Databricks Employee
  • 2 kudos

Alerts V2 (Public Preview) do not support query parameters yet. This is a documented limitation. Legacy alerts (V1) do support parameters and will use the default values defined in the SQL editor. For notifications, both legacy alerts and Alerts V2 a...

  • 2 kudos
2 More Replies
lziolkow2
by Databricks Partner
  • 1753 Views
  • 4 replies
  • 5 kudos

Resolved! Strange DELTA_MULTIPLE_SOURCE_ROW_MATCHING_TARGET_ROW_IN_MERGE error

I use databricks 17.3 runtime.I try to run following code.CREATE OR REPLACE TABLE default.target_table (key1 INT,key2 INT,key3 INT,val STRING) USING DELTA;INSERT INTO target_table(key1, key2, key3, val) VALUES(1, 1, 1, 'a');CREATE OR REPLACE TABLE de...

  • 1753 Views
  • 4 replies
  • 5 kudos
Latest Reply
emma_s
Databricks Employee
  • 5 kudos

Hi, you need to put all of the keys in the ON part of the clause rather then in the where condition. This code works: MERGE INTO target_table AS target USING source_table AS source ON target.key1 = source.key1 AND target.key2 = source.key2 AND target...

  • 5 kudos
3 More Replies
ganesh_raskar
by Databricks Partner
  • 872 Views
  • 5 replies
  • 0 kudos

Installing Custom Packages on Serverless Compute via Databricks Connect

I have a custom Python package that provides a PySpark DataSource implementation. I'm using Databricks Connect (16.4.10) and need to understand package installation options for serverless compute.Works: Traditional Compute ClusterCustom package pre-i...

Data Engineering
data-engineering
databricks-connect
  • 872 Views
  • 5 replies
  • 0 kudos
Latest Reply
Sanjeeb2024
Valued Contributor
  • 0 kudos

Hi @ganesh_raskar - If you can provide which custom package and exact code and error, I can try to replicate at my end and explore the suitable option. 

  • 0 kudos
4 More Replies
Anonymous
by Not applicable
  • 23091 Views
  • 9 replies
  • 17 kudos

Resolved! MetadataChangedException

A delta lake table is created with identity column and I'm not able to load the data parallelly from four process. i'm getting the metadata exception error.I don't want to load the data in temp table . Need to load directly and parallelly in to delta...

  • 23091 Views
  • 9 replies
  • 17 kudos
Latest Reply
lprevost
Contributor III
  • 17 kudos

I'm also having the same problem.   I'm using autoloader to load many files into a delta table with an identity column.  What used to work now dies with this problem -- after running for a long time!!

  • 17 kudos
8 More Replies
siva_pusarla
by Databricks Partner
  • 1194 Views
  • 6 replies
  • 0 kudos

workspace notebook path not recognized by dbutils.notebook.run() when running from a workflow/job

result = dbutils.notebooks.run("/Workspace/YourFolder/NotebookA", timeout_seconds=600, arguments={"param1": "value1"}) print(result)I was able to execute the above code manually from a notebook.But when i run the same notebook as a job, it fails stat...

  • 1194 Views
  • 6 replies
  • 0 kudos
Latest Reply
siva-anantha
Databricks Partner
  • 0 kudos

@siva_pusarla: We use the following pattern and it works,1) Calling notebook - constant location used by Job.            + src/framework                   + notebook_executor.py2) Callee notebooks - dynamic            + src/app/notebooks             ...

  • 0 kudos
5 More Replies
Gaurav_784295
by New Contributor III
  • 4089 Views
  • 4 replies
  • 1 kudos

pyspark.sql.utils.AnalysisException: Non-time-based windows are not supported on streaming DataFrames/Datasets

pyspark.sql.utils.AnalysisException: Non-time-based windows are not supported on streaming DataFrames/DatasetsGetting this error while writing can any one please tell how we can resolve it

  • 4089 Views
  • 4 replies
  • 1 kudos
Latest Reply
siva-anantha
Databricks Partner
  • 1 kudos

I share the same perspective as @preetmdata on this

  • 1 kudos
3 More Replies
dj4
by New Contributor II
  • 1210 Views
  • 4 replies
  • 2 kudos

Azure Databricks UI consuming way too much memory & laggy

This especially happens when the notebook is large with many cells. Even if I clear all the outputs scrolling the notebook is way too laggy. When I start running the code the memory consumption is 3-4GB minimum even if I am not displaying any data/ta...

  • 1210 Views
  • 4 replies
  • 2 kudos
Latest Reply
siva-anantha
Databricks Partner
  • 2 kudos

@dj4: Are you in a corporate proxy environment?Databricks Browser UI uses Web Sockets and sometimes the performance issues happen due to the security checks in the traffic. 

  • 2 kudos
3 More Replies
jeremy98
by Honored Contributor
  • 4138 Views
  • 12 replies
  • 1 kudos

restarting the cluster always running doesn't free the memory?

Hello community,I was working on optimising the driver memory, since there are code that are not optimised for spark, and I was planning temporary to restart the cluster to free up the memory.that could be a potential solution, since if the cluster i...

Screenshot 2025-03-04 at 14.49.44.png
  • 4138 Views
  • 12 replies
  • 1 kudos
Latest Reply
siva-anantha
Databricks Partner
  • 1 kudos

@jeremy98 : Please review the cluster's event logs to understand the trend of the GC related issues. Example in below snapshot.Typically, productive jobs are executed using Job clusters; and they stop as soon as the work is completed. Could you pleas...

  • 1 kudos
11 More Replies
amekojc
by New Contributor II
  • 411 Views
  • 1 replies
  • 1 kudos

How to not make tab headers show when embedding dashboard

When embedding the AI BI dashboard, is there a way to not make the tabs show and instead use our own UI tab to navigate the tabs?Currently, there are two tab headers - one in the databricks dashboard and then another tab section in our embedding webp...

  • 411 Views
  • 1 replies
  • 1 kudos
Latest Reply
mukul1409
Contributor II
  • 1 kudos

Hi @amekojc At the moment, Databricks AI BI Dashboards do not support hiding or disabling the native dashboard tabs when embedding. The embedded dashboard always renders with its own tab headers, and there is no configuration or API to control tab vi...

  • 1 kudos
libpekin
by New Contributor II
  • 559 Views
  • 2 replies
  • 2 kudos

Resolved! Databricks Free Edition - Accessing files in S3

Hello,Attempting read/write files from s3 but got the error below. I am on the free edition (serverless by default). I'm  using access_key and secret_key. Has anyone done this successfully? Thanks!Directly accessing the underlying Spark driver JVM us...

  • 559 Views
  • 2 replies
  • 2 kudos
Latest Reply
libpekin
New Contributor II
  • 2 kudos

Thank @Sanjeeb2024 I was able to confirm as well

  • 2 kudos
1 More Replies
espenol
by Databricks Partner
  • 29612 Views
  • 11 replies
  • 13 kudos

input_file_name() not supported in Unity Catalog

Hey, so our notebooks reading a bunch of json files from storage typically use a input_file_name() when moving from raw to bronze, but after upgrading to Unity Catalog we get an error message:AnalysisException: [UC_COMMAND_NOT_SUPPORTED] input_file_n...

  • 29612 Views
  • 11 replies
  • 13 kudos
Latest Reply
ramanpreet
New Contributor II
  • 13 kudos

The reason why the 'input_file_name' is not supported because this function was available in older versions of Databricks runtime. It got deprecated from Databricks Runtime 13.3 LTS onwards

  • 13 kudos
10 More Replies
Labels