Data Engineering

Forum Posts

Sorted by:

by Bob- • New Contributor II

07-07-2025 11:50:54 AM

3022 Views
3 replies
4 kudos

Resolved! Upload Screenshot

I am new to the Databricks Free Edition. I am trying to upload a screenshot to be able to put it in a table and run some AI functions against it. It is not letting me upload a .png file. After several attempts I am being told that the root cause is p...

Data Engineering

3022 Views
3 replies
4 kudos

07-07-2025 11:50:54 AM

View Replies

Latest Reply

Sharanya13
Contributor III

07-07-2025 5:23:20 PM

4 kudos

@Bob- Can you explain your use case? I'm not sure I understand "I am trying to upload a screenshot to be able to put it in a table and run some AI functions against it."Are you trying to perform OCR?

4 kudos

07-07-2025 5:23:20 PM

2 More Replies

by Phani1 • Databricks MVP

05-16-2025 3:45:52 AM

3491 Views
4 replies
2 kudos

Potential Challenges of Using Iceberg Format (Databricks + Iceberg)

Hi Team,What are the potential challenges of using Iceberg format instead of Delta for saving data in databricks?Regards,Phani

Data Engineering

3491 Views
4 replies
2 kudos

05-16-2025 3:45:52 AM

View Replies

Latest Reply

sridharplv
Valued Contributor II

07-07-2025 12:06:16 PM

2 kudos

Hi @Phani1 , Please find the below link which details out maintaining icerberg metadata along with delta metadata. https://community.databricks.com/t5/technical-blog/read-delta-tables-with-snowflake-via-unity-catalog/ba-p/115877

2 kudos

07-07-2025 12:06:16 PM

3 More Replies

by stevewb • New Contributor III

07-07-2025 8:09:22 AM

1435 Views
1 replies
0 kudos

Setting shuffle partitions in Databricks SQL Warehouse

I think it used to be possible to set shuffle partitions in databricks sql warehouse through e.g.: SET spark.sql.shuffle.partitions=20000. However, when I run this now, I get the error:[CONFIG_NOT_AVAILABLE] Configuration spark.sql.shuffle.partitions...

Data Engineering

1435 Views
1 replies
0 kudos

07-07-2025 8:09:22 AM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

07-07-2025 8:29:22 AM

0 kudos

Hi @stevewb ,It's not available anymore. According with documentation:" Databricks SQL allows admins to configure Spark properties for data access in the workspace settings menu. See Data access configurationsOther than data access configurations, Da...

0 kudos

07-07-2025 8:29:22 AM

by Somia • New Contributor III

02-05-2025 6:42:24 AM

3929 Views
7 replies
2 kudos

Resolved! sql query is not returning _sqldf.

Notebooks in my workspace are not returning _sqldf when a sql query is run. If I run this code, it would give an error in second cell that _sqldf is not defined.First Cell:%sqlselect * from some_table limit 10Second Cell:%sqlselect * from _sqldfHowev...

Data Engineering

3929 Views
7 replies
2 kudos

02-05-2025 6:42:24 AM

View Replies

Latest Reply

Somia
New Contributor III

02-07-2025 5:41:51 AM

2 kudos

Changing the notebook to default python and all purpose compute have fixed the issue. I am able to access _sqldf in subsequent sql or python cell.

2 kudos

02-07-2025 5:41:51 AM

6 More Replies

by anilsampson • New Contributor III

07-06-2025 10:40:22 PM

3069 Views
2 replies
3 kudos

Resolved! How to get previous version of the table in databricks sql dynamically

hello, im trying to get the previous version of a delta table using timestamp but databricks sql does not allow to use variables the only thing i can do is use TIMESTAMP AS OF CURRENT_DATE() -1 if i have refreshed the table today.please let me know i...

Data Engineering

3069 Views
2 replies
3 kudos

07-06-2025 10:40:22 PM

View Replies

Latest Reply

anilsampson
New Contributor III

07-07-2025 1:57:23 AM

3 kudos

thank you @Vidhi_Khaitan .Is there an upgrade or use case in works where we can pass parameters via workflow while triggering a databricks dashboard?

3 kudos

07-07-2025 1:57:23 AM

1 More Replies

by Divya_Bhadauria • New Contributor III

07-05-2025 11:37:59 AM

1736 Views
1 replies
0 kudos

Update databricks job parameter with CLI

Use Case:Updating a Databricks job with multiple tasks can be time-consuming and error-prone when changes (such as adding new parameters) need to be applied to each task manually.Possible Solutions:1. Using Databricks CLI – jobs reset commandYou can ...

Data Engineering

1736 Views
1 replies
0 kudos

07-05-2025 11:37:59 AM

View Replies

Latest Reply

anilsampson
New Contributor III

07-06-2025 10:09:49 PM

0 kudos

hello Divya, Could you also try YAML and update your task accordingly and deploy it as a part of asset bundles? let me know if you feel both are same? Regards,Anil.

0 kudos

07-06-2025 10:09:49 PM

by zach • New Contributor III

07-04-2025 2:13:33 AM

1212 Views
1 replies
0 kudos

Get the total amount of S3 storage used per user

In Databricks is it possible to get the total amount of delta lake storage being used in the parquet format per user? Subsequently, what are the best practices on making sure that users saving delta files are not taking up storage unnecessarily, for ...

Data Engineering

1212 Views
1 replies
0 kudos

07-04-2025 2:13:33 AM

View Replies

Latest Reply

Sharanya13
Contributor III

07-05-2025 6:44:18 PM

0 kudos

Hi @zach, can you expand on why you need to know the total storage per user?Best practices - If you use Databricks managed tables, optimization is taken care of. https://docs.databricks.com/aws/en/optimizations/predictive-optimization

0 kudos

07-05-2025 6:44:18 PM

by AbhayAgarwal • Databricks Partner

03-18-2025 10:32:23 AM

3953 Views
1 replies
0 kudos

DB to Snowflake connection error

We are getting below mentioned error after upgrading Databricks server to 15 version - error while making connection to Snowflake in Notebook " Bad request; operation not supported." .Has any one got this error ? Any pointers how to fix it ?

Data Engineering

3953 Views
1 replies
0 kudos

03-18-2025 10:32:23 AM

View Replies

Latest Reply

kamal_ch
Databricks Employee

07-05-2025 5:38:57 PM

0 kudos

Hi Abhay, Here are some general troubleshooting steps and pointers to help you resolve this issue: Ensure that you are using the correct connection configurations for Snowflake. Configuration mismatches can sometimes lead to operation errors. Using ...

0 kudos

07-05-2025 5:38:57 PM

by pooja_bhumandla • Databricks Partner

07-03-2025 11:32:53 PM

2353 Views
2 replies
2 kudos

Resolved! Small Files Persist After OPTIMIZE with Target File Size Set to 100MB – Seeking Possible Reasons

I'm currently working on optimizing a Delta table in Databricks. As part of this, I’ve increased the target file size from the (~33MB) to 100MB using the OPTIMIZE command. However, after running the OPTIMIZE operation, I still observe a large number ...

Data Engineering

2353 Views
2 replies
2 kudos

07-03-2025 11:32:53 PM

View Replies

Latest Reply

Brahmareddy
Esteemed Contributor

07-04-2025 8:23:48 PM

2 kudos

Hi pooja_bhumandla,Great question! How are you doing today? Even after running the OPTIMIZE command with a higher target file size like 100MB, it’s common to still see some small files in your Delta table—especially in partitions with very little dat...

2 kudos

07-04-2025 8:23:48 PM

1 More Replies

by PeSe • New Contributor

07-04-2025 3:38:06 AM

1466 Views
2 replies
1 kudos

How to fast sync large files (> 100GB)

I want to sync large files (>100GB) from my local system to a DBX Volume. I see 2 Options with different problems, do you have suggestions?Option 1: Needs to open the file completely -> Memory issues with open(local_file_path, 'rb') as file: ...

Data Engineering

1466 Views
2 replies
1 kudos

07-04-2025 3:38:06 AM

View Replies

Latest Reply

Brahmareddy
Esteemed Contributor

07-04-2025 8:19:57 PM

1 kudos

Hi PeSe,How are you doing today? As per my understanding, You're absolutely right to think through both options carefully. Option 1 runs into memory issues because it's trying to read the whole large file into memory at once, which doesn't work well ...

1 kudos

07-04-2025 8:19:57 PM

1 More Replies

by Sainath368 • Contributor

05-27-2025 3:50:03 AM

2438 Views
6 replies
2 kudos

Clarification Needed: COMPUTE STATISTICS vs COMPUTE DELTA STATISTICS on Delta Tables

Hi everyone,I’m trying to understand the difference between the two commands in Databricks:ANALYZE TABLE <table_name> COMPUTE STATISTICSANALYZE TABLE <table_name> COMPUTE DELTA STATISTICSSpecifically:What exactly does each command do, and how do they...

Data Engineering

2438 Views
6 replies
2 kudos

05-27-2025 3:50:03 AM

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

07-04-2025 10:28:01 AM

2 kudos

If you want to know more about query optimization I suggest you look in Spark's Catalyst Optimizer and Adaptive Query Execution (AQE). You should alway run Analyze table compute statistics becuase this will help Spark's query optimization converge o...

2 kudos

07-04-2025 10:28:01 AM

5 More Replies

by Sainath368 • Contributor

07-03-2025 7:29:39 AM

697 Views
1 replies
0 kudos

COMPUTE STATISTICS- QUERY OPTIMIZER

I ran ANALYZE TABLE <table_name> COMPUTE DELTA STATISTICS on my tables to calculate column-level statistics at the file level for each Parquet file inside the Delta table. Now, I’m wondering which command is better to run next: ANALYZE TABLE <table_n...

Data Engineering

697 Views
1 replies
0 kudos

07-03-2025 7:29:39 AM

View Replies

Latest Reply

SP_6721
Honored Contributor II

07-04-2025 6:17:07 AM

0 kudos

Hi @Sainath368 ,From what I understand, COMPUTE DELTA STATISTICS generates Delta statistics that are mainly used for data skipping, helping speed up table scans by avoiding unnecessary file reads. However, these stats aren't used by the query optimiz...

0 kudos

07-04-2025 6:17:07 AM

by Jothia • New Contributor III

07-03-2025 7:02:04 AM

2391 Views
8 replies
3 kudos

Resolved! Unity catalog with pool cluster

We are trying to execute notebook recently migrated to unity catalog through Pool cluster from synapse pipeline. But it giving error as "UC not enabled".Could you please suggest on this.

Data Engineering

2391 Views
8 replies
3 kudos

07-03-2025 7:02:04 AM

View Replies

Latest Reply

Jothia
New Contributor III

07-03-2025 7:29:01 AM

3 kudos

Access mode option not available to create under pool

3 kudos

07-03-2025 7:29:01 AM

7 More Replies

by sachamourier • Contributor

07-03-2025 1:32:30 AM

4459 Views
7 replies
4 kudos

Resolved! All-purpose cluster upsize failures

Hello,For one of my clients, we are using an all-purpose cluster to run some Databricks notebooks. We noticed in the logs of the cluster some Azure Quota Exceptions, from which we would like to know more.As you can see attached, the cluster always su...

Data Engineering

4459 Views
7 replies
4 kudos

07-03-2025 1:32:30 AM

View Replies

Latest Reply

sachamourier
Contributor

07-04-2025 2:28:14 AM

4 kudos

@szymon_dybczak That makes sense thank you ! However, do you have any idea of why it shows "Current usage: 0" ? And also why does the cluster still manage to reach the max number of nodes even with these "failures" ? Sacha

4 kudos

07-04-2025 2:28:14 AM

6 More Replies

by thari • New Contributor II

06-20-2025 1:32:37 AM

3142 Views
1 replies
1 kudos

[UNITY_CREDENTIAL_SCOPE_MISSING_SCOPE] Missing Credential Scope. Unity Credential Scope id not foun

Hi,I'm having a problem while trying to write to a delta table from a org.apache.spark.sql.util.QueryExecutionListenerCode:val df = SparkSession.active().createDataFrame(batch, MyClass::class.java)df.write().mode("append").format("delta").saveAsTable...

Data Engineering

3142 Views
1 replies
1 kudos

06-20-2025 1:32:37 AM

View Replies

Latest Reply

steyler-db
Databricks Employee

07-03-2025 10:45:00 AM

1 kudos

Hello @thari To answer this issue you are facing: What’s actually happening here is that when you try to write to a Delta table managed by Unity Catalog within a QueryExecutionListener’s callback, Spark’s security context isn’t set up correctly. Tha...

1 kudos

07-03-2025 10:45:00 AM

Databricks Community

Forum Posts

Resolved! Upload Screenshot

Potential Challenges of Using Iceberg Format (Databricks + Iceberg)

Setting shuffle partitions in Databricks SQL Warehouse

Resolved! sql query is not returning _sqldf.

Resolved! How to get previous version of the table in databricks sql dynamically

Update databricks job parameter with CLI

Get the total amount of S3 storage used per user

DB to Snowflake connection error

Resolved! Small Files Persist After OPTIMIZE with Target File Size Set to 100MB – Seeking Possible Reasons

How to fast sync large files (> 100GB)

Clarification Needed: COMPUTE STATISTICS vs COMPUTE DELTA STATISTICS on Delta Tables

COMPUTE STATISTICS- QUERY OPTIMIZER

Resolved! Unity catalog with pool cluster

Resolved! All-purpose cluster upsize failures

[UNITY_CREDENTIAL_SCOPE_MISSING_SCOPE] Missing Credential Scope. Unity Credential Scope id not foun

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template