cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Bob-
by New Contributor II
  • 3022 Views
  • 3 replies
  • 4 kudos

Resolved! Upload Screenshot

I am new to the Databricks Free Edition. I am trying to upload a screenshot to be able to put it in a table and run some AI functions against it. It is not letting me upload a .png file. After several attempts I am being told that the root cause is p...

  • 3022 Views
  • 3 replies
  • 4 kudos
Latest Reply
Sharanya13
Contributor III
  • 4 kudos

@Bob-  Can you explain your use case? I'm not sure I understand "I am trying to upload a screenshot to be able to put it in a table and run some AI functions against it."Are you trying to perform OCR?

  • 4 kudos
2 More Replies
Phani1
by Databricks MVP
  • 3491 Views
  • 4 replies
  • 2 kudos

Potential Challenges of Using Iceberg Format (Databricks + Iceberg)

 Hi Team,What are the potential challenges of using Iceberg format instead of Delta for saving data in databricks?Regards,Phani

  • 3491 Views
  • 4 replies
  • 2 kudos
Latest Reply
sridharplv
Valued Contributor II
  • 2 kudos

Hi @Phani1 , Please find the below link which details out maintaining icerberg metadata along with delta metadata. https://community.databricks.com/t5/technical-blog/read-delta-tables-with-snowflake-via-unity-catalog/ba-p/115877

  • 2 kudos
3 More Replies
stevewb
by New Contributor III
  • 1435 Views
  • 1 replies
  • 0 kudos

Setting shuffle partitions in Databricks SQL Warehouse

I think it used to be possible to set shuffle partitions in databricks sql warehouse through e.g.: SET spark.sql.shuffle.partitions=20000. However, when I run this now, I get the error:[CONFIG_NOT_AVAILABLE] Configuration spark.sql.shuffle.partitions...

  • 1435 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @stevewb ,It's not available anymore. According with documentation:" Databricks SQL allows admins to configure Spark properties for data access in the workspace settings menu. See Data access configurationsOther than data access configurations, Da...

  • 0 kudos
Somia
by New Contributor III
  • 3929 Views
  • 7 replies
  • 2 kudos

Resolved! sql query is not returning _sqldf.

Notebooks in my workspace are not returning _sqldf when a sql query is run. If I run this code, it would give an error in second cell that _sqldf is not defined.First Cell:%sqlselect * from some_table limit 10Second Cell:%sqlselect * from _sqldfHowev...

  • 3929 Views
  • 7 replies
  • 2 kudos
Latest Reply
Somia
New Contributor III
  • 2 kudos

Changing the notebook to default python and all purpose compute have fixed the issue. I am able to access _sqldf in subsequent sql or python cell.

  • 2 kudos
6 More Replies
anilsampson
by New Contributor III
  • 3069 Views
  • 2 replies
  • 3 kudos

Resolved! How to get previous version of the table in databricks sql dynamically

hello, im trying to get the previous version of a delta table using timestamp but databricks sql does not allow to use variables the only thing i can do is use TIMESTAMP AS OF CURRENT_DATE() -1 if i have refreshed the table today.please let me know i...

  • 3069 Views
  • 2 replies
  • 3 kudos
Latest Reply
anilsampson
New Contributor III
  • 3 kudos

thank you @Vidhi_Khaitan  .Is there an upgrade or use case in works where we can pass parameters via workflow while triggering a databricks dashboard?

  • 3 kudos
1 More Replies
Divya_Bhadauria
by New Contributor III
  • 1736 Views
  • 1 replies
  • 0 kudos

Update databricks job parameter with CLI

Use Case:Updating a Databricks job with multiple tasks can be time-consuming and error-prone when changes (such as adding new parameters) need to be applied to each task manually.Possible Solutions:1. Using Databricks CLI – jobs reset commandYou can ...

Divya_Bhadauria_1-1751740411129.png Divya_Bhadauria_0-1751740346442.png
  • 1736 Views
  • 1 replies
  • 0 kudos
Latest Reply
anilsampson
New Contributor III
  • 0 kudos

hello Divya, Could you also try YAML and update your task accordingly and deploy it as a part of asset bundles? let me know if you feel both are same? Regards,Anil.

  • 0 kudos
zach
by New Contributor III
  • 1212 Views
  • 1 replies
  • 0 kudos

Get the total amount of S3 storage used per user

In Databricks is it possible to get the total amount of delta lake storage being used in the parquet format per user? Subsequently, what are the best practices on making sure that users saving delta files are not taking up storage unnecessarily, for ...

  • 1212 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sharanya13
Contributor III
  • 0 kudos

Hi @zach, can you expand on why you need to know the total storage per user?Best practices - If you use Databricks managed tables, optimization is taken care of. https://docs.databricks.com/aws/en/optimizations/predictive-optimization

  • 0 kudos
AbhayAgarwal
by Databricks Partner
  • 3953 Views
  • 1 replies
  • 0 kudos

DB to Snowflake connection error

We are getting below mentioned error after upgrading Databricks server to 15 version - error while making connection to Snowflake in Notebook " Bad request; operation not supported." .Has any one got this error  ? Any pointers how to fix it ? 

  • 3953 Views
  • 1 replies
  • 0 kudos
Latest Reply
kamal_ch
Databricks Employee
  • 0 kudos

Hi Abhay, Here are some general troubleshooting steps and pointers to help you resolve this issue: Ensure that you are using the correct connection configurations for Snowflake. Configuration mismatches can sometimes lead to operation errors.  Using ...

  • 0 kudos
pooja_bhumandla
by Databricks Partner
  • 2353 Views
  • 2 replies
  • 2 kudos

Resolved! Small Files Persist After OPTIMIZE with Target File Size Set to 100MB – Seeking Possible Reasons

I'm currently working on optimizing a Delta table in Databricks. As part of this, I’ve increased the target file size from the (~33MB) to 100MB using the OPTIMIZE command. However, after running the OPTIMIZE operation, I still observe a large number ...

  • 2353 Views
  • 2 replies
  • 2 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 2 kudos

Hi pooja_bhumandla,Great question! How are you doing today? Even after running the OPTIMIZE command with a higher target file size like 100MB, it’s common to still see some small files in your Delta table—especially in partitions with very little dat...

  • 2 kudos
1 More Replies
PeSe
by New Contributor
  • 1466 Views
  • 2 replies
  • 1 kudos

How to fast sync large files (> 100GB)

I want to sync large files (>100GB) from my local system to a DBX Volume. I see 2 Options with different problems, do you have suggestions?Option 1: Needs to open the file completely -> Memory issues with open(local_file_path, 'rb') as file: ...

  • 1466 Views
  • 2 replies
  • 1 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 1 kudos

Hi PeSe,How are you doing today? As per my understanding, You're absolutely right to think through both options carefully. Option 1 runs into memory issues because it's trying to read the whole large file into memory at once, which doesn't work well ...

  • 1 kudos
1 More Replies
Sainath368
by Contributor
  • 2438 Views
  • 6 replies
  • 2 kudos

Clarification Needed: COMPUTE STATISTICS vs COMPUTE DELTA STATISTICS on Delta Tables

Hi everyone,I’m trying to understand the difference between the two commands in Databricks:ANALYZE TABLE <table_name> COMPUTE STATISTICSANALYZE TABLE <table_name> COMPUTE DELTA STATISTICSSpecifically:What exactly does each command do, and how do they...

  • 2438 Views
  • 6 replies
  • 2 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 2 kudos

If you want to know more about query optimization I suggest you look in Spark's Catalyst Optimizer and Adaptive Query Execution (AQE).  You should alway run Analyze table compute statistics becuase this will help Spark's query optimization converge o...

  • 2 kudos
5 More Replies
Sainath368
by Contributor
  • 697 Views
  • 1 replies
  • 0 kudos

COMPUTE STATISTICS- QUERY OPTIMIZER

I ran ANALYZE TABLE <table_name> COMPUTE DELTA STATISTICS on my tables to calculate column-level statistics at the file level for each Parquet file inside the Delta table. Now, I’m wondering which command is better to run next: ANALYZE TABLE <table_n...

  • 697 Views
  • 1 replies
  • 0 kudos
Latest Reply
SP_6721
Honored Contributor II
  • 0 kudos

Hi @Sainath368 ,From what I understand, COMPUTE DELTA STATISTICS generates Delta statistics that are mainly used for data skipping, helping speed up table scans by avoiding unnecessary file reads. However, these stats aren't used by the query optimiz...

  • 0 kudos
Jothia
by New Contributor III
  • 2391 Views
  • 8 replies
  • 3 kudos

Resolved! Unity catalog with pool cluster

We are trying to execute notebook recently migrated to unity catalog through Pool cluster from synapse pipeline. But it giving error as "UC not enabled".Could you please suggest on this. 

  • 2391 Views
  • 8 replies
  • 3 kudos
Latest Reply
Jothia
New Contributor III
  • 3 kudos

Access mode option not available to create under pool

  • 3 kudos
7 More Replies
sachamourier
by Contributor
  • 4459 Views
  • 7 replies
  • 4 kudos

Resolved! All-purpose cluster upsize failures

Hello,For one of my clients, we are using an all-purpose cluster to run some Databricks notebooks. We noticed in the logs of the cluster some Azure Quota Exceptions, from which we would like to know more.As you can see attached, the cluster always su...

  • 4459 Views
  • 7 replies
  • 4 kudos
Latest Reply
sachamourier
Contributor
  • 4 kudos

@szymon_dybczak That makes sense thank you ! However, do you have any idea of why it shows "Current usage: 0" ? And also why does the cluster still manage to reach the max number of nodes even with these "failures" ? Sacha

  • 4 kudos
6 More Replies
thari
by New Contributor II
  • 3142 Views
  • 1 replies
  • 1 kudos

[UNITY_CREDENTIAL_SCOPE_MISSING_SCOPE] Missing Credential Scope. Unity Credential Scope id not foun

Hi,I'm having a problem while trying to write to a delta table from a org.apache.spark.sql.util.QueryExecutionListenerCode:val df = SparkSession.active().createDataFrame(batch, MyClass::class.java)df.write().mode("append").format("delta").saveAsTable...

  • 3142 Views
  • 1 replies
  • 1 kudos
Latest Reply
steyler-db
Databricks Employee
  • 1 kudos

Hello @thari  To answer this issue you are facing: What’s actually happening here is that when you try to write to a Delta table managed by Unity Catalog within a QueryExecutionListener’s callback, Spark’s security context isn’t set up correctly. Tha...

  • 1 kudos
Labels