Data Engineering

Forum Posts

Sorted by:

by iptkrisna • New Contributor III

06-14-2023 3:04:40 AM

5057 Views
2 replies
1 kudos

Clear Cache From a Notebook, not from a Cluster

Hi, I'm running all my jobs on one big cluster, I'm just concerned is there a solution on how we could clear cache resulted by a notebook in the end of the job when its done? hence it does not causing any memory problem sometime from one to another, ...

Data Engineering

5057 Views
2 replies
1 kudos

06-14-2023 3:04:40 AM

View Replies

Latest Reply

Anonymous
Not applicable

06-16-2023 12:21:47 AM

1 kudos

Hi @krisna math We haven't heard from you since the last response from @Debayan Mukherjee , and I was checking back to see if her suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be helpful to...

1 kudos

06-16-2023 12:21:47 AM

1 More Replies

by jlgr • New Contributor II

02-22-2023 11:57:44 PM

4384 Views
2 replies
0 kudos

How disable disk cache in SQL Warehouse (Azure Databricks)?

Hi! I want to disable disk cache for SQL Warehouse in Azure Databricks, but it seems that is not possible. Is it correct?You can't use this configuration for SQL Warehouse (https://learn.microsoft.com/en-US/azure/databricks/optimizations/disk-cache#-...

Data Engineering

4384 Views
2 replies
0 kudos

02-22-2023 11:57:44 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-24-2023 3:18:27 AM

0 kudos

Hi @jlgr jlgr Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we ...

0 kudos

04-24-2023 3:18:27 AM

1 More Replies

by maartenvr • New Contributor III

02-28-2023 5:06:06 AM

30123 Views
9 replies
2 kudos

Resolved! Unable to clear cache using a pyspark session

Hi all,I am using a persist call on a spark dataframe inside an application to speed-up computations. The dataframe is used throughout my application and at the end of the application I am trying to clear the cache of the whole spark session by calli...

Data Engineering

30123 Views
9 replies
2 kudos

02-28-2023 5:06:06 AM

View Replies

Latest Reply

maartenvr
New Contributor III

03-13-2023 2:52:53 AM

2 kudos

No solution yet:Hi @Suteja Kanuri ,Thank you for thinking along and replying!Unfortunately, I have not found a solution yet.I am getting an error that there exists no ```.getCache()``` method on a spark context. Also note that I have tried to do som...

2 kudos

03-13-2023 2:52:53 AM

8 More Replies

by jerry-xu-sa • New Contributor II

03-06-2023 11:45:02 PM

2959 Views
2 replies
1 kudos

Order of a dataframe is not perserved after calling cache() and limit()

Here are the simple steps to reproduce it. Note that col "foo" and "bar" are just redundant cols to make sure the dataframe doesn't fit into a single partition. // generate a random df val rand = new scala.util.Random val df = (1 to 3000).map(i => (r...

Data Engineering

2959 Views
2 replies
1 kudos

03-06-2023 11:45:02 PM

View Replies

Latest Reply

Anonymous
Not applicable

03-31-2023 5:58:05 PM

1 kudos

Hi @Jerry Xu Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedback wil...

1 kudos

03-31-2023 5:58:05 PM

1 More Replies

by fury88 • New Contributor II

11-30-2022 9:04:20 AM

2287 Views
1 replies
1 kudos

Does CACHE TABLE/VIEW have a create or replace like view?

I'm trying to cache data/queries that we normally have as temporary views that get replaced when the code is run based on dynamic python. What I'd like to know is will CACHE TABLE get overwritten each time you run it? Is it smart enough to recognize ...

Data Engineering

2287 Views
1 replies
1 kudos

11-30-2022 9:04:20 AM

View Replies

Latest Reply

UmaMahesh1
Honored Contributor III

11-30-2022 9:53:15 AM

1 kudos

Hi @Matt Fury Yes...I guess cache overwrites each time you run it because for me it took nearly same amount of time for 1million records to be cached. However, you can check whether the table is cached or not using .storageLevel method. E.g. I have...

1 kudos

11-30-2022 9:53:15 AM

by pantelis_mare • Contributor III

04-27-2022 12:41:08 AM

2896 Views
3 replies
0 kudos

Spark 3 AQE and cache

Hello everybody,I recently discovered (the hard way) that when a query plan uses cached data, the AQE does not kick-in. Result is that you loose the super cool feature of dynamic partition coalesce (no more custom shuffle readers in the DAG). Is ther...

Data Engineering

2896 Views
3 replies
0 kudos

04-27-2022 12:41:08 AM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

08-15-2022 1:45:53 PM

0 kudos

Hi @Pantelis Maroudis,Did you check the physical query plan? did you check the SQL sub tab with in Spark UI? it will help you to undertand better what is happening.

0 kudos

08-15-2022 1:45:53 PM

2 More Replies

by 368545 • New Contributor III

08-24-2022 2:24:36 AM

3255 Views
2 replies
2 kudos

Resolved! Errors on Redash when queries are in cache

We got the following error when running queries on Redash connected toDatabricks early today (2022-08-24):```Error running query: [HY000] [Simba][Hardy] (35) Error from server:error code: '0' error message:'org.apache.spark.sql.catalyst.expressions.U...

Data Engineering

3255 Views
2 replies
2 kudos

08-24-2022 2:24:36 AM

View Replies

Latest Reply

Debayan
Databricks Employee

08-25-2022 9:15:15 AM

2 kudos

This can be related to user permission, particularly necessary permission to access the table in the database instance. I understand in SQL editor it is working fine, still can we check the permissions?

2 kudos

08-25-2022 9:15:15 AM

1 More Replies

by Hila_DG • New Contributor II

01-12-2022 2:40:56 PM

3571 Views
5 replies
4 kudos

Resolved! How to proactively monitor the use of the cache for driver node?

The problem:We have a dataframe which is based on the query:SELECT * FROM Very_Big_TableThis table returns over 4 GB of data, and when we try to push the data to Power BI we get the error message:ODBC: ERROR [HY000] [Microsoft][Hardy] (35) Error from...

Data Engineering

3571 Views
5 replies
4 kudos

01-12-2022 2:40:56 PM

View Replies

Latest Reply

Anonymous
Not applicable

05-13-2022 5:23:11 AM

4 kudos

Hey @Hila Galapo Hope everything is going good. Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Thanks!

4 kudos

05-13-2022 5:23:11 AM

4 More Replies

by shan_chandra • Databricks Employee

03-29-2022 11:48:12 AM

15975 Views
2 replies
1 kudos

Resolved! How to clear all cache without restarting the cluster?

Data Engineering

15975 Views
2 replies
1 kudos

03-29-2022 11:48:12 AM

View Replies

Latest Reply

shan_chandra
Databricks Employee

03-29-2022 11:55:51 AM

1 kudos

%scala def clearAllCaching(tableName: Option[String] = None): Unit = { tableName.map { path => com.databricks.sql.transaction.tahoe.DeltaValidation.invalidateCache(spark, path) } spark.conf.set("com.databricks.sql.io.caching.bucketedRead.enabled", "f...

1 kudos

03-29-2022 11:55:51 AM

1 More Replies

by MoJaMa • Databricks Employee

06-25-2021 3:37:34 PM

1023 Views
1 replies
0 kudos

Whats the cache interval DEK when BYOK is used for Notebooks?

Data Engineering

1023 Views
1 replies
0 kudos

06-25-2021 3:37:34 PM

View Replies

Latest Reply

MoJaMa
Databricks Employee

06-25-2021 3:38:57 PM

0 kudos

30 mins. So, for example, via CloudTrail you might see a call every 30 minutes, but depending on how the notebooks are being accessed.

0 kudos

06-25-2021 3:38:57 PM

by User16752240150 • New Contributor II

06-04-2021 12:04:59 PM

6686 Views
1 replies
1 kudos

When to use cache vs checkpoint?

I've seen .cache() and .checkpoint() used similarly in some workflows I've come across. What's the difference, and when should I use one over the other?

Data Engineering

6686 Views
1 replies
1 kudos

06-04-2021 12:04:59 PM

View Replies

Latest Reply

Srikanth_Gupta_
Databricks Employee

06-25-2021 5:48:48 AM

1 kudos

Caching is extremely useful than checkpointing when you have lot of available memory to store your RDD or Dataframes if they are massive.Caching will maintain the result of your transformations so that those transformations will not have to be recomp...

1 kudos

06-25-2021 5:48:48 AM

by MoJaMa • Databricks Employee

06-18-2021 12:49:41 PM

1286 Views
2 replies
0 kudos

Does Delta refresh DF cache automatically after a delete?

Data Engineering

1286 Views
2 replies
0 kudos

06-18-2021 12:49:41 PM

View Replies

Latest Reply

Srikanth_Gupta_
Databricks Employee

06-18-2021 2:11:13 PM

0 kudos

How about updates and inserts? does it refresh automatically?

0 kudos

06-18-2021 2:11:13 PM

1 More Replies

by User16826992666 • Valued Contributor

06-16-2021 4:08:38 PM

1998 Views
2 replies
0 kudos

Do I have to run .cache() on my dataframe before returning aggregations like count?

Data Engineering

1998 Views
2 replies
0 kudos

06-16-2021 4:08:38 PM

View Replies

Latest Reply

sean_owen
Databricks Employee

06-17-2021 11:24:29 AM

0 kudos

You do not have to cache anything to make it work. You would decide that based on whether you want to spend memory/storage to avoid recomputing the DataFrame, like when you may use it in multiple operations afterwards.

0 kudos

06-17-2021 11:24:29 AM

1 More Replies

by Anonymous • Not applicable

06-07-2021 3:07:19 PM

946 Views
0 replies
0 kudos

SQL notebooks not seeing the latest table schema / content.

When we run the sql statements "DROP TABLE .... CREATE TABLE" for the same table in multiple places (different notebooks, jobs, ...) some notebooks may not see the most recent schema / content.

Data Engineering

946 Views
0 replies
0 kudos

06-07-2021 3:07:19 PM