cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

iptkrisna
by New Contributor III
  • 2388 Views
  • 2 replies
  • 1 kudos

Clear Cache From a Notebook, not from a Cluster

Hi, I'm running all my jobs on one big cluster, I'm just concerned is there a solution on how we could clear cache resulted by a notebook in the end of the job when its done? hence it does not causing any memory problem sometime from one to another, ...

  • 2388 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @krisna math​ We haven't heard from you since the last response from @Debayan Mukherjee​ ​, and I was checking back to see if her suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be helpful to...

  • 1 kudos
1 More Replies
jlgr
by New Contributor II
  • 3043 Views
  • 4 replies
  • 2 kudos

Resolved! How disable disk cache in SQL Warehouse (Azure Databricks)?

Hi! I want to disable disk cache for SQL Warehouse in Azure Databricks, but it seems that is not possible. Is it correct?You can't use this configuration for SQL Warehouse (https://learn.microsoft.com/en-US/azure/databricks/optimizations/disk-cache#-...

  • 3043 Views
  • 4 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @jlgr jlgr​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we ...

  • 2 kudos
3 More Replies
maartenvr
by New Contributor III
  • 16600 Views
  • 9 replies
  • 2 kudos

Resolved! Unable to clear cache using a pyspark session

Hi all,I am using a persist call on a spark dataframe inside an application to speed-up computations. The dataframe is used throughout my application and at the end of the application I am trying to clear the cache of the whole spark session by calli...

  • 16600 Views
  • 9 replies
  • 2 kudos
Latest Reply
maartenvr
New Contributor III
  • 2 kudos

No solution yet:Hi @Suteja Kanuri​ ,Thank you for thinking along and replying!Unfortunately, I have not found a solution yet.I am getting an error that there exists no ```.getCache()``` method on a spark context. Also note that I have tried to do som...

  • 2 kudos
8 More Replies
jerry-xu-sa
by New Contributor II
  • 1820 Views
  • 2 replies
  • 1 kudos

Order of a dataframe is not perserved after calling cache() and limit()

Here are the simple steps to reproduce it. Note that col "foo" and "bar" are just redundant cols to make sure the dataframe doesn't fit into a single partition. // generate a random df val rand = new scala.util.Random val df = (1 to 3000).map(i => (r...

  • 1820 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Jerry Xu​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedback wil...

  • 1 kudos
1 More Replies
fury88
by New Contributor II
  • 1274 Views
  • 1 replies
  • 1 kudos

Does CACHE TABLE/VIEW have a create or replace like view?

I'm trying to cache data/queries that we normally have as temporary views that get replaced when the code is run based on dynamic python. What I'd like to know is will CACHE TABLE get overwritten each time you run it? Is it smart enough to recognize ...

  • 1274 Views
  • 1 replies
  • 1 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 1 kudos

Hi @Matt Fury​ Yes...I guess cache overwrites each time you run it because for me it took nearly same amount of time for 1million records to be cached. However, you can check whether the table is cached or not using .storageLevel method. E.g. I have...

  • 1 kudos
pantelis_mare
by Contributor III
  • 1742 Views
  • 3 replies
  • 0 kudos

Spark 3 AQE and cache

Hello everybody,I recently discovered (the hard way) that when a query plan uses cached data, the AQE does not kick-in. Result is that you loose the super cool feature of dynamic partition coalesce (no more custom shuffle readers in the DAG). Is ther...

  • 1742 Views
  • 3 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Moderator
  • 0 kudos

Hi @Pantelis Maroudis​,Did you check the physical query plan? did you check the SQL sub tab with in Spark UI? it will help you to undertand better what is happening.

  • 0 kudos
2 More Replies
368545
by New Contributor III
  • 2281 Views
  • 4 replies
  • 2 kudos

Resolved! Errors on Redash when queries are in cache

We got the following error when running queries on Redash connected toDatabricks early today (2022-08-24):```Error running query: [HY000] [Simba][Hardy] (35) Error from server:error code: '0' error message:'org.apache.spark.sql.catalyst.expressions.U...

  • 2281 Views
  • 4 replies
  • 2 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 2 kudos

Hi @Dat Tran​, We haven't heard from you on the last response from @Debayan Mukherjee​, and I was checking back to see if his suggestions helped you. Or else, If you have any solution, please share it with the community as it can be helpful to others...

  • 2 kudos
3 More Replies
Hila_DG
by New Contributor II
  • 2241 Views
  • 5 replies
  • 4 kudos

Resolved! How to proactively monitor the use of the cache for driver node?

The problem:We have a dataframe which is based on the query:SELECT * FROM Very_Big_TableThis table returns over 4 GB of data, and when we try to push the data to Power BI we get the error message:ODBC: ERROR [HY000] [Microsoft][Hardy] (35) Error from...

  • 2241 Views
  • 5 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hey @Hila Galapo​ Hope everything is going good. Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Thanks!

  • 4 kudos
4 More Replies
shan_chandra
by Esteemed Contributor
  • 10747 Views
  • 2 replies
  • 0 kudos
  • 10747 Views
  • 2 replies
  • 0 kudos
Latest Reply
shan_chandra
Esteemed Contributor
  • 0 kudos

%scala def clearAllCaching(tableName: Option[String] = None): Unit = { tableName.map { path => com.databricks.sql.transaction.tahoe.DeltaValidation.invalidateCache(spark, path) } spark.conf.set("com.databricks.sql.io.caching.bucketedRead.enabled", "f...

  • 0 kudos
1 More Replies
User16752240150
by New Contributor II
  • 4633 Views
  • 1 replies
  • 0 kudos

When to use cache vs checkpoint?

I've seen .cache() and .checkpoint() used similarly in some workflows I've come across. What's the difference, and when should I use one over the other?

  • 4633 Views
  • 1 replies
  • 0 kudos
Latest Reply
Srikanth_Gupta_
Valued Contributor
  • 0 kudos

Caching is extremely useful than checkpointing when you have lot of available memory to store your RDD or Dataframes if they are massive.Caching will maintain the result of your transformations so that those transformations will not have to be recomp...

  • 0 kudos
User16826992666
by Valued Contributor
  • 1162 Views
  • 2 replies
  • 0 kudos
  • 1162 Views
  • 2 replies
  • 0 kudos
Latest Reply
sean_owen
Honored Contributor II
  • 0 kudos

You do not have to cache anything to make it work. You would decide that based on whether you want to spend memory/storage to avoid recomputing the DataFrame, like when you may use it in multiple operations afterwards.

  • 0 kudos
1 More Replies
Labels