cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

iptkrisna
by New Contributor III
  • 3743 Views
  • 2 replies
  • 1 kudos

Clear Cache From a Notebook, not from a Cluster

Hi, I'm running all my jobs on one big cluster, I'm just concerned is there a solution on how we could clear cache resulted by a notebook in the end of the job when its done? hence it does not causing any memory problem sometime from one to another, ...

  • 3743 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @krisna math​ We haven't heard from you since the last response from @Debayan Mukherjee​ â€‹, and I was checking back to see if her suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be helpful to...

  • 1 kudos
1 More Replies
jlgr
by New Contributor II
  • 3853 Views
  • 2 replies
  • 0 kudos

How disable disk cache in SQL Warehouse (Azure Databricks)?

Hi! I want to disable disk cache for SQL Warehouse in Azure Databricks, but it seems that is not possible. Is it correct?You can't use this configuration for SQL Warehouse (https://learn.microsoft.com/en-US/azure/databricks/optimizations/disk-cache#-...

  • 3853 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @jlgr jlgr​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we ...

  • 0 kudos
1 More Replies
maartenvr
by New Contributor III
  • 23693 Views
  • 9 replies
  • 2 kudos

Resolved! Unable to clear cache using a pyspark session

Hi all,I am using a persist call on a spark dataframe inside an application to speed-up computations. The dataframe is used throughout my application and at the end of the application I am trying to clear the cache of the whole spark session by calli...

  • 23693 Views
  • 9 replies
  • 2 kudos
Latest Reply
maartenvr
New Contributor III
  • 2 kudos

No solution yet:Hi @Suteja Kanuri​ ,Thank you for thinking along and replying!Unfortunately, I have not found a solution yet.I am getting an error that there exists no ```.getCache()``` method on a spark context. Also note that I have tried to do som...

  • 2 kudos
8 More Replies
jerry-xu-sa
by New Contributor II
  • 2448 Views
  • 2 replies
  • 1 kudos

Order of a dataframe is not perserved after calling cache() and limit()

Here are the simple steps to reproduce it. Note that col "foo" and "bar" are just redundant cols to make sure the dataframe doesn't fit into a single partition. // generate a random df val rand = new scala.util.Random val df = (1 to 3000).map(i => (r...

  • 2448 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Jerry Xu​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedback wil...

  • 1 kudos
1 More Replies
fury88
by New Contributor II
  • 1765 Views
  • 1 replies
  • 1 kudos

Does CACHE TABLE/VIEW have a create or replace like view?

I'm trying to cache data/queries that we normally have as temporary views that get replaced when the code is run based on dynamic python. What I'd like to know is will CACHE TABLE get overwritten each time you run it? Is it smart enough to recognize ...

  • 1765 Views
  • 1 replies
  • 1 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 1 kudos

Hi @Matt Fury​ Yes...I guess cache overwrites each time you run it because for me it took nearly same amount of time for 1million records to be cached. However, you can check whether the table is cached or not using .storageLevel method. E.g. I have...

  • 1 kudos
pantelis_mare
by Contributor III
  • 2371 Views
  • 3 replies
  • 0 kudos

Spark 3 AQE and cache

Hello everybody,I recently discovered (the hard way) that when a query plan uses cached data, the AQE does not kick-in. Result is that you loose the super cool feature of dynamic partition coalesce (no more custom shuffle readers in the DAG). Is ther...

  • 2371 Views
  • 3 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

Hi @Pantelis Maroudis​,Did you check the physical query plan? did you check the SQL sub tab with in Spark UI? it will help you to undertand better what is happening.

  • 0 kudos
2 More Replies
368545
by New Contributor III
  • 2867 Views
  • 2 replies
  • 2 kudos

Resolved! Errors on Redash when queries are in cache

We got the following error when running queries on Redash connected toDatabricks early today (2022-08-24):```Error running query: [HY000] [Simba][Hardy] (35) Error from server:error code: '0' error message:'org.apache.spark.sql.catalyst.expressions.U...

  • 2867 Views
  • 2 replies
  • 2 kudos
Latest Reply
Debayan
Databricks Employee
  • 2 kudos

This can be related to user permission, particularly necessary permission to access the table in the database instance. I understand in SQL editor it is working fine, still can we check the permissions?

  • 2 kudos
1 More Replies
Hila_DG
by New Contributor II
  • 2960 Views
  • 5 replies
  • 4 kudos

Resolved! How to proactively monitor the use of the cache for driver node?

The problem:We have a dataframe which is based on the query:SELECT * FROM Very_Big_TableThis table returns over 4 GB of data, and when we try to push the data to Power BI we get the error message:ODBC: ERROR [HY000] [Microsoft][Hardy] (35) Error from...

  • 2960 Views
  • 5 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hey @Hila Galapo​ Hope everything is going good. Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Thanks!

  • 4 kudos
4 More Replies
shan_chandra
by Databricks Employee
  • 13379 Views
  • 2 replies
  • 0 kudos
  • 13379 Views
  • 2 replies
  • 0 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 0 kudos

%scala def clearAllCaching(tableName: Option[String] = None): Unit = { tableName.map { path => com.databricks.sql.transaction.tahoe.DeltaValidation.invalidateCache(spark, path) } spark.conf.set("com.databricks.sql.io.caching.bucketedRead.enabled", "f...

  • 0 kudos
1 More Replies
User16752240150
by New Contributor II
  • 5551 Views
  • 1 replies
  • 0 kudos

When to use cache vs checkpoint?

I've seen .cache() and .checkpoint() used similarly in some workflows I've come across. What's the difference, and when should I use one over the other?

  • 5551 Views
  • 1 replies
  • 0 kudos
Latest Reply
Srikanth_Gupta_
Valued Contributor
  • 0 kudos

Caching is extremely useful than checkpointing when you have lot of available memory to store your RDD or Dataframes if they are massive.Caching will maintain the result of your transformations so that those transformations will not have to be recomp...

  • 0 kudos
User16826992666
by Valued Contributor
  • 1618 Views
  • 2 replies
  • 0 kudos
  • 1618 Views
  • 2 replies
  • 0 kudos
Latest Reply
sean_owen
Databricks Employee
  • 0 kudos

You do not have to cache anything to make it work. You would decide that based on whether you want to spend memory/storage to avoid recomputing the DataFrame, like when you may use it in multiple operations afterwards.

  • 0 kudos
1 More Replies
Labels