- 1523 Views
- 2 replies
- 1 kudos
Hi, I'm running all my jobs on one big cluster, I'm just concerned is there a solution on how we could clear cache resulted by a notebook in the end of the job when its done? hence it does not causing any memory problem sometime from one to another, ...
- 1523 Views
- 2 replies
- 1 kudos
Latest Reply
Hi @krisna math​ We haven't heard from you since the last response from @Debayan Mukherjee​ ​, and I was checking back to see if her suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be helpful to...
1 More Replies
by
jlgr
• New Contributor II
- 2136 Views
- 4 replies
- 2 kudos
Hi! I want to disable disk cache for SQL Warehouse in Azure Databricks, but it seems that is not possible. Is it correct?You can't use this configuration for SQL Warehouse (https://learn.microsoft.com/en-US/azure/databricks/optimizations/disk-cache#-...
- 2136 Views
- 4 replies
- 2 kudos
Latest Reply
Hi @jlgr jlgr​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we ...
3 More Replies
- 12842 Views
- 9 replies
- 2 kudos
Hi all,I am using a persist call on a spark dataframe inside an application to speed-up computations. The dataframe is used throughout my application and at the end of the application I am trying to clear the cache of the whole spark session by calli...
- 12842 Views
- 9 replies
- 2 kudos
Latest Reply
No solution yet:Hi @Suteja Kanuri​ ,Thank you for thinking along and replying!Unfortunately, I have not found a solution yet.I am getting an error that there exists no ```.getCache()``` method on a spark context. Also note that I have tried to do som...
8 More Replies
- 1318 Views
- 2 replies
- 1 kudos
Here are the simple steps to reproduce it. Note that col "foo" and "bar" are just redundant cols to make sure the dataframe doesn't fit into a single partition. // generate a random df
val rand = new scala.util.Random
val df = (1 to 3000).map(i => (r...
- 1318 Views
- 2 replies
- 1 kudos
Latest Reply
Hi @Jerry Xu​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedback wil...
1 More Replies
by
fury88
• New Contributor II
- 901 Views
- 1 replies
- 1 kudos
I'm trying to cache data/queries that we normally have as temporary views that get replaced when the code is run based on dynamic python. What I'd like to know is will CACHE TABLE get overwritten each time you run it? Is it smart enough to recognize ...
- 901 Views
- 1 replies
- 1 kudos
Latest Reply
Hi @Matt Fury​ Yes...I guess cache overwrites each time you run it because for me it took nearly same amount of time for 1million records to be cached. However, you can check whether the table is cached or not using .storageLevel method. E.g. I have...
- 1239 Views
- 3 replies
- 0 kudos
Hello everybody,I recently discovered (the hard way) that when a query plan uses cached data, the AQE does not kick-in. Result is that you loose the super cool feature of dynamic partition coalesce (no more custom shuffle readers in the DAG). Is ther...
- 1239 Views
- 3 replies
- 0 kudos
Latest Reply
Hi @Pantelis Maroudis​,Did you check the physical query plan? did you check the SQL sub tab with in Spark UI? it will help you to undertand better what is happening.
2 More Replies
by
368545
• New Contributor III
- 1696 Views
- 4 replies
- 2 kudos
We got the following error when running queries on Redash connected toDatabricks early today (2022-08-24):```Error running query: [HY000] [Simba][Hardy] (35) Error from server:error code: '0' error message:'org.apache.spark.sql.catalyst.expressions.U...
- 1696 Views
- 4 replies
- 2 kudos
Latest Reply
Hi @Dat Tran​, We haven't heard from you on the last response from @Debayan Mukherjee​, and I was checking back to see if his suggestions helped you. Or else, If you have any solution, please share it with the community as it can be helpful to others...
3 More Replies
- 1628 Views
- 5 replies
- 4 kudos
The problem:We have a dataframe which is based on the query:SELECT *
FROM Very_Big_TableThis table returns over 4 GB of data, and when we try to push the data to Power BI we get the error message:ODBC: ERROR [HY000] [Microsoft][Hardy] (35) Error from...
- 1628 Views
- 5 replies
- 4 kudos
Latest Reply
Hey @Hila Galapo​ Hope everything is going good. Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Thanks!
4 More Replies
- 3997 Views
- 1 replies
- 0 kudos
I've seen .cache() and .checkpoint() used similarly in some workflows I've come across. What's the difference, and when should I use one over the other?
- 3997 Views
- 1 replies
- 0 kudos
Latest Reply
Caching is extremely useful than checkpointing when you have lot of available memory to store your RDD or Dataframes if they are massive.Caching will maintain the result of your transformations so that those transformations will not have to be recomp...
- 528 Views
- 0 replies
- 0 kudos
When we run the sql statements "DROP TABLE .... CREATE TABLE" for the same table in multiple places (different notebooks, jobs, ...) some notebooks may not see the most recent schema / content.
- 528 Views
- 0 replies
- 0 kudos