Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
Here are the simple steps to reproduce it. Note that col "foo" and "bar" are just redundant cols to make sure the dataframe doesn't fit into a single partition. // generate a random df
val rand = new scala.util.Random
val df = (1 to 3000).map(i => (r...
Hi @Jerry Xu Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedback wil...
Is there an upper bound of number that i can assign to delta.dataSkippingNumIndexedCols for computing statistics. Is there some tradeoff benchmark available for increasing this number beyond 32.
@Chhavi Bansal :The delta.dataSkippingNumIndexedCols configuration property controls the maximum number of columns that Delta Lake will build statistics on during data skipping. By default, this value is set to 32. There is no hard upper bound on th...
solution :- i don't need to add any executor or driver memory all i had to do in my case was add this : - option("maxRowsInMemory", 1000). Before i could n't even read a 9mb file now i just read a 50mb file without any error.{ val df = spark.read .f...
can you try without: .set("spark.driver.memory","4g") .set("spark.executor.memory", "6g")It is clearly show that there is no 4gb free on driver and 6gb free on executor (you can share hardware cluster details also).You can not also allocate 100% for ...
Standard tiers are allowed to have 1000 saved jobs. Premium tiers have a higher limit at 1500. Some clouds have an enterprise tier which has a saved job limit of 2000. A workspace is limited to 1000 concurrent job runs. A 429 Too Many Requests respon...