Data Engineering

Forum Posts

Sorted by:

by jerry-xu-sa • New Contributor II

03-06-2023 11:45:02 PM

3966 Views
2 replies
1 kudos

Order of a dataframe is not perserved after calling cache() and limit()

Here are the simple steps to reproduce it. Note that col "foo" and "bar" are just redundant cols to make sure the dataframe doesn't fit into a single partition. // generate a random df val rand = new scala.util.Random val df = (1 to 3000).map(i => (r...

Data Engineering

3966 Views
2 replies
1 kudos

03-06-2023 11:45:02 PM

View Replies

Latest Reply

Anonymous
Not applicable

03-31-2023 5:58:05 PM

1 kudos

Hi @Jerry Xu Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedback wil...

1 kudos

03-31-2023 5:58:05 PM

1 More Replies

by chhavibansal • New Contributor III

01-17-2023 1:22:22 AM

1604 Views
1 replies
0 kudos

What is the upper bound limit for dataSkippingNumIndexedCols, to keeps stats in delta log file?

Is there an upper bound of number that i can assign to delta.dataSkippingNumIndexedCols for computing statistics. Is there some tradeoff benchmark available for increasing this number beyond 32.

Data Engineering

1604 Views
1 replies
0 kudos

01-17-2023 1:22:22 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-08-2023 8:21:43 PM

0 kudos

@Chhavi Bansal :The delta.dataSkippingNumIndexedCols configuration property controls the maximum number of columns that Delta Lake will build statistics on during data skipping. By default, this value is set to 32. There is no hard upper bound on th...

0 kudos

03-08-2023 8:21:43 PM

by sarvesh • Contributor III

11-22-2021 9:51:42 PM

42813 Views
18 replies
6 kudos

Resolved! java.lang.OutOfMemoryError: GC overhead limit exceeded. [ solved ]

solution :- i don't need to add any executor or driver memory all i had to do in my case was add this : - option("maxRowsInMemory", 1000). Before i could n't even read a 9mb file now i just read a 50mb file without any error.{ val df = spark.read .f...

Data Engineering

42813 Views
18 replies
6 kudos

11-22-2021 9:51:42 PM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

11-23-2021 3:07:12 AM

6 kudos

can you try without: .set("spark.driver.memory","4g") .set("spark.executor.memory", "6g")It is clearly show that there is no 4gb free on driver and 6gb free on executor (you can share hardware cluster details also).You can not also allocate 100% for ...

6 kudos

11-23-2021 3:07:12 AM

17 More Replies

by User16826992666 • Databricks Employee

06-17-2021 2:08:18 PM

3024 Views
1 replies
0 kudos

Resolved! Is there a limit to the number of jobs that can be created in a workspace?

Data Engineering

3024 Views
1 replies
0 kudos

06-17-2021 2:08:18 PM

View Replies

Latest Reply

Ryan_Chynoweth
Databricks Employee

06-17-2021 4:19:27 PM

0 kudos

Standard tiers are allowed to have 1000 saved jobs. Premium tiers have a higher limit at 1500. Some clouds have an enterprise tier which has a saved job limit of 2000. A workspace is limited to 1000 concurrent job runs. A 429 Too Many Requests respon...

0 kudos

06-17-2021 4:19:27 PM

Databricks Community

Order of a dataframe is not perserved after calling cache() and limit()

What is the upper bound limit for dataSkippingNumIndexedCols, to keeps stats in delta log file?

Resolved! java.lang.OutOfMemoryError: GC overhead limit exceeded. [ solved ]

Resolved! Is there a limit to the number of jobs that can be created in a workspace?