- 465 Views
- 1 replies
- 0 kudos
Is there an upper bound of number that i can assign to delta.dataSkippingNumIndexedCols for computing statistics. Is there some tradeoff benchmark available for increasing this number beyond 32.
- 465 Views
- 1 replies
- 0 kudos
Latest Reply
@Chhavi Bansal​ :The delta.dataSkippingNumIndexedCols configuration property controls the maximum number of columns that Delta Lake will build statistics on during data skipping. By default, this value is set to 32. There is no hard upper bound on th...
- 1554 Views
- 4 replies
- 1 kudos
var df2 = spark.read
.format("csv")
.option("sep", ",")
.option("header", "true")
.option("inferSchema", "true")
.load("src/main/resources/datasets/titanic.csv")
df2.createOrReplaceTempView("titanic")
spark.table("titanic").cach...
- 1554 Views
- 4 replies
- 1 kudos
Latest Reply
can you share what the *newtitanic* is I think that you would have done something similarspark.sql("create table newtitanic as select * from titanic")something like this works for me, but the issue is i first make a temp view then again create a tab...
3 More Replies
by
aladda
• Honored Contributor II
- 648 Views
- 0 replies
- 0 kudos
It is best to avoid collecting stats on long strings. You typically want to collect stats on column that are used in filter, where clauses, joins and on which you tend to performance aggregations - typically numerical valuesYou can avoid collecting s...
- 648 Views
- 0 replies
- 0 kudos
- 7239 Views
- 2 replies
- 0 kudos
I have the following sparkdataframe :
agent_id/ payment_amount
a /1000
b /1100
a /1100
a /1200
b /1200
b /1250
a /10000
b /9000
my desire output would be something like
<code>agen_id 95_quantile
a whatever is95 quantile for a...
- 7239 Views
- 2 replies
- 0 kudos
Latest Reply
For those of you who haven't run into this SO thread http://stackoverflow.com/questions/39633614/calculate-quantile-on-grouped-data-in-spark-dataframe, it's pointed out there that one work-around is to use HIVE UDF "percentile_approx". Please see th...
1 More Replies