- 1168 Views
- 1 replies
- 0 kudos
Is there an upper bound of number that i can assign to delta.dataSkippingNumIndexedCols for computing statistics. Is there some tradeoff benchmark available for increasing this number beyond 32.
- 1168 Views
- 1 replies
- 0 kudos
Latest Reply
@Chhavi Bansal :The delta.dataSkippingNumIndexedCols configuration property controls the maximum number of columns that Delta Lake will build statistics on during data skipping. By default, this value is set to 32. There is no hard upper bound on th...
- 4249 Views
- 4 replies
- 1 kudos
var df2 = spark.read
.format("csv")
.option("sep", ",")
.option("header", "true")
.option("inferSchema", "true")
.load("src/main/resources/datasets/titanic.csv")
df2.createOrReplaceTempView("titanic")
spark.table("titanic").cach...
- 4249 Views
- 4 replies
- 1 kudos
Latest Reply
can you share what the *newtitanic* is I think that you would have done something similarspark.sql("create table newtitanic as select * from titanic")something like this works for me, but the issue is i first make a temp view then again create a tab...
3 More Replies
by
aladda
• Databricks Employee
- 1550 Views
- 0 replies
- 0 kudos
It is best to avoid collecting stats on long strings. You typically want to collect stats on column that are used in filter, where clauses, joins and on which you tend to performance aggregations - typically numerical valuesYou can avoid collecting s...
- 1550 Views
- 0 replies
- 0 kudos
- 9466 Views
- 2 replies
- 0 kudos
I have the following sparkdataframe :
agent_id/ payment_amount
a /1000
b /1100
a /1100
a /1200
b /1200
b /1250
a /10000
b /9000
my desire output would be something like
<code>agen_id 95_quantile
a whatever is95 quantile for a...
- 9466 Views
- 2 replies
- 0 kudos
Latest Reply
For those of you who haven't run into this SO thread http://stackoverflow.com/questions/39633614/calculate-quantile-on-grouped-data-in-spark-dataframe, it's pointed out there that one work-around is to use HIVE UDF "percentile_approx". Please see th...
1 More Replies