cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

andrej
by New Contributor II
  • 1286 Views
  • 4 replies
  • 1 kudos

Partition pruning with generated columns

I have a large table which contains a date_time column.The table contains 2 generated columns year, and month which are extracted from the date_time values and are used for partitioning.I have the following question.If I run the querySELECT *FROM tab...

  • 1286 Views
  • 4 replies
  • 1 kudos
Latest Reply
Vidula
Honored Contributor
  • 1 kudos

Hi @Andrej Znidarsic​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.T...

  • 1 kudos
3 More Replies
MartinB
by Contributor III
  • 13076 Views
  • 26 replies
  • 6 kudos

Resolved! Does partition pruning / partition elimination not work for folder partitioned JSON files? (Spark 3.1.2)

Imagine the following setup:I have log files stored as JSON files partitioned by year, month, day and hour in physical folders:""" /logs |-- year=2020 |-- year=2021 `-- year=2022 |-- month=01 `-- month=02 |-- day=01 |-- day=.....

  • 13076 Views
  • 26 replies
  • 6 kudos
Latest Reply
MartinB
Contributor III
  • 6 kudos

@Kaniz Fatma​  could you maybe involve a Databricks expert?

  • 6 kudos
25 More Replies
pantelis_mare
by Contributor III
  • 1795 Views
  • 3 replies
  • 1 kudos

Resolved! Dynamic Partition Pruning override

Hello everybody,Another strange issue I have and I would like to confirm me if this is a bug or expected behaviour:I'm joining a large dataset with a dimension table and as expected DPP is activated.I was trying to deactivate the feature as it change...

  • 1795 Views
  • 3 replies
  • 1 kudos
Latest Reply
pantelis_mare
Contributor III
  • 1 kudos

Hello @Kaniz Fatma​ Thank you for taking the time to answer.The issue in this case was that spark.databricks.optimizer.deltaTableFilesThreshold was activating DPP even if it was formally deactivated by setting all available "enabled" properties to f...

  • 1 kudos
2 More Replies
Labels