Databricks Community

cgrant · 06-23-2021

If I run OPTIMIZE on a Delta Lake table, will it prevent me from time travelling to a version before OPTIMIZE was run?

cgrant · 06-23-2021

I see that Delta Lake has an OPTIMIZE command and also table properties for Auto Optimize. What are the differences between these and when should I use one over the other?

cgrant · 06-09-2021

I'm trying to use the native execution engine, Photon. How can I tell if a query is using Photon or is falling back to the non-native Spark engine?

cgrant · 06-08-2021

I am running jobs on Databricks using the Run Submit API with Airflow. I have noticed that rarely, a particular run is run more than one time at once. Why?

cgrant

There is a blogpost for this that includes example code that you can find here

cgrant

At this time, Z-order columns must be specified in the asset definition, the property is pipelines.autoOptimize.zOrderCols. This may change in the future with Predictive Optimization.

cgrant

Please try partition discovery for external tables. This feature should make it so that you can successfully run the MSCK REPAIR command, and more importantly, query external Parquet tables in a more performant way.

cgrant

Please make sure you are using Dedicated (single-user) clusters when authenticating to the file notifications service when attempting to authenticate to SQS via instance profile authentication. This likely will change in the future, so stay posted.

cgrant

Auto loader's scope is limited to incrementally loading files from storage, and there is no such functionality to just load the latest file from a group of files, you'd likely want to have this kind of "last updated" logic in a different layer or in ...

Databricks Community

User Stats

User Activity

Does running OPTIMIZE on a delta table destroy the transaction history of table?

What is the difference between OPTIMIZE and Auto Optimize?

How do I know how much of a query/job used Photon?

How to ensure that a Databricks Run Submit run invoked from Airflow only runs one time?

Re: how to use dlt module in streaming pipeline

Re: DLT maintainace clusters

Re: Refresh a External table metadata

Re: Autoloader file notification mode error using UC

Re: Last file in S3 folder using autoloader