- 3647 Views
- 2 replies
- 0 kudos
When is it recommended to use Trigger.once mode compared to fixed processing intervals with micro batches?
- 3647 Views
- 2 replies
- 0 kudos
Latest Reply
Also note, the configurations like maxFilesPerTrigger, maxBytesPerTrigger are ignored with Trigger.Once. Streaming queries with significantly less throughput can switch to Trigger.Once to avoid the continuous execution of the job checking the availab...
1 More Replies
- 1533 Views
- 2 replies
- 0 kudos
Is it safe to run VACUUM on a Delta Lake table while data is being added to it at the same time? Will it impact the job result/performance?
- 1533 Views
- 2 replies
- 0 kudos
Latest Reply
In the vast majority of cases, yes, it is safe to run VACUUM while data is concurrently being appended or updated to the same table. This is because VACUUM deletes data files no longer referenced by a Delta table's transaction log and does not effect...
1 More Replies
- 1886 Views
- 2 replies
- 0 kudos
If I don't run VACUUM on a Delta Lake table, will that make my read performance slower?
- 1886 Views
- 2 replies
- 0 kudos
Latest Reply
VACUUM has no effect on read/write performance to that table. Never running VACUUM on a table will not make read/write performance to a Delta Lake table any slower.If you run VACUUM very infrequently, your VACUUM runtimes themselves may be pretty hig...
1 More Replies
- 684 Views
- 1 replies
- 1 kudos
Yes you can create a "Single Node" Cluster, https://docs.databricks.com/clusters/single-node.html . It is currently not recommended to use "Single Node" cluster for streaming workloads
- 684 Views
- 1 replies
- 1 kudos
Latest Reply
Single Node clusters should not be used for production workloads involving streaming queries, or complex computations. The intention here is to bring up the Spark cluster for all kinds of workloads
- 1346 Views
- 1 replies
- 0 kudos
I see a significant performance difference when calling spark.sessionState.catalog.list compared to spark.catalog.list. Is that expected?
- 1346 Views
- 1 replies
- 0 kudos
Latest Reply
spark.sessionState.catalog.listTables is a more lazy implementation.. it does not pull the column details when listing the tables. Hence it's faster. Whereas catalog.listTables will pull the column details as well. If the database has many Delta tabl...
- 3004 Views
- 1 replies
- 0 kudos
I wanted to get a list of all the Delta tables in a Database. What is the easiest way of getting it.
- 3004 Views
- 1 replies
- 0 kudos
Latest Reply
Below code, the snippet can be used to list down the tables in a databaseval db = "database_name"
spark.sessionState.catalog.listTables(db).map(table=>spark.sessionState.catalog.externalCatalog.getTable(table.database.get,table.table)).filter(x=>x....