- 5028 Views
- 13 replies
- 2 kudos
Hi, I am very new in databricks and I am trying to run quick experiments to understand the best practice for me, my colleagues and the company.I pull the data from snowflakedf = spark.read \ .format("snowflake") \ .options(**options) \ .option('qu...
- 5028 Views
- 13 replies
- 2 kudos
- 6296 Views
- 7 replies
- 8 kudos
Does anyone have experience with the mspnp/spark-monitoring library ?Is this best practice, or are there better ways to monitor a Databricks Cluster?
- 6296 Views
- 7 replies
- 8 kudos
Latest Reply
@Michael Galli​ I don't think you can monitor metrics captured by mspnp/spark-monitoring in datadog, there is a service called Azure Log Analytics workspace where these logs are available for querying.You can also check out below if you are interest...
6 More Replies
- 1108 Views
- 3 replies
- 2 kudos
I already have a trained and saved model that was created outside of MLflow. What is the best way to handle it if I want this model to be added to an MLflow experiment?
- 1108 Views
- 3 replies
- 2 kudos
Latest Reply
Hi @Trevor Bishop​ Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Thanks!
2 More Replies
- 675 Views
- 1 replies
- 0 kudos
Attached to this post we have added an ADLS Gen2 access recommendation to have the ideal security and governance over your data. The best practice involves leveraging Cluster ACLs, cluster configuration, and secret ACLs to handle user access over you...
- 675 Views
- 1 replies
- 0 kudos
Latest Reply
Hi @Ryan Chynoweth​ , Thank you for posting this!
- 1048 Views
- 1 replies
- 1 kudos
I have a use case where I need to delete the data completely and load new data to the existing Delta table.
- 1048 Views
- 1 replies
- 1 kudos
Latest Reply
It's recommended to use the overwrite option. Overwrite the table data and run a VACUUM command. To Delete the data from a Managed Delta table, the DROP TABLE command can be used. If it's an external table, then run a DELETE query on the table and th...
- 1173 Views
- 3 replies
- 0 kudos
What is the best practice for a delta pipeline with very high throughput to avoid small files problem and also reduce the need for external OPTIMIZE frequently?
- 1173 Views
- 3 replies
- 0 kudos
Latest Reply
The general practice in use is to enable only optimize writes and disable auto-compaction. This is because the optimize writes will introduce an extra shuffle step which will increase the latency of the write operation. In addition to that, the auto-...
2 More Replies
- 772 Views
- 1 replies
- 2 kudos
What is the best practice for generating jobs in an automated fashion?
- 772 Views
- 1 replies
- 2 kudos
Latest Reply
There are several approaches here. You can write an automation script that programmatically accesses Databricks API’s to generate configured jobs. You can also utilize the Databricks Terraform provider. The benefit of the latter approach is that Terr...