Topics with Label: Best practice

Forum Posts

Sorted by:

by enri_casca • New Contributor III

03-01-2022 3:50:05 AM

5028 Views
13 replies
2 kudos

Resolved! Couldn't convert string to float when fit model

Hi, I am very new in databricks and I am trying to run quick experiments to understand the best practice for me, my colleagues and the company.I pull the data from snowflakedf = spark.read \ .format("snowflake") \ .options(**options) \ .option('qu...

Data Engineering

5028 Views
13 replies
2 kudos

03-01-2022 3:50:05 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

03-01-2022 3:57:36 AM

2 kudos

can you check this SO topic?

2 kudos

03-01-2022 3:57:36 AM

12 More Replies

by Michael_Galli • Contributor II

04-22-2022 2:46:57 AM

6296 Views
7 replies
8 kudos

Resolved! Monitoring Azure Databricks in an Azure Log Analytics Workspace

Does anyone have experience with the mspnp/spark-monitoring library ?Is this best practice, or are there better ways to monitor a Databricks Cluster?

Data Engineering

6296 Views
7 replies
8 kudos

04-22-2022 2:46:57 AM

View Replies

Latest Reply

User16764241763
Honored Contributor

04-25-2022 8:38:19 AM

8 kudos

@Michael Galli I don't think you can monitor metrics captured by mspnp/spark-monitoring in datadog, there is a service called Azure Log Analytics workspace where these logs are available for querying.You can also check out below if you are interest...

8 kudos

04-25-2022 8:38:19 AM

6 More Replies

by User16826992666 • Valued Contributor

06-25-2021 10:38:31 AM

1108 Views
3 replies
2 kudos

Resolved! What is the best method for bringing an already trained model into MLflow?

I already have a trained and saved model that was created outside of MLflow. What is the best way to handle it if I want this model to be added to an MLflow experiment?

Data Engineering

1108 Views
3 replies
2 kudos

06-25-2021 10:38:31 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-22-2022 7:11:52 AM

2 kudos

Hi @Trevor Bishop Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Thanks!

2 kudos

04-22-2022 7:11:52 AM

2 More Replies

by Ryan_Chynoweth • Honored Contributor III

12-21-2021 2:39:05 PM

675 Views
1 replies
0 kudos

Azure_DAAM

Attached to this post we have added an ADLS Gen2 access recommendation to have the ideal security and governance over your data. The best practice involves leveraging Cluster ACLs, cluster configuration, and secret ACLs to handle user access over you...

Data Engineering

675 Views
1 replies
0 kudos

12-21-2021 2:39:05 PM

View Replies

Latest Reply

Kaniz
Community Manager

12-22-2021 5:58:29 AM

0 kudos

Hi @Ryan Chynoweth , Thank you for posting this!

0 kudos

12-22-2021 5:58:29 AM

by User16869510359 • Esteemed Contributor

06-25-2021 10:51:58 AM

1048 Views
1 replies
1 kudos

Resolved! What is the best practice of deleting the complete data from Delta table

I have a use case where I need to delete the data completely and load new data to the existing Delta table.

Data Engineering

1048 Views
1 replies
1 kudos

06-25-2021 10:51:58 AM

View Replies

Latest Reply

User16869510359
Esteemed Contributor

06-25-2021 10:53:28 AM

1 kudos

It's recommended to use the overwrite option. Overwrite the table data and run a VACUUM command. To Delete the data from a Managed Delta table, the DROP TABLE command can be used. If it's an external table, then run a DELETE query on the table and th...

1 kudos

06-25-2021 10:53:28 AM

by User16783853501 • New Contributor II

06-23-2021 2:16:18 PM

1173 Views
3 replies
0 kudos

best practice for optimizedWrites and Optimize

What is the best practice for a delta pipeline with very high throughput to avoid small files problem and also reduce the need for external OPTIMIZE frequently?

Data Engineering

1173 Views
3 replies
0 kudos

06-23-2021 2:16:18 PM

View Replies

Latest Reply

User16869510359
Esteemed Contributor

06-23-2021 10:21:44 PM

0 kudos

The general practice in use is to enable only optimize writes and disable auto-compaction. This is because the optimize writes will introduce an extra shuffle step which will increase the latency of the write operation. In addition to that, the auto-...

0 kudos

06-23-2021 10:21:44 PM

2 More Replies

by Anonymous • Not applicable

06-18-2021 2:18:04 PM

1119 Views
1 replies
0 kudos

Resolved! What fields should I Zorder by? Does the order of Zorder matter?

Data Engineering

1119 Views
1 replies
0 kudos

06-18-2021 2:18:04 PM

View Replies

Latest Reply

User16826994223
Honored Contributor III

06-21-2021 5:51:45 AM

0 kudos

If you expect a column to be commonly used in query predicates and if that column has high cardinality (that is, a large number of distinct values), then use ZORDER BY.You can specify multiple columns for ZORDER BY as a comma-separated list. However,...

0 kudos

06-21-2021 5:51:45 AM

by User16789201666 • Contributor II

06-07-2021 4:04:40 PM

772 Views
1 replies
2 kudos

What is the best practice for generating jobs in an automated fashion?

Data Engineering

772 Views
1 replies
2 kudos

06-07-2021 4:04:40 PM

View Replies

Latest Reply

User16789201666
Contributor II

06-07-2021 4:04:54 PM

2 kudos

There are several approaches here. You can write an automation script that programmatically accesses Databricks API’s to generate configured jobs. You can also utilize the Databricks Terraform provider. The benefit of the latter approach is that Terr...

2 kudos

06-07-2021 4:04:54 PM