cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

enri_casca
by New Contributor III
  • 5028 Views
  • 13 replies
  • 2 kudos

Resolved! Couldn't convert string to float when fit model

Hi, I am very new in databricks and I am trying to run quick experiments to understand the best practice for me, my colleagues and the company.I pull the data from snowflakedf = spark.read \  .format("snowflake") \  .options(**options) \  .option('qu...

  • 5028 Views
  • 13 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

can you check this SO topic?

  • 2 kudos
12 More Replies
Michael_Galli
by Contributor II
  • 6296 Views
  • 7 replies
  • 8 kudos

Resolved! Monitoring Azure Databricks in an Azure Log Analytics Workspace

Does anyone have experience with the mspnp/spark-monitoring library ?Is this best practice, or are there better ways to monitor a Databricks Cluster?

  • 6296 Views
  • 7 replies
  • 8 kudos
Latest Reply
User16764241763
Honored Contributor
  • 8 kudos

@Michael Galli​  I don't think you can monitor metrics captured by mspnp/spark-monitoring in datadog, there is a service called Azure Log Analytics workspace where these logs are available for querying.You can also check out below if you are interest...

  • 8 kudos
6 More Replies
User16826992666
by Valued Contributor
  • 1108 Views
  • 3 replies
  • 2 kudos

Resolved! What is the best method for bringing an already trained model into MLflow?

I already have a trained and saved model that was created outside of MLflow. What is the best way to handle it if I want this model to be added to an MLflow experiment?

  • 1108 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Trevor Bishop​ Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Thanks!

  • 2 kudos
2 More Replies
Ryan_Chynoweth
by Honored Contributor III
  • 675 Views
  • 1 replies
  • 0 kudos

Azure_DAAM

Attached to this post we have added an ADLS Gen2 access recommendation to have the ideal security and governance over your data. The best practice involves leveraging Cluster ACLs, cluster configuration, and secret ACLs to handle user access over you...

  • 675 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Ryan Chynoweth​ , Thank you for posting this!

  • 0 kudos
User16869510359
by Esteemed Contributor
  • 1048 Views
  • 1 replies
  • 1 kudos

Resolved! What is the best practice of deleting the complete data from Delta table

I have a use case where I need to delete the data completely and load new data to the existing Delta table. 

  • 1048 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16869510359
Esteemed Contributor
  • 1 kudos

It's recommended to use the overwrite option. Overwrite the table data and run a VACUUM command. To Delete the data from a Managed Delta table, the DROP TABLE command can be used. If it's an external table, then run a DELETE query on the table and th...

  • 1 kudos
User16783853501
by New Contributor II
  • 1173 Views
  • 3 replies
  • 0 kudos

best practice for optimizedWrites and Optimize

What is the best practice for a delta pipeline with very high throughput to avoid small files problem and also reduce the need for external OPTIMIZE frequently?  

  • 1173 Views
  • 3 replies
  • 0 kudos
Latest Reply
User16869510359
Esteemed Contributor
  • 0 kudos

The general practice in use is to enable only optimize writes and disable auto-compaction. This is because the optimize writes will introduce an extra shuffle step which will increase the latency of the write operation. In addition to that, the auto-...

  • 0 kudos
2 More Replies
Anonymous
by Not applicable
  • 1119 Views
  • 1 replies
  • 0 kudos
  • 1119 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

If you expect a column to be commonly used in query predicates and if that column has high cardinality (that is, a large number of distinct values), then use ZORDER BY.You can specify multiple columns for ZORDER BY as a comma-separated list. However,...

  • 0 kudos
User16789201666
by Contributor II
  • 772 Views
  • 1 replies
  • 2 kudos

What is the best practice for generating jobs in an automated fashion?

What is the best practice for generating jobs in an automated fashion?

  • 772 Views
  • 1 replies
  • 2 kudos
Latest Reply
User16789201666
Contributor II
  • 2 kudos

There are several approaches here. You can write an automation script that programmatically accesses Databricks API’s to generate configured jobs. You can also utilize the Databricks Terraform provider. The benefit of the latter approach is that Terr...

  • 2 kudos
Labels