cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Vik1
by New Contributor II
  • 2754 Views
  • 4 replies
  • 2 kudos

Resolved! Cluster setup for ML work for Pandas in Spark, and vanilla Python.

My setup:Worker type: Standard_D32d_v4, 128 GB Memory, 32 Cores, Min Workers: 2, Max Workers: 8Driver type: Standard_D32ds_v4, 128 GB Memory, 32 CoresDatabricks Runtime Version: 10.2 ML (includes Apache Spark 3.2.0, Scala 2.12)I ran a snowflake quer...

  • 2754 Views
  • 4 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hey there @Vivek Ranjan​ Checking in. If Joseph's answer helped, would you let us know and mark the answer as best?  It would be really helpful for the other members to find the solution more quickly.Thanks!

  • 2 kudos
3 More Replies
self-employed
by Contributor
  • 1249 Views
  • 3 replies
  • 3 kudos

Resolved! Is the machine learning part of "Apache Spark™ Tutorial: Getting Started with Apache Spark on Databricks" missing or no longer available?

I am following the Apache Spark™ Tutorial. When I finish the data set part and want to continue the machine learning part. I found the page is empty. The next section after machine learning is fine. So I guess there must be a url mismatching.The url ...

  • 1249 Views
  • 3 replies
  • 3 kudos
Latest Reply
self-employed
Contributor
  • 3 kudos

I clean the cookie and then the link recovers. So it is an issue about cookie.

  • 3 kudos
2 More Replies
Joseph_B
by New Contributor III
  • 847 Views
  • 0 replies
  • 1 kudos

mlflow.org

2021-09 webinar: Automating the ML Lifecycle With Databricks Machine Learning (Post 2 of 2)Thank you to everyone who joined! You can access the on-demand recording here and the code in this Github repo.We're sharing a subset of the questions asked an...

  • 847 Views
  • 0 replies
  • 1 kudos
Joseph_B
by New Contributor III
  • 601 Views
  • 0 replies
  • 1 kudos

docs.databricks.com

2021-09 webinar: Automating the ML Lifecycle With Databricks Machine Learning (post 1 of 2)Thank you to everyone who joined the Automating the ML Lifecycle With Databricks Machine Learning webinar! You can access the on-demand recording here and the ...

  • 601 Views
  • 0 replies
  • 1 kudos
User16752240150
by New Contributor II
  • 2265 Views
  • 1 replies
  • 0 kudos

What's the best way to implement long term data versioning?

I'm a data scientist creating versioned ML models. For compliance reasons, I need to be able to replicate the training data for each model version. I've seen that you can version datasets by using delta, but the default retention period is around 30 ...

  • 2265 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

Delta, as you mentioned has a feature to do time travel and by default, delta tables retain the commit history for 30 days. Operations on history of the table are parallel but will become more expensive as the log size increasesNow, in this case - s...

  • 0 kudos
User16752239203
by New Contributor
  • 675 Views
  • 1 replies
  • 0 kudos

How can I use Non- Spark related libraries like spacy with Databricks and Spark

I have an NLP application that I build on my local machine using spacy and pandas, but now I would like to scale my application to a large production dataset and utilize the benefits of sparks distributed compute. How do I import and utilize a librar...

  • 675 Views
  • 1 replies
  • 0 kudos
Latest Reply
sean_owen
Honored Contributor II
  • 0 kudos

It depends on what you mean, but if you're just trying to (say) tokenize and process data with spacy in parallel, then that's trivial. Write a 'pandas UDF' function that expresses how you want to transform data using spacy, in terms of a pandas DataF...

  • 0 kudos
User16826994223
by Honored Contributor III
  • 387 Views
  • 0 replies
  • 0 kudos

Databricks Certified Professional Data Scientist  Does this exam require Databricks-specific or Spark-specific knowledge?No. Test-takers will be asse...

Databricks Certified Professional Data Scientist Does this exam require Databricks-specific or Spark-specific knowledge?No. Test-takers will be assessed on their understanding of the basics of machine learning and data science, how to complete each ...

  • 387 Views
  • 0 replies
  • 0 kudos
Joseph_B
by New Contributor III
  • 1816 Views
  • 1 replies
  • 1 kudos
  • 1816 Views
  • 1 replies
  • 1 kudos
Latest Reply
Joseph_B
New Contributor III
  • 1 kudos

You can find a lot more info on this at this MLflow product page, including a comparison table at the bottom. I'd summarize that comparison as: Databricks provides three key things in its managed MLflow service.Security: MLflow experiments, models, ...

  • 1 kudos
Joseph_B
by New Contributor III
  • 2057 Views
  • 1 replies
  • 0 kudos
  • 2057 Views
  • 1 replies
  • 0 kudos
Latest Reply
Joseph_B
New Contributor III
  • 0 kudos

You can find the MLflow version in the runtime release notes, along with a list of every other library provided. E.g., for DBR 8.3 ML, you can look at the release notes for AWS, Azure, or GCP.The MLflow client API (i.e., the API provided by installi...

  • 0 kudos
User16788317466
by New Contributor II
  • 1231 Views
  • 2 replies
  • 0 kudos

How do I efficiently read image data for a deep learning model?

How do I efficiently read image data for a deep learning model?

  • 1231 Views
  • 2 replies
  • 0 kudos
Latest Reply
Joseph_B
New Contributor III
  • 0 kudos

Our documentation provides nice examples of preparing image data for training and inference.Training: See docs for AWS, Azure, GCPInference: See reference solution for AWS, Azure, GCP

  • 0 kudos
1 More Replies
Labels