cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

self-employed
by Contributor
  • 2088 Views
  • 1 replies
  • 3 kudos

Resolved! Is the machine learning part of "Apache Sparkâ„¢ Tutorial: Getting Started with Apache Spark on Databricks" missing or no longer available?

I am following the Apache Sparkâ„¢ Tutorial. When I finish the data set part and want to continue the machine learning part. I found the page is empty. The next section after machine learning is fine. So I guess there must be a url mismatching.The url ...

  • 2088 Views
  • 1 replies
  • 3 kudos
Latest Reply
self-employed
Contributor
  • 3 kudos

I clean the cookie and then the link recovers. So it is an issue about cookie.

  • 3 kudos
Joseph_B
by Databricks Employee
  • 1338 Views
  • 0 replies
  • 1 kudos

mlflow.org

2021-09 webinar: Automating the ML Lifecycle With Databricks Machine Learning (Post 2 of 2)Thank you to everyone who joined! You can access the on-demand recording here and the code in this Github repo.We're sharing a subset of the questions asked an...

  • 1338 Views
  • 0 replies
  • 1 kudos
Joseph_B
by Databricks Employee
  • 927 Views
  • 0 replies
  • 1 kudos

docs.databricks.com

2021-09 webinar: Automating the ML Lifecycle With Databricks Machine Learning (post 1 of 2)Thank you to everyone who joined the Automating the ML Lifecycle With Databricks Machine Learning webinar! You can access the on-demand recording here and the ...

  • 927 Views
  • 0 replies
  • 1 kudos
User16752240150
by New Contributor II
  • 3858 Views
  • 1 replies
  • 0 kudos

What's the best way to implement long term data versioning?

I'm a data scientist creating versioned ML models. For compliance reasons, I need to be able to replicate the training data for each model version. I've seen that you can version datasets by using delta, but the default retention period is around 30 ...

  • 3858 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

Delta, as you mentioned has a feature to do time travel and by default, delta tables retain the commit history for 30 days. Operations on history of the table are parallel but will become more expensive as the log size increasesNow, in this case - s...

  • 0 kudos
User16752239203
by Databricks Employee
  • 1251 Views
  • 1 replies
  • 0 kudos

How can I use Non- Spark related libraries like spacy with Databricks and Spark

I have an NLP application that I build on my local machine using spacy and pandas, but now I would like to scale my application to a large production dataset and utilize the benefits of sparks distributed compute. How do I import and utilize a librar...

  • 1251 Views
  • 1 replies
  • 0 kudos
Latest Reply
sean_owen
Databricks Employee
  • 0 kudos

It depends on what you mean, but if you're just trying to (say) tokenize and process data with spacy in parallel, then that's trivial. Write a 'pandas UDF' function that expresses how you want to transform data using spacy, in terms of a pandas DataF...

  • 0 kudos
User16826994223
by Honored Contributor III
  • 741 Views
  • 0 replies
  • 0 kudos

Databricks Certified Professional Data Scientist  Does this exam require Databricks-specific or Spark-specific knowledge?No. Test-takers will be asse...

Databricks Certified Professional Data Scientist Does this exam require Databricks-specific or Spark-specific knowledge?No. Test-takers will be assessed on their understanding of the basics of machine learning and data science, how to complete each ...

  • 741 Views
  • 0 replies
  • 0 kudos
Joseph_B
by Databricks Employee
  • 2653 Views
  • 1 replies
  • 1 kudos
  • 2653 Views
  • 1 replies
  • 1 kudos
Latest Reply
Joseph_B
Databricks Employee
  • 1 kudos

You can find a lot more info on this at this MLflow product page, including a comparison table at the bottom. I'd summarize that comparison as: Databricks provides three key things in its managed MLflow service.Security: MLflow experiments, models, ...

  • 1 kudos
Joseph_B
by Databricks Employee
  • 3272 Views
  • 1 replies
  • 0 kudos
  • 3272 Views
  • 1 replies
  • 0 kudos
Latest Reply
Joseph_B
Databricks Employee
  • 0 kudos

You can find the MLflow version in the runtime release notes, along with a list of every other library provided. E.g., for DBR 8.3 ML, you can look at the release notes for AWS, Azure, or GCP.The MLflow client API (i.e., the API provided by installi...

  • 0 kudos
User16788317466
by Databricks Employee
  • 1875 Views
  • 2 replies
  • 0 kudos

How do I efficiently read image data for a deep learning model?

How do I efficiently read image data for a deep learning model?

  • 1875 Views
  • 2 replies
  • 0 kudos
Latest Reply
Joseph_B
Databricks Employee
  • 0 kudos

Our documentation provides nice examples of preparing image data for training and inference.Training: See docs for AWS, Azure, GCPInference: See reference solution for AWS, Azure, GCP

  • 0 kudos
1 More Replies
Labels