cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

User16826994223
by Databricks Employee
  • 1836 Views
  • 0 replies
  • 0 kudos

Delta sharing Features- Share live data directly - Easily share existing, live data in your Delta Lake without copying it to another system.Support d...

Delta sharing Features-Share live data directly - Easily share existing, live data in your Delta Lake without copying it to another system.Support diverse clients - Data recipients can directly connect to Delta Shares from Pandas, Apache Sparkâ„¢, Rus...

sharing-hero-v3
  • 1836 Views
  • 0 replies
  • 0 kudos
User16789201666
by Databricks Employee
  • 1981 Views
  • 1 replies
  • 0 kudos

When would you use the Feature Store?

For example would you use a feature store on your raw data or what's is the granularity of the features in the store?

  • 1981 Views
  • 1 replies
  • 0 kudos
Latest Reply
Joseph_B
Databricks Employee
  • 0 kudos

I'll try to answer the broad question first, followed by the specific ones.When would you use the Feature Store?A Feature Store is primarily used to solve 2 challenges.(1) Discoverability and governance of featuresChallenge: In a large team or organi...

  • 0 kudos
MoJaMa
by Databricks Employee
  • 1132 Views
  • 1 replies
  • 0 kudos
  • 1132 Views
  • 1 replies
  • 0 kudos
Latest Reply
MoJaMa
Databricks Employee
  • 0 kudos

Yes.Please see Blog1: https://databricks.com/blog/2020/06/03/customer-lifetime-value-part-1-estimating-customer-lifetimes.htmlNotebook1:https://databricks.com/notebooks/CLV_Part_1_Customer_Lifetimes.htmlBlog2: https://databricks.com/blog/2020/06/17/c...

  • 0 kudos
User16826994223
by Databricks Employee
  • 5025 Views
  • 2 replies
  • 0 kudos

Resolved! Can we delte Mlflow experiment

I am using ML flow and my need of the hour is to delete an experiment and want to create another experiment with same run.client = MlflowClient(tracking_uri=server) client.delete_experiment(1)This deletes the experiment, but when I run a new experim...

  • 5025 Views
  • 2 replies
  • 0 kudos
Latest Reply
User16826994223
Databricks Employee
  • 0 kudos

SQL Database:This is more tricky, as there are dependencies that need to be deleted. I am using MySQL, and these commands work for me:USE mlflow_db; # the name of your database DELETE FROM experiment_tags WHERE experiment_id=ANY( SELECT experime...

  • 0 kudos
1 More Replies
User16752240150
by Databricks Employee
  • 4691 Views
  • 1 replies
  • 0 kudos

What's the best way to implement long term data versioning?

I'm a data scientist creating versioned ML models. For compliance reasons, I need to be able to replicate the training data for each model version. I've seen that you can version datasets by using delta, but the default retention period is around 30 ...

  • 4691 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Databricks Employee
  • 0 kudos

Delta, as you mentioned has a feature to do time travel and by default, delta tables retain the commit history for 30 days. Operations on history of the table are parallel but will become more expensive as the log size increasesNow, in this case - s...

  • 0 kudos
MoJaMa
by Databricks Employee
  • 2125 Views
  • 1 replies
  • 0 kudos
  • 2125 Views
  • 1 replies
  • 0 kudos
Latest Reply
MoJaMa
Databricks Employee
  • 0 kudos

Yes!You will have to pip install mlflowin your environment as a first step. For more details, see: https://docs.databricks.com/applications/mlflow/access-hosted-tracking-server.html

  • 0 kudos
Anonymous
by Not applicable
  • 1865 Views
  • 1 replies
  • 0 kudos

Resolved! How is Databricks AutoML different than other AutoML products out there?

How does it provide a glass box view?

  • 1865 Views
  • 1 replies
  • 0 kudos
Latest Reply
Mooune_DBU
Databricks Employee
  • 0 kudos

Depending on which solution you use, GlassBox means that any interactive work you do via point & click, we automatically generate the code behind the scene and generate notebooks used for each experiment that was ran under the hood, in addition for a...

  • 0 kudos
User16752240150
by Databricks Employee
  • 1637 Views
  • 1 replies
  • 1 kudos

What algorithms does Databricks AutoML use?

AutoML presumably tries a few different algorithms while hyperparameter searching. What model types are considered?

  • 1637 Views
  • 1 replies
  • 1 kudos
Latest Reply
sean_owen
Databricks Employee
  • 1 kudos

At the moment, it's really just xgboost, and sklearn implemenations like random forests, logistic regression, and linear regression as applicable. More possibilities are coming.

  • 1 kudos
User16752239203
by Databricks Employee
  • 1676 Views
  • 1 replies
  • 0 kudos

How can I use Non- Spark related libraries like spacy with Databricks and Spark

I have an NLP application that I build on my local machine using spacy and pandas, but now I would like to scale my application to a large production dataset and utilize the benefits of sparks distributed compute. How do I import and utilize a librar...

  • 1676 Views
  • 1 replies
  • 0 kudos
Latest Reply
sean_owen
Databricks Employee
  • 0 kudos

It depends on what you mean, but if you're just trying to (say) tokenize and process data with spacy in parallel, then that's trivial. Write a 'pandas UDF' function that expresses how you want to transform data using spacy, in terms of a pandas DataF...

  • 0 kudos
Anonymous
by Not applicable
  • 2882 Views
  • 1 replies
  • 0 kudos
  • 2882 Views
  • 1 replies
  • 0 kudos
Latest Reply
sean_owen
Databricks Employee
  • 0 kudos

I don't think we have a lot of internal docs, just high-level explanations like https://databricks.com/blog/2021/05/27/databricks-announces-the-first-feature-store-integrated-with-delta-lake-and-mlflow.htmlHowever I don't think there's much to it. Th...

  • 0 kudos
Anonymous
by Not applicable
  • 2251 Views
  • 1 replies
  • 0 kudos
  • 2251 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Databricks Employee
  • 0 kudos

The feature store has both online / offline components. The offline feature store is used for feature discovery, model training, and batch inference and is backed by  Delta tables. You could read/write to offline store from Databricks clusters that...

  • 0 kudos
Srikanth_Gupta_
by Databricks Employee
  • 2298 Views
  • 1 replies
  • 1 kudos

What are best NLP libraries to use with Spark

Best NLP APIs to use with Spark which gives better performance

  • 2298 Views
  • 1 replies
  • 1 kudos
Latest Reply
sean_owen
Databricks Employee
  • 1 kudos

By far the most popular and comprehensive library, to my knowledge, for Spark-native distributed NLP, is spark-nlp from John Snow Labs. https://nlp.johnsnowlabs.com/ It is open source (but with commercial support options) and has a whole lot of funct...

  • 1 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels