Machine Learning

by Supreme_Auto_Ci • New Contributor II

04-07-2022 1:19:50 AM

3180 Views
4 replies
4 kudos

Resolved! describe data science and machine learning?

Machine Learning

Reply

3180 Views
4 replies
4 kudos

04-07-2022 1:19:50 AM

View Replies

Latest Reply

rahulroy
New Contributor II

11-23-2023 6:58:03 AM

4 kudos

Data science is a multidisciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from data. It encompasses the entire data lifecycle, from data acquisition to data exploration, modeling, and...

4 kudos

11-23-2023 6:58:03 AM

3 More Replies

by Saeid_H • Contributor

02-01-2023 5:16:12 AM

13561 Views
7 replies
8 kudos

What are the practical advantage of Feature Store compared to Delta Lake?

Could someone explain the practical advantages of using a feature store vs. Delta Lake. apparently they both work in the same manner and the feature store does not provide additional value. However, based on the documentation on the databricks page, ...

Machine Learning

Reply

13561 Views
7 replies
8 kudos

02-01-2023 5:16:12 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-08-2023 8:13:23 PM

8 kudos

Hi @Saeid Hedayati Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answer...

8 kudos

04-08-2023 8:13:23 PM

6 More Replies

by rendorHaevyn • New Contributor III

04-03-2023 8:04:58 PM

2895 Views
4 replies
0 kudos

Resolved! History of code executed on Data Science & Engineering service clusters

I want to be able to view a listing of any or all of the following:When Notebooks were attached / detached to and from a DS&E clusterWhen Notebook code was executed on a DS&E clusterWhat Notebook specific cell code was executed on a DS&E clusterIs th...

Machine Learning

Reply

2895 Views
4 replies
0 kudos

04-03-2023 8:04:58 PM

View Replies

Latest Reply

Atanu
Databricks Employee

04-04-2023 11:02:07 AM

0 kudos

From the UI https://docs.databricks.com/notebooks/notebooks-code.html#version-control best way to check is version control.BTW, do you see this helps https://www.databricks.com/blog/2022/11/02/monitoring-notebook-command-logs-static-analysis-tools.ht...

0 kudos

04-04-2023 11:02:07 AM

3 More Replies

by anvil • New Contributor II

01-24-2023 1:24:02 PM

1049 Views
1 replies
0 kudos

How far does model size and lag impact distributed inference ?

Hello !I was wondering how impactful a model's size of inference lag was in a distributed manner.With tools like Pandas Iterator UDFs or mlflow.pyfunc.spark_udf() we can make it so models are loaded only once per worker, so I would tend to say that m...

Machine Learning

Reply

1049 Views
1 replies
0 kudos

01-24-2023 1:24:02 PM

View Replies

Latest Reply

youssefmrini
Databricks Employee

02-28-2023 5:16:48 AM

0 kudos

Your assumption that minimizing inference lag is more important than minimizing the size of the model in a distributed setting is generally correct.In a distributed environment, models are typically loaded once per worker, as you mentioned, which mea...

0 kudos

02-28-2023 5:16:48 AM

by anvil • New Contributor II

01-24-2023 1:14:46 PM

3185 Views
3 replies
4 kudos

Are UDFs necessary for applying models from ML libraries at scale ?

Hello,I recently finished the "scalable machine learning with apache spark" course and saw that SKLearn models could be applied faster in a distributed manner when used in pandas UDFs or with mapInPandas() method. Spark MLlib models don't need this k...

Machine Learning

Reply

3185 Views
3 replies
4 kudos

01-24-2023 1:14:46 PM

View Replies

Latest Reply

Manoj12421
Valued Contributor II

02-08-2023 11:17:49 AM

4 kudos

MlLib is in the maintenance model and udf is not used by creating model in most cases

4 kudos

02-08-2023 11:17:49 AM

2 More Replies

by jonathan-dufaul • Valued Contributor

01-25-2023 9:24:54 AM

1994 Views
1 replies
0 kudos

how does the data science workflow change in databricks if you start with a nosql database (specifically document store) instead of something more traditional/rdbms type source?

I'm sorry if this is a bad question. The tl;dr is are there any concrete examples of a nosql data science workflows specifically in databricks and if so what are they?is it always the case that our end goal is a dataframe?For us we start as a bunch o...

Machine Learning

Reply

1994 Views
1 replies
0 kudos

01-25-2023 9:24:54 AM

View Replies

Latest Reply

Nhan_Nguyen
Valued Contributor

01-31-2023 5:18:18 AM

0 kudos

Nice sharing, thanks!

0 kudos

01-31-2023 5:18:18 AM

by jdigiovanni • New Contributor

07-21-2022 9:28:35 AM

1646 Views
3 replies
0 kudos

EOFError trying to assign a model using a custom module

I'm in a Data Science Bootcamp, and the final case study includes data preprocessing (done), using a linear regression model on the data, then porting to SQL for visualization. The model build uses custom python code provided as part of the exercise....

Machine Learning

Reply

1646 Views
3 replies
0 kudos

07-21-2022 9:28:35 AM

View Replies

Latest Reply

Vidula
Honored Contributor

09-05-2022 5:54:30 AM

0 kudos

Hi @Joe DiGiovanni Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Thanks!

0 kudos

09-05-2022 5:54:30 AM

2 More Replies

by Dhara • New Contributor III

06-21-2022 7:50:11 AM

21172 Views
9 replies
5 kudos

Access multiple .mdb files using Python

Hi, I wanted to access multiple .mdb access files which are stored in the Azure Data Lake Storage(ADLS) or on Databricks File System using Python. Is it possible to guide me how can I achieve it? It would be great if you can share some code snippets ...

Machine Learning

Reply

21172 Views
9 replies
5 kudos

06-21-2022 7:50:11 AM

View Replies

Latest Reply

User16764241763
Honored Contributor

07-18-2022 7:40:26 AM

5 kudos

@Dhara Mandal Can you please try below?# cmd 1 %pip instal pandas_access # cmd 2 import pandas_access as mdb db_filename = '/dbfs/FileStore/Campaign_Template.mdb' # Listing the tables. for tbl in mdb.list_tables(db_filename): print(tbl) ...

5 kudos

07-18-2022 7:40:26 AM

8 More Replies

by User16752240150 • New Contributor II

06-04-2021 11:47:11 AM

3836 Views
1 replies
0 kudos

What's the best way to implement long term data versioning?

I'm a data scientist creating versioned ML models. For compliance reasons, I need to be able to replicate the training data for each model version. I've seen that you can version datasets by using delta, but the default retention period is around 30 ...

Machine Learning

Reply

3836 Views
1 replies
0 kudos

06-04-2021 11:47:11 AM

View Replies

Latest Reply

sajith_appukutt
Honored Contributor II

06-17-2021 10:36:52 PM

0 kudos

Delta, as you mentioned has a feature to do time travel and by default, delta tables retain the commit history for 30 days. Operations on history of the table are parallel but will become more expensive as the log size increasesNow, in this case - s...

0 kudos

06-17-2021 10:36:52 PM

by User16752239203 • Databricks Employee

06-11-2021 11:55:55 AM

1250 Views
1 replies
0 kudos

How can I use Non- Spark related libraries like spacy with Databricks and Spark

I have an NLP application that I build on my local machine using spacy and pandas, but now I would like to scale my application to a large production dataset and utilize the benefits of sparks distributed compute. How do I import and utilize a librar...

Machine Learning

Reply

1250 Views
1 replies
0 kudos

06-11-2021 11:55:55 AM

View Replies

Latest Reply

sean_owen
Databricks Employee

06-17-2021 4:23:53 PM

0 kudos

It depends on what you mean, but if you're just trying to (say) tokenize and process data with spacy in parallel, then that's trivial. Write a 'pandas UDF' function that expresses how you want to transform data using spacy, in terms of a pandas DataF...

0 kudos

06-17-2021 4:23:53 PM

by User16826994223 • Honored Contributor III

06-17-2021 1:48:38 AM

739 Views
0 replies
0 kudos

Databricks Certified Professional Data Scientist Does this exam require Databricks-specific or Spark-specific knowledge?No. Test-takers will be asse...

Databricks Certified Professional Data Scientist Does this exam require Databricks-specific or Spark-specific knowledge?No. Test-takers will be assessed on their understanding of the basics of machine learning and data science, how to complete each ...

Machine Learning

Reply

739 Views
0 replies
0 kudos

06-17-2021 1:48:38 AM

Databricks Community

Forum Posts

Resolved! describe data science and machine learning?

What are the practical advantage of Feature Store compared to Delta Lake?

Resolved! History of code executed on Data Science & Engineering service clusters

How far does model size and lag impact distributed inference ?

Are UDFs necessary for applying models from ML libraries at scale ?

how does the data science workflow change in databricks if you start with a nosql database (specifically document store) instead of something more traditional/rdbms type source?

EOFError trying to assign a model using a custom module

Access multiple .mdb files using Python

What's the best way to implement long term data versioning?

How can I use Non- Spark related libraries like spacy with Databricks and Spark

Databricks Certified Professional Data Scientist Does this exam require Databricks-specific or Spark-specific knowledge?No. Test-takers will be asse...