Topics with Label: Data Science

by Supreme_Auto_Ci • New Contributor II

04-07-2022 1:19:50 AM

1732 Views
5 replies
4 kudos

Resolved! describe data science and machine learning?

Machine Learning

Reply

1732 Views
5 replies
4 kudos

04-07-2022 1:19:50 AM

View Replies

Latest Reply

rahulroy
New Contributor II

11-23-2023 6:58:03 AM

4 kudos

Data science is a multidisciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from data. It encompasses the entire data lifecycle, from data acquisition to data exploration, modeling, and...

4 kudos

11-23-2023 6:58:03 AM

4 More Replies

by Kaniz • Community Manager

04-03-2023 12:27:26 PM

809 Views
3 replies
3 kudos

&#xd83d;&#xdcac; Monthly Community Q&A: Ask the Experts! &#xd83c;&#xdf93; We're excited to announce our first monthly Community Q&A session! This is y...

Monthly Community Q&A: Ask the Experts! We're excited to announce our first monthly Community Q&A session! This is your chance to ask questions, seek advice, and gain insights from our team of Data Science and AI experts.Whether you're curious abou...

Machine Learning

Reply

809 Views
3 replies
3 kudos

04-03-2023 12:27:26 PM

View Replies

Latest Reply

Anonymous
Not applicable

05-18-2023 4:27:54 AM

3 kudos

Hi! @Kaniz Fatma Thanks for the answer and nice explanation. As per my expertise, even embedded systems design with IoT work in a wide range of areas. It just only requires an AI gateway system.

3 kudos

05-18-2023 4:27:54 AM

2 More Replies

by Saeid_H • Contributor

02-01-2023 5:16:12 AM

3595 Views
7 replies
8 kudos

What are the practical advantage of Feature Store compared to Delta Lake?

Could someone explain the practical advantages of using a feature store vs. Delta Lake. apparently they both work in the same manner and the feature store does not provide additional value. However, based on the documentation on the databricks page, ...

Machine Learning

Reply

3595 Views
7 replies
8 kudos

02-01-2023 5:16:12 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-08-2023 8:13:23 PM

8 kudos

Hi @Saeid Hedayati Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answer...

8 kudos

04-08-2023 8:13:23 PM

6 More Replies

by rendorHaevyn • New Contributor III

04-03-2023 8:04:58 PM

1172 Views
4 replies
0 kudos

Resolved! History of code executed on Data Science & Engineering service clusters

I want to be able to view a listing of any or all of the following:When Notebooks were attached / detached to and from a DS&E clusterWhen Notebook code was executed on a DS&E clusterWhat Notebook specific cell code was executed on a DS&E clusterIs th...

Machine Learning

Reply

1172 Views
4 replies
0 kudos

04-03-2023 8:04:58 PM

View Replies

Latest Reply

Atanu
Esteemed Contributor

04-04-2023 11:02:07 AM

0 kudos

From the UI https://docs.databricks.com/notebooks/notebooks-code.html#version-control best way to check is version control.BTW, do you see this helps https://www.databricks.com/blog/2022/11/02/monitoring-notebook-command-logs-static-analysis-tools.ht...

0 kudos

04-04-2023 11:02:07 AM

3 More Replies

by anvil • New Contributor II

01-24-2023 1:24:02 PM

494 Views
1 replies
0 kudos

How far does model size and lag impact distributed inference ?

Hello !I was wondering how impactful a model's size of inference lag was in a distributed manner.With tools like Pandas Iterator UDFs or mlflow.pyfunc.spark_udf() we can make it so models are loaded only once per worker, so I would tend to say that m...

Machine Learning

Reply

494 Views
1 replies
0 kudos

01-24-2023 1:24:02 PM

View Replies

Latest Reply

youssefmrini
Honored Contributor III

02-28-2023 5:16:48 AM

0 kudos

Your assumption that minimizing inference lag is more important than minimizing the size of the model in a distributed setting is generally correct.In a distributed environment, models are typically loaded once per worker, as you mentioned, which mea...

0 kudos

02-28-2023 5:16:48 AM

by anvil • New Contributor II

01-24-2023 1:14:46 PM

1522 Views
3 replies
4 kudos

Are UDFs necessary for applying models from ML libraries at scale ?

Hello,I recently finished the "scalable machine learning with apache spark" course and saw that SKLearn models could be applied faster in a distributed manner when used in pandas UDFs or with mapInPandas() method. Spark MLlib models don't need this k...

Machine Learning

Reply

1522 Views
3 replies
4 kudos

01-24-2023 1:14:46 PM

View Replies

Latest Reply

Manoj12421
Valued Contributor II

02-08-2023 11:17:49 AM

4 kudos

MlLib is in the maintenance model and udf is not used by creating model in most cases

4 kudos

02-08-2023 11:17:49 AM

2 More Replies

by jonathan-dufaul • Valued Contributor

01-25-2023 9:24:54 AM

828 Views
1 replies
0 kudos

how does the data science workflow change in databricks if you start with a nosql database (specifically document store) instead of something more traditional/rdbms type source?

I'm sorry if this is a bad question. The tl;dr is are there any concrete examples of a nosql data science workflows specifically in databricks and if so what are they?is it always the case that our end goal is a dataframe?For us we start as a bunch o...

Machine Learning

Reply

828 Views
1 replies
0 kudos

01-25-2023 9:24:54 AM

View Replies

Latest Reply

Nhan_Nguyen
Valued Contributor

01-31-2023 5:18:18 AM

0 kudos

Nice sharing, thanks!

0 kudos

01-31-2023 5:18:18 AM

by jdigiovanni • New Contributor

07-21-2022 9:28:35 AM

797 Views
3 replies
0 kudos

EOFError trying to assign a model using a custom module

I'm in a Data Science Bootcamp, and the final case study includes data preprocessing (done), using a linear regression model on the data, then porting to SQL for visualization. The model build uses custom python code provided as part of the exercise....

Machine Learning

Reply

797 Views
3 replies
0 kudos

07-21-2022 9:28:35 AM

View Replies

Latest Reply

Vidula
Honored Contributor

09-05-2022 5:54:30 AM

0 kudos

Hi @Joe DiGiovanni Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Thanks!

0 kudos

09-05-2022 5:54:30 AM

2 More Replies

by Dhara • New Contributor III

06-21-2022 7:50:11 AM

11431 Views
10 replies
5 kudos

Access multiple .mdb files using Python

Hi, I wanted to access multiple .mdb access files which are stored in the Azure Data Lake Storage(ADLS) or on Databricks File System using Python. Is it possible to guide me how can I achieve it? It would be great if you can share some code snippets ...

Machine Learning

Reply

11431 Views
10 replies
5 kudos

06-21-2022 7:50:11 AM

View Replies

Latest Reply

User16764241763
Honored Contributor

07-18-2022 7:40:26 AM

5 kudos

@Dhara Mandal Can you please try below?# cmd 1 %pip instal pandas_access # cmd 2 import pandas_access as mdb db_filename = '/dbfs/FileStore/Campaign_Template.mdb' # Listing the tables. for tbl in mdb.list_tables(db_filename): print(tbl) ...

5 kudos

07-18-2022 7:40:26 AM

9 More Replies

by User16752240150 • New Contributor II

06-04-2021 11:47:11 AM

2031 Views
1 replies
0 kudos

What's the best way to implement long term data versioning?

I'm a data scientist creating versioned ML models. For compliance reasons, I need to be able to replicate the training data for each model version. I've seen that you can version datasets by using delta, but the default retention period is around 30 ...

Machine Learning

Reply

2031 Views
1 replies
0 kudos

06-04-2021 11:47:11 AM

View Replies

Latest Reply

sajith_appukutt
Honored Contributor II

06-17-2021 10:36:52 PM

0 kudos

Delta, as you mentioned has a feature to do time travel and by default, delta tables retain the commit history for 30 days. Operations on history of the table are parallel but will become more expensive as the log size increasesNow, in this case - s...

0 kudos

06-17-2021 10:36:52 PM

by User16752239203 • New Contributor

06-11-2021 11:55:55 AM

586 Views
1 replies
0 kudos

How can I use Non- Spark related libraries like spacy with Databricks and Spark

I have an NLP application that I build on my local machine using spacy and pandas, but now I would like to scale my application to a large production dataset and utilize the benefits of sparks distributed compute. How do I import and utilize a librar...

Machine Learning

Reply

586 Views
1 replies
0 kudos

06-11-2021 11:55:55 AM

View Replies

Latest Reply

sean_owen
Honored Contributor II

06-17-2021 4:23:53 PM

0 kudos

It depends on what you mean, but if you're just trying to (say) tokenize and process data with spacy in parallel, then that's trivial. Write a 'pandas UDF' function that expresses how you want to transform data using spacy, in terms of a pandas DataF...

0 kudos

06-17-2021 4:23:53 PM

by User16826994223 • Honored Contributor III

06-17-2021 1:48:38 AM

330 Views
0 replies
0 kudos

Databricks Certified Professional Data Scientist Does this exam require Databricks-specific or Spark-specific knowledge?No. Test-takers will be asse...

Databricks Certified Professional Data Scientist Does this exam require Databricks-specific or Spark-specific knowledge?No. Test-takers will be assessed on their understanding of the basics of machine learning and data science, how to complete each ...

Machine Learning

Reply

330 Views
0 replies
0 kudos

06-17-2021 1:48:38 AM