Topics with Label: Data

by jonathan-dufaul • Valued Contributor

11-23-2022 9:00:44 PM

1960 Views
5 replies
5 kudos

Does FeatureStoreClient().score_batch support multidimentional predictions?

I have a pyfunc model that I can use to get predictions. It takes time series data with context information at each date, and produces a string of predictions. For example:The data is set up like below (temp/pressure/output are different than my inpu...

Machine Learning

Reply

1960 Views
5 replies
5 kudos

11-23-2022 9:00:44 PM

View Replies

Latest Reply

EmilAndersson
New Contributor II

09-05-2023 4:49:43 AM

5 kudos

I have the same question. I've decided to look for alternative Feature Stores as this makes it very difficult to use for time series forecasting.

5 kudos

09-05-2023 4:49:43 AM

4 More Replies

by vayunandan_tupa • New Contributor

06-20-2023 8:46:09 AM

1451 Views
3 replies
0 kudos

If we fail in the databricks certified data engineer certification, how many times can we retake the exam without paying the fee?

Machine Learning

Reply

1451 Views
3 replies
0 kudos

06-20-2023 8:46:09 AM

View Replies

Latest Reply

dplante
Contributor II

07-23-2023 9:29:52 PM

0 kudos

You can take all the Databricks exams as many times as you want, but you have to pay a fee each time you take the exam.

0 kudos

07-23-2023 9:29:52 PM

2 More Replies

by sridhar0109 • New Contributor

02-15-2023 2:55:16 AM

520 Views
2 replies
0 kudos

Tracking changes in data distribution by using pyspark

Hi All,I'm working on creating a data quality dashboard. I've created few rules like checking nulls in a column, checking for data type of the column , removing duplicates etc.We follow medallion architecture and are applying these data quality check...

Machine Learning

Reply

520 Views
2 replies
0 kudos

02-15-2023 2:55:16 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-20-2023 9:50:24 PM

0 kudos

Hi @Sridhar Varanasi Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.T...

0 kudos

04-20-2023 9:50:24 PM

1 More Replies

by Santhanalakshmi • New Contributor II

07-13-2022 10:25:17 PM

1526 Views
3 replies
0 kudos

Throwing IndexoutofBound Exception in Pyspark

Hello All,I am trying to read the data and trying to group the data in order to pass it to predict function via @F.pandas_udf method.#Loading Model pkl_model = pickle.load(open(filepath,'rb')) # build schema for output labels filter_schema=[] ...

Machine Learning

Reply

1526 Views
3 replies
0 kudos

07-13-2022 10:25:17 PM

View Replies

Latest Reply

Vindhya
New Contributor II

04-18-2023 1:30:14 PM

0 kudos

@Santhanalakshmi Manoharan Was this issue resolved, Am also getting same error, any guidance would be of great help.Appreciate your help.

0 kudos

04-18-2023 1:30:14 PM

2 More Replies

by Orianh • Valued Contributor II

03-21-2023 9:59:57 AM

724 Views
2 replies
0 kudos

TF SummaryWriter flush() don't send any buffered data to storage.

Hey guys, I'm training a TF model in databricks, and logging to tensorboard using SummaryWriter. At the end of each epoch SummaryWriter.flush() is called which should send any buffered data into storage. But i can't see the tensorboard files while th...

Machine Learning

Reply

724 Views
2 replies
0 kudos

03-21-2023 9:59:57 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-31-2023 6:53:05 PM

0 kudos

Hi @orian hindi Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so w...

0 kudos

03-31-2023 6:53:05 PM

1 More Replies

by Kaan • New Contributor

02-02-2023 7:16:24 AM

982 Views
1 replies
1 kudos

Resolved! Using databricks in multi-cloud, and querying data from the same instance.

I'm looking for a good product to use across two clouds at once for Data Engineering, Data modeling and governance. I currently have a GCP platform, but most of my data and future data goes through Azure, and currently is then transfered to GCS/BQ.Cu...

Machine Learning

Reply

982 Views
1 replies
1 kudos

02-02-2023 7:16:24 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-31-2023 9:02:46 AM

1 kudos

@Karl Andrén :Databricks is a great option for data engineering, data modeling, and governance across multiple clouds. It supports integrations with multiple cloud providers, including Azure, AWS, and GCP, and provides a unified interface to access ...

1 kudos

03-31-2023 9:02:46 AM

by Sujitha • Community Manager

12-28-2022 3:52:03 AM

358 Views
0 replies
4 kudos

Hello Databricks Community! We are getting really excited about the upcoming event of the year Data & AI Summit 2023! The world’s largest data, a...

Hello Databricks Community! We are getting really excited about the upcoming event of the year Data & AI Summit 2023!The world’s largest data, analytics and AI conference returns live, to San Francisco and virtually. Four days (June 26–29, 2023) pack...

Machine Learning

Reply

358 Views
0 replies
4 kudos

12-28-2022 3:52:03 AM

by NhatHoang • Valued Contributor II

12-08-2022 12:52:29 AM

4047 Views
3 replies
15 kudos

Resolved! Do One-Hot-Encoding (OHE) before or after split data to train and test dataframe

Hi,I wonder that I should do OHE before or after I split data to build up a ML model.Please give some advise.

Machine Learning

Reply

4047 Views
3 replies
15 kudos

12-08-2022 12:52:29 AM

View Replies

Latest Reply

LandanG
Honored Contributor

12-08-2022 6:27:47 AM

15 kudos

Hi @Nhat Hoang ,While not Databricks-specific, here's a good answer:"If you perform the encoding before the split, it will lead to data leakage (train-test contamination). In this sense, you will introduce new data (integers of Label Encoders) and u...