cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

espenol
by New Contributor III
  • 2430 Views
  • 1 replies
  • 0 kudos

Why is mounting storage no longer considered best practice?

As the title describes. I think it's really nice to work with mounted storage, but I've typically had an IaC team take care of setting it up. Now I'm not that lucky. Why is it no longer best practice? Security reasons?

  • 2430 Views
  • 1 replies
  • 0 kudos
Latest Reply
xiangzhu
Contributor III
  • 0 kudos

I think so, mount is like a local storage, other users in the same workspace will have the access to any mounted storage too.Access Azure Data Lake Storage Gen2 and Blob Storage | Databricks on AWS

  • 0 kudos
jsu999
by New Contributor II
  • 7322 Views
  • 4 replies
  • 0 kudos

How to fix "WARNING mlflow.utils.environment" when run mlflow in Databricks?

I'm running the following python code from one of the databricks training materials. import mlflow import mlflow.spark from pyspark.ml.regression import LinearRegression from pyspark.ml.feature import VectorAssembler from pyspark.ml import Pipeline f...

  • 7322 Views
  • 4 replies
  • 0 kudos
Latest Reply
Fed
New Contributor III
  • 0 kudos

I've encountered the same warning when running this notebook from DA.https://github.com/databricks-academy/scalable-machine-learning-with-apache-spark-english/blob/published/ML%2002%20-%20Linear%20Regression%20I.pyI've managed to get rid of that war...

  • 0 kudos
3 More Replies
Rajib_Kumar_De
by New Contributor II
  • 3593 Views
  • 3 replies
  • 3 kudos

Databricks AutoML (Forecasting) Python SDK for Model Serving

I am using Databricks AutoML ( Python SDK) to forecast bed occupancy. (Actually, Databricks used MLflow experiments for AutoML run). After training with different iterations, I registered the best model in the Databricks Model registry. Now I am tryi...

  • 3593 Views
  • 3 replies
  • 3 kudos
Latest Reply
Debayan
Databricks Employee
  • 3 kudos

Hi, It can be a bug if the python version is 3.9.5 and still the error is on compatibility. Could you please raise a support case to look into it further?

  • 3 kudos
2 More Replies
llvu
by New Contributor III
  • 4676 Views
  • 3 replies
  • 2 kudos

How to solve cluster break down due to GC when training a pyspark.ml Random Forest

I am trying to train and optimize a random forest. At first the cluster handles the garbage collection fine, but after a couple of hours the cluster breaks down as Garbage Collection has gone up significantly.The train_df has a size of 6,365,018 reco...

  • 4676 Views
  • 3 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

The cache is expensive and wants to save that data to memory and disk (id there is no more space left in memory). I know that, in theory, it should improve, but it can make things worse. I would just putscaled_train_data = pipeline_data.transform(tra...

  • 2 kudos
2 More Replies
820409
by New Contributor II
  • 1923 Views
  • 2 replies
  • 3 kudos

Resolved! Can i change the Managed Mlflow to work with a postgresql server?

We are using the managed mlflow, but we want to access the metadata of the models and show it in another application. There is already a server that I can query?Can I re-create/configure the databricks workspace to make the managed mlflow use a post...

  • 1923 Views
  • 2 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

Ideas which I have is:periodically export/import mlflow models and experiments https://github.com/mlflow/mlflow-export-import#why-use-mlflow-export-importget metadata through API https://docs.databricks.com/dev-tools/api/latest/mlflow.html#operation/...

  • 3 kudos
1 More Replies
pvm26042000
by New Contributor III
  • 1406 Views
  • 1 replies
  • 2 kudos

MLflow

How determine exact date & time that an MLflow run was executed? Thank you so much!@

  • 1406 Views
  • 1 replies
  • 2 kudos
Latest Reply
Debayan
Databricks Employee
  • 2 kudos

Hi, you can view the job run. please refer: https://docs.databricks.com/mlflow/projects.html#step-3-view-the-databricks-job-run

  • 2 kudos
VM
by Contributor
  • 14929 Views
  • 5 replies
  • 6 kudos

Resolved! Security exception while using Feature Store. How can I get this whitelisted?

I was following the Databricks Academy "New Capabilities Overview: Feature Store" module. However when I try to run the code in the example notebook I get a security exception as explained below. When I try to run the example notebook "01-Populate a ...

  • 14929 Views
  • 5 replies
  • 6 kudos
Latest Reply
OMG
New Contributor II
  • 6 kudos

Hi @Daniel Barrundia​ - please select "No isolation shared" Access mode, it should resolve this problem.

  • 6 kudos
4 More Replies
Sujitha
by Databricks Employee
  • 1225 Views
  • 0 replies
  • 4 kudos

Hello Databricks Community!  We are getting really excited about the upcoming event of the year Data & AI Summit 2023! The world’s largest data, a...

Hello Databricks Community! We are getting really excited about the upcoming event of the year Data & AI Summit 2023!The world’s largest data, analytics and AI conference returns live, to San Francisco and virtually. Four days (June 26–29, 2023) pack...

  • 1225 Views
  • 0 replies
  • 4 kudos
Pritam
by New Contributor II
  • 1292 Views
  • 0 replies
  • 1 kudos

Not able to create jobs via jobs API in databricks

I am not able to create jobs via jobs API in databricks.Error=INVALID_PARAMETER_VALUE: Job settings must be specified.I simply copied the JSON file and saved it. Loaded the same JSON file and tried to create the job via API but the got the above erro...

  • 1292 Views
  • 0 replies
  • 1 kudos
User16752245312
by Databricks Employee
  • 5145 Views
  • 5 replies
  • 2 kudos

When running structured streaming jobs in production, what are the general best practices to reduce cost?

Consider a basic structured streaming use case of aggregating the data, perform some basic data cleaning transformation, and merge into a historical aggregate dataset.

  • 5145 Views
  • 5 replies
  • 2 kudos
Latest Reply
lawrence009
Contributor
  • 2 kudos

I second the recommendations: auto load with trigger, batch processing instead of continuous streaming where use case permits. In addition, test with a small batch firstfavor fewer larger workers over more smaller workersadjust your job cluster over...

  • 2 kudos
4 More Replies
lawrence009
by Contributor
  • 2089 Views
  • 2 replies
  • 3 kudos

Streaming Source for Feature Store (and outputMode)

To save computing resource and time, can I use streaming source in a batch mode (similar to Auto Loader) to update my feature store as my source table receives row update or is appended with new rows?

  • 2089 Views
  • 2 replies
  • 3 kudos
Latest Reply
Meghala
Valued Contributor II
  • 3 kudos

yes you can schedule the job to process the data with auto loader

  • 3 kudos
1 More Replies
Aviral-Bhardwaj
by Esteemed Contributor III
  • 11667 Views
  • 2 replies
  • 36 kudos

Delta lake Vs Data lake in Databricks Delta Lake is an open-source storage layer that sits on top of existing data lake storage, such as Azure Data La...

Delta lake Vs Data lake in DatabricksDelta Lake is an open-source storage layer that sits on top of existing data lake storage, such as Azure Data Lake Store or Amazon S3. It provides a more robust and scalable alternative to traditional data lake st...

  • 11667 Views
  • 2 replies
  • 36 kudos
Latest Reply
Meghala
Valued Contributor II
  • 36 kudos

this data is very much informative and i understood much in it so thank you @Aviral Bhardwaj​ sir

  • 36 kudos
1 More Replies
ptawil
by New Contributor III
  • 2544 Views
  • 2 replies
  • 2 kudos

Model Serving Status Failed

I'm trying to enable serving for my model but I keep getting Pending into Failed Status.Here are the model event logs.2022-11-15 15:43:13ENDPOINT_UPDATEDFailed to create model 3 times2022-11-15 15:43:03ENDPOINT_UPDATEDFailed to create cluster 3 times...

  • 2544 Views
  • 2 replies
  • 2 kudos
Latest Reply
171499
New Contributor III
  • 2 kudos

Any update on this? I'm running into the same issue

  • 2 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels