Machine Learning

by eyalwir • New Contributor

07-22-2021 7:08:40 PM

459 Views
0 replies
0 kudos

Deep Learning on Spark within AWS EMR

I'd like to use Deep Learning on Spark within AWS EMR.Is there a best practice or 'recommended' DL framework to run on Spark? It looks like Databricks' spark-deep-learning has been replaced by Horovod—should this the first option to consider? If th...

Machine Learning

Reply

459 Views
0 replies
0 kudos

07-22-2021 7:08:40 PM

by User16790091296 • Contributor II

06-25-2021 3:32:44 PM

646 Views
1 replies
0 kudos

We use open source MLflow, and want to migrate to managed MLflow on databricks - Is there a documentation around this process? If not, what is the best guidance for us?

Machine Learning

Reply

646 Views
1 replies
0 kudos

06-25-2021 3:32:44 PM

View Replies

Latest Reply

amr
Contributor

06-28-2021 11:14:57 AM

0 kudos

I am not aware of any special requirement for this migration, my suggestion to you is to try it on a small scale (one notebook) and observe the results showing in the tracker server, if everything looks OK, then migrate the rest.

0 kudos

06-28-2021 11:14:57 AM

by User16826994223 • Honored Contributor III

06-28-2021 6:37:59 AM

1897 Views
1 replies
1 kudos

Resolved! Can I get detailed metrics of the RocksDB that I have used in one of my streams ?

Machine Learning

Reply

1897 Views
1 replies
1 kudos

06-28-2021 6:37:59 AM

View Replies

Latest Reply

User16826994223
Honored Contributor III

06-28-2021 6:38:51 AM

1 kudos

If you have configured your Structured Streaming query to use RocksDB as the state store, you can now get better visibility into the performance of RocksDB, with detailed metrics on get/put latencies, compaction latencies, cache hits, and so on. Thes...

1 kudos

06-28-2021 6:38:51 AM

by User16826994223 • Honored Contributor III

06-28-2021 6:23:09 AM

294 Views
0 replies
1 kudos

docs.databricks.com

Advantage of using Photon EngineThe following summarizes the advantages of Photon:Supports SQL and equivalent DataFrame operations against Delta and Parquet tables.Expected to accelerate queries that process a significant amount of data (100GB+) and ...

Machine Learning

Reply

294 Views
0 replies
1 kudos

06-28-2021 6:23:09 AM

by User16826994223 • Honored Contributor III

06-28-2021 6:09:12 AM

1035 Views
1 replies
2 kudos

Resolved! How to check if the if my workspace has the IP access list feature enabled

Machine Learning

Reply

1035 Views
1 replies
2 kudos

06-28-2021 6:09:12 AM

View Replies

Latest Reply

User16826994223
Honored Contributor III

06-28-2021 6:11:54 AM

2 kudos

check if your workspace has the IP access list feature enabled, call the get feature status API (GET /workspace-conf). Pass keys=enableIpAccessLists as arguments to the request.In the response, the enableIpAccessListsthe field specifies either true o...

2 kudos

06-28-2021 6:11:54 AM

by User16826992666 • Valued Contributor

06-25-2021 10:13:24 AM

1656 Views
1 replies
0 kudos

Can multiple users collaborate together on MLflow experiments?

Wondering about best practices for how to handle collaboration between multiple ML practitioners working on a single experiment. Do we have to share the same notebook between people or is it possible to have individual notebooks going but still work ...

Machine Learning

Reply

1656 Views
1 replies
0 kudos

06-25-2021 10:13:24 AM

View Replies

Latest Reply

sajith_appukutt
Honored Contributor II

06-25-2021 3:50:57 PM

0 kudos

Yes, multiple users could work on individual notebooks and still use the same experiment via mlflow.set_experiment(). You could also assign different permission levels to experiments from a governance point of view

0 kudos

06-25-2021 3:50:57 PM

by User16826992666 • Valued Contributor

06-25-2021 10:29:01 AM

1410 Views
1 replies
0 kudos

Resolved! Can I save MLflow artifacts to locations other than the dbfs?

The default location or MLflow artifacts is on dbfs, but I would like to save my models to an alternative location. Is this supported, and if it is how can I accomplish it?

Machine Learning

Reply

1410 Views
1 replies
0 kudos

06-25-2021 10:29:01 AM

View Replies

Latest Reply

sajith_appukutt
Honored Contributor II

06-25-2021 3:46:05 PM

0 kudos

You could mount an s3 bucket in the workspace and save your model using the mounts DBFS path For e.gmodelpath = "/dbfs/my-s3-bucket/model-%f-%f" % (alpha, l1_ratio) mlflow.sklearn.save_model(lr, modelpath)

0 kudos

06-25-2021 3:46:05 PM

by MoJaMa • Valued Contributor II

06-24-2021 7:21:46 PM

920 Views
1 replies
2 kudos

Does Databricks AutoML support Time Series Forecasting?

Machine Learning

Reply

920 Views
1 replies
2 kudos

06-24-2021 7:21:46 PM

View Replies

Latest Reply

Mooune_DBU
Valued Contributor

06-25-2021 3:02:15 PM

2 kudos

Not yet, but stay-tuned it's being cooked in the kitchen

2 kudos

06-25-2021 3:02:15 PM

by MoJaMa • Valued Contributor II

06-25-2021 1:02:53 PM

570 Views
1 replies
0 kudos

Is storage for Feature Store in Control Plane? Where does the Delta Table live?

Machine Learning

Reply

570 Views
1 replies
0 kudos

06-25-2021 1:02:53 PM

View Replies

Latest Reply

MoJaMa
Valued Contributor II

06-25-2021 1:03:34 PM

0 kudos

Data is stored in the control plane. Metadata (eg feature table descriptions, column types, etc) is stored in the control plane. The location where the Delta table is stored is determined by the database location. The customer could call CREATE DATA...

0 kudos

06-25-2021 1:03:34 PM

by User16826990884 • New Contributor III

06-25-2021 11:59:36 AM

710 Views
1 replies
0 kudos

Rollback cluster changes

Is it possible to rollback changes made to a cluster? The problem I'm trying to solve is to recover from an accidental change made by a user on a cluster that affects interactive and job runs. Cluster policies help, but the policy still provides the ...

Machine Learning

Reply

710 Views
1 replies
0 kudos

06-25-2021 11:59:36 AM

View Replies

Latest Reply

sajith_appukutt
Honored Contributor II

06-25-2021 12:24:54 PM

0 kudos

You could look at automating cluster creation steps and implementing this with an infra-as-code solution like the databricks terraform provider which allows rollback

0 kudos

06-25-2021 12:24:54 PM

by User16826990884 • New Contributor III

06-25-2021 12:08:06 PM

920 Views
0 replies
1 kudos

Dev and Prod environments

Do we have general guidance around how other customers manage Dev and Prod environments in Databricks? Is it recommended to have separate workspaces for them? What are the pros and cons of using the same workspace with folder or repo level isolation?

Machine Learning

Reply

920 Views
0 replies
1 kudos

06-25-2021 12:08:06 PM

by User16826994223 • Honored Contributor III

06-25-2021 11:22:59 AM

1451 Views
1 replies
0 kudos

Delta Lake MERGE INTO statement error

I'm trying to run Delta Lake MergeMERGE INTO source USING updates ON source.d = updates.sessionId WHEN MATCHED THEN UPDATE * WHEN NOT MATCHED THEN INSERT *I'm getting an SQL errorParseException: mismatched input 'MERGE' expecting {'(', 'SELECT', 'FR...

Machine Learning

Reply

1451 Views
1 replies
0 kudos

06-25-2021 11:22:59 AM

View Replies

Latest Reply

User16826994223
Honored Contributor III

06-25-2021 11:23:35 AM

0 kudos

The merge SQL support is added in Delta Lake 0.7.0. You also need to upgrade your Apache Spark to 3.0.0 and enable the integration with Apache Spark DataSourceV2 and C

0 kudos

06-25-2021 11:23:35 AM

by User16826992666 • Valued Contributor

06-25-2021 10:47:27 AM

428 Views
0 replies
0 kudos

Should I be saving my SparkML models in MLflow using MLeap?

There's a lot of different ML formats out there and I am confused about how they should be fitting together. How should I be thinking about MLflow and MLeap working together?

Machine Learning

Reply

428 Views
0 replies
0 kudos

06-25-2021 10:47:27 AM

by User16765131552 • Contributor III

06-25-2021 10:37:47 AM

878 Views
1 replies
0 kudos

Resolved! Setup a model serving REST endpoint?

I am trying to set up a demo with a really simple spark ML model and i see this error repeated over and over in the logs in the serving UI:/databricks/chauffeur/model-runner/lib/python3.6/site-packages/urllib3/connectionpool.py:1020: InsecureRequestW...

Machine Learning

Reply

878 Views
1 replies
0 kudos

06-25-2021 10:37:47 AM

View Replies

Latest Reply

User16765131552
Contributor III

06-25-2021 10:38:25 AM

0 kudos

Not sure how the containers for each model version work on the endpoints, but looks like Model serving endpoints use a 7.x runtime. So those would be Spark 3.0, not Spark 3.1

0 kudos

06-25-2021 10:38:25 AM

by User16826994223 • Honored Contributor III

06-25-2021 9:38:52 AM

1060 Views
1 replies
0 kudos

Using l vacuum with a dry run in Python for a Delta Lake

I can see an example on how to call the vacuum function for a Delta lake in python here. how to use the same in python %sql VACUUM delta.`dbfs:/mnt/<myfolder>` DRY RUN

Machine Learning

Reply

1060 Views
1 replies
0 kudos

06-25-2021 9:38:52 AM

View Replies

Latest Reply

User16826994223
Honored Contributor III

06-25-2021 9:39:11 AM

0 kudos

The dry run for non-SQL code is not yet available in Delta version 0.8. I see there is a bug that is opened with delta opensource in git . hope it get resolved soon

0 kudos

06-25-2021 9:39:11 AM

Databricks

Forum Posts

Deep Learning on Spark within AWS EMR

We use open source MLflow, and want to migrate to managed MLflow on databricks - Is there a documentation around this process? If not, what is the best guidance for us?

Resolved! Can I get detailed metrics of the RocksDB that I have used in one of my streams ?

docs.databricks.com

Resolved! How to check if the if my workspace has the IP access list feature enabled

Can multiple users collaborate together on MLflow experiments?

Resolved! Can I save MLflow artifacts to locations other than the dbfs?

Does Databricks AutoML support Time Series Forecasting?

Is storage for Feature Store in Control Plane? Where does the Delta Table live?

Rollback cluster changes

Dev and Prod environments

Delta Lake MERGE INTO statement error

Should I be saving my SparkML models in MLflow using MLeap?

Resolved! Setup a model serving REST endpoint?

Using l vacuum with a dry run in Python for a Delta Lake

pdb debugger on databricks

import ml.dmlc.xgboost4j.scala.spark.{XGBoostEstim...

Query ML Endpoint with R and Curl

'error_code': 'INVALID_PARAMETER_VALUE', 'message'...

AutoMl Dataset too large