Data Engineering

Forum Posts

Sorted by:

by danatsafe • New Contributor

01-17-2023 2:15:33 PM

5774 Views
3 replies
0 kudos

Amazon returns a 403 error code when trying to access an S3 Bucket

Hey! So far I have followed along with the Configure S3 access with instance profiles article to grant my cluster access to an S3 bucket. I have also made sure to disable IAM role passthrough on the cluster. Upon querying the bucket through a noteboo...

Data Engineering

5774 Views
3 replies
0 kudos

01-17-2023 2:15:33 PM

View Replies

Latest Reply

winojoe
New Contributor III

08-18-2023 5:51:04 PM

0 kudos

I had the same issue and I found a solutionFor me, the permission problems only exist when the Cluster's (compute's) Access mode is "Shared No Isolation". When the Access Mode is either "Shared" or "Single User" then the IAM configuration seems to a...

0 kudos

08-18-2023 5:51:04 PM

2 More Replies

by THIAM_HUATTAN • Valued Contributor

05-15-2023 5:50:55 AM

3881 Views
6 replies
6 kudos

Resolved! Delta Lake’s CDF Feature

https://www.databricks.com/notebooks/delta-lake-cdf.htmlI am trying to understand the above article. Could someone explain to be the below questions?a) From SELECT * FROM table_changes('gold_consensus_eps', 2)why is consensus_eps values of 2.1 and 2....

Data Engineering

3881 Views
6 replies
6 kudos

05-15-2023 5:50:55 AM

View Replies

Latest Reply

Anonymous
Not applicable

05-23-2023 1:58:26 AM

6 kudos

Hi @THIAM HUAT TAN Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answer...

6 kudos

05-23-2023 1:58:26 AM

5 More Replies

by Fed • New Contributor III

03-10-2023 12:39:45 PM

1777 Views
1 replies
2 kudos

Resolved! Ray as a cluster library instead of notebook-scoped library

This article rightly suggests to install `ray` with `%pip`, although it fails to mention that installing it as a cluster library won't work.The reason, I think, is that `setup_ray_cluster` will use `sys.executable` (ie `/local_disk0/.ephemeral_nfs/en...

Data Engineering

1777 Views
1 replies
2 kudos

03-10-2023 12:39:45 PM

View Replies

Latest Reply

Fed
New Contributor III

03-10-2023 1:03:14 PM

2 kudos

Ugly, but this seems to work for nowimport sys import os import shutil from ray.util.spark import setup_ray_cluster, shutdown_ray_cluster shutil.copy( "/local_disk0/.ephemeral_nfs/cluster_libraries/python/bin/ray", os.path.dirname(sys.execu...

2 kudos

03-10-2023 1:03:14 PM

by User16752240150 • New Contributor II

06-04-2021 12:34:03 PM

1387 Views
1 replies
0 kudos

What's the best way to use hyperopt to train a spark.ml model and track automatically with mlflow?

I've read this article, which covers:Using CrossValidator or TrainValidationSplit to track hyperparameter tuning (no hyperopt). Only random/grid searchparallel "single-machine" model training with hyperopt using hyperopt.SparkTrials (not spark.ml)"Di...

Data Engineering

1387 Views
1 replies
0 kudos

06-04-2021 12:34:03 PM

View Replies

Latest Reply

sean_owen
Databricks Employee

06-17-2021 5:00:45 PM

0 kudos

It's actually pretty simple: use hyperopt, but use "Trials" not "SparkTrials". You get parallelism from Spark, not from the tuning process.

0 kudos

06-17-2021 5:00:45 PM

Databricks Community

Amazon returns a 403 error code when trying to access an S3 Bucket

Resolved! Delta Lake’s CDF Feature

Resolved! Ray as a cluster library instead of notebook-scoped library

What's the best way to use hyperopt to train a spark.ml model and track automatically with mlflow?