cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

danatsafe
by New Contributor
  • 4701 Views
  • 3 replies
  • 0 kudos

Amazon returns a 403 error code when trying to access an S3 Bucket

Hey! So far I have followed along with the Configure S3 access with instance profiles article to grant my cluster access to an S3 bucket. I have also made sure to disable IAM role passthrough on the cluster. Upon querying the bucket through a noteboo...

  • 4701 Views
  • 3 replies
  • 0 kudos
Latest Reply
winojoe
New Contributor III
  • 0 kudos

I had the same issue and I found a solutionFor me, the permission problems only exist when the Cluster's (compute's) Access mode is "Shared No Isolation".  When the Access Mode is either "Shared" or "Single User" then the IAM configuration seems to a...

  • 0 kudos
2 More Replies
THIAM_HUATTAN
by Valued Contributor
  • 3285 Views
  • 6 replies
  • 6 kudos

Resolved! Delta Lake’s CDF Feature

https://www.databricks.com/notebooks/delta-lake-cdf.htmlI am trying to understand the above article. Could someone explain to be the below questions?a) From SELECT * FROM table_changes('gold_consensus_eps', 2)why is consensus_eps values of 2.1 and 2....

  • 3285 Views
  • 6 replies
  • 6 kudos
Latest Reply
Anonymous
Not applicable
  • 6 kudos

Hi @THIAM HUAT TAN​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answer...

  • 6 kudos
5 More Replies
Fed
by New Contributor III
  • 1487 Views
  • 1 replies
  • 2 kudos

Resolved! Ray as a cluster library instead of notebook-scoped library

This article rightly suggests to install `ray` with `%pip`, although it fails to mention that installing it as a cluster library won't work.The reason, I think, is that `setup_ray_cluster` will use `sys.executable` (ie `/local_disk0/.ephemeral_nfs/en...

  • 1487 Views
  • 1 replies
  • 2 kudos
Latest Reply
Fed
New Contributor III
  • 2 kudos

Ugly, but this seems to work for nowimport sys import os import shutil from ray.util.spark import setup_ray_cluster, shutdown_ray_cluster   shutil.copy( "/local_disk0/.ephemeral_nfs/cluster_libraries/python/bin/ray", os.path.dirname(sys.execu...

  • 2 kudos
User16752240150
by New Contributor II
  • 1193 Views
  • 1 replies
  • 0 kudos

What's the best way to use hyperopt to train a spark.ml model and track automatically with mlflow?

I've read this article, which covers:Using CrossValidator or TrainValidationSplit to track hyperparameter tuning (no hyperopt). Only random/grid searchparallel "single-machine" model training with hyperopt using hyperopt.SparkTrials (not spark.ml)"Di...

  • 1193 Views
  • 1 replies
  • 0 kudos
Latest Reply
sean_owen
Honored Contributor II
  • 0 kudos

It's actually pretty simple: use hyperopt, but use "Trials" not "SparkTrials". You get parallelism from Spark, not from the tuning process.

  • 0 kudos
Labels