cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

danatsafe
by New Contributor
  • 5248 Views
  • 3 replies
  • 0 kudos

Amazon returns a 403 error code when trying to access an S3 Bucket

Hey! So far I have followed along with the Configure S3 access with instance profiles article to grant my cluster access to an S3 bucket. I have also made sure to disable IAM role passthrough on the cluster. Upon querying the bucket through a noteboo...

  • 5248 Views
  • 3 replies
  • 0 kudos
Latest Reply
winojoe
New Contributor III
  • 0 kudos

I had the same issue and I found a solutionFor me, the permission problems only exist when the Cluster's (compute's) Access mode is "Shared No Isolation".  When the Access Mode is either "Shared" or "Single User" then the IAM configuration seems to a...

  • 0 kudos
2 More Replies
KiranKondamadug
by New Contributor II
  • 4832 Views
  • 1 replies
  • 2 kudos

Running into delta.exceptions.ConcurrentAppendException even after setting up S3 Multi-Cluster Writes environment via S3 Dynamo DB LogStore

My use-case is to process a dataset worth 100s of partitions in concurrency. The data is partitioned, and they are disjointed. I was facing ConcurrentAppendException due to S3 not supporting the “put-if-absent” consistency guarantee. From Delta Lake ...

  • 4832 Views
  • 1 replies
  • 2 kudos
Latest Reply
Debayan
Databricks Employee
  • 2 kudos

Hi, You can refer to https://docs.databricks.com/optimizations/isolation-level.html#conflict-exceptions and recheck if everything is alright. Please let us know if this helps, also please tag @Debayan​ with your next response which will notify me, Th...

  • 2 kudos
fijoy
by Contributor
  • 2154 Views
  • 3 replies
  • 0 kudos

Is there a utility to convert between "/dbfs" and "dbfs:" path strings?

Is there a built-in utility function, e.g., dbutils, that can convert between path strings that start with "dbfs:" and "/dbfs"?Some operations, e.g, copying from one location in DBFS to another using dbutils.fs.cp() expect the path starting with "/db...

  • 2154 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Fijoy Vadakkumpadan​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best a...

  • 0 kudos
2 More Replies
kumarPerry
by New Contributor II
  • 2662 Views
  • 3 replies
  • 0 kudos

Notebook connectivity issue with aws s3 bucket using mounting

When connecting to aws s3 bucket using dbfs, application throws error like org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 7864387.0 failed 4 times, most recent failure: Lost task 0.3 in stage 7864387.0 (TID 1709732...

  • 2662 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Amrendra Kumar​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us s...

  • 0 kudos
2 More Replies
sajith_appukutt
by Honored Contributor II
  • 2239 Views
  • 2 replies
  • 1 kudos
  • 2239 Views
  • 2 replies
  • 1 kudos
Latest Reply
AdrianRojas
New Contributor II
  • 1 kudos

a bit old, but I just faced the same issue, specifying a custom EncryptionMaterialsProvider (as described in the previous post) did the trick for me but I did had to also specify my kms endpoint, just because my region:"fs.s3.cse.kms.endpoint" -> "km...

  • 1 kudos
1 More Replies
Tico23
by Contributor
  • 4513 Views
  • 3 replies
  • 0 kudos

Resolved! AmazonS3 with Autoloader consume "too many" requests or maybe not!

After successfully loading 3 small files (2 KB each) in from AWS S3 using Auto Loader for learning purposes, I got, few hours later, a "AWS Free tier limit alert", although I haven't used the AWS account for a while.Does this streaming service on Da...

Budget_alert
  • 4513 Views
  • 3 replies
  • 0 kudos
Latest Reply
Debayan
Databricks Employee
  • 0 kudos

Hi, ​​Auto Loader incrementally and efficiently processes new data files as they arrive in cloud storage. Auto Loader can load data files from AWS S3 (s3://), Azure Data Lake Storage Gen2 (ADLS Gen2, abfss://), Google Cloud Storage (GCS, gs://), Azur...

  • 0 kudos
2 More Replies
LidorAbo
by New Contributor II
  • 2019 Views
  • 1 replies
  • 0 kudos

Databricks can write to s3 bucket through panda but not from spark

Hey,I have problem with access to s3 bucket using cross account bucket permission, i got the following error:Steps to repreduce:Checking the role that assoicated to ec2 instance:{ "Version": "2012-10-17", "Statement": [ { ...

Access_Denied_S3_Bucket
  • 2019 Views
  • 1 replies
  • 0 kudos
Latest Reply
Nhan_Nguyen
Valued Contributor
  • 0 kudos

Could you try to map s3 bucket location with Databricks File System then write output to this new location instead of directly write to S3 location.

  • 0 kudos
impulsleistung
by New Contributor III
  • 3122 Views
  • 4 replies
  • 6 kudos

mount s3 bucket with specific endpoint

Environment:AZURE-DatabricksLanguage: PythonI can access my s3 bucket via:boto3.client('s3', endpoint_url='https://gateway.storjshare.io', ... )and it also works via:boto3.resource('s3', endpoint_url='https://gateway.storjshare.io', ... )As a next st...

  • 3122 Views
  • 4 replies
  • 6 kudos
Latest Reply
Anonymous
Not applicable
  • 6 kudos

Hi @Kevin Ostheimer​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Th...

  • 6 kudos
3 More Replies
its-kumar
by New Contributor III
  • 7592 Views
  • 9 replies
  • 4 kudos

Resolved! Error While creating the Databricks workspace on the AWS cloudformation stack

I am getting the following error in databricksAPIFunction resource creation and AWS Stack is failing with the rollback. Resource handler returned message: "Your access has been denied by S3, please make sure your request credentials have permission t...

  • 7592 Views
  • 9 replies
  • 4 kudos
Latest Reply
Vartika
Databricks Employee
  • 4 kudos

Hi @Kumar Shanu​,Thank you for coming back and letting us know.It was really great of you to mark an answer as best and for pointing everyone in the right direction.Have a great Databricks journey ahead! 

  • 4 kudos
8 More Replies
Anonymous
by Not applicable
  • 6223 Views
  • 4 replies
  • 1 kudos

Constructor public com.databricks.backend.daemon.dbutils.FSUtilsParallel is not whitelisted when mounting a s3 bucket

Hello all, I'm experiencing this issueConstructor public com.databricks.backend.daemon.dbutils.FSUtilsParallel is not whitelisted when I'm trying to mount a s3 bucket. %python dbutils.fs.mount("s3a://dd-databricks-staging-storage/data/staging/datalak...

  • 6223 Views
  • 4 replies
  • 1 kudos
Latest Reply
leonids2005
New Contributor II
  • 1 kudos

WE have this problem running cluster with 11.2 and shared access mode. spark.databricks.pyspark.enablePy4JSecurity false - this does not help because it says spark.databricks.pyspark.enablePy4JSecurity is not allowed when choosing access modehere is ...

  • 1 kudos
3 More Replies
antoniok
by New Contributor II
  • 2737 Views
  • 1 replies
  • 3 kudos

dbutils.fs.ls is giving "null uri host This can be caused by unencoded / in the password string"

I'm trying to list number of files in s3 bucket. I've initially used "aws s3 ls <s3://>" to list the files and it worked. However, when trying to do the same using dbutils.fs.ls, I'm getting java.lang.NullPointerException: null uri host. This can be ...

  • 2737 Views
  • 1 replies
  • 3 kudos
Latest Reply
marcus1
New Contributor III
  • 3 kudos

You might be encountering an issue with bucket naming. Which I'm also getting with a bucket named something.[0-9]https://issues.apache.org/jira/browse/HADOOP-17241

  • 3 kudos
Lonnie
by New Contributor
  • 2003 Views
  • 0 replies
  • 0 kudos

Recommended Redshift-2-Delta Migration Path

Hello All!My team is previewing Databricks and are contemplating the steps to take to perform one-time migrations of datasets from Redshift to Delta. Based on our understandings of the tool, here are our initial thoughts:Export data from Redshift-2-S...

  • 2003 Views
  • 0 replies
  • 0 kudos
matt_t
by New Contributor
  • 3362 Views
  • 2 replies
  • 1 kudos

Resolved! S3 sync from bucket to a mounted bucket causing a "[Errno 95] Operation not supported" error for some but not all files

Trying to sync one folder from an external s3 bucket to a folder on a mounted S3 bucket and running some simple code on databricks to accomplish this. Data is a bunch of CSVs and PSVs.The only problem is some of the files are giving this error that t...

  • 3362 Views
  • 2 replies
  • 1 kudos
Latest Reply
Atanu
Databricks Employee
  • 1 kudos

@Matthew Tribby​  does above suggestion work. Please let us know if you need further help on this. Thanks.

  • 1 kudos
1 More Replies
fff_ds
by New Contributor
  • 1223 Views
  • 1 replies
  • 1 kudos

Manual overwrite in s3 console of a collection of parquet files and now we can't read them.

org.apache.spark.SparkException: Job aborted due to stage failure: Task 19 in stage 26.0 failed 4 times, most recent failure: Lost task 19.3 in stage 26.0 (TID 4205, 10.66.225.154, executor 0): com.databricks.sql.io.FileReadException: Error while rea...

  • 1223 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hello, @Lili Ehrlich​. Welcome! My name is Piper, and I'm a moderator for Databricks. Thank you for bringing your question to us. Let's give it a while for the community to respond first.Thanks in advance for your patience.

  • 1 kudos
venkyv
by New Contributor II
  • 2154 Views
  • 1 replies
  • 3 kudos

Resolved! Can I use Databricks to join data from S3 and Postgres using SQL?

Hello, I'm very much new to Databricks and I'm finding it hard if it's right solution for our needs.Requirement:We have multiple data sources spread across AWS S3 and Postgres. We need a common SQL endpoint that can be used to write queries to join d...

  • 2154 Views
  • 1 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

Yes you can. You can ETL to data lake storage register your tables to metastore and register your SELECT with JOINS as VIEW or even better create additionally jobs and store your JOINED table. From BI you can connect to databricks sql or to data lake...

  • 3 kudos
Labels