Data Engineering

Forum Posts

Sorted by:

by danatsafe • New Contributor

01-17-2023 2:15:33 PM

6475 Views
3 replies
0 kudos

Amazon returns a 403 error code when trying to access an S3 Bucket

Hey! So far I have followed along with the Configure S3 access with instance profiles article to grant my cluster access to an S3 bucket. I have also made sure to disable IAM role passthrough on the cluster. Upon querying the bucket through a noteboo...

Data Engineering

6475 Views
3 replies
0 kudos

01-17-2023 2:15:33 PM

View Replies

Latest Reply

winojoe
New Contributor III

08-18-2023 5:51:04 PM

0 kudos

I had the same issue and I found a solutionFor me, the permission problems only exist when the Cluster's (compute's) Access mode is "Shared No Isolation". When the Access Mode is either "Shared" or "Single User" then the IAM configuration seems to a...

0 kudos

08-18-2023 5:51:04 PM

2 More Replies

by KiranKondamadug • New Contributor II

05-31-2023 12:27:05 PM

6029 Views
1 replies
2 kudos

Running into delta.exceptions.ConcurrentAppendException even after setting up S3 Multi-Cluster Writes environment via S3 Dynamo DB LogStore

My use-case is to process a dataset worth 100s of partitions in concurrency. The data is partitioned, and they are disjointed. I was facing ConcurrentAppendException due to S3 not supporting the “put-if-absent” consistency guarantee. From Delta Lake ...

Data Engineering

6029 Views
1 replies
2 kudos

05-31-2023 12:27:05 PM

View Replies

Latest Reply

Debayan
Databricks Employee

06-06-2023 12:36:31 AM

2 kudos

Hi, You can refer to https://docs.databricks.com/optimizations/isolation-level.html#conflict-exceptions and recheck if everything is alright. Please let us know if this helps, also please tag @Debayan with your next response which will notify me, Th...

2 kudos

06-06-2023 12:36:31 AM

by fijoy • Contributor

05-27-2023 9:19:07 AM

2364 Views
3 replies
0 kudos

Is there a utility to convert between "/dbfs" and "dbfs:" path strings?

Is there a built-in utility function, e.g., dbutils, that can convert between path strings that start with "dbfs:" and "/dbfs"?Some operations, e.g, copying from one location in DBFS to another using dbutils.fs.cp() expect the path starting with "/db...

Data Engineering

2364 Views
3 replies
0 kudos

05-27-2023 9:19:07 AM

View Replies

Latest Reply

Anonymous
Not applicable

06-01-2023 1:33:28 AM

0 kudos

Hi @Fijoy Vadakkumpadan Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best a...

0 kudos

06-01-2023 1:33:28 AM

2 More Replies

by kumarPerry • New Contributor II

04-11-2023 10:46:49 AM

3093 Views
3 replies
0 kudos

Notebook connectivity issue with aws s3 bucket using mounting

When connecting to aws s3 bucket using dbfs, application throws error like org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 7864387.0 failed 4 times, most recent failure: Lost task 0.3 in stage 7864387.0 (TID 1709732...

Data Engineering

3093 Views
3 replies
0 kudos

04-11-2023 10:46:49 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-15-2023 11:50:12 PM

0 kudos

Hi @Amrendra Kumar Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us s...

0 kudos

04-15-2023 11:50:12 PM

2 More Replies

by sajith_appukutt • Honored Contributor II

06-09-2021 12:36:37 AM

2485 Views
2 replies
1 kudos

Resolved! How can I configure S3 Client-Side Encryption (CSE-KMS ) for my data pipeline

Data Engineering

2485 Views
2 replies
1 kudos

06-09-2021 12:36:37 AM

View Replies

Latest Reply

AdrianRojas
New Contributor II

04-13-2023 4:35:26 PM

1 kudos

a bit old, but I just faced the same issue, specifying a custom EncryptionMaterialsProvider (as described in the previous post) did the trick for me but I did had to also specify my kms endpoint, just because my region:"fs.s3.cse.kms.endpoint" -> "km...

1 kudos

04-13-2023 4:35:26 PM

1 More Replies

by LidorAbo • New Contributor II

01-31-2023 1:09:02 AM

2261 Views
1 replies
0 kudos

Databricks can write to s3 bucket through panda but not from spark

Hey,I have problem with access to s3 bucket using cross account bucket permission, i got the following error:Steps to repreduce:Checking the role that assoicated to ec2 instance:{ "Version": "2012-10-17", "Statement": [ { ...

Data Engineering

2261 Views
1 replies
0 kudos

01-31-2023 1:09:02 AM

View Replies

Latest Reply

Nhan_Nguyen
Valued Contributor

01-31-2023 5:17:32 AM

0 kudos

Could you try to map s3 bucket location with Databricks File System then write output to this new location instead of directly write to S3 location.

0 kudos

01-31-2023 5:17:32 AM

by impulsleistung • New Contributor III

10-23-2022 6:46:52 AM

3447 Views
4 replies
6 kudos

mount s3 bucket with specific endpoint

Environment:AZURE-DatabricksLanguage: PythonI can access my s3 bucket via:boto3.client('s3', endpoint_url='https://gateway.storjshare.io', ... )and it also works via:boto3.resource('s3', endpoint_url='https://gateway.storjshare.io', ... )As a next st...

Data Engineering

3447 Views
4 replies
6 kudos

10-23-2022 6:46:52 AM

View Replies

Latest Reply

Anonymous
Not applicable

11-27-2022 6:10:42 AM

6 kudos

Hi @Kevin Ostheimer Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Th...

6 kudos

11-27-2022 6:10:42 AM

3 More Replies

by its-kumar • New Contributor III

06-06-2022 12:37:55 AM

8356 Views
9 replies
4 kudos

Resolved! Error While creating the Databricks workspace on the AWS cloudformation stack

I am getting the following error in databricksAPIFunction resource creation and AWS Stack is failing with the rollback. Resource handler returned message: "Your access has been denied by S3, please make sure your request credentials have permission t...

Data Engineering

8356 Views
9 replies
4 kudos

06-06-2022 12:37:55 AM

View Replies

Latest Reply

Vartika
Databricks Employee

11-11-2022 7:46:37 AM

4 kudos

Hi @Kumar Shanu,Thank you for coming back and letting us know.It was really great of you to mark an answer as best and for pointing everyone in the right direction.Have a great Databricks journey ahead!

4 kudos

11-11-2022 7:46:37 AM

8 More Replies

by Anonymous • Not applicable

04-12-2022 10:27:19 AM

6766 Views
4 replies
1 kudos

Constructor public com.databricks.backend.daemon.dbutils.FSUtilsParallel is not whitelisted when mounting a s3 bucket

Hello all, I'm experiencing this issueConstructor public com.databricks.backend.daemon.dbutils.FSUtilsParallel is not whitelisted when I'm trying to mount a s3 bucket. %python dbutils.fs.mount("s3a://dd-databricks-staging-storage/data/staging/datalak...

Data Engineering

6766 Views
4 replies
1 kudos

04-12-2022 10:27:19 AM

View Replies

Latest Reply

leonids2005
New Contributor II

10-03-2022 10:00:20 AM

1 kudos

WE have this problem running cluster with 11.2 and shared access mode. spark.databricks.pyspark.enablePy4JSecurity false - this does not help because it says spark.databricks.pyspark.enablePy4JSecurity is not allowed when choosing access modehere is ...

1 kudos

10-03-2022 10:00:20 AM

3 More Replies

by antoniok • New Contributor II

09-08-2022 5:36:57 AM

3750 Views
1 replies
3 kudos

dbutils.fs.ls is giving "null uri host This can be caused by unencoded / in the password string"

I'm trying to list number of files in s3 bucket. I've initially used "aws s3 ls <s3://>" to list the files and it worked. However, when trying to do the same using dbutils.fs.ls, I'm getting java.lang.NullPointerException: null uri host. This can be ...

Data Engineering

3750 Views
1 replies
3 kudos

09-08-2022 5:36:57 AM

View Replies

Latest Reply

marcus1
New Contributor III

09-27-2022 8:53:32 AM

3 kudos

You might be encountering an issue with bucket naming. Which I'm also getting with a bucket named something.[0-9]https://issues.apache.org/jira/browse/HADOOP-17241

3 kudos

09-27-2022 8:53:32 AM

by Lonnie • New Contributor

05-19-2022 1:17:15 PM

2168 Views
0 replies
0 kudos

Recommended Redshift-2-Delta Migration Path

Hello All!My team is previewing Databricks and are contemplating the steps to take to perform one-time migrations of datasets from Redshift to Delta. Based on our understandings of the tool, here are our initial thoughts:Export data from Redshift-2-S...

Data Engineering

2168 Views
0 replies
0 kudos

05-19-2022 1:17:15 PM

by matt_t • New Contributor

02-17-2022 12:54:21 PM

3803 Views
2 replies
1 kudos

Resolved! S3 sync from bucket to a mounted bucket causing a "[Errno 95] Operation not supported" error for some but not all files

Trying to sync one folder from an external s3 bucket to a folder on a mounted S3 bucket and running some simple code on databricks to accomplish this. Data is a bunch of CSVs and PSVs.The only problem is some of the files are giving this error that t...

Data Engineering

3803 Views
2 replies
1 kudos

02-17-2022 12:54:21 PM

View Replies

Latest Reply

Atanu
Databricks Employee

03-15-2022 9:38:16 PM

1 kudos

@Matthew Tribby does above suggestion work. Please let us know if you need further help on this. Thanks.

1 kudos

03-15-2022 9:38:16 PM

1 More Replies

by fff_ds • New Contributor

02-15-2022 1:05:44 PM

1339 Views
1 replies
1 kudos

Manual overwrite in s3 console of a collection of parquet files and now we can't read them.

org.apache.spark.SparkException: Job aborted due to stage failure: Task 19 in stage 26.0 failed 4 times, most recent failure: Lost task 19.3 in stage 26.0 (TID 4205, 10.66.225.154, executor 0): com.databricks.sql.io.FileReadException: Error while rea...

Data Engineering

1339 Views
1 replies
1 kudos

02-15-2022 1:05:44 PM

View Replies

Latest Reply

Anonymous
Not applicable

02-16-2022 8:24:05 AM

1 kudos

Hello, @Lili Ehrlich. Welcome! My name is Piper, and I'm a moderator for Databricks. Thank you for bringing your question to us. Let's give it a while for the community to respond first.Thanks in advance for your patience.

1 kudos

02-16-2022 8:24:05 AM

by venkyv • New Contributor II

01-26-2022 1:51:22 PM

2320 Views
1 replies
3 kudos

Resolved! Can I use Databricks to join data from S3 and Postgres using SQL?

Hello, I'm very much new to Databricks and I'm finding it hard if it's right solution for our needs.Requirement:We have multiple data sources spread across AWS S3 and Postgres. We need a common SQL endpoint that can be used to write queries to join d...

Data Engineering

2320 Views
1 replies
3 kudos

01-26-2022 1:51:22 PM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

01-26-2022 2:46:48 PM

3 kudos

Yes you can. You can ETL to data lake storage register your tables to metastore and register your SELECT with JOINS as VIEW or even better create additionally jobs and store your JOINED table. From BI you can connect to databricks sql or to data lake...

3 kudos

01-26-2022 2:46:48 PM

by Hubert-Dudek • Esteemed Contributor III

11-15-2021 3:48:22 AM

2443 Views
2 replies
13 kudos

Resolved! something like AWS Macie to perform scans on Azure Data Lake

Does anyone know alternative for AWS Macie in Azure?AWS Macie scan S3 buckets for files with sensitive data (personal address, credit card etc...).I would like to use the same style ready scanner for Azure Data Lake.

Data Engineering

2443 Views
2 replies
13 kudos

11-15-2021 3:48:22 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

11-15-2021 4:58:17 AM

13 kudos

thank you, I checked and yes it is definitely the way to go

13 kudos

11-15-2021 4:58:17 AM

1 More Replies