cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

CarterM
by New Contributor III
  • 3574 Views
  • 2 replies
  • 2 kudos

Resolved! Why Spark Streaming from S3 is returning thousands of files when there are only 9?

I am attempting to stream JSON endpoint responses from an s3 bucket into a spark DLT. I have been very successful in this practice previously, but the difference this time is that I am storing the responses from multiple endpoints in the same s3 buck...

8_9 endpoint response structure Soccer  endpoint  9 9 endpoint responses in same s3 bucket
  • 3574 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@Carter Mooring​ Thank you SO MUCH for coming back to provide a solution to your thread! Happy you were able to figure this out so quickly. And I am sure that this will help someone in the future with the same issue.

  • 2 kudos
1 More Replies
JasonN
by New Contributor II
  • 1489 Views
  • 2 replies
  • 2 kudos

Resolved! DLT Cluster accessing to S3 bucket without Instance Profile attached

Hi Team,Can anyone please help me figure out how to configure Delta Live Tables Cluster accessing AWS S3 bucket without Instance profile defined in Cluster's JSON?The idea is, the user who is running the DLT Cluster has Storage Credentials and Extern...

  • 1489 Views
  • 2 replies
  • 2 kudos
Latest Reply
Vivian_Wilfred
Honored Contributor
  • 2 kudos

Hi @Jason Nam​ , DLT and unity catalog are not integrated yet. The cluster-notebook setup uses UC and can access S3 but not the DLT jobs. Please check the limitations in this document (7th point):https://docs.databricks.com/release-notes/unity-catalo...

  • 2 kudos
1 More Replies
77796
by New Contributor II
  • 3383 Views
  • 4 replies
  • 0 kudos

Databricks S3A error - java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory not found

We are getting the below error for runtime 10.x and 11.x when writing to s3 via saveAsNewAPIHadoopFile function. The same jobs are running fine on runtime 9.x and 7.x. The difference betwen 9.x and 10.x is the former has hadoop 2.7 bindings with sp...

  • 3383 Views
  • 4 replies
  • 0 kudos
Latest Reply
77796
New Contributor II
  • 0 kudos

We have resolved this issue by using s3 scheme instead of s3a i.e. pairRDD.saveAsNewAPIHadoopFile("s3://bucket/testout.dat",

  • 0 kudos
3 More Replies
MadelynM
by New Contributor III
  • 6248 Views
  • 1 replies
  • 0 kudos

Delta Live Tables + S3 | 5 tips for cloud storage with DLT

You’ve gotten familiar with Delta Live Tables (DLT) via the quickstart and getting started guide. Now it’s time to tackle creating a DLT data pipeline for your cloud storage–with one line of code. Here’s how it’ll look when you're starting:CREATE OR ...

Workflows-Left Nav Workflows
  • 6248 Views
  • 1 replies
  • 0 kudos
Latest Reply
MadelynM
New Contributor III
  • 0 kudos

Tip #3: Use JSON cluster configurations to access your storage locationKnowledge check: How do I modify DLT settings using JSON? Delta Live Tables settings are expressed as JSON and can be modified in the Delta Live Tables UI [AWS] [Azure][GCP].Examp...

  • 0 kudos
tej1
by New Contributor III
  • 2657 Views
  • 6 replies
  • 7 kudos

Resolved! Trouble accessing `_metadata` column using cloudFiles in Delta Live Tables

We are building a delta live pipeline where we ingest csv files in AWS S3 using cloudFiles. And it is necessary to access the file modification timestamp of the file. As documented here, we tried selecting `_metadata` column in a task in delta live p...

  • 2657 Views
  • 6 replies
  • 7 kudos
Latest Reply
tej1
New Contributor III
  • 7 kudos

Update: We were able to test `_metadata` column feature in DLT "preview" mode (which is DBR 11.0). Databricks doesn't recommend production workloads when using "preview" mode, but nevertheless, glad to be using this feature in DLT.

  • 7 kudos
5 More Replies
Megan05
by New Contributor III
  • 1963 Views
  • 4 replies
  • 1 kudos

Trying to write to S3 bucket but executed code not showing any progress

I am trying to write data from databricks to an S3 bucket but when I submit the code, it runs and runs and does not make any progress. I am not getting any errors and the logs don't seem to recognize I've submitted anything. The cluster also looks un...

image
  • 1963 Views
  • 4 replies
  • 1 kudos
Latest Reply
User16753725469
Contributor II
  • 1 kudos

Can you please check the driver log4j to see what is happening?

  • 1 kudos
3 More Replies
lsoewito
by New Contributor
  • 4475 Views
  • 3 replies
  • 3 kudos

Resolved! How to configure Databricks Connect to 'Assume Role' when accessing file from an AWS S3 bucket?

I have a Databricks cluster configured with an instance profile to assume role when accessing an AWS S3 bucket.Accessing the bucket from the notebook using the cluster works properly (the instance profile can assume role to access the bucket).However...

  • 4475 Views
  • 3 replies
  • 3 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 3 kudos

Hi @lsoewito​ , We haven't heard from you since my last response, and I was checking back to see if my suggestions helped you. Or else, If you have any solution, please share it with the community as it can be helpful to others. Also, please don't fo...

  • 3 kudos
2 More Replies
vivek_sinha
by Contributor
  • 17080 Views
  • 4 replies
  • 4 kudos

Resolved! PySpark on Jupyterhub K8s || Unable to query data || Class org.apache.hadoop.fs.s3a.S3AFileSystem not found

Pyspark Version: 2.4.5 Hive Version: 1.2 Hadoop Version: 2.7 AWS-SDK Jar: 1.7.4 Hadoop-AWS: 2.7.3When I am trying to show data I am getting Class org.apache.hadoop.fs.s3a.S3AFileSystem not found while I am passing all the information which all are re...

  • 17080 Views
  • 4 replies
  • 4 kudos
Latest Reply
vivek_sinha
Contributor
  • 4 kudos

Hi @Arvind Ravish​ Thanks for the response and now I fixed the issue.The image which I was using to launch spark executor didn't have aws jars. After doing necessary changes it started working.But still may thanks for your response.

  • 4 kudos
3 More Replies
Vee
by New Contributor
  • 2717 Views
  • 3 replies
  • 0 kudos

Tips for resolving follolwing errors related to AWS S3 read / write

Job aborted due to stage failure: Task 0 in stage 3084.0 failed 4 times, most recent failure: Lost task 0.3 in stage 3084.0 (TID...., ip..., executor 0): org.apache.spark.SparkExecution: Task failed while writing rowsJob aborted due to stage failure:...

  • 2717 Views
  • 3 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Vetrivel Senthil​ , Are you still facing the problem? Were you able to resolve it by yourself, or do you still need help? Please let us know.

  • 0 kudos
2 More Replies
Constantine
by Contributor III
  • 2515 Views
  • 2 replies
  • 5 kudos

Resolved! Unable to create a partitioned table on s3 data

I write data to s3 like data.write.format("delta").mode("append").option("mergeSchema", "true").save(s3_location)and create a partitioned table likeCREATE TABLE IF NOT EXISTS demo_table USING DELTA PARTITIONED BY (column_a) LOCATION {s3_location};whi...

  • 2515 Views
  • 2 replies
  • 5 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 5 kudos

Hi @John Constantine​ , Did the above suggestions provided by @Hubert Dudek​ help your case?

  • 5 kudos
1 More Replies
bonjih
by New Contributor
  • 5599 Views
  • 3 replies
  • 3 kudos

Resolved! AttributeError: module 'dbutils' has no attribute 'fs'

Hi,Using db in SageMaker to connect EC2 to S3. Following other examples I get 'AttributeError: module 'dbutils' has no attribute 'fs'....I guess Im missing an import?

  • 5599 Views
  • 3 replies
  • 3 kudos
Latest Reply
Atanu
Esteemed Contributor
  • 3 kudos

agree with @Werner Stinckens​  . also may try importing dbutils - @ben Hamilton​ 

  • 3 kudos
2 More Replies
hari
by Contributor
  • 1719 Views
  • 3 replies
  • 3 kudos

Resolved! Multi-cluster write for delta tables with s3 as the datastore

Does Delta currently support multi-cluster writes to delta table in s3?I see in the data bricks documentation that data bricks doesn't support writing to the same table from multiple spark drivers and thus multiple clusters.But s3Guard was also added...

  • 1719 Views
  • 3 replies
  • 3 kudos
Latest Reply
nastasiya09
New Contributor II
  • 3 kudos

that's really good post for memobdroverizon wifi

  • 3 kudos
2 More Replies
Constantine
by Contributor III
  • 1575 Views
  • 3 replies
  • 2 kudos

Do we have delta table access logs ?

I have delta tables on databricks with AWS s3.Are there any logs or anything else to figure out who all are accessing a particular DB or tables.

  • 1575 Views
  • 3 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

The thing that comes closest are Audit logs. Here is a list of log triggers.

  • 2 kudos
2 More Replies
MohitAnchlia
by New Contributor II
  • 841 Views
  • 1 replies
  • 1 kudos

Change AWS storage setting and account

I am seeing a super weird behaviour in databricks. We initially configured the following: 1. Account X in Account Console -> AWS Account arn:aws:iam::X:role/databricks-s3 2. We setup databricks-s3 as S3 bucket in Account Console -> AWS Storage 3. W...

  • 841 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @ MohitAnchlia! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the Forum have an answer to your questions first. Or else I will follow up shortly with a response.

  • 1 kudos
Labels