cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How to mount AWS EFS via NFS on a Databricks Cluster

stvayers
New Contributor

I'm trying to read in ~500 million small json files into an spark autoloader pipeline, and I seem to be slowed down massively by S3 request limits, so I want to explore using AWS EFS instead. 

I found this blog post: https://www.databricks.com/blog/2019/05/17/nfs-mounting-in-databricks-product.html

I followed the instructions but it doesn't seem to work. It also says something about turning on an NFS configuration flag, but I can't find it in databricks anywhere. Can someone please advise me on whether or not this is still possible?

1 REPLY 1

Kaniz
Community Manager
Community Manager

Hi @stvayers

If you’re dealing with ~500 million small JSON files and facing S3 request limits, consider the following steps:

  • EFS Setup:

    1. Create an EFS file system.
    2. Mount the EFS file system on your EC2 instances.
    3. Use EFS as the target for your Spark autoloader pipeline.
  • Ensure that your EC2 instances have the necessary permissions to mount the EFS file system.
  •  
Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.