cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Governance
Join discussions on data governance practices, compliance, and security within the Databricks Community. Exchange strategies and insights to ensure data integrity and regulatory compliance.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How do I configure my interactive compute in databricks to access files from an EFS filesystem?

mtaraviya-QA
New Contributor II

I have an S3 account in which I have full administrator privileges. In that account I have a databricks workspace and an EFS filesystem setup.  I created an interactive compute inside databricks workspace with the default config. How do I configure my interactive compute in databricks to access files from an EFS filesystem?

2 REPLIES 2

mtaraviya-QA
New Contributor II

*When I said S3 account, I meant AWS account. This question is strictly about EFS. I am trying to use EFS as attempts to use S3 instead did not work, and furthermore EFS suits my usage requirements more closely

Louis_Frolio
Databricks Employee
Databricks Employee

Greetings @mtaraviya-QA ,  Hereโ€™s how to configure your interactive Databricks compute to access files in AWS EFS.

Prerequisites on AWS networking

  • Ensure the Databricks cluster VPC/subnets can reach EFS mount targets. Place EFS mount targets in subnets reachable from your cluster, and open NFS (TCP 2049) in the relevant security groups.
  • If EFS is in a different VPC or AWS account, set up VPC peering or a transit gateway and routing between VPCs.
  • Enable VPC DNS resolution and hostnames and test reachability from the cluster network (for example, nslookup fs-xxxx.efs.<region>.amazonaws.com and nc -vz <efs-hostname> 2049). If crossโ€‘VPC DNS is not available, plan to mount using the mount target IP address.

Configure the Databricks cluster (API-based mount)

 
Mount EFS via the Clusters API using the experimental cluster_mount_infos field. Do not use init scripts for EFS on typical shared/E2 workspaces.
  • Create or edit your cluster to include cluster_mount_infos; example: json { "cluster_name": "efs-cluster", "spark_version": "15.4.x-scala2.12", "node_type_id": "i3.xlarge", "num_workers": 2, "cluster_mount_infos": [ { "network_filesystem_info": { "server_address": "fs-abcdef0123456789.efs.us-east-1.amazonaws.com", "mount_options": "nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport" }, "local_mount_dir_path": "/mnt/volumes/efs-mount", "remote_mount_dir_path": "/" } ] }
  • If DNS doesnโ€™t resolve (common in crossโ€‘VPC setups), use the mount target IP for server_address. Optionally pin the clusterโ€™s AZ with aws_attributes.zone_id to match the mount targetโ€™s AZ.
  • Access the mount at the path you specified in local_mount_dir_path (for example, /mnt/volumes/efs-mount). In some environments mounts are presented under /db-mnt/...; if you donโ€™t see your path at the root, check under /db-mnt.

Terraform example

hcl resource "databricks_cluster" "with_efs" { # ... cluster_mount_info { network_filesystem_info { server_address = "fs-abcdef0123456789.efs.us-east-1.amazonaws.com" mount_options = "nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport" } remote_mount_dir_path = "/" local_mount_dir_path = "/mnt/volumes/efs-mount" } }
### Verify from a notebook * Check the mount and list files: bash %sh mount | grep efs ls -la /mnt/volumes/efs-mount
  • Access files via POSIX paths: python with open("/mnt/volumes/efs-mount/somefile.txt") as f: print(f.readline())

Important notes and limitations

  • Init scripts are not the supported method for mounting EFS on shared/E2 workspaces; use the Clusters API (cluster_mount_infos) or Terraform.
  • The amazon-efs-utils IAM/TLS mount helper is not supported in this integration. Use NFSv4.1 with standard mount options (as shown above).
  • After cluster edits or restarts, ensure the mount configuration remains in the cluster definition; avoid editing mounts in the UI, as custom properties can be lost. If the cluster autoscale adds nodes, Databricks will apply the configured mount during node setup.
  • This applies to classic clusters. For serverless compute, use S3 or Unity Catalog Volumes rather than EFS.

Troubleshooting checklist

  • From a sameโ€‘subnet test EC2 instance (with the same security group rules), try: bash sudo apt-get update && sudo apt-get install -y nfs-common sudo mkdir /efs sudo mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport <efs-host-or-ip>:/ /efs mount | grep efs This validates network, security group, and DNS routing independently of Databricks.
  • If DNS fails, use the mount target IP and pin the cluster AZ. Confirm security groups allow TCP 2049 and that routing to peering/TGW is correct.
  • If youโ€™re using RStudio on Databricks, EFS via cluster_mount_infos is a good way to persist user data; ensure the cluster can write to the mount (for example, chmod a+w /mnt/volumes/efs-mount).
 
Hope this helps, Louis.