cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Terraform - Install egg file from S3

Murthy1
Contributor II

I am looking to install Python Egg files on all my clusters. The egg file is located in a S3 location. I tried using the following code which didn't work

 

 

 

resource "databricks_dbfs_file" "app" {
  source = "${S3_Path}/foo.egg"
  path   = "/FileStore/foo.egg"
}

resource "databricks_library" "app" {
  cluster_id = databricks_cluster.this.id
  egg        = databricks_dbfs_file.app.dbfs_path
}

 

 

 

 Can someone help me with this?

Thanks in advance!

5 REPLIES 5

Kaniz
Community Manager
Community Manager

Hi @Murthy1 , What is the error message you're getting?

 

Kaniz
Community Manager
Community Manager

Hi @Murthy1 , 

Installing Python eggs is deprecated and will be removed in a future release. Use Python wheels or install packages from PyPI instead. However, to install a Python egg file located in an S3 location on all clusters in Databricks, you can follow the steps below:
 
1. Upload the egg file to an S3 bucket.
2. Mount the S3 bucket to Databricks File System (DBFS) using the instructions in the documentation: https://docs.databricks.com/data/data-sources/aws/amazon-s3.html#mount-an-s3-bucket
3. Create a library in Databricks using the mounted S3 bucket as the library source. You can follow the instructions in the documentation: https://docs.databricks.com/libraries/workspace-libraries.html#create-a-workspace-library
4. Install the library on all clusters using the instructions in the documentation: https://docs.databricks.com/libraries/workspace-libraries.html#install-a-workspace-library-on-a-clus...

Murthy1
Contributor II

Hello @Kaniz ,

Thanks for the very quick response. I indeed understand that it is going to be deprecated but for now have to use the EGG files from S3 ๐Ÿ™‚

Thanks for the blogs! It would be nice if you have some links that allows us to do this through Terraform.

The error I get is 

Error: File s3://bucket/folder/package.egg does not exist 

Kaniz
Community Manager
Community Manager

Hi @Murthy1 , 

 

 
The possible reasons for the error message "File s3://bucket/folder/package.egg does not exist" when trying to install a Python Egg file located in an S3 location on all clusters in Databricks could be:

- The cluster does not have read access to the S3 location of the Egg file.
- The file name of the Egg file does not follow the correct convention.
- The Egg file may have been deleted or moved from the S3 location.

To resolve the issue, ensure the cluster has read access to the S3 location, check if the file name follows the correct convention, and verify if the Egg file is still in the S3 location.
 
Additional Resources
 

Anonymous
Not applicable

Hi @Murthy1 

Does @Kaniz  response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?

We'd love to hear from you.

Thanks!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.