cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

How worker nodes get the packages during scale-up?

satycse06
New Contributor

Hi,

We are working with one of the repository where we used to download the artifact/python package from that repository using index url in global init script but now the logic is going to be change we need give the cred to download the package and that will be valid for few hours.

So I have the question how the new worker node will get the same package/artifact during scale up it will download using the global init script or it will get it from the driver node so whether it will hit the configured index url or not?

I have second question If I consider that it(new worker node)  will execute the global init script and it will again download the package from the configured index url  then whether it will get the new credentials from the azure library where the new cred will update or it will execute with the old cred what it got it on driver node during the first execution?

Regards,

Satya

1 REPLY 1

Vidhi_Khaitan
Databricks Employee
Databricks Employee

Yes, the new worker node will execute the global init script independently when it starts. 
It does not get the package from the driver or other existing nodes and will hit the configured index URL directly, and try to download the package on its own. Therefore, every new node joining during autoscaling will invoke the index URL to download the artifact/package.

Regarding the second question, it depends on how your global init script is written.
If you pull the credentials from Databricks secrets every time the script runs, then the new node will get fresh credentials, but if the driver fetches the credentials once and writes them to DBFS or environment variables shared across nodes, then new worker nodes may use stale credentials.