cancel
Showing results for 
Search instead for 
Did you mean: 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

How can I increase the hard capacity of the master node?

himanmon
New Contributor III

I'm not sure if this is the right place to post my question. If not, please let me know where I should post my question.

 

I want to download large files from the web from Databricks' master(driver) node. For example, I fetch a file over 150GB via API request.

Of course I know this isn't a beautiful thing, but for some reason I have to do it.

The problem is that the master node's disk capacity is insufficient. I tried adding an EBS volume in the cluster settings, but it doesn't seem to be the option I want.

I am trying to download a 150 GB file to the /tmp path on the master node's local file system. I would like to know how to increase insufficient disk space.

 

1 ACCEPTED SOLUTION

Accepted Solutions

Slash
Contributor

Hi @himanmon,

If you 100% sure that you can't download this file to storage account configured with unity catalog and you want it directly on driver node local storage, then why can't you just increase local disk space by choosing a larger instance type?Select an instance type with more local storage capacity for the master node and you should be fine.

View solution in original post

2 REPLIES 2

Slash
Contributor

Hi @himanmon,

If you 100% sure that you can't download this file to storage account configured with unity catalog and you want it directly on driver node local storage, then why can't you just increase local disk space by choosing a larger instance type?Select an instance type with more local storage capacity for the master node and you should be fine.

himanmon
New Contributor III

I didn't know that instances with more CPU cores and memory size also had larger local storage. 

I know from your answer that more expensive instances have more storage, but it's a burden because it increases the cost.

However, I absolutely need to download the file to local storage for some reason, so it's pretty sad if there's no other way.

(The reason I mentioned earlier is, I need to increment data line by line in append mode in a txt file using some API.)

Thank you for your answer.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group