a week ago
a week ago
what do you mean by offline?
Actually down, or disconnected from the public internet?
Databricks can only access systems where it has access to, and that can be done using endpoints/vpn.
If none of these are possible, Databricks cannot reach it, there is no local Databricks agent/gateway.
But you might have an ETL tool available which has access to the system and can write to a cloud storage?
a week ago
1. The network is connected and accessible.
2. I am currently using the free version for debugging and have found that I cannot connect to the offline HDFS. I have not been able to locate the place to change the configuration. Is this because the free version does not support this feature?
a week ago
Hello @yinan
Good day!!
a week ago
1. Is it that the free version of the SQL editor cannot connect to HDFS?
2. Is it also that the free version of the notebook cannot be used? I created a free version, and it keeps spinning;
3. Does the current free version of Databricks not support linking to an offline cluster's HDFS (assuming the network is all set up)?
4. I saw in the free version that it says custom workspace storage locations are not supported. Does this mean I cannot choose other storage spaces and can only use ADLS?
a week ago
Hello @yinan
Good day!!
Thank you for your response.
Here are you solutions for your questions
1. Is it that the free version of the SQL editor cannot connect to HDFS?
The Databricks Free Edition (which replaced the Community Edition) does support using the SQL editor for querying and analyzing data, but it has limitations. The SQL editor relies on a single, small-sized SQL warehouse (limited to a 2X-Small cluster size). While it can access data registered in Unity Catalog or default storage, it cannot directly connect to external on-premises HDFS due to the absence of private networking configurations in the Free Edition.
2. Is it also that the free version of the notebook cannot be used? I created a free version, and it keeps spinning;
No, the Free Edition does support notebooks, and they can be created and used via the limited all-purpose serverless compute (restricted to small cluster sizes). Yes, even I am trying since morning, the cluster is just spinning may be after few hours it will work normally.
3. Does the current free version of Databricks not support linking to an offline cluster's HDFS (assuming the network is all set up)?
Correctโthe current Free Edition does not support direct linking to an on-premises (offline) cluster's HDFS, even if the network is theoretically set up on your end. This is because the Free Edition operates in a managed, shared workspace without private networking options (e.g., no VNet peering, ExpressRoute, or VPC configurations). On-prem HDFS access would require secure private connectivity(like VPN or expressrouting) and custom compute setups, which aren't available in the serverless-only Free Edition. But you can try to upload/import the data manually on databricks community edition.
4. I saw in the free version that it says custom workspace storage locations are not supported. Does this mean I cannot choose other storage spaces and can only use ADLS?
Yes, the message about custom workspace storage locations not being supported means you cannot configure or choose alternative storage for the workspace root (e.g., a custom ADLS container or other cloud storage).
BUT if you are student or got any debit/credit card, you can get one month of free (for students one year) access from azure subscription. You can go the Azure portal and register yourself plus azure also provides 14 days of free databricks access plus azure provides approx 200$ credit value to use and practice for only one month which means that you can mount the azure storages on databricks.
IF you find this answer useful, please select my solution as solution for the question.
Thank you.
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now