โ07-11-2024 02:00 PM
โ07-12-2024 02:56 PM - edited โ07-12-2024 03:01 PM
I think there is some kind of problem with networking/permissions to the storage account created in managed resource group by Databricks. By default, when you run a notebook interactively by clicking Run in the notebook:
So in your case, when you limit the result set then it works becasue small results are stored in Azure Databricks control plane.
But when you try to display whole datframe without limiting it, databricks will try to save result in the workspace storage account. Look at the cluster logs and see if there is some errors related to the root storage account.
Maybe you have some firewall that prevents Databricks to connect to storage account.
โ07-11-2024 07:34 PM
Hi @joseroca99 ,
Try to add filesystem type to your path. Something like that: dbfs:/databricks-datasets/wikipedia-datasets/data-001/pageviews/raw/pageviews_by_second
L
โ07-11-2024 09:21 PM
Depending on where did you find the file using %fs you should use appropriate filesystem pre-fix.
If its in dbfs use dbfs:/YOUR_PATH
If its in local file system try with - file:/
โ07-12-2024 07:21 AM
I tried writing dbfs: and /dbfs before the path, still not working
โ07-12-2024 07:29 AM
Update 1: Apparently the problem shows up when using display(), using show() or display(df.limit()) works fine. I also started using the premium pricing tier, I'm going to see what happens if I use the free 14 days trial pricing tier.
Update 2: I tried using dbfs: and /dbfs prefixes, still not working. I also tried using a table I got from the marketplace and spark.read.table() and the problem persists
โ07-12-2024 02:56 PM - edited โ07-12-2024 03:01 PM
I think there is some kind of problem with networking/permissions to the storage account created in managed resource group by Databricks. By default, when you run a notebook interactively by clicking Run in the notebook:
So in your case, when you limit the result set then it works becasue small results are stored in Azure Databricks control plane.
But when you try to display whole datframe without limiting it, databricks will try to save result in the workspace storage account. Look at the cluster logs and see if there is some errors related to the root storage account.
Maybe you have some firewall that prevents Databricks to connect to storage account.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group