10-25-2021 02:04 AM
I feel like the answer to this question should be simple, but none the less I'm struggling.
I run a python code that prompts me with the following warning:
On my local machine, I can accept this through my terminal and my machine do not run out of memory. I'm expecting the same to be the case when running it from Databricks. Hence, how do I accept this within runtime of the current command?
I'm missing some kind of terminal to type 'y' into, or I am missing something else entirely 🙂
Can someone help me out?
10-26-2021 01:41 PM
Hi @Nickels Köhling ,
In Databricks, you will only be able to see the output in the driver logs. If you go to your driver logs, you will be able to see 3 windows that are displaying the output of "stdout", "stderr" and "log4j".
If in your code you do any print() statements, then it will be displayed in the "stdout" window from your driver logs.
If you are using a Spark logger, then it will be displayed in the "log4j" output window.
If you would like to provide a value dynamically, then I will recommend to use the input widgets from your notebook.
10-25-2021 03:05 AM
Typically, Spark (Databricks) is not made for downloading files locally to your laptop.
It is a distributed computing system optimized for parallel writes to some kind of storage (DBFS for Databricks).
I do not know what your use case is exactly, but if you want to download data, it might be a good idea to let Databricks write to a data lake/blob storage etc (something mounted in DBFS).
From there on, you can download it to your computer if necessary.
There is the possibility to download data from within the notebooks (display command), but I think there is a hard limit in the amount of data which can be transferred that way.
10-25-2021 03:11 AM
Thank you for your reply, Werners. Actually, the thing with regard to my local computer was only to see if the code would run on a local instance by use of VS Code. So it was only for testing purposes.
What I'm trying to do in Databricks is to convert a NC-file to CSV and store it in my datalake. In this process I receive the warning/exception as showed above. Hence, my 'only' problem is that I don't now how to reply to the warning/exception from within a Databricks notebook at runtime...
I guess an alternative would be to suppress the warning/exception in the first place?
10-25-2021 05:26 AM
Ok, the nc-file is of a file type not known to spark.
So that is why you read it in python, to convert it, I suppose.
Thing is that if you use plain python, Databricks behaves in the same way as your local computer with the same limits.
The power of spark is the parallel processing. But spark does not know this file format so here we are.
I do not know this nc-file format, but what you need is a reader which will work on Spark
(pyspark if you use python).
10-26-2021 01:41 PM
Hi @Nickels Köhling ,
In Databricks, you will only be able to see the output in the driver logs. If you go to your driver logs, you will be able to see 3 windows that are displaying the output of "stdout", "stderr" and "log4j".
If in your code you do any print() statements, then it will be displayed in the "stdout" window from your driver logs.
If you are using a Spark logger, then it will be displayed in the "log4j" output window.
If you would like to provide a value dynamically, then I will recommend to use the input widgets from your notebook.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group