cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

Real-time output missing when using “Upload and Run File” from VS Code

tinodj
New Contributor

I am running Python files on a Databricks cluster using the VS Code Databricks extension, specifically the “Upload and Run File” command.

I cannot get real-time output in the Debug Console. I have checked the official docs:

https://learn.microsoft.com/en-us/azure/databricks/dev-tools/vscode-ext/tutorial

https://github.com/databricks/databricks-vscode/blob/release-v2.10.3/packages/databricks-vscode/DATA...

but these do not address the issue.

The behavior I see is:

  1. While the script is running, no output is shown in the Debug Console.

  2. When the script finishes, all output appears at once.

  3. If the script fails with an exception, only the error is shown and none of the printed output appears.

This makes it very difficult to test and debug. If this is expected behavior, I would like to know what the recommended or best-practice workflow is for running and testing a standalone Python file on a Databricks cluster with live output.

Below is a minimal example based on Databricks’ own sample script. I added a loop with prints and sleeps to demonstrate the missing streaming output:

 

from pyspark.sql import SparkSession
from pyspark.sql.types import *

spark = SparkSession.builder.getOrCreate()

schema = StructType([
    StructField('CustomerID', IntegerType(), False),
    StructField('FirstName', StringType(), False),
    StructField('LastName', StringType(), False)
])

data = [
    [1000, 'Matthijs', 'Oosterhout-Buntjes'],
    [1001, 'Joost', 'van Brunswijk'],
    [1002, 'Stan', 'Bokenkamp']
]

customers = spark.createDataFrame(data, schema)
customers.show()

import time
for i in range(100):
    print(f'{i}-Output')
    time.sleep(1)

time.sleep(10)
raise Exception("Demo failure")

 

 

4 REPLIES 4

stbjelcevic
Databricks Employee
Databricks Employee

Hi @tinodj ,

Are you using the Databricks Connect integration in conjunction with the VS Code extension? I think you should be able to get better debugging results with this enabled.

Hi @stbjelcevic 

Thanks! With Databricks Connect part of the code runs locally. And this is not always the best scenario to debug the code, or am I wrong? For example when reading some local files mounted on the cluster, or running remote services available only from the cluster, etc. I would like to test when code runs on the cluster in full. 

 

stbjelcevic
Databricks Employee
Databricks Employee

Ahh yes, you are right about that. There isn’t a “run all non-Spark code remotely” mode in Databricks Connect today, but it appears to be a commonly requested enhancement and is currently tracked internally. 

Have you checked the Databricks UI for the job run to see if the printed outputs show up there? I understand you want local development, but I am trying to figure out if your expected print statements/logs are viewable anywhere to begin with.

tinodj
New Contributor

Yes, prints and loggings are viewable in driver logs as they happen. If the same file is run in databricks Web UI they are viewable on output window as they happen as well. But, when run through VS code, unfortunately they are not visible in the debug console (but they are in the driver logs), as can be demonstrated with the example above.