I am running Python files on a Databricks cluster using the VS Code Databricks extension, specifically the “Upload and Run File” command.
I cannot get real-time output in the Debug Console. I have checked the official docs:
https://learn.microsoft.com/en-us/azure/databricks/dev-tools/vscode-ext/tutorial
https://github.com/databricks/databricks-vscode/blob/release-v2.10.3/packages/databricks-vscode/DATA...
but these do not address the issue.
The behavior I see is:
While the script is running, no output is shown in the Debug Console.
When the script finishes, all output appears at once.
If the script fails with an exception, only the error is shown and none of the printed output appears.
This makes it very difficult to test and debug. If this is expected behavior, I would like to know what the recommended or best-practice workflow is for running and testing a standalone Python file on a Databricks cluster with live output.
Below is a minimal example based on Databricks’ own sample script. I added a loop with prints and sleeps to demonstrate the missing streaming output:
from pyspark.sql import SparkSession
from pyspark.sql.types import *
spark = SparkSession.builder.getOrCreate()
schema = StructType([
StructField('CustomerID', IntegerType(), False),
StructField('FirstName', StringType(), False),
StructField('LastName', StringType(), False)
])
data = [
[1000, 'Matthijs', 'Oosterhout-Buntjes'],
[1001, 'Joost', 'van Brunswijk'],
[1002, 'Stan', 'Bokenkamp']
]
customers = spark.createDataFrame(data, schema)
customers.show()
import time
for i in range(100):
print(f'{i}-Output')
time.sleep(1)
time.sleep(10)
raise Exception("Demo failure")