Hi. I have a PySpark notebook that takes 25 minutes to run as opposed to one minute in on-prem Linux + Pandas. How can I speed it up?It's not a volume issue. The input is around 30k rows. Output is the same because there's no filtering or aggregation...
I'm migrating code from Python Linux to Databricks PySpark. I have many mappings like this: { "main": { "honda": 1.0, "toyota": 2.9, "BMW": 5.77, "Fiat": 4.5, },}I exported using json.dump, saved to s3 and was able to import with sp...
I don't think it's possible but I thought I would check. I need to combine notebooks. While developing I might have code in various notebooks. I read them in with "%run".Then when all looks good I combine many cells into fewer notebooks. Is there any...
Hi, I just went to run a Databricks pyspark notebook and saw this message:This is a notebook I've run before but never saw this. Is it referring to my cluster? The Databricks infrastructure? My notebook ran normally, just wondering though. Google sea...
I want to install my own Python wheel package on a cluster but can't get it working. I tried two ways: I followed these steps: https://docs.databricks.com/en/workflows/jobs/how-to/use-python-wheels-in-workflows.html#:~:text=March%2025%2C%202024,code%...
Actually, no I can't even find it. I see it in the browser Workspace, but when I do "%ls" it showsazure/ eventlogs/ logs/ conf/ hadoop_accessed_config.lst* preload_class.lst*
It seems related to the notebook length (number of cells). The notebook that was really slow had about 40-50 cells, which I've done before without issue. Anyway after starting a new notebook using Chrome, it seems useable again. So without a specific...