Hi. I'm testing out the "Run parameters" you see in Jobs & Pipelines. As far as I know, this value is set manually by "Job parameters" on the right side bar. Can I set the value within code though? Like if I want something dynamically generated depen...
I would guess unusual but want to hear from others before I nag my managers about it. In Databricks (I access in web browser) we have a compute cluster specifically for Git; you need to start it to push code or even to change branches. This is separa...
Hi. I have a PySpark notebook that takes 25 minutes to run as opposed to one minute in on-prem Linux + Pandas. How can I speed it up?It's not a volume issue. The input is around 30k rows. Output is the same because there's no filtering or aggregation...
I'm migrating code from Python Linux to Databricks PySpark. I have many mappings like this: { "main": { "honda": 1.0, "toyota": 2.9, "BMW": 5.77, "Fiat": 4.5, },}I exported using json.dump, saved to s3 and was able to import with sp...
I don't think it's possible but I thought I would check. I need to combine notebooks. While developing I might have code in various notebooks. I read them in with "%run".Then when all looks good I combine many cells into fewer notebooks. Is there any...
Actually, no I can't even find it. I see it in the browser Workspace, but when I do "%ls" it showsazure/ eventlogs/ logs/ conf/ hadoop_accessed_config.lst* preload_class.lst*
It seems related to the notebook length (number of cells). The notebook that was really slow had about 40-50 cells, which I've done before without issue. Anyway after starting a new notebook using Chrome, it seems useable again. So without a specific...