As part of my batch processing I archive a large number of small files received from the source system each day using the dbutils.fs.mv command. This takes hours as dbutils.fs.mv moves the files one at a time.How can I speed this up?
When running spark under yarn each script has it's own self contained set of logs:- In databricks all I see if a list of jobs and stages that have been run on the cluster:- From a support perspective this is a nightmare.How can notebooks logs be grou...
I am using the delta format and occasionaly get the following error:-"xx.parquet referenced in the transaction log cannot be found. This occurs when data has been manually deleted from the file system rather than using the table `DELETE` statement"FS...
I have have started getting an error message when running the following optimize command:-deltaTable.optimize().executeCompaction()Error:-java.util.concurrent.ExecutionException: java.lang.IllegalStateException: Number of records changed after Optimi...
I have created a number of workflows in the Databricks UI. I now need to deploy them to a different workspace.How can I do that?Code can be deployed via Git, but the job definitions are stored in the workspace only.
The workflows are 'integrated' with git, as in they get the code from a git branch each time they run. I am not seeing an option to store the actual workflow definitions in git (task name, script path, dependencies, retries, schedule etc). I have a w...
The only way I can find to move workflow jobs (schedules) to another workspace is:-1) Create a job in the databricks UI (Workflows -> Jobs -> Create Job)2) Copy the json definition ("...", View JSON, Create, Copy)3) Save the json locally or in the Gi...
I can't see anything in this link related to moving jobs (schedules) between environments. I have git integration, but workflow jobs are not stored in git.