Re: Automation for any script changes in databrick...

iyashk-DB · 4 weeks ago

Hi, couple ways to do this depending on how tightly you want it wired in. If Bitbucket is just your source repo and you want a row per change, the simplest path is a Bitbucket Pipeline that runs on push. In that pipeline step, pull the commit metadata (`git log -1`, changed files from `git diff --name-only HEAD^ HEAD`) and write it straight into a Unity Catalog table using the Databricks REST API's SQL execution endpoint or the databricks-sql-connector, hitting a serverless SQL warehouse. That gives you full control over what counts as "a change" and works the same across all your envs since it's driven from the pipeline, not from inside Databricks.

If instead you're syncing Bitbucket into Databricks through Git folders (Repos) and just want to know when someone pulled a change into the workspace, Databricks already logs that for you. Git folder activity lands in the `system.access.audit` system table under `service_name = 'repos'`, so you could just query that instead of building your own capture step. I couldn't confirm whether system tables are switched on by default on every workspace tier though, so check that `system.access.audit` is actually queryable in your workspace before you design around it.

Either way, land the output in a Delta table in Unity Catalog rather than writing raw files straight to ADLS, UC is sitting on top of that storage anyway and you get proper schema, permissions, and query access for free.