cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Automation for any script changes in databricks and bit bucket

Dolly0503
New Contributor III

If any code changes are done and moved to bit bucket based on env now i want to fetch all the code changes and move all these changes to a table in adls/unity catalog table.

Anyone please help with the approaches we are using bit bucket for repository

1 REPLY 1

iyashk-DB
Databricks Employee
Databricks Employee

Hi, couple ways to do this depending on how tightly you want it wired in. If Bitbucket is just your source repo and you want a row per change, the simplest path is a Bitbucket Pipeline that runs on push. In that pipeline step, pull the commit metadata (`git log -1`, changed files from `git diff --name-only HEAD^ HEAD`) and write it straight into a Unity Catalog table using the Databricks REST API's SQL execution endpoint or the databricks-sql-connector, hitting a serverless SQL warehouse. That gives you full control over what counts as "a change" and works the same across all your envs since it's driven from the pipeline, not from inside Databricks.

If instead you're syncing Bitbucket into Databricks through Git folders (Repos) and just want to know when someone pulled a change into the workspace, Databricks already logs that for you. Git folder activity lands in the `system.access.audit` system table under `service_name = 'repos'`, so you could just query that instead of building your own capture step. I couldn't confirm whether system tables are switched on by default on every workspace tier though, so check that `system.access.audit` is actually queryable in your workspace before you design around it.

Either way, land the output in a Delta table in Unity Catalog rather than writing raw files straight to ADLS, UC is sitting on top of that storage anyway and you get proper schema, permissions, and query access for free.