cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

tracing the history of a workflow

dev_puli
New Contributor III

Hi!

I use Databricks in Azure and I find it inconvenient not knowing the last modified user and modified time. How can I trace the history of modified time and user details? Would it be possible to deploy the workflows into higher environments?

Thanks!

3 REPLIES 3

Kaniz_Fatma
Community Manager
Community Manager

Hi @dev_puli, In Azure Databricks, you can trace the history of table modifications, including the user responsible and the timestamp.

 

Hereโ€™s how you can achieve this:

 

Delta Lake Table History:

  • Each operation that modifies a Delta Lake table creates a new table version.
  • You can retrieve information about these operations, including the user ID, timestamp, and operation type, by running the DESCRIBE HISTORY command on your Delta table.
  • The history information is returned in reverse chronological order.
  • Example SQL commands:
    • To get the full history of the table:DESCRIBE HISTORY '/data/events/'
    • To get only the last operation:DESCRIBE HISTORY '/data/events/' LIMIT 1
  • The table history retention is determined by the setting delta.logRetentionDuration, which is 30 days by default.
  • Note that Databricks does not recommend using Delta Lake table history as a long-term backup solution for data archival. Itโ€™s primarily for auditing, rollback, and time travel purposes1.

Comparison and Deployment:

  • To compare performance across different environments, consider the following steps:
    • Development Environment:
      • Develop and test your workflows in a development environment.
      • Use the Delta Lake table history to track changes and understand performance.
    • Higher Environments (Staging, Production):
      • Once satisfied with the development, deploy your workflows to higher environments.
      • Ensure that the same Delta Lake table schema and data are used.
      • Monitor performance and compare it against the development environment.
    • Automated Deployment:
      • Use CI/CD pipelines or automation tools to streamline deployment from development to higher environments.
      • Automate the process of creating tables, applying schema changes, and loading data.

Workspace Files and Notebooks:

  • You can also programmatically access workspace files (including notebooks) using Python or Scala.
  • Retrieve details such as creation date, modified date, and user information.
  • This can be useful for tracking changes and understanding performance across different versions of notebooks.

Remember to adapt these steps based on your specific use case and requirements. 

 

By leveraging Delta Lake history and workspace file details, you can gain insights into modifications and compare performance effectively. ๐Ÿš€

dev_puli
New Contributor III

Thanks for your response! I used "describe history table_name" before as you specified above. I am fairly new to Databricks. Can you provide more insights about the workflow? How to trace the history of a workflow like tracing the history of a table(using a command like describe) or a notebook (using git options)? I am not sure of the possibilities of deploying the workflows from lower environments to higher environments. I had challenges in changing the owner of a workflow when I created a workflow. I ended up in seeking the help from another user with admin privileges to change the owner.

dev_puli
New Contributor III

Sorry! I added another issue at the end without mentioning it was a new issue I encountered. I had challenges in changing the owner of a workflow when I created a workflow. I ended up seeking help from another user with admin privileges to change the owner. Is there a way to choose a service principle as the default owner every time a workflow gets created by any user?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group