cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks time travel - how to get ALL changes ever done to a table

oishimbo
New Contributor

Hi time travel gurus,

I am investigating creating a reporting solution with an AsOf functionality. Users will be able to create a report based on the current data or on the data AsOf some time ago. Due to the nature of our data this AsOf feature is quite important to reproduce reports. 

I can create Databricks tables with time travel enabled, thus capturing all changes. However, I plan to import the data into a PowerBI dataset to allow users in PowerBI to set the correct AsOf date while creating reports. In order to do so, I would have to load ALL data changes from Databricks into the PowerBI dataset. For temporal tables in SQL Server this can be done by specifying "FOR SYSTEM_TIME ALL". Is there a similar query option available for Databricks?

I am aware that I have to monitor the size of the underlying data very closely. At this stage I'm simply investigating if Databricks tables with time travel enabled would allow me access to all the changes. 

Thanks for your help 🙂

2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @oishimbo, Databricks, with its Delta Lake, provides a powerful mechanism for capturing data changes over time. 

 

Here are some key points to consider:

 

Delta Lake Time Travel:

Querying with Time Travel:

  • To query a Delta table at a specific point in time, you can add a clause after the table name specification.
  • The timestamp expression can be:
    • A string that can be cast to a timestamp (e.g., '2018-10-18T22:15:12.013Z').
    • A date string (e.g., '2018-10-18').
    • An expression like current_timestamp() - interval 12 hours.
  • Example query: SELECT * FROM myDeltaTable TIMESTAMP AS OF '2018-10-18'.

Considerations:

  • While Delta Lake history is useful for auditing and time travel, it’s not recommended as a long-term backup solution.
  • Databricks suggests using only the past 7 days for time travel operations unless you’ve adjusted data and log retention configurations.
  • Keep a close eye on the size of your underlying data, especially when loading changes into Power BI.

Power BI Integration:

  • Import the data from Databricks into your Power BI dataset.
  • Allow users in Power BI to set the correct AsOf date by leveraging the timestamp expressions in your queries.

Remember that Databricks’ time travel feature provides a powerful way to navigate historical data, but it’s essential to balance it with storage considerations. 

Kaniz
Community Manager
Community Manager

Hey there! Thanks a bunch for being part of our awesome community! 🎉 

We love having you around and appreciate all your questions. Take a moment to check out the responses – you'll find some great info. Your input is valuable, so pick the best solution for you. And remember, if you ever need more help , we're here for you! 

Keep being awesome! 😊🚀

 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.