cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Best practice for creating queries for data transformation?

ConSpooky
New Contributor II

My apologies in advance for sounding like a newbie. This is really just a curiosity question I have as an outsider observing my team clash with our client. Please ask any questions you have, and I will try my best to answer it.

Currently, we are storing our transformation queries in a DynamoDB table. When needed, we pull into Databricks and execute the query. Simple as that. Our client has called this out as โ€œhard codingโ€ (more on that soon)

Our client has come up with an alternative that involves creating JSON config files containing the transformation rules (all tables/attributes required, target table names, Alias names, join keys, etc. etc.). From here, the SQL query is dynamically created. This approach is still โ€œhard codingโ€ since these config files would need to be manually edited anytime there is a change in the rules.

The way I see this: I think storing the transform rules in JSON is more business user friendly, but thatโ€™s about where I see the pros end. It brings in much more complexity to the code and likely will need to be continuously developed to support new queries. Also, I donโ€™t see anyway to prevent โ€œhard codingโ€. The client business leads seem to think there is some magical tool to convert plain English text to complex SQL queries

I just wanted to get some experts thoughts on this. Which solution is better, or is there another approach that should be taken?

3 REPLIES 3

Hubert-Dudek
Esteemed Contributor III

I agree with you, and I had similar experiences in the past. The business user would like to have everything magically automated and invite JSON, which will be edited by non-developer users. Additionally, it is still hard coded as you need to maintain that JSON and then read them by some script which will make ineffective queries that can easily break after the change in JSON

raymond021
New Contributor II

These modifications might be as straightforward as filtering the rows or eliminating a column, or they can be as widespread as using the first row as the table header. Along with basic transformation choices, there are also more complex ones like merge, append, group by, pivot, and unpivot.

Anonymous
Not applicable

Hi @Nick Connorsโ€‹

Hope all is well!

Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group