cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Best practice for creating queries for data transformation?

ConSpooky
New Contributor II

My apologies in advance for sounding like a newbie. This is really just a curiosity question I have as an outsider observing my team clash with our client. Please ask any questions you have, and I will try my best to answer it.

Currently, we are storing our transformation queries in a DynamoDB table. When needed, we pull into Databricks and execute the query. Simple as that. Our client has called this out as “hard coding” (more on that soon)

Our client has come up with an alternative that involves creating JSON config files containing the transformation rules (all tables/attributes required, target table names, Alias names, join keys, etc. etc.). From here, the SQL query is dynamically created. This approach is still “hard coding” since these config files would need to be manually edited anytime there is a change in the rules.

The way I see this: I think storing the transform rules in JSON is more business user friendly, but that’s about where I see the pros end. It brings in much more complexity to the code and likely will need to be continuously developed to support new queries. Also, I don’t see anyway to prevent “hard coding”. The client business leads seem to think there is some magical tool to convert plain English text to complex SQL queries

I just wanted to get some experts thoughts on this. Which solution is better, or is there another approach that should be taken?

4 REPLIES 4

Hubert-Dudek
Esteemed Contributor III

I agree with you, and I had similar experiences in the past. The business user would like to have everything magically automated and invite JSON, which will be edited by non-developer users. Additionally, it is still hard coded as you need to maintain that JSON and then read them by some script which will make ineffective queries that can easily break after the change in JSON

raymond021
New Contributor II

These modifications might be as straightforward as filtering the rows or eliminating a column, or they can be as widespread as using the first row as the table header. Along with basic transformation choices, there are also more complex ones like merge, append, group by, pivot, and unpivot.

Kaniz
Community Manager
Community Manager

Hi @Nick Connors​ ​, We haven’t heard from you since the last response from @Hubert Dudek​ and @Raymond Jack​, and I was checking back to see if you have a resolution yet.

If you have any solution, please share it with the community as it can be helpful to others. Otherwise, we will respond with more details and try to help.

Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.

Anonymous
Not applicable

Hi @Nick Connors​

Hope all is well!

Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.