Design Question

Frank — Tue, 27 Sep 2022 06:20:12 GMT

we have an application that takes in raw metrics data like key-value pairs.

then we split them into four different table like below

`key1, min, max, average`

Those four tables are later used for dashboard.

What are the design recommendations to this? Shall we change the schema?
When data is ingested in, there seems to 2s delay for everytime there is a SQL command. In the ingestion endpoint, we will have to write to four tables and also insert to raw tables, those will cause about 2*5=10s which is really long. How can we minimize the ingest time?
What is the recommended data ingestion pattern? We currently use http post to a server and then server write to a database. But Delta seems to be slow in this case.

Re: Design Question

stefnhuy — Tue, 29 Aug 2023 12:11:59 GMT

Hey,

I can totally relate to the challenges Frank is facing with this application'**bleep** data processing. It'**bleep** frustrating to deal with delays, especially when dealing with real-time metrics. I've had a similar experience where optimizing data ingestion was crucial.

Considering the design, using separate tables for 'min', 'max', and 'average' is a good start for dashboard efficiency. However, the 2-second delay per SQL command seems like a bottleneck. Have you thought about batch processing instead of individual inserts? Combining multiple commands into one batch could significantly reduce overhead. If you haven't heard of it before, I suggest you read this article: Cross Platform App Design: Discover The Solid UI Design Guidelines.

Regarding the ingestion pattern, HTTP POST to a server is convenient, but if Delta'**bleep** slow, exploring other technologies like Apache Kafka might be worth it. It'**bleep** designed for high-throughput, real-time data streaming.

Changing the schema might help, but first, analyze the read vs. write frequency. If reads are more frequent, consider optimizing the dashboard queries.

Remember, it'**bleep** a trial-and-error process. I'd love to hear how others dealt with similar challenges and what worked best for them.

topic Re: Design Question in Data Engineering

Design Question

Re: Design Question