Design Question
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-26-2022 11:20 PM
we have an application that takes in raw metrics data like key-value pairs.
then we split them into four different table like below
`key1, min, max, average`
Those four tables are later used for dashboard.
- What are the design recommendations to this? Shall we change the schema?
- When data is ingested in, there seems to 2s delay for everytime there is a SQL command. In the ingestion endpoint, we will have to write to four tables and also insert to raw tables, those will cause about 2*5=10s which is really long. How can we minimize the ingest time?
- What is the recommended data ingestion pattern? We currently use http post to a server and then server write to a database. But Delta seems to be slow in this case.
- Labels:
-
Delta
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-29-2023 05:10 AM - edited 08-29-2023 05:11 AM
Hey,
I can totally relate to the challenges Frank is facing with this application'**bleep** data processing. It'**bleep** frustrating to deal with delays, especially when dealing with real-time metrics. I've had a similar experience where optimizing data ingestion was crucial.
Considering the design, using separate tables for 'min', 'max', and 'average' is a good start for dashboard efficiency. However, the 2-second delay per SQL command seems like a bottleneck. Have you thought about batch processing instead of individual inserts? Combining multiple commands into one batch could significantly reduce overhead. If you haven't heard of it before, I suggest you read this article: Cross Platform App Design: Discover The Solid UI Design Guidelines.
Regarding the ingestion pattern, HTTP POST to a server is convenient, but if Delta'**bleep** slow, exploring other technologies like Apache Kafka might be worth it. It'**bleep** designed for high-throughput, real-time data streaming.
Changing the schema might help, but first, analyze the read vs. write frequency. If reads are more frequent, consider optimizing the dashboard queries.
Remember, it'**bleep** a trial-and-error process. I'd love to hear how others dealt with similar challenges and what worked best for them.