<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Design Question in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/design-question/m-p/42788#M27421</link>
    <description>&lt;P&gt;Hey,&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I can totally relate to the challenges Frank is facing with this application'**bleep** data processing. It'**bleep** frustrating to deal with delays, especially when dealing with real-time metrics. I've had a similar experience where optimizing data ingestion was crucial.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;Considering the design, using separate tables for &lt;STRONG&gt;'min'&lt;/STRONG&gt;, &lt;STRONG&gt;'max'&lt;/STRONG&gt;, and &lt;STRONG&gt;'average'&lt;/STRONG&gt; is a good start for dashboard efficiency. However, the 2-second delay per SQL command seems like a bottleneck. Have you thought about batch processing instead of individual inserts? Combining multiple commands into one batch could significantly reduce overhead. If you haven't heard of it before, I suggest you read this article: &lt;A href="https://stackdiary.com/a-guide-to-consistent-cross-platform-app-design/" target="_self"&gt;Cross Platform App Design: Discover The Solid UI Design Guidelines&lt;/A&gt;.&lt;/P&gt;&lt;P&gt;Regarding the ingestion pattern, HTTP POST to a server is convenient, but if Delta'**bleep** slow, exploring other technologies like Apache Kafka might be worth it. It'**bleep** designed for high-throughput, real-time data streaming.&lt;/P&gt;&lt;P&gt;Changing the schema might help, but first, analyze the read vs. write frequency. If reads are more frequent, consider optimizing the dashboard queries.&lt;/P&gt;&lt;P&gt;Remember, it'**bleep** a trial-and-error process. I'd love to hear how others dealt with similar challenges and what worked best for them.&lt;/P&gt;</description>
    <pubDate>Tue, 29 Aug 2023 12:11:59 GMT</pubDate>
    <dc:creator>stefnhuy</dc:creator>
    <dc:date>2023-08-29T12:11:59Z</dc:date>
    <item>
      <title>Design Question</title>
      <link>https://community.databricks.com/t5/data-engineering/design-question/m-p/30762#M22329</link>
      <description>&lt;P&gt;we have an application that takes in raw metrics data like key-value pairs. &lt;/P&gt;&lt;P&gt;then we split them into four different table like below&lt;/P&gt;&lt;P&gt;`key1, min, max, average`&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Those four tables are later used for dashboard. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;What are the design recommendations to this? Shall we change the schema? &lt;/LI&gt;&lt;LI&gt;When data is ingested in, there seems to 2s delay for everytime there is a SQL command. In the ingestion endpoint, we will have to write to four tables and also insert to raw tables, those will cause about 2*5=10s which is really long. How can we minimize the ingest time? &lt;/LI&gt;&lt;LI&gt;What is the recommended data ingestion pattern? We currently use http post to a server and then server write to a database. But Delta seems to be slow in this case. &lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 27 Sep 2022 06:20:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/design-question/m-p/30762#M22329</guid>
      <dc:creator>Frank</dc:creator>
      <dc:date>2022-09-27T06:20:12Z</dc:date>
    </item>
    <item>
      <title>Re: Design Question</title>
      <link>https://community.databricks.com/t5/data-engineering/design-question/m-p/42788#M27421</link>
      <description>&lt;P&gt;Hey,&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I can totally relate to the challenges Frank is facing with this application'**bleep** data processing. It'**bleep** frustrating to deal with delays, especially when dealing with real-time metrics. I've had a similar experience where optimizing data ingestion was crucial.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;Considering the design, using separate tables for &lt;STRONG&gt;'min'&lt;/STRONG&gt;, &lt;STRONG&gt;'max'&lt;/STRONG&gt;, and &lt;STRONG&gt;'average'&lt;/STRONG&gt; is a good start for dashboard efficiency. However, the 2-second delay per SQL command seems like a bottleneck. Have you thought about batch processing instead of individual inserts? Combining multiple commands into one batch could significantly reduce overhead. If you haven't heard of it before, I suggest you read this article: &lt;A href="https://stackdiary.com/a-guide-to-consistent-cross-platform-app-design/" target="_self"&gt;Cross Platform App Design: Discover The Solid UI Design Guidelines&lt;/A&gt;.&lt;/P&gt;&lt;P&gt;Regarding the ingestion pattern, HTTP POST to a server is convenient, but if Delta'**bleep** slow, exploring other technologies like Apache Kafka might be worth it. It'**bleep** designed for high-throughput, real-time data streaming.&lt;/P&gt;&lt;P&gt;Changing the schema might help, but first, analyze the read vs. write frequency. If reads are more frequent, consider optimizing the dashboard queries.&lt;/P&gt;&lt;P&gt;Remember, it'**bleep** a trial-and-error process. I'd love to hear how others dealt with similar challenges and what worked best for them.&lt;/P&gt;</description>
      <pubDate>Tue, 29 Aug 2023 12:11:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/design-question/m-p/42788#M27421</guid>
      <dc:creator>stefnhuy</dc:creator>
      <dc:date>2023-08-29T12:11:59Z</dc:date>
    </item>
  </channel>
</rss>

