Hi everyone,
I’m researching the data‑engineering challenges behind modern streaming apps.
For example, platforms like Kisskh — which manage thousands of daily active users and large volumes of video metadata — often struggle with performance and data‑scalability issues.
Here are some real technical problems that such platforms typically face:
• Sudden traffic spikes during popular releases
• Huge volumes of user‑event logs (searches, watch time, session data)
• Slow or inconsistent recommendation performance
• Difficulty tracking playback quality and buffering metrics in real time
• Inefficient data pipelines causing delayed analytics
My question is:
**How would a platform like this redesign its entire data pipeline if it migrated to the Databricks Lakehouse ecosystem?**
More specifically:
1. Can Delta Live Tables or Structured Streaming handle real‑time user‑event data at scale?
2. How can Databricks improve recommendation‑model training for rapidly changing user behavior?
3. What monitoring + observability patterns are suitable for a high‑traffic streaming service?
4. Does Databricks have any reference architectures for streaming/OTT-type workloads?
I’m asking this to understand real-world data‑pipeline design best practices, not about the content side of the app.
Thanks — would love to hear expert insights.