@szymon_dybczak @BS_THE_ANALYST @Coffee77 @TheOC the use case summary is as eblow
The use case:
A telecom operator wants to minimize unnecessary truck rolls (sending technicians to customer sites), which cost $100–$200 per visit.
Data sources feeding into the data platform:
Network telemetry – SNMP traps, modem/router health (e.g., SNR, packet loss, outages).
IoT device data – ONT, set-top boxes, CPE logs.
CRM & Billing data – open tickets, service type, SLA tiers.
Geospatial/weather feeds – storm events, regional outages.
Technician logs – prior visit outcomes.
All this lands in the Bronze layer as unstructured JSON, CSV, log files, and streaming events.
Why Parquet in Silver Layer?
The Silver layer aggregates and cleans this into a customer/equipment-level service health dataset:
Customer ID, Service ID, Site ID
Last 24h modem health KPIs (uptime, SNR, packet loss)
Outage correlation (area-wide vs. local issue)
Historical technician visits and resolution codes
Predictive probability: "Can this issue be fixed remotely?"
Benefits of Parquet here which i am tyring to achieeve:
Efficient analytics – Technicians need KPIs by device or site; Parquet’s columnar format makes queries 5–10× faster.
Compression – IoT + network telemetry is massive; Parquet reduces footprint dramatically.
Schema evolution – New device types (5G routers, IoT sensors) can be added without breaking downstream integrations.
Reusability – Same Parquet data powers ML models (predicting if a truck roll is necessary) and operational dashboards.
But you have a very valid suggestion i am trying On ingest, also write to a Delta table with minimal transformation. This becomes the query-friendly version of raw.