Dive into a collaborative space where members like YOU can exchange knowledge, tips, and best practices. Join the conversation today and unlock a wealth of collective wisdom to enhance your experience and drive success.
Part 3 of my series on building an enterprise data platform on Databricks is up - this one cover Gold layer design.The short version: Gold isn't just aggregated Silver. Silver maps to your source system. Gold maps to the business questions your consu...
Part 2 of my series on building an enterprise data platform on Databricks — this one's about Silver.Part 1 covered why we ran two ingestion paths in parallel (GoldenGate CDC + JDBC batch) and kept them as separate bronze tables. If you missed it:http...
Part 1 of a 5-part series on building an enterprise data platform on Databricks.When migrating a large retail conglomerate's SAP HANA platform to Databricks, we needed both historicalcompleteness and near-real-time freshness from day one.That require...
Tips and Techniques for Ingesting Large JSON files with PySparkIntroductionSuppose you’ve ever struggled or grappled with consuming massive JSON files with PySpark. In that case, you are aware that insufficient data can always creep in and silently d...
The Apache Sparkâ„¢ 4.0 introduces a new feature for SQL developers and data engineers: SQL Scripting. As such, this feature enhances the power and extends the flexibility of Spark SQL, enabling users to write procedural code within SQL queries, with t...