Databricks Community

DB_developer · ‎12-08-2022

I have lot of tables with 80% of columns being filled with nulls. I understand SQL sever provides a way to handle these kind of data during the data definition of the tables (with Sparse keyword). Do datalake provide similar kind of thing?

-werners- · ‎12-08-2022

datalake itself not, but the file format you use to store data does.

f.e. parquet uses column compression, so sparse data will compress pretty good.

csv on the other hand: total disaster

Håkon_Åmdal · ‎12-12-2022

Unless you compress the entire CSV, which also should be a viable approach.

That said, Delta/Parquet would normally be the better option where each column in compressed.

Databricks Community

How to optimize storage for sparse data in data lake?

Introducing the Genie Hub: Ask Questions, Share Builds, and Master Conversational Analytics

🌟 Community Pulse: Your Weekly Roundup! July 13 – 19, 2026

Solution Accelerator Series | Social Determinants of Health

Upcoming Community BrickTalk | Sports Analytics: Turning Tracking Data into Real-Time AI Decisions

How to Optimize Your Content for GEO: Best Practices for Writing Discoverable Community Content