Databricks Community

Anonymous · ‎11-03-2021

Is Lakehouse the answer? Here's a good resource that was just published: https://dbricks.co/3q3471X

Dan_Z · ‎11-04-2021

Lakehouse is definitely the answer. Making sure that your data is OPEN, so that anyone can go in and read it given whatever strange format it may be in (JSON, parquet, csv, messaging queues, protobuf, etc.) and wherever it may reside (Redshift, S3, Gen2, Teradata, Kafka, etc.) is truly game-changing. You never have to 'import' your data into the Lakehouse. That's just where it naturally would be, like on S3. So you get all this benefits of Data Lakes, but combined with the performance of Data Warehouses (and then some) as well as the ability to impose RDBMS-style schemas and do ACID-compliant table operations (JOIN, MERGE, UPDATE, etc.) and then compose low-latency dashboards with Photon makes it a comprehensive end-to-end solution Over the next few years it will just get faster and easier to use for low-code/no-code users.

All that aside, 73% of data goes unused because right now, for most companies, it's HARD WORK to go through and understand all the data, cull it, format it, combine it, run experiments, train models, etc. It requires a very in-demand skill-set and time/$$. I believe Lakehouse will make it much easier for smart people, who may not trained data engineers or data scientists, to be able to go in and work with data to solve problems. Our number #1 initiative at Databricks is to make Lakehouse simple to work with so that every company can be fully data-driven.

</rant>

View solution in original post

brickster_2018 · ‎11-03-2021

One more reason to use lakehouse

https://databricks.com/blog/2021/11/02/databricks-sets-official-data-warehousing-performance-record....

Dan_Z · ‎11-04-2021

Lakehouse is definitely the answer. Making sure that your data is OPEN, so that anyone can go in and read it given whatever strange format it may be in (JSON, parquet, csv, messaging queues, protobuf, etc.) and wherever it may reside (Redshift, S3, Gen2, Teradata, Kafka, etc.) is truly game-changing. You never have to 'import' your data into the Lakehouse. That's just where it naturally would be, like on S3. So you get all this benefits of Data Lakes, but combined with the performance of Data Warehouses (and then some) as well as the ability to impose RDBMS-style schemas and do ACID-compliant table operations (JOIN, MERGE, UPDATE, etc.) and then compose low-latency dashboards with Photon makes it a comprehensive end-to-end solution Over the next few years it will just get faster and easier to use for low-code/no-code users.

All that aside, 73% of data goes unused because right now, for most companies, it's HARD WORK to go through and understand all the data, cull it, format it, combine it, run experiments, train models, etc. It requires a very in-demand skill-set and time/$$. I believe Lakehouse will make it much easier for smart people, who may not trained data engineers or data scientists, to be able to go in and work with data to solve problems. Our number #1 initiative at Databricks is to make Lakehouse simple to work with so that every company can be fully data-driven.

</rant>

Anonymous · ‎11-10-2021

@Alexis Lopez - If @Dan Zafar 's or @Harikrishnan Kunhumveettil's answers solved the issue, would you be happy to mark one of their answers as best so other members can find the solution more easily?

Databricks Community

How does 73% of the data go unused for analytics or decision-making?

Join Us as a Local Community Builder!

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples

Level Up with Databricks Specialist Sessions

🌟 Community Pulse: Your Weekly Roundup! November 07 – 13, 2025

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐