cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How does 73% of the data go unused for analytics or decision-making?

Anonymous
Not applicable

Is Lakehouse the answer? Here's a good resource that was just published: https://dbricks.co/3q3471X

1 ACCEPTED SOLUTION

Accepted Solutions

Dan_Z
Honored Contributor
Honored Contributor

Lakehouse is definitely the answer. Making sure that your data is OPEN, so that anyone can go in and read it given whatever strange format it may be in (JSON, parquet, csv, messaging queues, protobuf, etc.) and wherever it may reside (Redshift, S3, Gen2, Teradata, Kafka, etc.) is truly game-changing. You never have to 'import' your data into the Lakehouse. That's just where it naturally would be, like on S3. So you get all this benefits of Data Lakes, but combined with the performance of Data Warehouses (and then some) as well as the ability to impose RDBMS-style schemas and do ACID-compliant table operations (JOIN, MERGE, UPDATE, etc.) and then compose low-latency dashboards with Photon makes it a comprehensive end-to-end solution Over the next few years it will just get faster and easier to use for low-code/no-code users.

All that aside, 73% of data goes unused because right now, for most companies, it's HARD WORK to go through and understand all the data, cull it, format it, combine it, run experiments, train models, etc. It requires a very in-demand skill-set and time/$$. I believe Lakehouse will make it much easier for smart people, who may not trained data engineers or data scientists, to be able to go in and work with data to solve problems. Our number #1 initiative at Databricks is to make Lakehouse simple to work with so that every company can be fully data-driven.

</rant>

View solution in original post

3 REPLIES 3

Dan_Z
Honored Contributor
Honored Contributor

Lakehouse is definitely the answer. Making sure that your data is OPEN, so that anyone can go in and read it given whatever strange format it may be in (JSON, parquet, csv, messaging queues, protobuf, etc.) and wherever it may reside (Redshift, S3, Gen2, Teradata, Kafka, etc.) is truly game-changing. You never have to 'import' your data into the Lakehouse. That's just where it naturally would be, like on S3. So you get all this benefits of Data Lakes, but combined with the performance of Data Warehouses (and then some) as well as the ability to impose RDBMS-style schemas and do ACID-compliant table operations (JOIN, MERGE, UPDATE, etc.) and then compose low-latency dashboards with Photon makes it a comprehensive end-to-end solution Over the next few years it will just get faster and easier to use for low-code/no-code users.

All that aside, 73% of data goes unused because right now, for most companies, it's HARD WORK to go through and understand all the data, cull it, format it, combine it, run experiments, train models, etc. It requires a very in-demand skill-set and time/$$. I believe Lakehouse will make it much easier for smart people, who may not trained data engineers or data scientists, to be able to go in and work with data to solve problems. Our number #1 initiative at Databricks is to make Lakehouse simple to work with so that every company can be fully data-driven.

</rant>

Anonymous
Not applicable

@Alexis Lopez​ - If @Dan Zafar​ 's or @Harikrishnan Kunhumveettil​'s answers solved the issue, would you be happy to mark one of their answers as best so other members can find the solution more easily?

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.