cancel
Showing results for 
Search instead for 
Did you mean: 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

How are Struct type columns stored/accessed (interested in efficiency)?

crowley
New Contributor III

Hello, I've searched around for awhile and didn't find a similar question here or elsewhere, so thought I'd ask...

I'm assessing the storage/access efficiency of Struct type columns in delta tables.  I want to know more about how Databricks is storing Struct type field.  Can an SME add some details?

Example question I'm looking at:  Suppose I add an int field with low cardinality to a Struct column... in columnar database this would be stored/accessed efficiently, I believe... so would it also be stored/accessed efficiently as a field in a Struct column?

Note: I did find a Databricks page describing (maybe) how Apache Arrow is used in Databricks runtime 14+ (link below), but it referenced use in UDFs... I am using Structs in vanilla delta tables and figured that was significantly different.

https://www.databricks.com/blog/arrow-optimized-python-udfs-apache-sparktm-35#:~:text=In%20Apache%20...   

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz_Fatma
Community Manager
Community Manager

Hi @crowley, Let’s delve into the storage and access efficiency of Struct type columns in Delta tables within the context of Databricks.

  1. Structured Data Sources and Efficiency:

  2. Struct Type in Databricks:

  3. Apache Arrow and UDFs:

  4. Predictive Optimization:

In summary, Databricks leverages schema awareness, Apache Arrow, and predictive optimization to ensure efficient storage and access for Struct type columns, even within vanilla Delta tables. Feel free to explore these capabilities further to enhance your understanding! 🚀

 

View solution in original post

1 REPLY 1

Kaniz_Fatma
Community Manager
Community Manager

Hi @crowley, Let’s delve into the storage and access efficiency of Struct type columns in Delta tables within the context of Databricks.

  1. Structured Data Sources and Efficiency:

  2. Struct Type in Databricks:

  3. Apache Arrow and UDFs:

  4. Predictive Optimization:

In summary, Databricks leverages schema awareness, Apache Arrow, and predictive optimization to ensure efficient storage and access for Struct type columns, even within vanilla Delta tables. Feel free to explore these capabilities further to enhance your understanding! 🚀

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group