Data Engineering

- - Certifications
- - Learning Paths
- - Databricks Product Tours
- - Get Started Guides

- - Get Started Resources
- - Events
- - Support FAQs
- - Technical Blog
- - Community Articles
- - Announcements
- - DatabricksTV
- - Product Platform Updates

- - Private Groups
  - Princeton Life Sciences Databricks User Group
- - Skills@Scale

- - Databricks Community Champions
- - Khoros Community Forums Support (Not for Databricks Product Questions)
- - Databricks Community Code of Conduct

Data Engineering

Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.

Forum Posts

Sorted by:

Start a conversation

by User16790091296 • Databricks Employee

06-24-2021 8:09:20 AM

2270 Views
0 replies
1 kudos

What is the most efficient way to read in a partitioned parquet file with pyspark?

I work with parquet files stored in AWS S3 buckets. They are multiple TB in size and partitioned by a numeric column containing integer values between 1 and 200, call it my_partition. I read in and perform compute actions on this data in Databricks w...

Data Engineering

2270 Views
0 replies
1 kudos

06-24-2021 8:09:20 AM