cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

irfanaziz
by Contributor II
  • 19678 Views
  • 7 replies
  • 8 kudos

Resolved! How to merge small parquet files into a single parquet file?

I have thousands of parquet files having same schema and each has 1 or more records. But reading with spark these files is very very slow. I want to know if there is any solution how to merge the files before reading them with spark? Or is there any ...

  • 19678 Views
  • 7 replies
  • 8 kudos
Latest Reply
mmore500
New Contributor II
  • 8 kudos

Give [*joinem*](https://github.com/mmore500/joinem) a try, available via PyPi: `python3 -m pip install joinem`.*joinem* provides a CLI for fast, flexbile concatenation of tabular data using [polars](https://pola.rs).I/O is *lazily streamed* in order ...

  • 8 kudos
6 More Replies
Labels