Performance issues when loading an Excel file from DBFS using R
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-18-2023 03:34 AM
I have uploaded small Excel files on my DBFS. I then use function read_xlsx() from the "readxl" package in R to import the file into the R memory. I use a standard cluster (12.1, non ML). The function works but it takes ages. E.g. a simple Excel table with 40000+ records and 5 columns takes 9 minutes. On my R installation on Windows, the load is instantaneous. "readxl" is considered to be the best package to deal with Excel files. It is part of the already made available libraries on the cluster. Any idea what might cause this?
Labels:
- Labels:
-
Performance Issues