-
To get the size of a specific directory (e.g., /mnt/abc/xyz
), you can run the following Scala code:
val path = "/mnt/abc/xyz"
val filelist = dbutils.fs.ls(path)
val df = filelist.toDF()
df.createOrReplaceTempView("adlsSize")
spark.sql("SELECT SUM(size) / (1024 * 1024 * 1024) AS sizeInGB FROM adlsSize").show()
This will display the size in gigabytes.
-
If you want to calculate the size of a directory that contains subfolders and files (e.g., XYZ), you can use the Unix du
command in a notebook:
%sh du -h /dbfs/mnt/abc/xyz
This will give you the total size of everything inside the XYZ directory.
Remember that when data is read from DBFS, it is divided into input blocks, which are sent to different executors. By default, the block size is 128 MB1. If youโre dealing with external tables, you can control the size of output files using methods like ...
2.
I hope this helps! Let me know if you need further assistance. ๐