cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Get Block Size

William_Scardua
Valued Contributor

Hi guys,

How can I get the block size ? have any idea ?

Thank you

1 REPLY 1

Kaniz_Fatma
Community Manager
Community Manager

Hi @William_ScarduaTo get the total directory size in Databricks, you can use the dbutils.fs.ls command. Here are a few options:

  1. To get the size of a specific directory (e.g., /mnt/abc/xyz), you can run the following Scala code:

    val path = "/mnt/abc/xyz"
    val filelist = dbutils.fs.ls(path)
    val df = filelist.toDF()
    df.createOrReplaceTempView("adlsSize")
    spark.sql("SELECT SUM(size) / (1024 * 1024 * 1024) AS sizeInGB FROM adlsSize").show()
    

    This will display the size in gigabytes.

  2. If you want to calculate the size of a directory that contains subfolders and files (e.g., XYZ), you can use the Unix du command in a notebook:

    %sh du -h /dbfs/mnt/abc/xyz
    

    This will give you the total size of everything inside the XYZ directory.

Remember that when data is read from DBFS, it is divided into input blocks, which are sent to different executors. By default, the block size is 128 MB1. If youโ€™re dealing with external tables, you can control the size of output files using methods like ...2.

I hope this helps! Let me know if you need further assistance. ๐Ÿ˜Š

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group