cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Get Block Size

William_Scardua
Valued Contributor

Hi guys,

How can I get the block size ? have any idea ?

Thank you

1 REPLY 1

Kaniz_Fatma
Community Manager
Community Manager

Hi @William_ScarduaTo get the total directory size in Databricks, you can use the dbutils.fs.ls command. Here are a few options:

  1. To get the size of a specific directory (e.g., /mnt/abc/xyz), you can run the following Scala code:

    val path = "/mnt/abc/xyz"
    val filelist = dbutils.fs.ls(path)
    val df = filelist.toDF()
    df.createOrReplaceTempView("adlsSize")
    spark.sql("SELECT SUM(size) / (1024 * 1024 * 1024) AS sizeInGB FROM adlsSize").show()
    

    This will display the size in gigabytes.

  2. If you want to calculate the size of a directory that contains subfolders and files (e.g., XYZ), you can use the Unix du command in a notebook:

    %sh du -h /dbfs/mnt/abc/xyz
    

    This will give you the total size of everything inside the XYZ directory.

Remember that when data is read from DBFS, it is divided into input blocks, which are sent to different executors. By default, the block size is 128 MB1. If youโ€™re dealing with external tables, you can control the size of output files using methods like ...2.

I hope this helps! Let me know if you need further assistance. ๐Ÿ˜Š

 
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!