Hi @PrasSabb_97245, Delta tables are a way of storing and processing data in Apache Spark™ using the Delta Lake format. They allow you to create tables that are schema-on-read, meaning that the schema is only defined when you query the table, not when you create it. This makes them more flexible and efficient for handling dynamic data.
One of the benefits of Delta tables is that they can be stored in various cloud storage services, such as Amazon S3, Azure data lake Storage, Google Cloud Storage, and more. You can use the cloud_files source in Auto Loader to ingest data from these services into your Delta tables. Auto Loader is a feature of Delta Live Tables (DLT), which is a framework for building reliable and maintainable data pipelines with Delta tables.
To get the raw size of a delta table in S3, you can use one of the following methods:
- Use the spark.sql command to query the DeltaLog object for your table. The DeltaLog object contains information about the files and partitions of your delta table. You can use the snapshot.sizeInBytes attribute to get the total size of your delta table in bytes.
This method works for both Python and Scala notebooks.
- Use the delta-lake-reader package to read your delta table from S3 into a Spark DataFrame. The delta-lake-reader package is a Python library that provides an easy way to access and manipulate delta tables stored in various cloud storage services. You can use the read_delta_table function to read your delta table from S3 into a Spark DataFrame.
This method works for Python notebooks.
I hope this helps you with your task.