cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Cannot up cast sizeInBytes from string to bigint

brickster
New Contributor II

I am creating a basic delta table using CREATE SQL query

CREATE TABLE test_transact (transaction_id string, post_date date)

and running this query throws an error "Cannot up cast sizeInBytes from string to bigint"

Even if I try to create a dataframe and save as table,

df.write.format("delta").mode("overwrite").saveAsTable("test_transact")

same error appears, despite there is no column sizeInBytes being used.

However, I can able to create temp view using df.createOrReplaceTempView("test_transact")

My cluster DBR version is 14.3 LTS ML (Spark 3.5.0, Scala 2.12)

Can anyone encountered such issue? Appreciate you help... here is the screenshot

7 REPLIES 7

nickmerritt
New Contributor II

Without being able to see the previous cells in the notebook, it's not possible to pinpoint your exact root cause, but the error indicates that the issue lies with an implicit cast of transaction_id. sizeInBytes is not being referred to as a field, it is a field attribute base on data type, that cannot be manipulated by the cited SQL statement or df method for creating a Table. Per your report, the df method that instantiates a View is more flexible. For the purpose of Table creation, the verbose error is suggesting that you attend to your casting explicitly at ingestion (which I assume is happening in an earlier cell in the same notebook).

https://docs.databricks.com/en/error-messages/error-classes.html#cannot_up_cast_datatype

nickmerritt
New Contributor II

It looks like there's an issue between the datatype of a field between the source and the target. Perhaps in your CREATE you are implicitly casting that transaction_id? I'm just going off of similar discussions here in the community, and I cannot speak to the cells before screenshot cell in your notebook, but your verbose error message wants you to explicitly cast something (I suspect transaction_id) prior to the CREATE statement, i.e. earlier in your notebook, at ingestion.

https://docs.databricks.com/en/error-messages/error-classes.html#cannot_up_cast_datatype

Brahmareddy
Honored Contributor

Hi @brickster, How are you doing today?

As per my understanding, Try dropping any existing table or metadata related to test_transact before creating it to avoid conflicts. Consider using a different table name or creating it in a new schema to bypass potential metadata issues. Explicitly define the schema in your DataFrame to prevent type inference problems. Restart your cluster to clear any cached sessions or metadata causing the error. Lastly, check for DBR version compatibility with Delta Lake, and consider trying a different DBR version if the issue persists.

Just give a try and let me know if it works. Good day.

Regards,

Brahma

Brahmareddy
Honored Contributor

Hi @brickster, How are you doing today?

As per my understanding, Try dropping any existing table or metadata related to test_transact before creating it to avoid conflicts. Consider using a different table name or creating it in a new schema to bypass potential metadata issues. Explicitly define the schema in your DataFrame to prevent type inference problems. Restart your cluster to clear any cached sessions or metadata causing the error. Lastly, check for DBR version compatibility with Delta Lake, and consider trying a different DBR version if the issue persists.

Just give a try and let me know if it works. Have a good day.

Regards,

Brahma

Brahmareddy
Honored Contributor

Hi @brickster, How are you doing today?

Try dropping any existing table or metadata related to test_transact before creating it to avoid conflicts. Consider using a different table name or creating it in a new schema to bypass potential metadata issues. Explicitly define the schema in your DataFrame to prevent type inference problems. Restart your cluster to clear any cached sessions or metadata causing the error. Lastly, check for DBR version compatibility with Delta Lake, and consider trying a different DBR version if the issue persists.

Give a try.

Regards,

Brahma

filipniziol
Contributor III

Hi @brickster ,

The error message in the screenshot indicates that there is an issue with casting sizeInBytes from STRING to BIGINT related to the SnapshotState in Delta Lake. This is not caused by the columns you are trying to create in your Delta table but rather relates to internal metadata managed by Delta Lake.

What it means is that most likely the metadata of Delta table is corrupt. For example you created the table, then you dropped it, but still there are some leftover files in the table location.

You need to clean-up the table location before recreating it to make sure there are no older files anymore.

Here are the steps:
1. Table creation (this is already done):

 

%sql
CREATE TABLE test_transact (transaction_id string, post_date date)

 

2. Check the table location

 

%sql
DESCRIBE DETAIL test_transact;

 

filipniziol_0-1725184349733.png

 -> this is what you want to clean-up before recreating the table

3. Drop the table -> if the metadata is corrupt, some files will not be removed

 

%sql
DROP TABLE IF EXISTS test_transact

 

4. Show the content of the location (copy-paste from point 2.)

 

display(dbutils.fs.ls("dbfs:/user/hive/warehouse/test_transact"))

 

If the metadata is corrupt, you will see some files there even after DROP TABLE:

filipniziol_1-1725184567975.png

5. Clean-up the TABLE location:

 

dbutils.fs.rm("dbfs:/user/hive/warehouse/test_transact", recurse=True)

 

6. Try now to recreate the table and insert the data.

filipniziol
Contributor III

Hi @brickster ,

The error message in the screenshot indicates that there is an issue with casting sizeInBytes from STRING torelated to the SnapshotState in Delta Lake. This is not caused by the columns you are trying to create in your Delta table but rather relates to internal metadata managed by Delta Lake.

What it means is that most likely the metadata of Delta table is corrupt. For example you created the table, then you dropped it, but still there are some leftover files in the table location.

You need to clean-up the table location before recreating it to make sure there are no older files anymore.

Here are the steps:
1. Table creation (this is already done):

 

%sql
CREATE TABLE test_transact (transaction_id string, post_date date)

 

2. Check the table location

 

%sql
DESCRIBE DETAIL test_transact;

 

filipniziol_0-1725184349733.png

 -> this is what you want to clean-up before recreating the table

3. Drop the table -> if the metadata is corrupt, some files will not be removed

 

%sql
DROP TABLE IF EXISTS test_transact

 

4. Show the content of the location (copy-paste from point 2.)

 

display(dbutils.fs.ls("dbfs:/user/hive/warehouse/test_transact"))

 

If the metadata is corrupt, you will see some files there even after DROP TABLE:

filipniziol_1-1725184567975.png

5. Clean-up the TABLE location:

 

dbutils.fs.rm("dbfs:/user/hive/warehouse/test_transact", recurse=True)

 

6. Try now to recreate the table and insert the data.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group