cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Saving Number field as String in Databricks

Manju1202
New Contributor II

Do we see any risk of saving a Number field as String?

Will we use any functionality/feature if we save as String ?

Will it have any impact on performance ?

3 REPLIES 3

pvignesh92
Honored Contributor

Hi @Manju Chuganiโ€‹. Yes. In Short, it is not really recommended to save the columns as string if all the values are expected to be numbers.

Here are some of them

  1. Storage Space: Storing numbers as strings can take up more storage space than storing them as numbers. This is because strings are typically represented using Unicode characters, which require more bits to store than the binary representation of numbers.
  2. Performance: Using strings can be slower than using numbers when performing calculations or other operations on the data. Converting strings to numbers before performing calculations can add overhead and reduce performance.
  3. Sorting and Filtering: Sorting and filtering operations can be slower with strings than with numbers. Sorting strings requires additional steps such as converting the strings to a common format and comparing them character by character.
  4. Type Checking: Using strings can make it more difficult to ensure that the data is of the correct type. This can lead to errors and inconsistencies in the data.
  5. Data Integrity: Storing values as strings can increase the risk of data integrity issues, such as data input errors or unexpected data formats. This can make it more difficult to analyze the data and can lead to inaccurate results.

Thank you for the response, the info is very helpful.

Do you see any issue with any mathematical function - other than performance? Will the outcome of any mathematical functions be different for string vs number?

pvignesh92
Honored Contributor

Hi @Manju Chuganiโ€‹ , Mathematical functions will definitely be a concern. In my observations before, we store dates as string some times and greater than or less than works fine. But when it comes to min and max, the integers as strings might misbehave.

You can try them by storing some integer values as a string in a dataframe and try the sum, min and max and few more functions and you can get to see the differences.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group