cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks throwing error "SQL DW failed to execute the JDBC query produced by the connector." while pushing the column with string length more than 255

bhaumikg
New Contributor II

I am using databricks to transform the data and than pushing the data into datalake.

the data is getting pushed in if the length of the string field is 255 or less but it throws following error if it is beyond that.

"SQL DW failed to execute the JDBC query produced by the connector.

Underlying SQLException(s): - com.microsoft.sqlserver.jdbc.SQLServerException: HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: HadoopSqlException: String or binary data would be truncated. [ErrorCode = 107090] [SQLState = S0001"

I am using following code to push the data into Datawarehouse

testdf.write

.format("com.databricks.spark.sqldw") .

option("url", sqlDwUrlSmall) .

option("dbtable", "dbo.testAddress") .

option( "forward_spark_azure_storage_credentials","True")

.option("tempdir", tempDir)

.mode("overwrite")

.save()

This table has just one column and the length of this field is 4000.

7 REPLIES 7

jamesferrisjr
New Contributor II

@bhaumikg I think this link here explains whats going on:https://kb.informatica.com/solution/23/pages/68/563577.aspx The hive metastore column PARAM_VALUE in table TABLE_PARAMS probably has datatype that is 4000 in length. Thus when you attempt to load a record greater than that you get: "String or binary data would be truncated".

leonarbe
New Contributor II
I'm having a similar issue in Azure with MS-SQL Server Datawarehouse (DWH)

Eventhough I have set the target column to nvarchar(4000) on MS SQL Server DWH

This is the code:

spark.conf.set(
"spark.sql.parquet.writeLegacyFormat", "true")
dwDF.write.format("com.databricks.spark.sqldw")
    .option("url", sqlDwUrlSmall)
    .option("dbtable", TableName)
    .option( "forward_spark_azure_storage_credentials","True")
    .option("tempdir", tempDir)
    .mode("overwrite")
    .save()

I get the following error.

-
com.microsoft.sqlserver.jdbc.SQLServerException:
HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling
record reader buffer: HadoopSqlException: String or binary data would be
truncated. [ErrorCode = 107090] [SQLState = S0001]

hasitha
New Contributor II

I have the same issue regarding this case.

CREATE TABLE [dbo].[SampleTable14]

(

[firstname] [varchar](8000) NULL,

[lastname] [nvarchar](300) NULL,

[gender] [nvarchar](300) NULL,

[location] [nvarchar](300) NULL,

[subscription_type] [nvarchar](300) NULL

)

WITH

(

DISTRIBUTION = ROUND_ROBIN,

CLUSTERED COLUMNSTORE INDEX

)

GO

I also tried by maximizing the table length to 1000 and data length to 8000.

ZAIvR
New Contributor II

You have to set the maxStrLength option to a value bigger than the longest string in your source data

synapseDF.write \
      .format("com.databricks.spark.sqldw") \
      .option("url", connStr ) \
      .mode( "append" ) \
      .option("tempDir", synapse_tempDir ) \
      .option("forwardSparkAzureStorageCredentials", "true") \
      .option("maxStrLength", "1024" ) \
      .option("dbTable", synapse_targetschema + "." + synapse_targettable ) \
      .save() 

Synapse creates a temporary external table while loading data with Polybase, so even if you create your target table with columns of the appropriate width, you can still get truncation errors from this temp table if you don't set maxStrLength

ImranShaik
New Contributor II

I tried this still I have same error. My string length is 1900, i gave maxStrLength value as 3000 .. still not working.

RafaelCruz
New Contributor II

Hello guys, I'm facing this error, it is not exactly the same as refered above, but as it is about the same code, i though you might help.

thanks in advance 🙂 I'm following the tutorial on this link:

https://docs.microsoft.com/en-us/azure/azure-databricks/databricks-extract-load-sql-data-warehouse
com.databricks.spark.sqldw.SqlDWSideException: SQL DW failed to execute the JDBC query produced by the connector.
Underlying SQLException(s):
  - com.microsoft.sqlserver.jdbc.SQLServerException: External file access failed due to internal error: 'Error occurred while accessing HDFS: Java exception raised on call to HdfsBridge_IsDirExist. Java exception message:
HdfsBridge::isDirExist - Unexpected error encountered checking whether directory exists or not: StorageException: This request is not authorized to perform this operation.' [ErrorCode = 105019] [SQLState = S0001]

bhaumikg
New Contributor II

As suggested by ZAIvR, please use append and provide maxlength while pushing the data. Overwrite may not work with this unless databricks team has fixed the issue

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group