Databricks

bhaumikg · ‎08-29-2019

I am using databricks to transform the data and than pushing the data into datalake.

the data is getting pushed in if the length of the string field is 255 or less but it throws following error if it is beyond that.

"SQL DW failed to execute the JDBC query produced by the connector.

Underlying SQLException(s): - com.microsoft.sqlserver.jdbc.SQLServerException: HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: HadoopSqlException: String or binary data would be truncated. [ErrorCode = 107090] [SQLState = S0001"

I am using following code to push the data into Datawarehouse

testdf.write

.format("com.databricks.spark.sqldw") .

option("url", sqlDwUrlSmall) .

option("dbtable", "dbo.testAddress") .

option( "forward_spark_azure_storage_credentials","True")

.option("tempdir", tempDir)

.mode("overwrite")

.save()

This table has just one column and the length of this field is 4000.

jamesferrisjr · ‎09-03-2019

@bhaumikg I think this link here explains whats going on:https://kb.informatica.com/solution/23/pages/68/563577.aspx The hive metastore column PARAM_VALUE in table TABLE_PARAMS probably has datatype that is 4000 in length. Thus when you attempt to load a record greater than that you get: "String or binary data would be truncated".

leonarbe · ‎09-04-2019

I'm having a similar issue in Azure with MS-SQL Server Datawarehouse (DWH)

Eventhough I have set the target column to nvarchar(4000) on MS SQL Server DWH

This is the code:

spark.conf.set(
"spark.sql.parquet.writeLegacyFormat", "true")
dwDF.write.format("com.databricks.spark.sqldw")
    .option("url", sqlDwUrlSmall)
    .option("dbtable", TableName)
    .option( "forward_spark_azure_storage_credentials","True")
    .option("tempdir", tempDir)
    .mode("overwrite")
    .save()

I get the following error.

-
com.microsoft.sqlserver.jdbc.SQLServerException:
HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling
record reader buffer: HadoopSqlException: String or binary data would be
truncated. [ErrorCode = 107090] [SQLState = S0001]

hasitha · ‎09-27-2019

I have the same issue regarding this case.

CREATE TABLE [dbo].[SampleTable14]

(

[firstname] [varchar](8000) NULL,

[lastname] [nvarchar](300) NULL,

[gender] [nvarchar](300) NULL,

[location] [nvarchar](300) NULL,

[subscription_type] [nvarchar](300) NULL

)

WITH

(

DISTRIBUTION = ROUND_ROBIN,

CLUSTERED COLUMNSTORE INDEX

)

GO

I also tried by maximizing the table length to 1000 and data length to 8000.

ZAIvR · ‎04-02-2020

You have to set the maxStrLength option to a value bigger than the longest string in your source data

synapseDF.write \
      .format("com.databricks.spark.sqldw") \
      .option("url", connStr ) \
      .mode( "append" ) \
      .option("tempDir", synapse_tempDir ) \
      .option("forwardSparkAzureStorageCredentials", "true") \
      .option("maxStrLength", "1024" ) \
      .option("dbTable", synapse_targetschema + "." + synapse_targettable ) \
      .save()

Synapse creates a temporary external table while loading data with Polybase, so even if you create your target table with columns of the appropriate width, you can still get truncation errors from this temp table if you don't set maxStrLength

ImranShaik · ‎06-26-2020

I tried this still I have same error. My string length is 1900, i gave maxStrLength value as 3000 .. still not working.

RafaelCruz · ‎04-22-2020

Hello guys, I'm facing this error, it is not exactly the same as refered above, but as it is about the same code, i though you might help.

thanks in advance 🙂 I'm following the tutorial on this link:

https://docs.microsoft.com/en-us/azure/azure-databricks/databricks-extract-load-sql-data-warehouse

com.databricks.spark.sqldw.SqlDWSideException: SQL DW failed to execute the JDBC query produced by the connector.
Underlying SQLException(s):
  - com.microsoft.sqlserver.jdbc.SQLServerException: External file access failed due to internal error: 'Error occurred while accessing HDFS: Java exception raised on call to HdfsBridge_IsDirExist. Java exception message:
HdfsBridge::isDirExist - Unexpected error encountered checking whether directory exists or not: StorageException: This request is not authorized to perform this operation.' [ErrorCode = 105019] [SQLState = S0001]

bhaumikg · ‎04-24-2020

As suggested by ZAIvR, please use append and provide maxlength while pushing the data. Overwrite may not work with this unless databricks team has fixed the issue

Databricks

Databricks throwing error "SQL DW failed to execute the JDBC query produced by the connector." while pushing the column with string length more than 255

Unity Catalog Lakeguard: Industry-first and only data governance for multi-user Apache™ Spark cluste

Announcing the General Availability of Databricks Asset Bundles

Register now and save 50% on training at Data + AI Summit!

How to successfully build GenAI applications

Meet DBRX, the New Standard for High-Quality LLMs