cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Query fails with 'Error occurred while deserializing arrow data' on Databricks SQL with Channel set to Preview

kilaki
New Contributor II

Noticed with a query based on inline select and joins fails to the client with 'Error occurred while deserializing arrow data'  I.e the query succeeds on Databricks but client (DBeaver, AtScale) receives an error

Screen Shot 2023-01-24 at 2.08.54 PMThe error is only noticed with Databricks SQL setup with preview as channel mode

Screen Shot 2023-01-24 at 2.11.20 PMThe same query works fine with Databricks SQL, that has channel setup to current

Screen Shot 2023-01-24 at 2.03.21 PMThe issue can be replicated with a query against samples.tpch.nation table

SELECT
  SUM(`t_36`.`noq_empty`) `ne36`,
  `t_36`.`nname` `nname`,
  SUM(`t_94`.`c0`) `c94`
FROM
  (
    SELECT  `t_59`.`noq_empty_gbakc13` `noq_empty`,
      `t_59`.`n_gbakc10` `nname`
    FROM
      (
        SELECT SUM(ntn1.n_nationkey) `noq_empty_gbakc13`,
         ntn1.n_name `n_gbakc10`
        FROM
          samples.tpch.nation ntn1
        WHERE
          true
        GROUP BY
          `n_gbakc10`
      ) `t_59`
  ) `t_36`
  JOIN (
    SELECT
      `t_135`.`c0` `c0`,
      `t_135`.`nname` `nname`
    FROM
      (
        SELECT
          `t_134`.`ci` `c0`,
          `t_134`.`nname` `nname`
        FROM
          (
            SELECT
              `t_133`.`ci_gbakc2` `ci`,
              `t_133`.`sono_gbakc3` `nname`
            FROM
              ( SELECT
                  SUM(ntn10.n_nationkey) `ci_gbakc2`,
                  ntn10.n_name  `sono_gbakc3`
                FROM
                  samples.tpch.nation ntn10
                WHERE
                  true
                GROUP BY
                  `sono_gbakc3`
                HAVING
                  SUM(`ntn10`.`n_nationkey`) > 1
              ) `t_133`
          ) `t_134`
        WHERE
          `t_134`.`ci` > 1
      ) `t_135`
    WHERE
      `t_135`.`c0` > 1
  ) `t_94` ON `t_94`.`nname` = `t_36`.`nname`
GROUP BY
  `t_36`.`nname`
HAVING
  SUM(`t_94`.`c0`) > 10

So would like to understand is there going to be a change in how the result-set object is being serialized in the upcoming releases of DBSQL?

OR Is this a bug?

3 REPLIES 3

Hubert-Dudek
Esteemed Contributor III

And where can we find samples.tpch.nation or notebook which creates it?

kilaki
New Contributor II

It's one of the sample datasets provided by Databricks

https://docs.databricks.com/dbfs/databricks-datasets.html

franco_patano
Databricks Employee
Databricks Employee

Opened an ES on this, looks like an issue with the Preview channel. Thanks for your help!

Franco Patano
Stragetic Data and AI Advisor

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group