cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Query fails with 'Error occurred while deserializing arrow data' on Databricks SQL with Channel set to Preview

kilaki
New Contributor II

Noticed with a query based on inline select and joins fails to the client with 'Error occurred while deserializing arrow data'  I.e the query succeeds on Databricks but client (DBeaver, AtScale) receives an error

Screen Shot 2023-01-24 at 2.08.54 PMThe error is only noticed with Databricks SQL setup with preview as channel mode

Screen Shot 2023-01-24 at 2.11.20 PMThe same query works fine with Databricks SQL, that has channel setup to current

Screen Shot 2023-01-24 at 2.03.21 PMThe issue can be replicated with a query against samples.tpch.nation table

SELECT
  SUM(`t_36`.`noq_empty`) `ne36`,
  `t_36`.`nname` `nname`,
  SUM(`t_94`.`c0`) `c94`
FROM
  (
    SELECT  `t_59`.`noq_empty_gbakc13` `noq_empty`,
      `t_59`.`n_gbakc10` `nname`
    FROM
      (
        SELECT SUM(ntn1.n_nationkey) `noq_empty_gbakc13`,
         ntn1.n_name `n_gbakc10`
        FROM
          samples.tpch.nation ntn1
        WHERE
          true
        GROUP BY
          `n_gbakc10`
      ) `t_59`
  ) `t_36`
  JOIN (
    SELECT
      `t_135`.`c0` `c0`,
      `t_135`.`nname` `nname`
    FROM
      (
        SELECT
          `t_134`.`ci` `c0`,
          `t_134`.`nname` `nname`
        FROM
          (
            SELECT
              `t_133`.`ci_gbakc2` `ci`,
              `t_133`.`sono_gbakc3` `nname`
            FROM
              ( SELECT
                  SUM(ntn10.n_nationkey) `ci_gbakc2`,
                  ntn10.n_name  `sono_gbakc3`
                FROM
                  samples.tpch.nation ntn10
                WHERE
                  true
                GROUP BY
                  `sono_gbakc3`
                HAVING
                  SUM(`ntn10`.`n_nationkey`) > 1
              ) `t_133`
          ) `t_134`
        WHERE
          `t_134`.`ci` > 1
      ) `t_135`
    WHERE
      `t_135`.`c0` > 1
  ) `t_94` ON `t_94`.`nname` = `t_36`.`nname`
GROUP BY
  `t_36`.`nname`
HAVING
  SUM(`t_94`.`c0`) > 10

So would like to understand is there going to be a change in how the result-set object is being serialized in the upcoming releases of DBSQL?

OR Is this a bug?

3 REPLIES 3

Hubert-Dudek
Esteemed Contributor III

And where can we find samples.tpch.nation or notebook which creates it?

kilaki
New Contributor II

It's one of the sample datasets provided by Databricks

https://docs.databricks.com/dbfs/databricks-datasets.html

franco_patano
New Contributor III
New Contributor III

Opened an ES on this, looks like an issue with the Preview channel. Thanks for your help!

Franco Patano
Stragetic Data and AI Advisor
Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.