<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic databricks-connect throws an exception when showing a dataframe with json content in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/databricks-connect-throws-an-exception-when-showing-a-dataframe/m-p/34709#M25435</link>
    <description>&lt;P&gt;I'm facing an issue when I want to show a dataframe with JSON content.&lt;/P&gt;&lt;P&gt;All this happens when the script runs in databricks-connect from VS Code.&lt;/P&gt;&lt;P&gt;Basically, I would like any help or guidance to get this run as it should be. &lt;/P&gt;&lt;P&gt;Thanks in advance.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This is how the cluster is configured.&lt;/P&gt;&lt;P&gt;Cluster Azure Databricks runtime 10.4&lt;/P&gt;&lt;P&gt;Workers 2-8 Standard_DS3_v2 14GB Memory, 4 cores&lt;/P&gt;&lt;P&gt;Driver Standard_DS3_v2 14GB Memory, 4 cores&lt;/P&gt;&lt;P&gt;Spark config.&lt;/P&gt;&lt;P&gt;spark.databricks.service.server.enabled true&lt;/P&gt;&lt;P&gt;spark.databricks.service.port 8787&lt;/P&gt;&lt;P&gt;spark.hadoop.datanucleus.connectionPoolingType hikari&lt;/P&gt;&lt;P&gt;spark.databricks.delta.preview.enabled true&lt;/P&gt;&lt;P&gt;On my local computer&lt;/P&gt;&lt;P&gt;I installed databricks-connect using pip install databricks-connect 10.4.*&lt;/P&gt;&lt;P&gt;It is configured as per documentation indicates.&amp;nbsp;&lt;A href="https://[1]:%20https//docs.microsoft.com/en-us/azure/databricks/dev-tools/databricks-connect#set-up-the-client" alt="https://[1]:%20https//docs.microsoft.com/en-us/azure/databricks/dev-tools/databricks-connect#set-up-the-client" target="_blank"&gt;Azure databricks-connect setup&lt;/A&gt;&lt;/P&gt;&lt;P&gt;When I run databricks-connect test passes without any failure&lt;/P&gt;&lt;P&gt;The code I'm trying to run is this:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;from pyspark.sql import SparkSession
from pyspark.sql.types import *
from datetime import date
&amp;nbsp;
nested_row = ['{"name":"Yin","address":{"city":"Columbus","state":"Ohio"}}']
&amp;nbsp;
nested_struct = StructType([
StructField("address",StructType([
     StructField("city",StringType(),True),
     StructField("state",StringType(),True)
  ]),True),
  StructField("name",StringType(),True)
])
&amp;nbsp;
nested_rdd = sc.parallelize(nested_row)
&amp;nbsp;
df_json = spark.read.json(nested_rdd,nested_struct)
&amp;nbsp;
df_json.printSchema()
df_json.show()&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;everything runs fine until the printSchema, then when I want to show the dataframe throws an exception.&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;Traceback (most recent call last):
  File "c:\Data\projects\vcode\gbrx-dbconnect\dbc1.py", line 76, in &amp;lt;module&amp;gt;
    df_json.show()
  File "c:\Data\projects\vcode\gbrx-dbconnect\venv\lib\site-packages\pyspark\sql\dataframe.py", line 502, in show
    print(self._jdf.showString(n, 20, vertical))
  File "c:\Data\projects\vcode\gbrx-dbconnect\venv\lib\site-packages\py4j\java_gateway.py", line 1304, in __call__
    return_value = get_return_value(
  File "c:\Data\projects\vcode\gbrx-dbconnect\venv\lib\site-packages\pyspark\sql\utils.py", line 117, in deco
    return f(*a, **kw)
  File "c:\Data\projects\vcode\gbrx-dbconnect\venv\lib\site-packages\py4j\protocol.py", line 326, in get_return_value
    raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o46.showString.
: java.lang.ClassCastException: cannot assign instance of java.lang.String to field org.apache.spark.sql.catalyst.json.JSONOptions.lineSeparatorInRead of type scala.Option in instance of org.apache.spark.sql.catalyst.json.JSONOptions
        at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)
        at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2411)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)
        at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2093)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1655)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)
        at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2093)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1655)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461)
        at org.apache.spark.sql.util.ProtoSerializer.$anonfun$deserializeObject$1(ProtoSerializer.scala:7055)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 12 Aug 2022 15:07:43 GMT</pubDate>
    <dc:creator>KarimSegura</dc:creator>
    <dc:date>2022-08-12T15:07:43Z</dc:date>
    <item>
      <title>databricks-connect throws an exception when showing a dataframe with json content</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-connect-throws-an-exception-when-showing-a-dataframe/m-p/34709#M25435</link>
      <description>&lt;P&gt;I'm facing an issue when I want to show a dataframe with JSON content.&lt;/P&gt;&lt;P&gt;All this happens when the script runs in databricks-connect from VS Code.&lt;/P&gt;&lt;P&gt;Basically, I would like any help or guidance to get this run as it should be. &lt;/P&gt;&lt;P&gt;Thanks in advance.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This is how the cluster is configured.&lt;/P&gt;&lt;P&gt;Cluster Azure Databricks runtime 10.4&lt;/P&gt;&lt;P&gt;Workers 2-8 Standard_DS3_v2 14GB Memory, 4 cores&lt;/P&gt;&lt;P&gt;Driver Standard_DS3_v2 14GB Memory, 4 cores&lt;/P&gt;&lt;P&gt;Spark config.&lt;/P&gt;&lt;P&gt;spark.databricks.service.server.enabled true&lt;/P&gt;&lt;P&gt;spark.databricks.service.port 8787&lt;/P&gt;&lt;P&gt;spark.hadoop.datanucleus.connectionPoolingType hikari&lt;/P&gt;&lt;P&gt;spark.databricks.delta.preview.enabled true&lt;/P&gt;&lt;P&gt;On my local computer&lt;/P&gt;&lt;P&gt;I installed databricks-connect using pip install databricks-connect 10.4.*&lt;/P&gt;&lt;P&gt;It is configured as per documentation indicates.&amp;nbsp;&lt;A href="https://[1]:%20https//docs.microsoft.com/en-us/azure/databricks/dev-tools/databricks-connect#set-up-the-client" alt="https://[1]:%20https//docs.microsoft.com/en-us/azure/databricks/dev-tools/databricks-connect#set-up-the-client" target="_blank"&gt;Azure databricks-connect setup&lt;/A&gt;&lt;/P&gt;&lt;P&gt;When I run databricks-connect test passes without any failure&lt;/P&gt;&lt;P&gt;The code I'm trying to run is this:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;from pyspark.sql import SparkSession
from pyspark.sql.types import *
from datetime import date
&amp;nbsp;
nested_row = ['{"name":"Yin","address":{"city":"Columbus","state":"Ohio"}}']
&amp;nbsp;
nested_struct = StructType([
StructField("address",StructType([
     StructField("city",StringType(),True),
     StructField("state",StringType(),True)
  ]),True),
  StructField("name",StringType(),True)
])
&amp;nbsp;
nested_rdd = sc.parallelize(nested_row)
&amp;nbsp;
df_json = spark.read.json(nested_rdd,nested_struct)
&amp;nbsp;
df_json.printSchema()
df_json.show()&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;everything runs fine until the printSchema, then when I want to show the dataframe throws an exception.&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;Traceback (most recent call last):
  File "c:\Data\projects\vcode\gbrx-dbconnect\dbc1.py", line 76, in &amp;lt;module&amp;gt;
    df_json.show()
  File "c:\Data\projects\vcode\gbrx-dbconnect\venv\lib\site-packages\pyspark\sql\dataframe.py", line 502, in show
    print(self._jdf.showString(n, 20, vertical))
  File "c:\Data\projects\vcode\gbrx-dbconnect\venv\lib\site-packages\py4j\java_gateway.py", line 1304, in __call__
    return_value = get_return_value(
  File "c:\Data\projects\vcode\gbrx-dbconnect\venv\lib\site-packages\pyspark\sql\utils.py", line 117, in deco
    return f(*a, **kw)
  File "c:\Data\projects\vcode\gbrx-dbconnect\venv\lib\site-packages\py4j\protocol.py", line 326, in get_return_value
    raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o46.showString.
: java.lang.ClassCastException: cannot assign instance of java.lang.String to field org.apache.spark.sql.catalyst.json.JSONOptions.lineSeparatorInRead of type scala.Option in instance of org.apache.spark.sql.catalyst.json.JSONOptions
        at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)
        at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2411)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)
        at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2093)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1655)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)
        at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2093)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1655)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461)
        at org.apache.spark.sql.util.ProtoSerializer.$anonfun$deserializeObject$1(ProtoSerializer.scala:7055)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 12 Aug 2022 15:07:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-connect-throws-an-exception-when-showing-a-dataframe/m-p/34709#M25435</guid>
      <dc:creator>KarimSegura</dc:creator>
      <dc:date>2022-08-12T15:07:43Z</dc:date>
    </item>
    <item>
      <title>Re: databricks-connect throws an exception when showing a dataframe with json content</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-connect-throws-an-exception-when-showing-a-dataframe/m-p/34710#M25436</link>
      <description>&lt;P&gt;Your code is correct. Please execute it directly on databricks.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;"Databricks recommends that you use&amp;nbsp;DBX by Databricks Labs for local development instead of Databricks Connect." The main limitation is that code is executed directly on clusters and not databricks. + the fact that it is EOL.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Soon Spark Connect will be available, so our life will be easier.&lt;/P&gt;</description>
      <pubDate>Fri, 12 Aug 2022 17:13:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-connect-throws-an-exception-when-showing-a-dataframe/m-p/34710#M25436</guid>
      <dc:creator>Hubert-Dudek</dc:creator>
      <dc:date>2022-08-12T17:13:37Z</dc:date>
    </item>
    <item>
      <title>Re: databricks-connect throws an exception when showing a dataframe with json content</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-connect-throws-an-exception-when-showing-a-dataframe/m-p/34711#M25437</link>
      <description>&lt;P&gt;The code works fine on databricks cluster, but this code is part of a unit test in local env. then submitted to a branch-&amp;gt;PR-&amp;gt;merged into master branch.&lt;/P&gt;&lt;P&gt;Thanks for the advice on using DBX. I will give DBX a try again even though I've already tried.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I'll be heads up for Spark Connect.&lt;/P&gt;&lt;P&gt;Thank you.&lt;/P&gt;</description>
      <pubDate>Fri, 12 Aug 2022 18:41:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-connect-throws-an-exception-when-showing-a-dataframe/m-p/34711#M25437</guid>
      <dc:creator>KarimSegura</dc:creator>
      <dc:date>2022-08-12T18:41:40Z</dc:date>
    </item>
  </channel>
</rss>

