<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Differences between Spark SQL and Databricks in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/differences-between-spark-sql-and-databricks/m-p/66656#M33182</link>
    <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/100535"&gt;@dollyb&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;That's because when you've added another dependency on Databricks, it doesn't really know which one it should use. By default it's using built-in&amp;nbsp;&lt;SPAN&gt;com.google.cloud.spark.bigquery.BigQueryRelationProvider.&lt;BR /&gt;&lt;BR /&gt;What you can do is provide whole package name into format(), ex.&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;spark.read.format("com.google.cloud.spark.bigquery.v2.Spark35BigQueryTableProvider")&lt;/LI-CODE&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 19 Apr 2024 05:41:26 GMT</pubDate>
    <dc:creator>daniel_sahal</dc:creator>
    <dc:date>2024-04-19T05:41:26Z</dc:date>
    <item>
      <title>Differences between Spark SQL and Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/differences-between-spark-sql-and-databricks/m-p/66287#M33074</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;I'm using a local Docker Spark 3.5 runtime to test my Databricks Connect code. However I've come across a couple of cases where my code would work in one environment, but not the other.&lt;/P&gt;&lt;P&gt;Concrete example, I'm reading data from BigQuery via spark.read.format("bigquery") and the BigQuery connector 0.36.1 in my local environment. I can't seem to find out what library Databricks is using.&lt;/P&gt;&lt;P&gt;So when I fetch a table, the dataset has a subtly different schema which I don't understand since the table is the same.&lt;/P&gt;&lt;P&gt;Spark:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt; |-- event_params: map (nullable = false)
 |    |-- key: string
 |    |-- value: struct (valueContainsNull = true)
 |    |    |-- string_value: string (nullable = true)
 |    |    |-- int_value: long (nullable = true)
 |    |    |-- float_value: double (nullable = true)
 |    |    |-- double_value: double (nullable = true)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Databricks&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="javascript"&gt;|-- event_params: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- key: string (nullable = true)
 |    |    |-- value: struct (nullable = true)
 |    |    |    |-- string_value: string (nullable = true)
 |    |    |    |-- int_value: long (nullable = true)
 |    |    |    |-- float_value: double (nullable = true)
 |    |    |    |-- double_value: double (nullable = true)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So in Databricks, it's wrapped in an extra array which makes no sense to me.&lt;/P&gt;&lt;P&gt;What library is Databricks using? How to handle these differences in environments?&lt;/P&gt;&lt;P&gt;When adding my local dependency, I get this:&lt;/P&gt;&lt;P&gt;Multiple sources found for bigquery (com.google.cloud.spark.bigquery.BigQueryRelationProvider, com.google.cloud.spark.bigquery.v2.Spark35BigQueryTableProvider)&lt;/P&gt;</description>
      <pubDate>Mon, 15 Apr 2024 15:57:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/differences-between-spark-sql-and-databricks/m-p/66287#M33074</guid>
      <dc:creator>dollyb</dc:creator>
      <dc:date>2024-04-15T15:57:42Z</dc:date>
    </item>
    <item>
      <title>Re: Differences between Spark SQL and Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/differences-between-spark-sql-and-databricks/m-p/66656#M33182</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/100535"&gt;@dollyb&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;That's because when you've added another dependency on Databricks, it doesn't really know which one it should use. By default it's using built-in&amp;nbsp;&lt;SPAN&gt;com.google.cloud.spark.bigquery.BigQueryRelationProvider.&lt;BR /&gt;&lt;BR /&gt;What you can do is provide whole package name into format(), ex.&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;spark.read.format("com.google.cloud.spark.bigquery.v2.Spark35BigQueryTableProvider")&lt;/LI-CODE&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 19 Apr 2024 05:41:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/differences-between-spark-sql-and-databricks/m-p/66656#M33182</guid>
      <dc:creator>daniel_sahal</dc:creator>
      <dc:date>2024-04-19T05:41:26Z</dc:date>
    </item>
    <item>
      <title>Re: Differences between Spark SQL and Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/differences-between-spark-sql-and-databricks/m-p/75367#M34951</link>
      <description>&lt;P&gt;Thanks, using the FQN works. I've now added a cluster init-script that removes the (outdated) version provided by Databricks.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 21 Jun 2024 17:54:15 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/differences-between-spark-sql-and-databricks/m-p/75367#M34951</guid>
      <dc:creator>dollyb</dc:creator>
      <dc:date>2024-06-21T17:54:15Z</dc:date>
    </item>
  </channel>
</rss>

