<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Local pyspark read data using jdbc driver returns column names only in Warehousing &amp; Analytics</title>
    <link>https://community.databricks.com/t5/warehousing-analytics/local-pyspark-read-data-using-jdbc-driver-returns-column-names/m-p/70950#M1355</link>
    <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;I have an Azure sql warehouse serverless instance that I can connect to using databricks-sql-connector. But, when I try to use pyspark and jdbc driver url, I can't read or write.&lt;/P&gt;&lt;P&gt;See my code below&lt;/P&gt;&lt;DIV&gt;&lt;PRE&gt;&lt;BR /&gt;&lt;SPAN&gt;def &lt;/SPAN&gt;&lt;SPAN&gt;get_jdbc_url&lt;/SPAN&gt;():&lt;BR /&gt;    &lt;SPAN&gt;# Define your Databricks parameters&lt;BR /&gt;&lt;/SPAN&gt;    server_hostname&lt;SPAN&gt;, &lt;/SPAN&gt;http_path&lt;SPAN&gt;, &lt;/SPAN&gt;access_token = get_connection_configs()&lt;BR /&gt;    default_catalog = &lt;SPAN&gt;"researchers"&lt;BR /&gt;&lt;/SPAN&gt;    &lt;SPAN&gt;# Build the Spark JDBC URL&lt;BR /&gt;&lt;/SPAN&gt;    jdbc_url = &lt;SPAN&gt;f"jdbc:databricks://&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;server_hostname&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;:443;httpPath=&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;http_path&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;;AuthMech=3;UID=token;PWD=&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;access_token&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;;ConnCatalog=&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;default_catalog&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;BR /&gt;&lt;/SPAN&gt;    &lt;SPAN&gt;return &lt;/SPAN&gt;jdbc_url&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;def &lt;/SPAN&gt;&lt;SPAN&gt;get_spark&lt;/SPAN&gt;() -&amp;gt; SparkSession:&lt;BR /&gt;    &lt;SPAN&gt;# Initialize SparkSession&lt;BR /&gt;&lt;/SPAN&gt;    conf = SparkConf()&lt;BR /&gt;    &lt;SPAN&gt;# unfortunately, there is no automatic way for pyspark to download the jar itself&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;    # &lt;/SPAN&gt;&lt;SPAN&gt;TODO set to your own jar path&lt;BR /&gt;&lt;/SPAN&gt;    conf.set(&lt;SPAN&gt;"spark.jars"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;BR /&gt;&lt;/SPAN&gt;             &lt;SPAN&gt;"/home/username/repos/uwcip-research/code-examples/DatabricksJDBC42-2.6.38.1068/DatabricksJDBC42.jar"&lt;/SPAN&gt;)&lt;BR /&gt;    conf.set(&lt;SPAN&gt;"spark.driver.extraClassPath"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;BR /&gt;&lt;/SPAN&gt;             &lt;SPAN&gt;"/home/username/repos/uwcip-research/code-examples/DatabricksJDBC42-2.6.38.1068/DatabricksJDBC42.jar"&lt;/SPAN&gt;)&lt;BR /&gt;    conf.set(&lt;SPAN&gt;"spark.sql.execution.arrow.pyspark.enabled"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"true"&lt;/SPAN&gt;)&lt;BR /&gt;&lt;BR /&gt;    spark = SparkSession.builder \&lt;BR /&gt;        .appName(&lt;SPAN&gt;"connecting to databricks"&lt;/SPAN&gt;) \&lt;BR /&gt;        .config(&lt;SPAN&gt;conf&lt;/SPAN&gt;=conf) \&lt;BR /&gt;        .getOrCreate()&lt;BR /&gt;    &lt;SPAN&gt;return &lt;/SPAN&gt;spark&lt;/PRE&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;PRE&gt;&lt;SPAN&gt;def &lt;/SPAN&gt;&lt;SPAN&gt;read_table&lt;/SPAN&gt;():&lt;BR /&gt;    spark = get_spark()&lt;BR /&gt;    jdbc_url = get_jdbc_url()&lt;BR /&gt;    dbtable = &lt;SPAN&gt;"researchers.dev.test_transcripts"&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;    df = spark.read \&lt;BR /&gt;        .format(&lt;SPAN&gt;"jdbc"&lt;/SPAN&gt;) \&lt;BR /&gt;        .option(&lt;SPAN&gt;"url"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;jdbc_url) \&lt;BR /&gt;        .option(&lt;SPAN&gt;"dbtable"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;dbtable) \&lt;BR /&gt;        .load()&lt;BR /&gt;&lt;BR /&gt;    top5 = df.head(&lt;SPAN&gt;5&lt;/SPAN&gt;)&lt;BR /&gt;    &lt;SPAN&gt;print&lt;/SPAN&gt;(top5)&lt;BR /&gt;    &lt;SPAN&gt;return&lt;/SPAN&gt;&lt;/PRE&gt;&lt;/DIV&gt;&lt;P&gt;Below is the error and read results I got when call read_table(). See that it simply returned column names 5 times. When I call df.printSchema(), it works fine.&lt;/P&gt;&lt;P&gt;&lt;FONT color="#993366"&gt;ERROR StatusLogger Unrecognized conversion specifier [msg] starting at position 54 in conversion pattern.&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#993366"&gt;ERROR StatusLogger Unrecognized format specifier [n]&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#993366"&gt;ERROR StatusLogger Unrecognized conversion specifier [n] starting at position 56 in conversion pattern.&lt;/FONT&gt;&lt;BR /&gt;[Row(video_name='video_name', video_transcript='video_transcript'), Row(video_name='video_name', video_transcript='video_transcript'), Row(video_name='video_name', video_transcript='video_transcript'), Row(video_name='video_name', video_transcript='video_transcript'), Row(video_name='video_name', video_transcript='video_transcript')]&lt;/P&gt;&lt;P&gt;Additionally, probably unrelated logger error&lt;/P&gt;&lt;P&gt;&lt;FONT color="#993366"&gt;ERROR StatusLogger Unable to create Lookup for bundle&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#993366"&gt;java.lang.ClassCastException: class org.apache.logging.log4j.core.lookup.ResourceBundleLookup&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#993366"&gt;at java.base/java.lang.Class.asSubclass(Class.java:3640)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#993366"&gt;at com.databricks.client.jdbc42.internal.apache.logging.log4j.core.lookup.Interpolator.&amp;lt;init&amp;gt;(Interpolator.java:84)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#993366"&gt;at com.databricks.client.jdbc42.internal.apache.logging.log4j.core.lookup.Interpolator.&amp;lt;init&amp;gt;(Interpolator.java:105)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#993366"&gt;at com.databricks.client.jdbc42.internal.apache.logging.log4j.core.config.AbstractConfiguration.&amp;lt;init&amp;gt;(AbstractConfiguration.java:135)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#993366"&gt;at com.databricks.client.jdbc42.internal.apache.logging.log4j.core.config.NullConfiguration.&amp;lt;init&amp;gt;(NullConfiguration.java:32)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#993366"&gt;at com.databricks.client.jdbc42.internal.apache.logging.log4j.core.LoggerContext.&amp;lt;clinit&amp;gt;(LoggerContext.java:74)&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#993366"&gt;...&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#993366"&gt;ERROR StatusLogger Unable to create Lookup for ctx&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#993366"&gt;java.lang.ClassCastException: class org.apache.logging.log4j.core.lookup.ContextMapLookup&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#993366"&gt;at java.base/java.lang.Class.asSubclass(Class.java:3640)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#993366"&gt;at com.databricks.client.jdbc42.internal.apache.logging.log4j.core.lookup.Interpolator.&amp;lt;init&amp;gt;(Interpolator.java:84)&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 28 May 2024 18:13:28 GMT</pubDate>
    <dc:creator>amelia1</dc:creator>
    <dc:date>2024-05-28T18:13:28Z</dc:date>
    <item>
      <title>Local pyspark read data using jdbc driver returns column names only</title>
      <link>https://community.databricks.com/t5/warehousing-analytics/local-pyspark-read-data-using-jdbc-driver-returns-column-names/m-p/70950#M1355</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;I have an Azure sql warehouse serverless instance that I can connect to using databricks-sql-connector. But, when I try to use pyspark and jdbc driver url, I can't read or write.&lt;/P&gt;&lt;P&gt;See my code below&lt;/P&gt;&lt;DIV&gt;&lt;PRE&gt;&lt;BR /&gt;&lt;SPAN&gt;def &lt;/SPAN&gt;&lt;SPAN&gt;get_jdbc_url&lt;/SPAN&gt;():&lt;BR /&gt;    &lt;SPAN&gt;# Define your Databricks parameters&lt;BR /&gt;&lt;/SPAN&gt;    server_hostname&lt;SPAN&gt;, &lt;/SPAN&gt;http_path&lt;SPAN&gt;, &lt;/SPAN&gt;access_token = get_connection_configs()&lt;BR /&gt;    default_catalog = &lt;SPAN&gt;"researchers"&lt;BR /&gt;&lt;/SPAN&gt;    &lt;SPAN&gt;# Build the Spark JDBC URL&lt;BR /&gt;&lt;/SPAN&gt;    jdbc_url = &lt;SPAN&gt;f"jdbc:databricks://&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;server_hostname&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;:443;httpPath=&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;http_path&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;;AuthMech=3;UID=token;PWD=&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;access_token&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;;ConnCatalog=&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;default_catalog&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;BR /&gt;&lt;/SPAN&gt;    &lt;SPAN&gt;return &lt;/SPAN&gt;jdbc_url&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;def &lt;/SPAN&gt;&lt;SPAN&gt;get_spark&lt;/SPAN&gt;() -&amp;gt; SparkSession:&lt;BR /&gt;    &lt;SPAN&gt;# Initialize SparkSession&lt;BR /&gt;&lt;/SPAN&gt;    conf = SparkConf()&lt;BR /&gt;    &lt;SPAN&gt;# unfortunately, there is no automatic way for pyspark to download the jar itself&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;    # &lt;/SPAN&gt;&lt;SPAN&gt;TODO set to your own jar path&lt;BR /&gt;&lt;/SPAN&gt;    conf.set(&lt;SPAN&gt;"spark.jars"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;BR /&gt;&lt;/SPAN&gt;             &lt;SPAN&gt;"/home/username/repos/uwcip-research/code-examples/DatabricksJDBC42-2.6.38.1068/DatabricksJDBC42.jar"&lt;/SPAN&gt;)&lt;BR /&gt;    conf.set(&lt;SPAN&gt;"spark.driver.extraClassPath"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;BR /&gt;&lt;/SPAN&gt;             &lt;SPAN&gt;"/home/username/repos/uwcip-research/code-examples/DatabricksJDBC42-2.6.38.1068/DatabricksJDBC42.jar"&lt;/SPAN&gt;)&lt;BR /&gt;    conf.set(&lt;SPAN&gt;"spark.sql.execution.arrow.pyspark.enabled"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"true"&lt;/SPAN&gt;)&lt;BR /&gt;&lt;BR /&gt;    spark = SparkSession.builder \&lt;BR /&gt;        .appName(&lt;SPAN&gt;"connecting to databricks"&lt;/SPAN&gt;) \&lt;BR /&gt;        .config(&lt;SPAN&gt;conf&lt;/SPAN&gt;=conf) \&lt;BR /&gt;        .getOrCreate()&lt;BR /&gt;    &lt;SPAN&gt;return &lt;/SPAN&gt;spark&lt;/PRE&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;PRE&gt;&lt;SPAN&gt;def &lt;/SPAN&gt;&lt;SPAN&gt;read_table&lt;/SPAN&gt;():&lt;BR /&gt;    spark = get_spark()&lt;BR /&gt;    jdbc_url = get_jdbc_url()&lt;BR /&gt;    dbtable = &lt;SPAN&gt;"researchers.dev.test_transcripts"&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;    df = spark.read \&lt;BR /&gt;        .format(&lt;SPAN&gt;"jdbc"&lt;/SPAN&gt;) \&lt;BR /&gt;        .option(&lt;SPAN&gt;"url"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;jdbc_url) \&lt;BR /&gt;        .option(&lt;SPAN&gt;"dbtable"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;dbtable) \&lt;BR /&gt;        .load()&lt;BR /&gt;&lt;BR /&gt;    top5 = df.head(&lt;SPAN&gt;5&lt;/SPAN&gt;)&lt;BR /&gt;    &lt;SPAN&gt;print&lt;/SPAN&gt;(top5)&lt;BR /&gt;    &lt;SPAN&gt;return&lt;/SPAN&gt;&lt;/PRE&gt;&lt;/DIV&gt;&lt;P&gt;Below is the error and read results I got when call read_table(). See that it simply returned column names 5 times. When I call df.printSchema(), it works fine.&lt;/P&gt;&lt;P&gt;&lt;FONT color="#993366"&gt;ERROR StatusLogger Unrecognized conversion specifier [msg] starting at position 54 in conversion pattern.&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#993366"&gt;ERROR StatusLogger Unrecognized format specifier [n]&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#993366"&gt;ERROR StatusLogger Unrecognized conversion specifier [n] starting at position 56 in conversion pattern.&lt;/FONT&gt;&lt;BR /&gt;[Row(video_name='video_name', video_transcript='video_transcript'), Row(video_name='video_name', video_transcript='video_transcript'), Row(video_name='video_name', video_transcript='video_transcript'), Row(video_name='video_name', video_transcript='video_transcript'), Row(video_name='video_name', video_transcript='video_transcript')]&lt;/P&gt;&lt;P&gt;Additionally, probably unrelated logger error&lt;/P&gt;&lt;P&gt;&lt;FONT color="#993366"&gt;ERROR StatusLogger Unable to create Lookup for bundle&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#993366"&gt;java.lang.ClassCastException: class org.apache.logging.log4j.core.lookup.ResourceBundleLookup&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#993366"&gt;at java.base/java.lang.Class.asSubclass(Class.java:3640)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#993366"&gt;at com.databricks.client.jdbc42.internal.apache.logging.log4j.core.lookup.Interpolator.&amp;lt;init&amp;gt;(Interpolator.java:84)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#993366"&gt;at com.databricks.client.jdbc42.internal.apache.logging.log4j.core.lookup.Interpolator.&amp;lt;init&amp;gt;(Interpolator.java:105)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#993366"&gt;at com.databricks.client.jdbc42.internal.apache.logging.log4j.core.config.AbstractConfiguration.&amp;lt;init&amp;gt;(AbstractConfiguration.java:135)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#993366"&gt;at com.databricks.client.jdbc42.internal.apache.logging.log4j.core.config.NullConfiguration.&amp;lt;init&amp;gt;(NullConfiguration.java:32)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#993366"&gt;at com.databricks.client.jdbc42.internal.apache.logging.log4j.core.LoggerContext.&amp;lt;clinit&amp;gt;(LoggerContext.java:74)&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#993366"&gt;...&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#993366"&gt;ERROR StatusLogger Unable to create Lookup for ctx&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#993366"&gt;java.lang.ClassCastException: class org.apache.logging.log4j.core.lookup.ContextMapLookup&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#993366"&gt;at java.base/java.lang.Class.asSubclass(Class.java:3640)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#993366"&gt;at com.databricks.client.jdbc42.internal.apache.logging.log4j.core.lookup.Interpolator.&amp;lt;init&amp;gt;(Interpolator.java:84)&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 28 May 2024 18:13:28 GMT</pubDate>
      <guid>https://community.databricks.com/t5/warehousing-analytics/local-pyspark-read-data-using-jdbc-driver-returns-column-names/m-p/70950#M1355</guid>
      <dc:creator>amelia1</dc:creator>
      <dc:date>2024-05-28T18:13:28Z</dc:date>
    </item>
    <item>
      <title>Re: Local pyspark read data using jdbc driver returns column names only</title>
      <link>https://community.databricks.com/t5/warehousing-analytics/local-pyspark-read-data-using-jdbc-driver-returns-column-names/m-p/103816#M1779</link>
      <description>&lt;P&gt;The error does not look specific to the warehouse that you are connecting to.&lt;/P&gt;
&lt;P&gt;The error message "Unrecognized conversion specifier [msg] starting at position 54 in conversion pattern" indicates that there is an issue with the logging configuration in your application. Specifically, it suggests that the logging pattern you are using contains an invalid or unrecognized specifier.&lt;/P&gt;
&lt;P&gt;&lt;A href="https://github.com/aws/aws-lambda-java-libs/issues/225" target="_blank"&gt;https://github.com/aws/aws-lambda-java-libs/issues/225&lt;/A&gt;&amp;nbsp;This shows there were multiple log4j dependencies, causing this.&lt;/P&gt;</description>
      <pubDate>Wed, 01 Jan 2025 11:02:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/warehousing-analytics/local-pyspark-read-data-using-jdbc-driver-returns-column-names/m-p/103816#M1779</guid>
      <dc:creator>NandiniN</dc:creator>
      <dc:date>2025-01-01T11:02:02Z</dc:date>
    </item>
  </channel>
</rss>

