<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Scala Connectivity to Databricks Bronze Layer Raw Data from a Non-Databricks Spark environment in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/scala-connectivity-to-databricks-bronze-layer-raw-data-from-a/m-p/25392#M17650</link>
    <description>&lt;P&gt;Hi All,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="requirement"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1292i30E1352A2188CE8A/image-size/large?v=v2&amp;amp;px=999" role="button" title="requirement" alt="requirement" /&gt;&lt;/span&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We are developing a new Scala/Java program which needs to read &amp;amp; process the raw data stored in source ADLS (which is a Databricks Environment) in parallel as the volume of the source data is very high (in GBs &amp;amp; TBs). What kind of connection is required for reading high volume data in parallel manner in such cases ? &lt;B&gt;JDBC don’t seem to be the right choice , as it cannot run multiple threads. Also, delta sharing has been tried but its not working. &lt;/B&gt;Can you please provide some pointer to some Scala/Java codebase, design and connectivity options for this requirement ?&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Note that :- This is not an ETL process. After Data Read the raw data will be undergoing through curation and enrichment by the program and sent to downstream applications for consumption. We only have Gemfire on our spark cluster.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Any pointers would be a great help. Thanks in advance&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank You,&lt;/P&gt;&lt;P&gt;Badal Panda&lt;/P&gt;</description>
    <pubDate>Fri, 28 Oct 2022 07:55:01 GMT</pubDate>
    <dc:creator>BkP</dc:creator>
    <dc:date>2022-10-28T07:55:01Z</dc:date>
    <item>
      <title>Scala Connectivity to Databricks Bronze Layer Raw Data from a Non-Databricks Spark environment</title>
      <link>https://community.databricks.com/t5/data-engineering/scala-connectivity-to-databricks-bronze-layer-raw-data-from-a/m-p/25392#M17650</link>
      <description>&lt;P&gt;Hi All,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="requirement"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1292i30E1352A2188CE8A/image-size/large?v=v2&amp;amp;px=999" role="button" title="requirement" alt="requirement" /&gt;&lt;/span&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We are developing a new Scala/Java program which needs to read &amp;amp; process the raw data stored in source ADLS (which is a Databricks Environment) in parallel as the volume of the source data is very high (in GBs &amp;amp; TBs). What kind of connection is required for reading high volume data in parallel manner in such cases ? &lt;B&gt;JDBC don’t seem to be the right choice , as it cannot run multiple threads. Also, delta sharing has been tried but its not working. &lt;/B&gt;Can you please provide some pointer to some Scala/Java codebase, design and connectivity options for this requirement ?&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Note that :- This is not an ETL process. After Data Read the raw data will be undergoing through curation and enrichment by the program and sent to downstream applications for consumption. We only have Gemfire on our spark cluster.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Any pointers would be a great help. Thanks in advance&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank You,&lt;/P&gt;&lt;P&gt;Badal Panda&lt;/P&gt;</description>
      <pubDate>Fri, 28 Oct 2022 07:55:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/scala-connectivity-to-databricks-bronze-layer-raw-data-from-a/m-p/25392#M17650</guid>
      <dc:creator>BkP</dc:creator>
      <dc:date>2022-10-28T07:55:01Z</dc:date>
    </item>
    <item>
      <title>Re: Scala Connectivity to Databricks Bronze Layer Raw Data from a Non-Databricks Spark environment</title>
      <link>https://community.databricks.com/t5/data-engineering/scala-connectivity-to-databricks-bronze-layer-raw-data-from-a/m-p/25393#M17651</link>
      <description>&lt;P&gt;More Info : &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The source data in ADLS is from ERPs like SAP and JDE. The data format is parquet &amp;amp; both Full Load data and Delta Load data is available.&lt;/P&gt;</description>
      <pubDate>Fri, 28 Oct 2022 08:06:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/scala-connectivity-to-databricks-bronze-layer-raw-data-from-a/m-p/25393#M17651</guid>
      <dc:creator>BkP</dc:creator>
      <dc:date>2022-10-28T08:06:25Z</dc:date>
    </item>
    <item>
      <title>Re: Scala Connectivity to Databricks Bronze Layer Raw Data from a Non-Databricks Spark environment</title>
      <link>https://community.databricks.com/t5/data-engineering/scala-connectivity-to-databricks-bronze-layer-raw-data-from-a/m-p/25394#M17652</link>
      <description>&lt;P&gt;hello experts. any advise on this question ?? tagging some folks from whom I have received answers before. Please help on this requirement or tag someone who can help on this&lt;/P&gt;&lt;P&gt;@Kaniz Fatma​&amp;nbsp;, @Vartika Nain​&amp;nbsp;, @Bilal Aslam​&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 31 Oct 2022 19:31:28 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/scala-connectivity-to-databricks-bronze-layer-raw-data-from-a/m-p/25394#M17652</guid>
      <dc:creator>BkP</dc:creator>
      <dc:date>2022-10-31T19:31:28Z</dc:date>
    </item>
  </channel>
</rss>

