<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: RDD Parallelism without delta_log in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/rdd-parallelism-without-delta-log/m-p/102267#M41046</link>
    <description>&lt;P&gt;This is impossible, there's no delta_log:&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Failed to process raw JSON: &lt;A href="https://awsindividual.blob.core.windows.net/awsdashdata1/dash.test/Sync-3/Feed/Company/1014148/CompanyContacts.JSON" target="_blank"&gt;https://awsindividual.blob.core.windows.net/awsdashdata1/dash.test/Sync-3/Feed/Company/1014148/CompanyContacts.JSON&lt;/A&gt; - [DELTA_INVALID_FORMAT] Incompatible format detected. A transaction log for Delta was found at `&lt;A href="https://awsindividual.blob.core.windows.net/awsdashdata1/dash.test/Sync-3/Feed/Company/1014148/CompanyContacts.JSON/_delta_log" target="_blank"&gt;https://awsindividual.blob.core.windows.net/awsdashdata1/dash.test/Sync-3/Feed/Company/1014148/CompanyContacts.JSON/_delta_log&lt;/A&gt;`, but you are trying to read from `&lt;A href="https://awsindividual.blob.core.windows.net/awsdashdata1/dash.test/Sync-3/Feed/Company/1014148/CompanyContacts.JSON" target="_blank"&gt;https://awsindividual.blob.core.windows.net/awsdashdata1/dash.test/Sync-3/Feed/Company/1014148/CompanyContacts.JSON&lt;/A&gt;` using format("text"). You must use 'format("delta")' when reading and writing to a delta table.&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 16 Dec 2024 15:07:41 GMT</pubDate>
    <dc:creator>Wildabeast</dc:creator>
    <dc:date>2024-12-16T15:07:41Z</dc:date>
    <item>
      <title>RDD Parallelism without delta_log</title>
      <link>https://community.databricks.com/t5/data-engineering/rdd-parallelism-without-delta-log/m-p/102147#M40986</link>
      <description>&lt;P&gt;I've set up my script to be able to use a multinode cluster, but am running into an issue when iterating on a list of .json files to sink to sql table via a JDBC driver. The primary response is that a delta_log file (that I can't see in my blob container) is causing my original file path list to break. Where is this delta_log file and how do I avoid this failure?&lt;/P&gt;</description>
      <pubDate>Sat, 14 Dec 2024 23:27:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/rdd-parallelism-without-delta-log/m-p/102147#M40986</guid>
      <dc:creator>Wildabeast</dc:creator>
      <dc:date>2024-12-14T23:27:13Z</dc:date>
    </item>
    <item>
      <title>Re: RDD Parallelism without delta_log</title>
      <link>https://community.databricks.com/t5/data-engineering/rdd-parallelism-without-delta-log/m-p/102149#M40988</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/136437"&gt;@Wildabeast&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;Is your table a Delta table? And are you getting any failures?&lt;/P&gt;</description>
      <pubDate>Sun, 15 Dec 2024 00:51:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/rdd-parallelism-without-delta-log/m-p/102149#M40988</guid>
      <dc:creator>Alberto_Umana</dc:creator>
      <dc:date>2024-12-15T00:51:36Z</dc:date>
    </item>
    <item>
      <title>Re: RDD Parallelism without delta_log</title>
      <link>https://community.databricks.com/t5/data-engineering/rdd-parallelism-without-delta-log/m-p/102165#M41000</link>
      <description>&lt;P&gt;No, it's an ssms dbo table.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 15 Dec 2024 19:56:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/rdd-parallelism-without-delta-log/m-p/102165#M41000</guid>
      <dc:creator>Wildabeast</dc:creator>
      <dc:date>2024-12-15T19:56:12Z</dc:date>
    </item>
    <item>
      <title>Re: RDD Parallelism without delta_log</title>
      <link>https://community.databricks.com/t5/data-engineering/rdd-parallelism-without-delta-log/m-p/102255#M41035</link>
      <description>&lt;P&gt;And what is the error that you are getting?&lt;/P&gt;
&lt;P&gt;Can you do one test, you can filter out the &lt;CODE&gt;_delta_log&lt;/CODE&gt; directory when listing the files. Here is an example of how you can do this in Python&lt;/P&gt;
&lt;P&gt;import os&lt;/P&gt;
&lt;P&gt;# List all files in the directory&lt;BR /&gt;all_files = dbutils.fs.ls("path/to/your/directory")&lt;/P&gt;
&lt;P&gt;# Filter out the _delta_log directory&lt;BR /&gt;json_files = [file.path for file in all_files if "_delta_log" not in file.path]&lt;/P&gt;
&lt;P&gt;# Now you can iterate over json_files&lt;BR /&gt;for file_path in json_files:&lt;BR /&gt;# Your code to process each JSON file&lt;/P&gt;</description>
      <pubDate>Mon, 16 Dec 2024 13:22:09 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/rdd-parallelism-without-delta-log/m-p/102255#M41035</guid>
      <dc:creator>Alberto_Umana</dc:creator>
      <dc:date>2024-12-16T13:22:09Z</dc:date>
    </item>
    <item>
      <title>Re: RDD Parallelism without delta_log</title>
      <link>https://community.databricks.com/t5/data-engineering/rdd-parallelism-without-delta-log/m-p/102259#M41039</link>
      <description>&lt;P&gt;I actually submitted the error and my .py file....for some reason it didn't post yesterday. Let me pull it while I'm running your filter suggestion.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 16 Dec 2024 13:56:27 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/rdd-parallelism-without-delta-log/m-p/102259#M41039</guid>
      <dc:creator>Wildabeast</dc:creator>
      <dc:date>2024-12-16T13:56:27Z</dc:date>
    </item>
    <item>
      <title>Re: RDD Parallelism without delta_log</title>
      <link>https://community.databricks.com/t5/data-engineering/rdd-parallelism-without-delta-log/m-p/102262#M41042</link>
      <description>&lt;P&gt;It's in Azure, we just named in awsindividual because it's a migration.&lt;/P&gt;&lt;P&gt;Error:&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Processing file: &lt;/SPAN&gt;&lt;A class="" href="https://awsindividual.blob.core.windows.net/awsdashdata1/dash.test/Sync-45/Feed/Company/1570546/CompanyContacts.JSON" target="_blank" rel="noopener noreferrer"&gt;https://awsindividual.blob.core.windows.net/awsdashdata1/dash.test/Sync-45/Feed/Company/1570546/CompanyContacts.JSON&lt;/A&gt;&lt;SPAN&gt; File name: CompanyContacts.JSON, Table: dbo.CompanyContacts, Schema: StructType([StructField('Contacts', ArrayType(StructType([StructField('Address', StructType([StructField('AddressLine1', StringType(), True), StructField('AddressLine2', StringType(), True), StructField('City', StringType(), True), StructField('StateProvince', StringType(), True), StructField('Country', StringType(), True), StructField('PostalCode', StringType(), True), StructField('County', StringType(), True)]), True), StructField('BillingAddress', StructType([StructField('AddressLine1', StringType(), True), StructField('AddressLine2', StringType(), True), StructField('City', StringType(), True), StructField('StateProvince', StringType(), True), StructField('Country', StringType(), True), StructField('PostalCode', StringType(), True), StructField('County', StringType(), True)]), True), StructField('ContactID', IntegerType(), True), StructField('CorrespondenceEmail', StringType(), True), StructField('InquiryEmail', StringType(), True), StructField('MailingAddress', StructType([StructField('AddressLine1', StringType(), True), StructField('AddressLine2', StringType(), True), StructField('City', StringType(), True), StructField('StateProvince', StringType(), True), StructField('Country', StringType(), True), StructField('PostalCode', StringType(), True), StructField('County', StringType(), True)]), True), StructField('MainPhone', StructType([StructField('Number', StringType(), True), StructField('Extension', StringType(), True)]), True), StructField('OtherPhones', ArrayType(StringType(), True), True), StructField('Website', StringType(), True)]), True), True), StructField('CompanyID', IntegerType(), True)]) Error reading file &lt;/SPAN&gt;&lt;A class="" href="https://awsindividual.blob.core.windows.net/awsdashdata1/dash.test/Sync-45/Feed/Company/1570546/CompanyContacts.JSON" target="_blank" rel="noopener noreferrer"&gt;https://awsindividual.blob.core.windows.net/awsdashdata1/dash.test/Sync-45/Feed/Company/1570546/CompanyContacts.JSON&lt;/A&gt;&lt;SPAN&gt;: [DELTA_INVALID_FORMAT] Incompatible format detected. A transaction log for Delta was found at `&lt;/SPAN&gt;&lt;A class="" href="https://awsindividual.blob.core.windows.net/awsdashdata1/dash.test/Sync-45/Feed/Company/1570546/CompanyContacts.JSON/_delta_log%60" target="_blank" rel="noopener noreferrer"&gt;https://awsindividual.blob.core.windows.net/awsdashdata1/dash.test/Sync-45/Feed/Company/1570546/CompanyContacts.JSON/_delta_log`&lt;/A&gt;&lt;SPAN&gt;, but you are trying to read from `&lt;/SPAN&gt;&lt;A class="" href="https://awsindividual.blob.core.windows.net/awsdashdata1/dash.test/Sync-45/Feed/Company/1570546/CompanyContacts.JSON%60" target="_blank" rel="noopener noreferrer"&gt;https://awsindividual.blob.core.windows.net/awsdashdata1/dash.test/Sync-45/Feed/Company/1570546/CompanyContacts.JSON`&lt;/A&gt;&lt;SPAN&gt; using format("json"). You must use 'format("delta")' when reading and writing to a delta table. To learn more about Delta, see &lt;/SPAN&gt;&lt;A class="" href="https://docs.microsoft.com/azure/databricks/delta/index" target="_blank" rel="noopener noreferrer"&gt;https://docs.microsoft.com/azure/databricks/delta/index&lt;/A&gt;&lt;SPAN&gt; Processing Results: Skipped: &lt;/SPAN&gt;&lt;A class="" href="https://awsindividual.blob.core.windows.net/awsdashdata1/dash.test/Sync-45/Feed/Company/1570546/CompanyContacts.JSON" target="_blank" rel="noopener noreferrer"&gt;https://awsindividual.blob.core.windows.net/awsdashdata1/dash.test/Sync-45/Feed/Company/1570546/CompanyContacts.JSON&lt;/A&gt;&lt;SPAN&gt; Processing Completed!&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 16 Dec 2024 14:21:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/rdd-parallelism-without-delta-log/m-p/102262#M41042</guid>
      <dc:creator>Wildabeast</dc:creator>
      <dc:date>2024-12-16T14:21:40Z</dc:date>
    </item>
    <item>
      <title>Re: RDD Parallelism without delta_log</title>
      <link>https://community.databricks.com/t5/data-engineering/rdd-parallelism-without-delta-log/m-p/102267#M41046</link>
      <description>&lt;P&gt;This is impossible, there's no delta_log:&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Failed to process raw JSON: &lt;A href="https://awsindividual.blob.core.windows.net/awsdashdata1/dash.test/Sync-3/Feed/Company/1014148/CompanyContacts.JSON" target="_blank"&gt;https://awsindividual.blob.core.windows.net/awsdashdata1/dash.test/Sync-3/Feed/Company/1014148/CompanyContacts.JSON&lt;/A&gt; - [DELTA_INVALID_FORMAT] Incompatible format detected. A transaction log for Delta was found at `&lt;A href="https://awsindividual.blob.core.windows.net/awsdashdata1/dash.test/Sync-3/Feed/Company/1014148/CompanyContacts.JSON/_delta_log" target="_blank"&gt;https://awsindividual.blob.core.windows.net/awsdashdata1/dash.test/Sync-3/Feed/Company/1014148/CompanyContacts.JSON/_delta_log&lt;/A&gt;`, but you are trying to read from `&lt;A href="https://awsindividual.blob.core.windows.net/awsdashdata1/dash.test/Sync-3/Feed/Company/1014148/CompanyContacts.JSON" target="_blank"&gt;https://awsindividual.blob.core.windows.net/awsdashdata1/dash.test/Sync-3/Feed/Company/1014148/CompanyContacts.JSON&lt;/A&gt;` using format("text"). You must use 'format("delta")' when reading and writing to a delta table.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 16 Dec 2024 15:07:41 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/rdd-parallelism-without-delta-log/m-p/102267#M41046</guid>
      <dc:creator>Wildabeast</dc:creator>
      <dc:date>2024-12-16T15:07:41Z</dc:date>
    </item>
    <item>
      <title>Re: RDD Parallelism without delta_log</title>
      <link>https://community.databricks.com/t5/data-engineering/rdd-parallelism-without-delta-log/m-p/102295#M41057</link>
      <description>&lt;P&gt;Could it be that our cluster doesn't have the delta lake libraries loaded?&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;maven JAR coordinates:&lt;/P&gt;&lt;P&gt;&amp;nbsp;Maven: io.delta:delta-core_2.12:2.4.0&lt;/P&gt;</description>
      <pubDate>Mon, 16 Dec 2024 19:47:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/rdd-parallelism-without-delta-log/m-p/102295#M41057</guid>
      <dc:creator>Wildabeast</dc:creator>
      <dc:date>2024-12-16T19:47:43Z</dc:date>
    </item>
  </channel>
</rss>

