cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

RDD Parallelism without delta_log

Wildabeast
New Contributor

I've set up my script to be able to use a multinode cluster, but am running into an issue when iterating on a list of .json files to sink to sql table via a JDBC driver. The primary response is that a delta_log file (that I can't see in my blob container) is causing my original file path list to break. Where is this delta_log file and how do I avoid this failure?

7 REPLIES 7

Alberto_Umana
Databricks Employee
Databricks Employee

Hi @Wildabeast,

Is your table a Delta table? And are you getting any failures?

No, it's an ssms dbo table. 

Alberto_Umana
Databricks Employee
Databricks Employee

And what is the error that you are getting?

Can you do one test, you can filter out the _delta_log directory when listing the files. Here is an example of how you can do this in Python

import os

# List all files in the directory
all_files = dbutils.fs.ls("path/to/your/directory")

# Filter out the _delta_log directory
json_files = [file.path for file in all_files if "_delta_log" not in file.path]

# Now you can iterate over json_files
for file_path in json_files:
# Your code to process each JSON file

I actually submitted the error and my .py file....for some reason it didn't post yesterday. Let me pull it while I'm running your filter suggestion. 

It's in Azure, we just named in awsindividual because it's a migration.

Error:

Processing file: https://awsindividual.blob.core.windows.net/awsdashdata1/dash.test/Sync-45/Feed/Company/1570546/Comp... File name: CompanyContacts.JSON, Table: dbo.CompanyContacts, Schema: StructType([StructField('Contacts', ArrayType(StructType([StructField('Address', StructType([StructField('AddressLine1', StringType(), True), StructField('AddressLine2', StringType(), True), StructField('City', StringType(), True), StructField('StateProvince', StringType(), True), StructField('Country', StringType(), True), StructField('PostalCode', StringType(), True), StructField('County', StringType(), True)]), True), StructField('BillingAddress', StructType([StructField('AddressLine1', StringType(), True), StructField('AddressLine2', StringType(), True), StructField('City', StringType(), True), StructField('StateProvince', StringType(), True), StructField('Country', StringType(), True), StructField('PostalCode', StringType(), True), StructField('County', StringType(), True)]), True), StructField('ContactID', IntegerType(), True), StructField('CorrespondenceEmail', StringType(), True), StructField('InquiryEmail', StringType(), True), StructField('MailingAddress', StructType([StructField('AddressLine1', StringType(), True), StructField('AddressLine2', StringType(), True), StructField('City', StringType(), True), StructField('StateProvince', StringType(), True), StructField('Country', StringType(), True), StructField('PostalCode', StringType(), True), StructField('County', StringType(), True)]), True), StructField('MainPhone', StructType([StructField('Number', StringType(), True), StructField('Extension', StringType(), True)]), True), StructField('OtherPhones', ArrayType(StringType(), True), True), StructField('Website', StringType(), True)]), True), True), StructField('CompanyID', IntegerType(), True)]) Error reading file https://awsindividual.blob.core.windows.net/awsdashdata1/dash.test/Sync-45/Feed/Company/1570546/Comp...: [DELTA_INVALID_FORMAT] Incompatible format detected. A transaction log for Delta was found at `https://awsindividual.blob.core.windows.net/awsdashdata1/dash.test/Sync-45/Feed/Company/1570546/Comp..., but you are trying to read from `https://awsindividual.blob.core.windows.net/awsdashdata1/dash.test/Sync-45/Feed/Company/1570546/Comp... using format("json"). You must use 'format("delta")' when reading and writing to a delta table. To learn more about Delta, see https://docs.microsoft.com/azure/databricks/delta/index Processing Results: Skipped: https://awsindividual.blob.core.windows.net/awsdashdata1/dash.test/Sync-45/Feed/Company/1570546/Comp... Processing Completed!

Wildabeast
New Contributor

This is impossible, there's no delta_log:

Failed to process raw JSON: https://awsindividual.blob.core.windows.net/awsdashdata1/dash.test/Sync-3/Feed/Company/1014148/Compa... - [DELTA_INVALID_FORMAT] Incompatible format detected. A transaction log for Delta was found at `https://awsindividual.blob.core.windows.net/awsdashdata1/dash.test/Sync-3/Feed/Company/1014148/Compa...`, but you are trying to read from `https://awsindividual.blob.core.windows.net/awsdashdata1/dash.test/Sync-3/Feed/Company/1014148/Compa...` using format("text"). You must use 'format("delta")' when reading and writing to a delta table.

Wildabeast
New Contributor

Could it be that our cluster doesn't have the delta lake libraries loaded?

maven JAR coordinates:

 Maven: io.delta:delta-core_2.12:2.4.0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group