Saturday
I've set up my script to be able to use a multinode cluster, but am running into an issue when iterating on a list of .json files to sink to sql table via a JDBC driver. The primary response is that a delta_log file (that I can't see in my blob container) is causing my original file path list to break. Where is this delta_log file and how do I avoid this failure?
Saturday
Hi @Wildabeast,
Is your table a Delta table? And are you getting any failures?
Sunday
No, it's an ssms dbo table.
Monday
And what is the error that you are getting?
Can you do one test, you can filter out the _delta_log
directory when listing the files. Here is an example of how you can do this in Python
import os
# List all files in the directory
all_files = dbutils.fs.ls("path/to/your/directory")
# Filter out the _delta_log directory
json_files = [file.path for file in all_files if "_delta_log" not in file.path]
# Now you can iterate over json_files
for file_path in json_files:
# Your code to process each JSON file
Monday
I actually submitted the error and my .py file....for some reason it didn't post yesterday. Let me pull it while I'm running your filter suggestion.
Monday - last edited Monday
It's in Azure, we just named in awsindividual because it's a migration.
Error:
Processing file: https://awsindividual.blob.core.windows.net/awsdashdata1/dash.test/Sync-45/Feed/Company/1570546/Comp... File name: CompanyContacts.JSON, Table: dbo.CompanyContacts, Schema: StructType([StructField('Contacts', ArrayType(StructType([StructField('Address', StructType([StructField('AddressLine1', StringType(), True), StructField('AddressLine2', StringType(), True), StructField('City', StringType(), True), StructField('StateProvince', StringType(), True), StructField('Country', StringType(), True), StructField('PostalCode', StringType(), True), StructField('County', StringType(), True)]), True), StructField('BillingAddress', StructType([StructField('AddressLine1', StringType(), True), StructField('AddressLine2', StringType(), True), StructField('City', StringType(), True), StructField('StateProvince', StringType(), True), StructField('Country', StringType(), True), StructField('PostalCode', StringType(), True), StructField('County', StringType(), True)]), True), StructField('ContactID', IntegerType(), True), StructField('CorrespondenceEmail', StringType(), True), StructField('InquiryEmail', StringType(), True), StructField('MailingAddress', StructType([StructField('AddressLine1', StringType(), True), StructField('AddressLine2', StringType(), True), StructField('City', StringType(), True), StructField('StateProvince', StringType(), True), StructField('Country', StringType(), True), StructField('PostalCode', StringType(), True), StructField('County', StringType(), True)]), True), StructField('MainPhone', StructType([StructField('Number', StringType(), True), StructField('Extension', StringType(), True)]), True), StructField('OtherPhones', ArrayType(StringType(), True), True), StructField('Website', StringType(), True)]), True), True), StructField('CompanyID', IntegerType(), True)]) Error reading file https://awsindividual.blob.core.windows.net/awsdashdata1/dash.test/Sync-45/Feed/Company/1570546/Comp...: [DELTA_INVALID_FORMAT] Incompatible format detected. A transaction log for Delta was found at `https://awsindividual.blob.core.windows.net/awsdashdata1/dash.test/Sync-45/Feed/Company/1570546/Comp..., but you are trying to read from `https://awsindividual.blob.core.windows.net/awsdashdata1/dash.test/Sync-45/Feed/Company/1570546/Comp... using format("json"). You must use 'format("delta")' when reading and writing to a delta table. To learn more about Delta, see https://docs.microsoft.com/azure/databricks/delta/index Processing Results: Skipped: https://awsindividual.blob.core.windows.net/awsdashdata1/dash.test/Sync-45/Feed/Company/1570546/Comp... Processing Completed!
Monday
This is impossible, there's no delta_log:
Failed to process raw JSON: https://awsindividual.blob.core.windows.net/awsdashdata1/dash.test/Sync-3/Feed/Company/1014148/Compa... - [DELTA_INVALID_FORMAT] Incompatible format detected. A transaction log for Delta was found at `https://awsindividual.blob.core.windows.net/awsdashdata1/dash.test/Sync-3/Feed/Company/1014148/Compa...`, but you are trying to read from `https://awsindividual.blob.core.windows.net/awsdashdata1/dash.test/Sync-3/Feed/Company/1014148/Compa...` using format("text"). You must use 'format("delta")' when reading and writing to a delta table.
Monday
Could it be that our cluster doesn't have the delta lake libraries loaded?
maven JAR coordinates:
Maven: io.delta:delta-core_2.12:2.4.0
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group