cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks Notebook dataframe loading duplicate data in SQL table

Priya_Mani
New Contributor II

Hi, I am trying to load data from datalake into SQL table using "SourceDataFrame.write" operation in a Notebook using apache spark.

This seems to be loading duplicates at random times. The logs don't give much information and I am not sure what else to look for. How can I investigate and find the root cause for this. Please let me know what more information I can provide for anyone to help.

Thanks!

4 REPLIES 4

-werners-
Esteemed Contributor III

can you elaborate a bit more on this notebook?

And also what databricks runtime version?

hi @Werner Stinckens​ , This is a Apache spark notebook, which reads the contents of a file stored in Azure blob and loads into an on prem SQL table.

Databricks Runtime is 9.1 LTS (includes Apache Spark 3.1.2, Scala 2.12) with a Standard_DS3_v2 worker-driver type

The notebook reads the file content using below code

val SourceDataFrame = spark 
            .read 
            .option("header","false") 
            .option("delimiter", "|") 
            .schema(SourceSchemaStruct) 
            .csv(SourceFilename)

Then it writes the dataframe into a table with an overwrite mode

SourceDataFrame2
      .write
      .format("jdbc")
      .mode("overwrite")
      .option("driver", driverClass)
      .option("url", jdbcUrl)
      .option("dbtable", TargetTable)
      .option("user", jdbcUsername)
      .option("password", jdbcPassword)
      .save()

-werners-
Esteemed Contributor III

Kaniz
Community Manager
Community Manager

Hi @Priya Mani​​, We haven’t heard from you since the last response from @Werner Stinckens​, and I was checking back to see if you have a resolution yet.

If you have any solution, please share it with the community as it can be helpful to others. Otherwise, we will respond with more details and try to help.

Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.