Databricks Community

diegohMoodys · ‎01-16-2025

Spark version: spark-3.4.1-bin-hadoop3

JBDC Driver: mysql-connector-j-8.4.0.jar

Assumptions:

have all the proper read/write permissions
dataset isn't large: ~2 million records
reading flat files, writing to a database
Does not read from the database at all

I call this transaction on a nohup
```

out.show() # perform a calculation

print("Writing to database")

jdbc_url = f"jdbc:mysql://{hostname}/{database}?rewriteBatchedStatements=true"

properties = {

"user": ****,

"password": ****,

"driver": "com.mysql.jdbc.Driver"

}

out.write.jdbc(url=jdbc_url, table="story_count_by_entity_id", mode="overwrite", properties=properties)

print("DONE")

```
This just doesn't complete. The output clearly shows the data is correct,
```
|5011422399|2025-01-16 00:00:00|         1|         2|         5|         16|         48|         224|         562|   9|
|6001131375|2025-01-16 00:00:00|       353|       629|      2163|       9314|      18256|       23679|       53813|5229|
|2707170344|2025-01-16 00:00:00|        11|        23|        48|        293|       1728|        3113|        4169| 106|
|2838891055|2025-01-16 00:00:00|         3|         3|        29|         78|         98|         167|         350|  54|
|3784123049|2025-01-16 00:00:00|         6|        10|       113|        238|        472|        1076|        3119| 163|
+----------+-------------------+----------+----------+----------+-----------+-----------+------------+------------+----+
only showing top 20 rows

Writing to database
DONE
```
But the database shows incomplete data