topic Re: Job aborted stage failure java.sql.SQLRecoverableException: IO Error: Connection reset by peer in Data Engineering

Job aborted stage failure java.sql.SQLRecoverableException: IO Error: Connection reset by peer

Sudharsan24 — Tue, 23 Jul 2024 09:09:26 GMT

While ingesting data from Oracle to databricks(writing into ADLS) using jdbc I am getting connection reset by peer error when ingesting a large table which has millions of rows.I am using oracle sql developer and azure databricks.

I tried every way like using partition column (lower and upper bounds), predicates and also incremental loading none of them are working, please help me if anyone knows the solution
jdbc(url=conn, table= pQuery, properties={ "user": user, "password": pwd, "driver": driver, "autoReconnect": "true", #"numPartitions": "50", "numPartitions": "20", "partitionColumn": "PARTITION_KEY", "lowerBound": lbound, "upperBound": ubound, #"fetchSize": "90000000" "fetchSize": "900000" } )

Re: Job aborted stage failure java.sql.SQLRecoverableException: IO Error: Connection reset by peer

Rishabh-Pandey — Tue, 23 Jul 2024 09:19:24 GMT

Try using this code .

import pyspark from pyspark.sql import SparkSession # Initialize Spark session spark = SparkSession.builder.appName("OracleToDatabricks").getOrCreate() # Oracle connection properties conn = "jdbc:oracle:thin:@//<host>:<port>/<service_name>" user = "<username>" pwd = "<password>" driver = "oracle.jdbc.driver.OracleDriver" pQuery = "<table_name>" lbound = 1 ubound = 1000000 batch_size = 10000 properties = { "user": user, "password": pwd, "driver": driver, "autoReconnect": "true", "numPartitions": "20", "partitionColumn": "PARTITION_KEY", "lowerBound": lbound, "upperBound": ubound, "fetchSize": "10000" } for i in range(lbound, ubound, batch_size): lower_bound = i upper_bound = min(i + batch_size - 1, ubound) query = f"(SELECT * FROM {pQuery} WHERE PARTITION_KEY >= {lower_bound} AND PARTITION_KEY <= {upper_bound}) AS TEMP" df = spark.read.jdbc(url=conn, table=query, properties=properties) # Process and write the data to ADLS df.write.mode("append").parquet("path/to/adls")

Re: Job aborted stage failure java.sql.SQLRecoverableException: IO Error: Connection reset by peer

Sudharsan24 — Tue, 23 Jul 2024 11:00:21 GMT

Thanks for the reply, its still failing with same issue