Databricks Community

elementalM · ‎09-13-2022

I'm wondering if you can help me with a google auth issue related to structured streaming and long running databricks jobs in general. I will get this error after running for 8+ hours. Any tips on this? GCP auth issues for long running jobs?

Caused by: java.net.UnknownHostException: oauth2.googleapis.com

at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)

at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)

at java.net.Socket.connect(Socket.java:607)

at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:288)

at sun.net.NetworkClient.doConnect(NetworkClient.java:175)

at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)

at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)

at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:264)

at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:367)

at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:203)

at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1162)

at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1056)

at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:189)

at sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1340)

at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1315)

at sun.net.www.protocol.https.HttpsURLConnectionImpl.getOutputStream(HttpsURLConnectionImpl.java:264)

at shaded.databricks.com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:113)

at shaded.databricks.com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:84)

at shaded.databricks.com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1012)

at shaded.databricks.com.google.api.client.auth.oauth2.TokenRequest.executeUnparsed(TokenRequest.java:322)

at shaded.databricks.com.google.api.client.auth.oauth2.TokenRequest.execute(TokenRequest.java:346)

at shaded.databricks.com.google.cloud.hadoop.util.CredentialFactory$GoogleCredentialWithRetry.executeRefreshToken(CredentialFactory.java:170)

at shaded.databricks.com.google.api.client.auth.oauth2.Credential.refreshToken(Credential.java:494)

at shaded.databricks.com.google.api.client.auth.oauth2.Credential.intercept(Credential.java:217)

at shaded.databricks.com.google.api.client.http.HttpRequest.execute(HttpRequest.java:880)

at shaded.databricks.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:514)

at shaded.databricks.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:455)

at shaded.databricks.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:565)

at shaded.databricks.com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getObject(GoogleCloudStorageImpl.java:2038)

... 49 more

Driver stacktrace:

at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:3029)

at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2976)

at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2970)

at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)

at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)

at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)

at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2970)

at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1390)

at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1390)

at scala.Option.foreach(Option.scala:407)

at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1390)

at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3238)

at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3179)

at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3167)

at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)

at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:1152)

at org.apache.spark.SparkContext.runJobInternal(SparkContext.scala:2651)

at org.apache.spark.SparkContext.runJob(SparkContext.scala:2634)

at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:325)

... 91 more

Caused by: com.databricks.sql.io.FileReadException: Error while reading file gs://em-blue-data/em-core-data/events/message_date=2022-09-13/part-00003-3f2affa0-0bd4-4e91-ab34-f22c57a2982b.c000.snappy.parquet.

at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.logFileNameAndThrow(FileScanRDD.scala:521)

at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.getNext(FileScanRDD.scala:494)

User16741082858 · ‎09-15-2022

Hi @Dwight Branscombe I am wondering... are you using notebook workflows to stream your jobs? If so, take a look at this document here.

View solution in original post

Debayan · ‎09-14-2022

Hi, This can be an issue with oauth2, could you please check if this steps were followed? https://developers.google.com/identity/protocols/oauth2/web-server

elementalM · ‎09-15-2022

No not at all. I just followed something along these lines: https://docs.gcp.databricks.com/data/data-sources/google/gcs.html.

It's not clear to me how to use this for structured streaming applications given the article you reference is geared for web applications.

Can you elaborate?

User16741082858 · ‎09-15-2022

Hi @Dwight Branscombe I am wondering... are you using notebook workflows to stream your jobs? If so, take a look at this document here.

Anonymous · ‎09-27-2022

Hi @Dwight Branscombe

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help.

We'd love to hear from you.

Thanks!