cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

error on connecting to Snowflake

marchello
New Contributor III

Hi team,

I'm getting weird error in one of my jobs when connecting to Snowflake. All my other jobs (I've got plenty) work fine.

The current one also works fine when I have only one coding step (except installing needed libraries in my very first step).

The error appears when I have more steps in the same job:

1) install needed libraries (this works fine in both cases)

pip install numpy xgboost>=1.3.2 category-encoders feature-engine==0.6.1 snowflake.sqlalchemy eventlet pyarrow==0.17.0 asn1crypto==1.4.0 auditwheel==3.1.1 azure-common==1.1.25 azure-core==1.8.2 azure-storage-blob==12.5.0 boto3==1.15.18 botocore==1.18.18 certifi==2020.4.5.1 cffi==1.14.3 chardet==3.0.4 cryptography==2.9.2 idna==2.10 isodate==0.6.0 jmespath==0.10.0 msrest==0.6.19 oauthlib==3.1.0 oscrypto pycparser==2.20 pycryptodomex==3.9.8 pyelftools==0.26 PyJWT==1.7.1 pyOpenSSL==19.1.0 python-dateutil==2.8.1 pytz==2020.1 requests==2.23.0 requests-oauthlib==1.3.0 s3transfer==0.3.3 six==1.15.0 urllib3==1.25.11 snowflake-connector-python==2.3.10

(my guess is that I can install those libraries once in configuration so that I don't waste time on each job run, but this is rather secondary question here)

2) fetch data from postgres into pandas dataframe

#!/usr/bin/env python3
import psycopg2 as pg
import pandas.io.sql as psql
pass1 = dbutils.secrets.get("monolith", "prod_pass")
conn1 = pg.connect("host=host1 dbname=db1 user=user1 password=" + pass1)
query1 = """ select column1, column2 from table1 """
 
df1 = psql.read_sql(query1, conn1)
 
print("df1: ")
print(df1)

(works fine in both cases)

3) store pandas dataframe into Snowflake table using python connector

from sqlalchemy import create_engine
from snowflake.sqlalchemy import URL
 
engine = create_engine(URL(
    account = 'my_account_name',
    user=dbutils.secrets.get("snowflake", "snowflakeusr"),
    password=dbutils.secrets.get("snowflake", "snowflakepwd"),
    database = 'database1',
    schema = 'schema1',
    warehouse = 'warehouse1',
    role='role1',
    login_timeout=120,
    network_timeout=120
))
 
df1.to_sql('tmp_df1', con=engine, index=False, if_exists='replace')

(also works fine in both cases)

4) loop through snowflake table - this step fails if I have (2) and (3) in the same job, but works fine if (2) and (3) steps are absent

import requests
import eventlet
import time
import snowflake.connector
import json
time.sleep(2)
eventlet.monkey_patch()
 
try:
  ctx = snowflake.connector.connect(
      user=dbutils.secrets.get("snowflake", "snowflakeusr"),
      password=dbutils.secrets.get("snowflake", "snowflakepwd"),
      account="account1",
      warehouse="warehouse1",
      database="database1",
      schema="schema1",
      role="role1",
      login_timeout=120,
      network_timeout=120
      )
  cs = ctx.cursor()
  ...
except Exception as error: 
    raise Exception("Error fetching data from Snowflake: " + str(error))
finally:
    cs.close()
    ctx.close()

it fails on line 10 with error:

    487             try:
--> 488                 cnx.do_handshake()
    489             except OpenSSL.SSL.WantReadError:
...
---> 98     poll_obj = select.poll()
AttributeError: module 'select' has no attribute 'poll'

Unfortunately I can't post full error stack because your forum complains that my post is too long.

Please see full details in pastebin:

https://pastebin.com/VKwZ1ek4

As a workaround, I just run those in two separate jobs:

  • (1), (2), (3) steps
  • (1), (4) steps

But I hope that I can combine those with your help.

Please advise.

1 ACCEPTED SOLUTION

Accepted Solutions

Dan_Z
Honored Contributor
Honored Contributor

@marchello​ I suggest you contact Snowflake to move forward on this one.

View solution in original post

9 REPLIES 9

Kaniz
Community Manager
Community Manager

Hi @ marchello ! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the community have an answer to your question first. Or else I will follow up shortly with a response.

Dan_Z
Honored Contributor
Honored Contributor

Looking at the stack trace, this actually seems like an issue with Snowflake's connector. Or maybe there's a namespace conflict with 'select'? You should see if a different version of the snowflake connector works or contact Snowflake.

marchello
New Contributor III

Will try v2.5.1 and will update.

Anonymous
Not applicable

@marchello​ - How did it go?

marchello
New Contributor III

Hi @Piper Wilson​, thanks for asking.

Upgraded Snowflake connector to v2.5.1 in both of my jobs, but didn't try to combine them yet actually. This is TBD. The process is still in DEV state, so it's even more convenient to schedule them separately as I can set one of parts to run more often. Eventually I will need to combine them, but it's not urgent yet.

Anonymous
Not applicable

@marchello​  Cool! 😎

marchello
New Contributor III

Just tried to combine my jobs with the latest Snowflake connector version. No, it didn't work. Any other suggestions, please?

Anonymous
Not applicable

@marchello​ - I'm passing this on to the team. 👍

Dan_Z
Honored Contributor
Honored Contributor

@marchello​ I suggest you contact Snowflake to move forward on this one.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.