PYODBC very slow - 30 minutes to write 6000 rows

turagittech — Fri, 02 Sep 2022 01:32:02 GMT

Along withh several other issues I'm encountering, I am finding pandas dataframe to_sql being very slow

I am writing to an Azure SQL database and performance is woeful. This is a test database and it has S3 100DTU and one user, me as it's configuration. It's quick enough for most other tasks, but terrible for this.

I don't see any isssues in the database monitoring to suggest I've exceeded DTU or something.

This is my connection configuration, with appropriate redactions

connection_string = "Driver={ODBC Driver 18 for SQL Server};" \
                    "Server=azureservername.privatelink.database.windows.net;" \
                    "Database=datawarehousename;TrustServerCertificate=yes;" \
                    "Encrypt=yes;UID=dbuser;PWD=<pwd>"
connection_url = URL.create("mssql+pyodbc", query={"odbc_connect": connection_string})
engine = create_engine(connection_url, fast_executemany=True)

This is my dataframe.to_sql write to database

dfoutDB.to_sql("ResourceLogRpt", schema="reporting", con=engine, if_exists='append', chunksize=10000, index=False)

Changing chunksize isn't making a difference

In between reading data I've manipulated URL data to extract directory elements and parameters

Is there a way to verify the fast_executemany=True is doing anything?

Any other thoughts on diagnosing what is causing this to be slow.?

My cluster is 56 Gb and 8 cores DBR 1.4 with Scala 3.2.1, I would have thought enough as all other data frame manipulation is very fast, but if there's an impact that I need to understand I'd appreciate telling me what I misunderstood

Thanks

Peter

Re: PYODBC very slow - 30 minutes to write 6000 rows

Vidula — Sun, 18 Sep 2022 06:07:31 GMT

Hi @Peter McLarty

Does @Debayan Mukherjee response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?

We'd love to hear from you.

Thanks!

Re: PYODBC very slow - 30 minutes to write 6000 rows

Debayan — Fri, 02 Sep 2022 20:44:15 GMT

Hi, Thanks for reaching out to community.databricks.com.

Could you please run a profiler trace on the SQL and determine what's the issue going on?

Also, please refer https://stackoverflow.com/questions/29638136/how-to-speed-up-bulk-insert-to-ms-sql-server-using-pyodbc/47057189#47057189 to add fast_executemany feature, which will be "off" by default, this helps to speed up bulk actions.

topic Re: PYODBC very slow - 30 minutes to write 6000 rows in Data Engineering

PYODBC very slow - 30 minutes to write 6000 rows

Re: PYODBC very slow - 30 minutes to write 6000 rows

Re: PYODBC very slow - 30 minutes to write 6000 rows