<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: PYODBC very slow -  30 minutes to write 6000 rows in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/pyodbc-very-slow-30-minutes-to-write-6000-rows/m-p/32755#M23888</link>
    <description>&lt;P&gt;Hi @Peter McLarty​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Does @Debayan Mukherjee​&amp;nbsp; response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We'd love to hear from you.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Sun, 18 Sep 2022 06:07:31 GMT</pubDate>
    <dc:creator>Vidula</dc:creator>
    <dc:date>2022-09-18T06:07:31Z</dc:date>
    <item>
      <title>PYODBC very slow -  30 minutes to write 6000 rows</title>
      <link>https://community.databricks.com/t5/data-engineering/pyodbc-very-slow-30-minutes-to-write-6000-rows/m-p/32753#M23886</link>
      <description>&lt;P&gt;Along withh several other issues I'm encountering, I am finding pandas dataframe to_sql being very slow&lt;/P&gt;&lt;P&gt;I am writing to an Azure SQL database and performance is woeful. This is a test database and it has S3 100DTU and one user, me as it's configuration. It's quick enough for most other tasks, but terrible for this.&lt;/P&gt;&lt;P&gt;I don't see any isssues in the database monitoring to suggest I've exceeded DTU or something.&lt;/P&gt;&lt;P&gt;This is my connection configuration, with appropriate redactions &lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;connection_string = "Driver={ODBC Driver 18 for SQL Server};" \
                    "Server=azureservername.privatelink.database.windows.net;" \
                    "Database=datawarehousename;TrustServerCertificate=yes;" \
                    "Encrypt=yes;UID=dbuser;PWD=&amp;lt;pwd&amp;gt;"
connection_url = URL.create("mssql+pyodbc", query={"odbc_connect": connection_string})
engine = create_engine(connection_url, fast_executemany=True)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This is my dataframe.to_sql write to database&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;dfoutDB.to_sql("ResourceLogRpt", schema="reporting", con=engine, if_exists='append', chunksize=10000, index=False)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Changing chunksize isn't making a difference&lt;/P&gt;&lt;P&gt;In between reading data I've manipulated URL data to extract directory elements and parameters&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Is there a way to verify the fast_executemany=True is doing anything?&lt;/P&gt;&lt;P&gt;Any other thoughts on diagnosing what is causing this to be slow.?&lt;/P&gt;&lt;P&gt;My cluster is 56 Gb and 8 cores DBR 1.4 with Scala 3.2.1, I would have thought enough as all other data frame manipulation is very fast, but if there's an impact that I need to understand I'd appreciate telling me what I misunderstood&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Peter&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 02 Sep 2022 01:32:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pyodbc-very-slow-30-minutes-to-write-6000-rows/m-p/32753#M23886</guid>
      <dc:creator>turagittech</dc:creator>
      <dc:date>2022-09-02T01:32:02Z</dc:date>
    </item>
    <item>
      <title>Re: PYODBC very slow -  30 minutes to write 6000 rows</title>
      <link>https://community.databricks.com/t5/data-engineering/pyodbc-very-slow-30-minutes-to-write-6000-rows/m-p/32755#M23888</link>
      <description>&lt;P&gt;Hi @Peter McLarty​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Does @Debayan Mukherjee​&amp;nbsp; response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We'd love to hear from you.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 18 Sep 2022 06:07:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pyodbc-very-slow-30-minutes-to-write-6000-rows/m-p/32755#M23888</guid>
      <dc:creator>Vidula</dc:creator>
      <dc:date>2022-09-18T06:07:31Z</dc:date>
    </item>
    <item>
      <title>Re: PYODBC very slow -  30 minutes to write 6000 rows</title>
      <link>https://community.databricks.com/t5/data-engineering/pyodbc-very-slow-30-minutes-to-write-6000-rows/m-p/32754#M23887</link>
      <description>&lt;P&gt;Hi, Thanks for reaching out to community.databricks.com. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Could you please run a profiler trace on the SQL and determine what's the issue going on? &lt;/P&gt;&lt;P&gt;Also, please refer &lt;A href="https://stackoverflow.com/questions/29638136/how-to-speed-up-bulk-insert-to-ms-sql-server-using-pyodbc/47057189#47057189" target="test_blank"&gt;https://stackoverflow.com/questions/29638136/how-to-speed-up-bulk-insert-to-ms-sql-server-using-pyodbc/47057189#47057189&lt;/A&gt; to add fast_executemany feature, which will be "off" by default, this helps to speed up bulk actions. &lt;/P&gt;</description>
      <pubDate>Fri, 02 Sep 2022 20:44:15 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pyodbc-very-slow-30-minutes-to-write-6000-rows/m-p/32754#M23887</guid>
      <dc:creator>Debayan</dc:creator>
      <dc:date>2022-09-02T20:44:15Z</dc:date>
    </item>
  </channel>
</rss>

