<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Data Engineering with Databricks course lesson 4.2 set up error. in DELETE</title>
    <link>https://community.databricks.com/t5/delete/data-engineering-with-databricks-course-lesson-4-2-set-up-error/m-p/13938#M610</link>
    <description>&lt;P&gt;Posting a solution here for anyone else that runs into this issue, the notebook needs to be run on a single node cluster.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I found the necessary information in&lt;A href="https://github.com/databricks-academy/data-engineering-with-databricks/commit/bdea92bf224272e4c73ff3265b1e35c2adb52d34" alt="https://github.com/databricks-academy/data-engineering-with-databricks/commit/bdea92bf224272e4c73ff3265b1e35c2adb52d34" target="_blank"&gt; this commit&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 13 Jul 2022 15:35:03 GMT</pubDate>
    <dc:creator>A_GREY_FOX</dc:creator>
    <dc:date>2022-07-13T15:35:03Z</dc:date>
    <item>
      <title>Data Engineering with Databricks course lesson 4.2 set up error.</title>
      <link>https://community.databricks.com/t5/delete/data-engineering-with-databricks-course-lesson-4-2-set-up-error/m-p/13937#M609</link>
      <description>&lt;P&gt;Hello everyone,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;When I run the set up for lesson 4.2 Providing Options for External Sources, I get the below error:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 8491.0 failed 4 times, most recent failure: Lost task 0.3 in stage 8491.0 (TID 81305, 10.33.185.79, executor 23): java.sql.SQLException: [SQLITE_ERROR] SQL error or missing database (no such table: users)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;Py4JJavaError                             Traceback (most recent call last)
&amp;lt;command-176215&amp;gt; in &amp;lt;module&amp;gt;
      2 DA.init()
      3 install_eltwss_datasets(reinstall=False)
----&amp;gt; 4 load_eltwss_external_tables()
      5 DA.conclude_setup()
&amp;nbsp;
&amp;lt;command-180667&amp;gt; in load_eltwss_external_tables()
     30           .option("url", f"jdbc:sqlite:/{DA.username}_ecommerce.db")
     31           .option("dbtable", "users") # The table name in sqllight
---&amp;gt; 32           .mode("overwrite")
     33           .save()
     34     )
&amp;nbsp;
/databricks/spark/python/pyspark/sql/readwriter.py in save(self, path, format, mode, partitionBy, **options)
    823             self.format(format)
    824         if path is None:
--&amp;gt; 825             self._jwrite.save()
    826         else:
    827             self._jwrite.save(path)
&amp;nbsp;
/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in __call__(self, *args)
   1303         answer = self.gateway_client.send_command(command)
   1304         return_value = get_return_value(
-&amp;gt; 1305             answer, self.gateway_client, self.target_id, self.name)
   1306 
   1307         for temp_arg in temp_args:
&amp;nbsp;
/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
    125     def deco(*a, **kw):
    126         try:
--&amp;gt; 127             return f(*a, **kw)
    128         except py4j.protocol.Py4JJavaError as e:
    129             converted = convert_exception(e.java_exception)
&amp;nbsp;
/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    326                 raise Py4JJavaError(
    327                     "An error occurred while calling {0}{1}{2}.\n".
--&amp;gt; 328                     format(target_id, ".", name), value)
    329             else:
    330                 raise Py4JError(&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I think the issue may lie in the below functions which I believe are called by the setup command.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;def copy_source_dataset(src_path, dst_path, format, name):
    import time
    start = int(time.time())
    print(f"Creating the {name} dataset", end="...")
    
    dbutils.fs.cp(src_path, dst_path, True)
&amp;nbsp;
    total = spark.read.format(format).load(dst_path).count()
    print(f"({int(time.time())-start} seconds / {total:,} records)")
    
def load_eltwss_external_tables():
    copy_source_dataset(f"{DA.paths.datasets}/raw/sales-csv", 
                        f"{DA.paths.working_dir}/sales-csv", "csv", "sales-csv")
&amp;nbsp;
    import time
    start = int(time.time())
    print(f"Creating the users table", end="...")
&amp;nbsp;
    # REFACTORING - Making lesson-specific copy
    dbutils.fs.cp(f"{DA.paths.datasets}/raw/users-historical", 
                  f"{DA.paths.working_dir}/users-historical", True)
&amp;nbsp;
    # &lt;A href="https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html" target="test_blank"&gt;https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html&lt;/A&gt;
    (spark.read
          .format("parquet")
          .load(f"{DA.paths.working_dir}/users-historical")
          .repartition(1)
          .write
          .format("org.apache.spark.sql.jdbc")
          .option("url", f"jdbc:sqlite:/{DA.username}_ecommerce.db")
          .option("dbtable", "users") # The table name in sqllight
          .mode("overwrite")
          .save()
    )
&amp;nbsp;
    total = spark.read.parquet(f"{DA.paths.working_dir}/users-historical").count()
    print(f"({int(time.time())-start} seconds / {total:,} records)")&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;I can see that this was raised and seemingly resolved &lt;A href="https://community.databricks.com/s/question/0D53f00001n660QCAQ/the-databricksacademy-question" alt="https://community.databricks.com/s/question/0D53f00001n660QCAQ/the-databricksacademy-question" target="_blank"&gt;here&lt;/A&gt;, but the solution is not shared and newer posters with the same issue have gone unanswered.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I have raised a support ticket but I'm not getting a response.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;I have tried removing everything related to the course from databricks and then adding the course repo again making sure I'm using the latest release.&lt;/LI&gt;&lt;LI&gt;I'm tried to fix the code myself but with no luck as I thought it might be a typo or similar.&lt;/LI&gt;&lt;LI&gt;My company is still on databricks runtime 7.3, could this be related?&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Any guidance would be much appreciated.&lt;/P&gt;</description>
      <pubDate>Tue, 12 Jul 2022 09:45:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/delete/data-engineering-with-databricks-course-lesson-4-2-set-up-error/m-p/13937#M609</guid>
      <dc:creator>A_GREY_FOX</dc:creator>
      <dc:date>2022-07-12T09:45:42Z</dc:date>
    </item>
    <item>
      <title>Re: Data Engineering with Databricks course lesson 4.2 set up error.</title>
      <link>https://community.databricks.com/t5/delete/data-engineering-with-databricks-course-lesson-4-2-set-up-error/m-p/13938#M610</link>
      <description>&lt;P&gt;Posting a solution here for anyone else that runs into this issue, the notebook needs to be run on a single node cluster.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I found the necessary information in&lt;A href="https://github.com/databricks-academy/data-engineering-with-databricks/commit/bdea92bf224272e4c73ff3265b1e35c2adb52d34" alt="https://github.com/databricks-academy/data-engineering-with-databricks/commit/bdea92bf224272e4c73ff3265b1e35c2adb52d34" target="_blank"&gt; this commit&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 13 Jul 2022 15:35:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/delete/data-engineering-with-databricks-course-lesson-4-2-set-up-error/m-p/13938#M610</guid>
      <dc:creator>A_GREY_FOX</dc:creator>
      <dc:date>2022-07-13T15:35:03Z</dc:date>
    </item>
  </channel>
</rss>

