<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Read sqlite file from s3 bucket into databricks, creating delta tables in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/read-sqlite-file-from-s3-bucket-into-databricks-creating-delta/m-p/146754#M52699</link>
    <description>&lt;P&gt;&lt;SPAN&gt;I have a sqlite database that I want to read into databricks to create delta tables/dataframes in Python that I can export to power BI and have a live connection. When there is new data added to my sqlite data base, the changes will need to reflect in my Power BI.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;My sqlite file is uploaded to an s3 bucket. I was able to read the file in by creating a temporary local path and then using the sqlite3.connect(local_path) to connect. I then used pandas to read the sql query to create my pandas dataframe, then output my dataframe to my s3 bucket as a csv file. Once this was output to my s3 bucket, I could then manually create a table from that output file using UI in databricks. I do know how to connect to power BI through spark and was able to get my tables in there, so I don't need help on that part.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;However, this does not give me a live connection. How can I read the sqlite file from my s3 bucket into databricks and create delta tables/dataframes? Thank you in advance.&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 03 Feb 2026 18:58:33 GMT</pubDate>
    <dc:creator>lw2</dc:creator>
    <dc:date>2026-02-03T18:58:33Z</dc:date>
    <item>
      <title>Read sqlite file from s3 bucket into databricks, creating delta tables</title>
      <link>https://community.databricks.com/t5/data-engineering/read-sqlite-file-from-s3-bucket-into-databricks-creating-delta/m-p/146754#M52699</link>
      <description>&lt;P&gt;&lt;SPAN&gt;I have a sqlite database that I want to read into databricks to create delta tables/dataframes in Python that I can export to power BI and have a live connection. When there is new data added to my sqlite data base, the changes will need to reflect in my Power BI.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;My sqlite file is uploaded to an s3 bucket. I was able to read the file in by creating a temporary local path and then using the sqlite3.connect(local_path) to connect. I then used pandas to read the sql query to create my pandas dataframe, then output my dataframe to my s3 bucket as a csv file. Once this was output to my s3 bucket, I could then manually create a table from that output file using UI in databricks. I do know how to connect to power BI through spark and was able to get my tables in there, so I don't need help on that part.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;However, this does not give me a live connection. How can I read the sqlite file from my s3 bucket into databricks and create delta tables/dataframes? Thank you in advance.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 03 Feb 2026 18:58:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/read-sqlite-file-from-s3-bucket-into-databricks-creating-delta/m-p/146754#M52699</guid>
      <dc:creator>lw2</dc:creator>
      <dc:date>2026-02-03T18:58:33Z</dc:date>
    </item>
    <item>
      <title>Re: Read sqlite file from s3 bucket into databricks, creating delta tables</title>
      <link>https://community.databricks.com/t5/data-engineering/read-sqlite-file-from-s3-bucket-into-databricks-creating-delta/m-p/150205#M53305</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/213289"&gt;@lw2&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;The approach you have been using (copy SQLite to local, read with sqlite3, export to CSV, then manually create tables) works as a one-shot load but, as you noticed, it does not give you an easy path to keep things in sync. Below is a streamlined pattern that reads SQLite directly into Delta tables and can be scheduled so your Power BI reports always reflect the latest data.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;OVERVIEW OF THE APPROACH&lt;/P&gt;
&lt;P&gt;Because SQLite is a file-based database (not a server you can connect to over the network), Spark's built-in JDBC reader cannot reach it on S3 directly. The proven pattern is:&lt;/P&gt;
&lt;P&gt;1. Copy the SQLite file from S3 to the driver's local storage.&lt;BR /&gt;2. Use Python's sqlite3 module to query each table.&lt;BR /&gt;3. Convert the results to Spark DataFrames and write them as Delta tables.&lt;BR /&gt;4. Schedule the notebook with a Lakeflow Job so it runs automatically whenever the SQLite file is updated.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;STEP 1: COPY THE SQLITE FILE TO LOCAL STORAGE&lt;/P&gt;
&lt;P&gt;If your S3 bucket is configured as an external location in Unity Catalog, you can use dbutils. Otherwise you can use boto3 or the AWS CLI. Here is the dbutils approach:&lt;/P&gt;
&lt;P&gt;import os&lt;/P&gt;
&lt;P&gt;s3_path = "s3://your-bucket/path/to/database.sqlite"&lt;BR /&gt;local_path = "/tmp/database.sqlite"&lt;/P&gt;
&lt;P&gt;dbutils.fs.cp(s3_path, "file:" + local_path)&lt;/P&gt;
&lt;P&gt;If you prefer boto3:&lt;/P&gt;
&lt;P&gt;import boto3&lt;/P&gt;
&lt;P&gt;s3 = boto3.client("s3")&lt;BR /&gt;s3.download_file("your-bucket", "path/to/database.sqlite", "/tmp/database.sqlite")&lt;/P&gt;
&lt;P&gt;Note: /tmp on the driver node is ephemeral, so this file only lives for the duration of the cluster session. That is fine because we re-download it each run.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;STEP 2: READ SQLITE TABLES AND WRITE TO DELTA&lt;/P&gt;
&lt;P&gt;import sqlite3&lt;BR /&gt;import pandas as pd&lt;BR /&gt;from pyspark.sql import SparkSession&lt;/P&gt;
&lt;P&gt;spark = SparkSession.builder.getOrCreate()&lt;/P&gt;
&lt;P&gt;conn = sqlite3.connect("/tmp/database.sqlite")&lt;/P&gt;
&lt;P&gt;# List all tables in the SQLite database&lt;BR /&gt;tables = pd.read_sql_query(&lt;BR /&gt;"SELECT name FROM sqlite_master WHERE type='table';", conn&lt;BR /&gt;)&lt;/P&gt;
&lt;P&gt;catalog = "your_catalog"&lt;BR /&gt;schema = "your_schema"&lt;/P&gt;
&lt;P&gt;for table_name in tables["name"]:&lt;BR /&gt;pdf = pd.read_sql_query(f"SELECT * FROM [{table_name}]", conn)&lt;BR /&gt;sdf = spark.createDataFrame(pdf)&lt;BR /&gt;sdf.write.mode("overwrite").saveAsTable(f"{catalog}.{schema}.{table_name}")&lt;BR /&gt;print(f"Wrote {len(pdf)} rows to {catalog}.{schema}.{table_name}")&lt;/P&gt;
&lt;P&gt;conn.close()&lt;/P&gt;
&lt;P&gt;Key notes:&lt;BR /&gt;- mode("overwrite") replaces the entire table each run. If you want incremental updates instead, you can use Delta merge (MERGE INTO) with a primary key column as the match condition.&lt;BR /&gt;- saveAsTable creates managed Delta tables in Unity Catalog, which Power BI can connect to directly via the Databricks SQL Connector or Partner Connect.&lt;BR /&gt;- The bracket syntax around table_name in the SQL handles table names that contain spaces or special characters.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;STEP 3 (OPTIONAL): INCREMENTAL MERGE FOR LARGE TABLES&lt;/P&gt;
&lt;P&gt;If your SQLite tables have a reliable primary key and you want to avoid full overwrites, you can use Delta merge:&lt;/P&gt;
&lt;P&gt;from delta.tables import DeltaTable&lt;/P&gt;
&lt;P&gt;for table_name in tables["name"]:&lt;BR /&gt;pdf = pd.read_sql_query(f"SELECT * FROM [{table_name}]", conn)&lt;BR /&gt;updates_df = spark.createDataFrame(pdf)&lt;BR /&gt;full_table = f"{catalog}.{schema}.{table_name}"&lt;/P&gt;
&lt;P&gt;if spark.catalog.tableExists(full_table):&lt;BR /&gt;delta_table = DeltaTable.forName(spark, full_table)&lt;BR /&gt;delta_table.alias("target").merge(&lt;BR /&gt;updates_df.alias("source"),&lt;BR /&gt;"target.id = source.id" # replace 'id' with your primary key&lt;BR /&gt;).whenMatchedUpdateAll().whenNotMatchedInsertAll().execute()&lt;BR /&gt;else:&lt;BR /&gt;updates_df.write.saveAsTable(full_table)&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;STEP 4: SCHEDULE WITH A LAKEFLOW JOB&lt;/P&gt;
&lt;P&gt;To keep your Delta tables current automatically:&lt;/P&gt;
&lt;P&gt;1. Save the code above in a Databricks notebook.&lt;BR /&gt;2. Go to Workflows in the left sidebar and create a new Job.&lt;BR /&gt;3. Add a task pointing to your notebook.&lt;BR /&gt;4. Set a trigger: use a time-based schedule (for example, every hour or daily), or use file arrival triggers if you want it to run whenever the SQLite file is updated in S3.&lt;/P&gt;
&lt;P&gt;This way your Power BI dashboards, connected to the Delta tables via the Databricks SQL Connector, will always show up-to-date data without any manual steps.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;ADDITIONAL TIPS&lt;/P&gt;
&lt;P&gt;- If the SQLite file is very large and individual tables have millions of rows, consider reading the data in chunks with pd.read_sql_query using the chunksize parameter, then writing each chunk with mode("append").&lt;BR /&gt;- If you have not yet set up your S3 bucket as an external location in Unity Catalog, follow the guide here: &lt;A href="https://docs.databricks.com/en/connect/storage/index.html" target="_blank"&gt;https://docs.databricks.com/en/connect/storage/index.html&lt;/A&gt;&lt;BR /&gt;- For the Power BI connection, the Databricks SQL Connector and Partner Connect documentation is here: &lt;A href="https://docs.databricks.com/en/partners/bi/power-bi.html" target="_blank"&gt;https://docs.databricks.com/en/partners/bi/power-bi.html&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Hope this helps you get a fully automated pipeline running.&lt;/P&gt;
&lt;P&gt;* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.&lt;/P&gt;</description>
      <pubDate>Sun, 08 Mar 2026 07:41:35 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/read-sqlite-file-from-s3-bucket-into-databricks-creating-delta/m-p/150205#M53305</guid>
      <dc:creator>SteveOstrowski</dc:creator>
      <dc:date>2026-03-08T07:41:35Z</dc:date>
    </item>
  </channel>
</rss>

