<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Importing Azure SQL data into Databricks in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/importing-azure-sql-data-into-databricks/m-p/14789#M9230</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am looking at building a data warehouse using Databricks. Most of the data will be coming from Azure SQL, and we now have Azure SQL CDC enabled to capture changes. Also I would like to import this without paying for additional connectors like FiveTran.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;1. Would it be reasonable to create one Notebook / Delta Live Pipeline per source table?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;2. The first time the Delta Live Pipeline runs there will be no tables or data available in Databricks so I am guessing I need a quick check to see if the table already exists, and if not, pull the entire table from Azure SQL? I was thinking something like this (although it doesn't seem to work)?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;db_name = "AdventureWorks"
table_name = 'SalesLT_Customer'
&amp;nbsp;
tables_collection = spark.catalog.listTables(db_name)
table_names_in_db = [table.name for table in tables_collection]
table_exists = table_name in table_names_in_db
&amp;nbsp;
if not table_exists:    
    @dlt.table(
    name=f"SalesLT_Customer",
    comment=f"Original data for SalesLT.Customer"
    )
    def SalesLT_Customer():
        df = spark.read.format("jdbc") \
            .option("url", "jdbc:sqlserver://sql.database.windows.net;databaseName=database") \
            .option("username", "x") \
            .option("password", "x") \
            .option("dbtable", "SalesLt.Customer") \
            .option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") \
            .load()
        return (df)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;For subsequent runs, I would take the data from the CDC tables in Azure SQL. Maybe something like this?&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;@dlt.table(
    name=f"CDC_SalesLT_Customer_CT",
    comment=f"Original CDC data for SalesLT_Customer_CT"
)
def CDC_SalesLT_Customer_CT():
    df = spark.read.format("jdbc") \
        .option("url", "jdbc:sqlserver://sql.database.windows.net;databaseName=database") \
        .option("username", "x") \
        .option("password", "x") \
        .option("dbtable", "cdc.SalesLT_Customer_CT") \
        .option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") \
        .load()
    return (df)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Does that seem reasonable so far?&lt;/P&gt;</description>
    <pubDate>Mon, 04 Jul 2022 16:37:16 GMT</pubDate>
    <dc:creator>BearInTheWoods</dc:creator>
    <dc:date>2022-07-04T16:37:16Z</dc:date>
    <item>
      <title>Importing Azure SQL data into Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/importing-azure-sql-data-into-databricks/m-p/14789#M9230</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am looking at building a data warehouse using Databricks. Most of the data will be coming from Azure SQL, and we now have Azure SQL CDC enabled to capture changes. Also I would like to import this without paying for additional connectors like FiveTran.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;1. Would it be reasonable to create one Notebook / Delta Live Pipeline per source table?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;2. The first time the Delta Live Pipeline runs there will be no tables or data available in Databricks so I am guessing I need a quick check to see if the table already exists, and if not, pull the entire table from Azure SQL? I was thinking something like this (although it doesn't seem to work)?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;db_name = "AdventureWorks"
table_name = 'SalesLT_Customer'
&amp;nbsp;
tables_collection = spark.catalog.listTables(db_name)
table_names_in_db = [table.name for table in tables_collection]
table_exists = table_name in table_names_in_db
&amp;nbsp;
if not table_exists:    
    @dlt.table(
    name=f"SalesLT_Customer",
    comment=f"Original data for SalesLT.Customer"
    )
    def SalesLT_Customer():
        df = spark.read.format("jdbc") \
            .option("url", "jdbc:sqlserver://sql.database.windows.net;databaseName=database") \
            .option("username", "x") \
            .option("password", "x") \
            .option("dbtable", "SalesLt.Customer") \
            .option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") \
            .load()
        return (df)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;For subsequent runs, I would take the data from the CDC tables in Azure SQL. Maybe something like this?&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;@dlt.table(
    name=f"CDC_SalesLT_Customer_CT",
    comment=f"Original CDC data for SalesLT_Customer_CT"
)
def CDC_SalesLT_Customer_CT():
    df = spark.read.format("jdbc") \
        .option("url", "jdbc:sqlserver://sql.database.windows.net;databaseName=database") \
        .option("username", "x") \
        .option("password", "x") \
        .option("dbtable", "cdc.SalesLT_Customer_CT") \
        .option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") \
        .load()
    return (df)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Does that seem reasonable so far?&lt;/P&gt;</description>
      <pubDate>Mon, 04 Jul 2022 16:37:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/importing-azure-sql-data-into-databricks/m-p/14789#M9230</guid>
      <dc:creator>BearInTheWoods</dc:creator>
      <dc:date>2022-07-04T16:37:16Z</dc:date>
    </item>
    <item>
      <title>Re: Importing Azure SQL data into Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/importing-azure-sql-data-into-databricks/m-p/14790#M9231</link>
      <description>&lt;P&gt;@Bear Woods​&amp;nbsp;Hi! were you able to create DLT tables using CDC feature from sources like sql tables ? even I'm kinda in your situation, you need to leverage apply_changes function and &lt;A href="https://docs.databricks.com/workflows/delta-live-tables/delta-live-tables-cdc.html#create-target-fn" alt="https://docs.databricks.com/workflows/delta-live-tables/delta-live-tables-cdc.html#create-target-fn" target="_blank"&gt;create_streaming_live_table()&lt;/A&gt;&amp;nbsp;function but it required intermediate table which I'm trying to avoid.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 01 Dec 2022 14:27:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/importing-azure-sql-data-into-databricks/m-p/14790#M9231</guid>
      <dc:creator>ravinchi</dc:creator>
      <dc:date>2022-12-01T14:27:59Z</dc:date>
    </item>
  </channel>
</rss>

