cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Delta Live Tables with Common Data Model as source

BobBubble2000
New Contributor II

Hi,

I'm investigating whether it's possible to use Common Data Model CDM (in particular the Dynamics 365 exported csv and cdm files) as a Delta Live Tables data source? Can someone point me in the right direction?

Thanks!

4 REPLIES 4

SparkJun
Databricks Employee
Databricks Employee

Hi @BobBubble2000 
The MSFT Azure document briefly mentioned that Spark Common Data Model connector natively supports Azure Synapse Analytics. On the other hand, Azure Databricks as a data consumer to read data from the Common Data Model folders in Data Lake Storage Gen2, this should be achievable as well. You can firstly look for a way to install Spark CDM connector on Azure Databricks, and then you can read CMD data into a dataframe. If you can achieve this step, you can also import dlt module as well as the Spark CMD connector in your DLT notebook and turn the CMD data into a live table in your DLT pipeline. 

Reference:

https://github.com/Azure/spark-cdm-connector/blob/spark3.1/documentation/overview.md

https://learn.microsoft.com/en-us/common-data-model/data-lake

 

developer_bi
New Contributor II

hello, I am  currently looking the same task. Have you been successful in installing the connector in databricks? 

dalebidwell
New Contributor II

I've been looking into this myself for a while, depending on the requirements this new technical blog and code from @SergioSchena might help.

How to efficiently ingest Dataverse Common Data Mo... - Databricks Community - 66671

Suryanarayan
New Contributor II

Using Delta Live Tables with Common Data Model (CDM) as a Source in Databricks

I'm investigating the use of Delta Live Tables (DLT) to process Common Data Model (CDM) files exported from Dynamics 365, and I found a solution that works well. Here’s a quick guide on how to set it up:

Prerequisites:

  1. Azure Databricks workspace
  2. Azure Data Lake Storage (ADLS) to store your CDM files

Steps:

  1. Store Data in ADLS: Upload your Dynamics 365 exported CSV and CDM files to ADLS.

  2. Mount ADLS in Databricks: Use the following code to mount your ADLS in Databricks:

            

configs = {
"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": "<client-id>",
"fs.azure.account.oauth2.client.secret": "<client-secret>",
"fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<tenant-id>/oauth2/token",
}

dbutils.fs.mount(
source = "abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/",
mount_point = "/mnt/datalake",
extra_configs = configs
)

 

 

  • Create a Delta Live Table Pipeline: Navigate to the "Jobs" section in Databricks and create a new "Delta Live Tables" pipeline.

  • Define Pipeline Configuration: Define the pipeline to read and process the CDM files:

        

import dlt
from pyspark.sql.functions import col

@Dlt.table(comment="Read Account data from CDM")
def account():
return (spark.read.csv("/mnt/datalake/CDMExports/Account.csv", header=True, inferSchema=True)
.withColumnRenamed("AccountID", "account_id")
.withColumnRenamed("AccountName", "account_name"))

@Dlt.table(comment="Read Contact data from CDM")
def contact():
return (spark.read.csv("/mnt/datalake/CDMExports/Contact.csv", header=True, inferSchema=True)
.withColumnRenamed("ContactID", "contact_id")
.withColumnRenamed("ContactName", "contact_name"))

@Dlt.table(comment="Join Account and Contact data")
def account_contact():
account_df = dlt.read("account")
contact_df = dlt.read("contact")
return account_df.join(contact_df, account_df.account_id == contact_df.account_id)

 

 

  1. Deploy and Run the Pipeline: Deploy the pipeline from the Databricks UI. This will read the CDM files, apply the transformations, and write the results to Delta tables.

This setup allows you to leverage Databricks and Delta Lake capabilities to process and transform CDM data efficiently.

Hope this helps anyone looking to integrate Dynamics 365 CDM with Delta Live Tables!

 

Surya@the_data-bro

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group