Data Engineering

Forum Posts

Sorted by:

by Pratikmsbsvm • Contributor

06-25-2025 9:08:02 AM

842 Views
2 replies
2 kudos

Resolved! Data Lakehouse architecture with Azure Databricks and Unity Catalog

I am Creating a Data lakehouse solution on Azure Databricks.Source : SAP, SALESFORCE, AdobeTarget: Hightouch (External Application), Mad Mobile (External Application)The datalake house also have transactional records which should be store in ACID pro...

Data Engineering

842 Views
2 replies
2 kudos

06-25-2025 9:08:02 AM

View Replies

Latest Reply

KaranamS
Contributor III

06-25-2025 11:44:08 AM

2 kudos

Hi @Pratikmsbsvm , from what I understand, you have a lakehouse on Azure databricks and would like to share this data with another databricks account or workspace. If Unity Catalog is enabled on your Azure databricks account, you can leverage Delta S...

2 kudos

06-25-2025 11:44:08 AM

1 More Replies

by data_learner1 • New Contributor II

06-18-2025 9:48:01 AM

664 Views
4 replies
1 kudos

Need to track the schema changes/column renames/column drops in Data bricks Unity Catalog

Hi Team, We are getting data from third party vendor to the databricks unity Catalog. They are doing schema changes frequently and we would like to track that. Just wanted to know if I can do this using audit table on the system catalog. As we only h...

Data Engineering

664 Views
4 replies
1 kudos

06-18-2025 9:48:01 AM

View Replies

Latest Reply

CURIOUS_DE
Contributor III

06-25-2025 11:02:34 AM

1 kudos

@data_learner1 Unity Catalog logs all data access and metadata operations (including schema changes) into the audit logs — which are stored in the system catalog tables, such as:system.access.auditYou mentioned you only have read access — and likely...

1 kudos

06-25-2025 11:02:34 AM

3 More Replies

by NikosLoutas • New Contributor III

06-25-2025 3:21:52 AM

1801 Views
2 replies
0 kudos

Resolved! Databricks Full Refresh of DLT Pipeline

Hello, I have a question regarding the full refresh of a DLT pipeline, where the data source is an external table. When running the pipeline without a full refresh, then the streaming will pull data which are currently present in the external source ...

Data Engineering

1801 Views
2 replies
0 kudos

06-25-2025 3:21:52 AM

View Replies

Latest Reply

seeyesbee
New Contributor II

06-25-2025 9:58:36 AM

0 kudos

Hi @paolajara — in your point 5 you mentioned using Delta Lake for tracking changes. Could you point me to any official docs or examples that walk through enabling CDC / row-tracking on a Delta table?I pull data from SharePoint via its REST endpoint,...

0 kudos

06-25-2025 9:58:36 AM

1 More Replies

by Pratikmsbsvm • Contributor

06-23-2025 9:39:01 PM

1153 Views
2 replies
0 kudos

How to build architecture for Batch as well Stream Data Pipeline in Databricks

Hello,I am planning to Create a Data Lake house using Azure and Databricks.Earlier i planned to do with Azure, but use cases looks complex.Can someone please help me with suggestions.Source System : SAP, SALESFORCE, SAP CAR, Adobe Clickstream.Consume...

Data Engineering

1153 Views
2 replies
0 kudos

06-23-2025 9:39:01 PM

View Replies

Latest Reply

SP_6721
Honored Contributor

06-25-2025 5:10:46 AM

0 kudos

Hi @Pratikmsbsvm ,The appropriate approach would be:Data Ingestion:Ingest data from SAP, SAP CAR, and Salesforce using Azure Data Factory or third-party connectors. For near real-time updates, enable CDC-based ingestion.Data Lakehouse Storage:Store a...

0 kudos

06-25-2025 5:10:46 AM

1 More Replies

by guizsantos • New Contributor II

05-21-2024 10:32:58 AM

3125 Views
3 replies
3 kudos

Resolved! How to obtain a query profile programatically?

Hi everyone! Does anyone know if there is a way to obtain the data used to create the graph showed in the "Query profile" section? Particularly, I am interested in the rows produced by the intermediary query operations. I can see there is "Download" ...

Data Engineering

3125 Views
3 replies
3 kudos

05-21-2024 10:32:58 AM

View Replies

Latest Reply

artsheiko
Databricks Employee

06-25-2025 7:30:23 AM

3 kudos

@guizsantos, Query history list api provides metrics, see include_metrics an executed query definition may be seen using query history system table

3 kudos

06-25-2025 7:30:23 AM

2 More Replies

by seefoods • Valued Contributor

06-24-2025 7:55:39 AM

1359 Views
1 replies
1 kudos

Resolved! python task

Hello Guys,I have define asset bundle which have rule to run a python task. This task have some parameters, So how can i interract with this using argparse ? Cordially,

Data Engineering

1359 Views
1 replies
1 kudos

06-24-2025 7:55:39 AM

View Replies

Latest Reply

SP_6721
Honored Contributor

06-25-2025 7:06:05 AM

1 kudos

Hi @seefoods ,In your asset bundle YAML, define the parameters using the named_parameters field, for example like this:tasks: - task_key: python_task python_wheel_task: entry_point: main named_parameters: input_path: "/data/input...

1 kudos

06-25-2025 7:06:05 AM

by mkwparth • New Contributor III

06-23-2025 11:51:50 PM

1027 Views
4 replies
1 kudos

Spark Failed to start: Driver unresponsive

Hi everyone,I'm encountering an intermittent issue when launching a Databricks pipeline cluster. Error messagecom.databricks.pipelines.common.errors.deployment.DeploymentException: Failed to launch pipeline cluster xxxx-xxxxxx-ofgxxxxx: Attempt to la...

Data Engineering

1027 Views
4 replies
1 kudos

06-23-2025 11:51:50 PM

View Replies

Latest Reply

Gopichand_G
New Contributor II

06-25-2025 5:13:17 AM

1 kudos

I have personally witnessed these kind of issues. Why these failures happen, usually as far as I have witnessed that the Driver Node might be unavailable or not responsive as you might have hit the maximum cpu or memory usage, may be your cache utili...

1 kudos

06-25-2025 5:13:17 AM

3 More Replies

by skooijman • New Contributor II

06-23-2025 7:42:33 AM

1914 Views
4 replies
7 kudos

dbt_project.yml won't load in databricks dbt job

We're running into issues with dbt jobs, which are not running anymore. The errors we receive suggest that the dbt_project.yml file cannot be found, while the profiles.yml can be found. We are running our dbt jobs with Databricks Workflows. We've tri...

Data Engineering

1914 Views
4 replies
7 kudos

06-23-2025 7:42:33 AM

View Replies

Latest Reply

LokmenChouaya
New Contributor II

06-25-2025 5:26:20 AM

7 kudos

Hello is there any updates please regarding the issue? I'm having the same problem on my prod

7 kudos

06-25-2025 5:26:20 AM

3 More Replies

by Phani1 • Valued Contributor II

02-21-2025 5:42:28 AM

3273 Views
1 replies
0 kudos

Databricks AI (LLM) Functionalities: Data Privacy and Security

Hi Databricks Team,When leveraging Databricks' AI (LLM) functionalities, such as ai_query and ai_assistant, how does Databricks safeguard customer data and ensure privacy, safety, and security?Regards,Phani

Data Engineering

3273 Views
1 replies
0 kudos

02-21-2025 5:42:28 AM

View Replies

Latest Reply

Vinay_M_R
Databricks Employee

06-25-2025 5:18:01 AM

0 kudos

Hello @Phani1, Databricks employs a multi-layered security approach to protect customer data when using AI functionalities like ai_query and Databricks Assistant. I am sharing below official documentation for your reference:https://learn.microsoft.co...

0 kudos

06-25-2025 5:18:01 AM

by Marvin_T • New Contributor III

06-07-2023 12:39:34 AM

20521 Views
3 replies
2 kudos

Resolved! Disabling query caching for SQL Warehouse

Hello everybody,I am currently trying to run some performance tests on queries in Databricks on Azure. For my tests, I am using a Classic SQL Warehouse in the SQL Editor. I have created two views that contain the same data but have different structur...

Data Engineering

20521 Views
3 replies
2 kudos

06-07-2023 12:39:34 AM

View Replies

Latest Reply

Marvin_T
New Contributor III

06-07-2023 1:38:14 AM

2 kudos

They are probably executing the same query plan now that you say it. And yes, restarting the warehouse does theoretically works but it isnt a nice solution.I guess I will do some restarting and build averages to have a good comparison for now

2 kudos

06-07-2023 1:38:14 AM

2 More Replies

by KristiLogos • Contributor

06-24-2025 12:54:36 PM

827 Views
2 replies
0 kudos

Netsuite error - The driver could not open a JDBC connection. Check the URL

I'm trying to connect to Netsuite2 with the JDBC driver I added to my cluster. I'm testing this in my Sandbox Netsuite and I have the below code but it keeps saying:requirement failed: The driver could not open a JDBC connection. Check the URL: jdbc:...

Data Engineering

827 Views
2 replies
0 kudos

06-24-2025 12:54:36 PM

View Replies

Latest Reply

TheOC
Contributor III

06-24-2025 2:16:49 PM

0 kudos

Hey @KristiLogos I had a little search online and found this which may be useful:https://stackoverflow.com/questions/79236996/pyspark-jdbc-connection-to-netsuite2-com-fails-with-failed-to-login-using-tbain short it seems that a token based connection...

0 kudos

06-24-2025 2:16:49 PM

1 More Replies

by seapen • New Contributor II

06-25-2025 1:30:11 AM

1143 Views
1 replies
0 kudos

[Question]: Get permissions for a schema containing backticks via the API

I am unsure if this is specific to the Java SDK, but i am having issues checking effective permissions on the following schema: databricks_dev.test_schema`In Scala i have the following example test: test("attempting to access schema with backtick") ...

Data Engineering

1143 Views
1 replies
0 kudos

06-25-2025 1:30:11 AM

View Replies

Latest Reply

seapen
New Contributor II

06-25-2025 1:37:32 AM

0 kudos

Update:Interestingly, if i URL encode _twice_ it appears to work, eg: test("attempting to access schema with backtick") { val client = new WorkspaceClient() client.config().setHost("redacted").setToken("redacted") val name = "databricks...

0 kudos

06-25-2025 1:37:32 AM

by lezwon • Contributor

06-24-2025 6:54:38 AM

615 Views
2 replies
1 kudos

Resolved! Databricks Serverless: Package import fails from notebook in subfolder after wheel installation

I have a Python package installed via wheel file in a Databricks serverless environment. The package imports work fine when my notebook is in the root directory, but fail when the notebook is in a subfolder. How can I fix this? src/ ├── datalake_util...

Data Engineering

615 Views
2 replies
1 kudos

06-24-2025 6:54:38 AM

View Replies

Latest Reply

lezwon
Contributor

06-25-2025 1:20:16 AM

1 kudos

It appears that there is a pre-installed package called datalake_utils available within Databricks. I had to rename my package to something else, and it worked like a charm.

1 kudos

06-25-2025 1:20:16 AM

1 More Replies

by AxelBrsn • New Contributor III

03-21-2024 2:40:01 AM

4363 Views
5 replies
1 kudos

Why materialized views are created in __databricks_internal ?

Hello, I have a question about why materialized views are created in "__databricks_internal" catalog?We specified catalog and schemas in the DLT Pipeline.

Data Engineering

catalog

Delta Live Table

materialized views

4363 Views
5 replies
1 kudos

03-21-2024 2:40:01 AM

View Replies

Latest Reply

Yogesh_Verma_
Contributor

06-19-2025 9:40:47 AM

1 kudos

Hello,Materialized views created by Delta Live Tables (DLT) are stored in the __databricks_internal catalog for a few key reasons:Separation: This keeps system-generated tables (like materialized views) separate from your own tables and views, so you...

1 kudos

06-19-2025 9:40:47 AM

4 More Replies

by fostermink • New Contributor II

05-09-2025 4:49:54 PM

1678 Views
6 replies
0 kudos

Spark aws s3 folder partition pruning doesn't work

Hi, I have a use case where my spark job running on EMR AWS, and it is reading from a s3 path: some-bucket/some-path/region=na/days=1during my read, I pass DataFrame df = sparkSession.read().option("mergeSchema", true).parquet("some-bucket/some-path...

Data Engineering

1678 Views
6 replies
0 kudos

05-09-2025 4:49:54 PM

View Replies

Latest Reply

lingareddy_Alva
Honored Contributor III

05-09-2025 8:07:52 PM

0 kudos

In your case, Spark isn't automatically pruning partitions because:Missing Partition Discovery: For Spark to perform partition pruning when reading directly from paths (without a metastore table), you need to explicitly tell it about the partition st...

0 kudos

05-09-2025 8:07:52 PM

5 More Replies

Databricks Community

Forum Posts

Resolved! Data Lakehouse architecture with Azure Databricks and Unity Catalog

Need to track the schema changes/column renames/column drops in Data bricks Unity Catalog

Resolved! Databricks Full Refresh of DLT Pipeline

How to build architecture for Batch as well Stream Data Pipeline in Databricks

Resolved! How to obtain a query profile programatically?

Resolved! python task

Spark Failed to start: Driver unresponsive

dbt_project.yml won't load in databricks dbt job

Databricks AI (LLM) Functionalities: Data Privacy and Security

Resolved! Disabling query caching for SQL Warehouse

Netsuite error - The driver could not open a JDBC connection. Check the URL

[Question]: Get permissions for a schema containing backticks via the API

Resolved! Databricks Serverless: Package import fails from notebook in subfolder after wheel installation

Why materialized views are created in __databricks_internal ?

Spark aws s3 folder partition pruning doesn't work

Join Us as a Local Community Builder!

Node type not available in Central India (Student ...

Unexpected Schema ID Folder Creation in Unity Cata...

PipelineSpec object does not seem to show event_lo...

delta live tables

readStream with readChangeFeed option in SQL