cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Pratikmsbsvm
by Contributor
  • 842 Views
  • 2 replies
  • 2 kudos

Resolved! Data Lakehouse architecture with Azure Databricks and Unity Catalog

I am Creating a Data lakehouse solution on Azure Databricks.Source : SAP, SALESFORCE, AdobeTarget: Hightouch (External Application), Mad Mobile (External Application)The datalake house also have transactional records which should be store in ACID pro...

  • 842 Views
  • 2 replies
  • 2 kudos
Latest Reply
KaranamS
Contributor III
  • 2 kudos

Hi @Pratikmsbsvm , from what I understand, you have a lakehouse on Azure databricks and would like to share this data with another databricks account or workspace. If Unity Catalog is enabled on your Azure databricks account, you can leverage Delta S...

  • 2 kudos
1 More Replies
data_learner1
by New Contributor II
  • 664 Views
  • 4 replies
  • 1 kudos

Need to track the schema changes/column renames/column drops in Data bricks Unity Catalog

Hi Team, We are getting data from third party vendor to the databricks unity Catalog. They are doing schema changes frequently and we would like to track that. Just wanted to know if I can do this using audit table on the system catalog. As we only h...

  • 664 Views
  • 4 replies
  • 1 kudos
Latest Reply
CURIOUS_DE
Contributor III
  • 1 kudos

@data_learner1  Unity Catalog logs all data access and metadata operations (including schema changes) into the audit logs — which are stored in the system catalog tables, such as:system.access.auditYou mentioned you only have read access — and likely...

  • 1 kudos
3 More Replies
NikosLoutas
by New Contributor III
  • 1801 Views
  • 2 replies
  • 0 kudos

Resolved! Databricks Full Refresh of DLT Pipeline

Hello, I have a question regarding the full refresh of a DLT pipeline, where the data source is an external table. When running the pipeline without a full refresh, then the streaming will pull data which are currently present in the external source ...

  • 1801 Views
  • 2 replies
  • 0 kudos
Latest Reply
seeyesbee
New Contributor II
  • 0 kudos

Hi @paolajara — in your point 5 you mentioned using Delta Lake for tracking changes. Could you point me to any official docs or examples that walk through enabling CDC / row-tracking on a Delta table?I pull data from SharePoint via its REST endpoint,...

  • 0 kudos
1 More Replies
Pratikmsbsvm
by Contributor
  • 1153 Views
  • 2 replies
  • 0 kudos

How to build architecture for Batch as well Stream Data Pipeline in Databricks

Hello,I am planning to Create a Data Lake house using Azure and Databricks.Earlier i planned to do with Azure, but use cases looks complex.Can someone please help me with suggestions.Source System : SAP, SALESFORCE, SAP CAR, Adobe Clickstream.Consume...

  • 1153 Views
  • 2 replies
  • 0 kudos
Latest Reply
SP_6721
Honored Contributor
  • 0 kudos

Hi @Pratikmsbsvm ,The appropriate approach would be:Data Ingestion:Ingest data from SAP, SAP CAR, and Salesforce using Azure Data Factory or third-party connectors. For near real-time updates, enable CDC-based ingestion.Data Lakehouse Storage:Store a...

  • 0 kudos
1 More Replies
guizsantos
by New Contributor II
  • 3125 Views
  • 3 replies
  • 3 kudos

Resolved! How to obtain a query profile programatically?

Hi everyone! Does anyone know if there is a way to obtain the data used to create the graph showed in the "Query profile" section? Particularly, I am interested in the rows produced by the intermediary query operations. I can see there is "Download" ...

  • 3125 Views
  • 3 replies
  • 3 kudos
Latest Reply
artsheiko
Databricks Employee
  • 3 kudos

@guizsantos,  Query history list api provides metrics, see include_metrics  an executed query definition may be seen using query history system table 

  • 3 kudos
2 More Replies
seefoods
by Valued Contributor
  • 1359 Views
  • 1 replies
  • 1 kudos

Resolved! python task

Hello Guys,I have define asset bundle which have rule to run a python task. This task have some parameters, So how can i interract with this using argparse ? Cordially, 

  • 1359 Views
  • 1 replies
  • 1 kudos
Latest Reply
SP_6721
Honored Contributor
  • 1 kudos

Hi @seefoods ,In your asset bundle YAML, define the parameters using the named_parameters field, for example like this:tasks:  - task_key: python_task    python_wheel_task:      entry_point: main      named_parameters:        input_path: "/data/input...

  • 1 kudos
mkwparth
by New Contributor III
  • 1027 Views
  • 4 replies
  • 1 kudos

Spark Failed to start: Driver unresponsive

Hi everyone,I'm encountering an intermittent issue when launching a Databricks pipeline cluster. Error messagecom.databricks.pipelines.common.errors.deployment.DeploymentException: Failed to launch pipeline cluster xxxx-xxxxxx-ofgxxxxx: Attempt to la...

  • 1027 Views
  • 4 replies
  • 1 kudos
Latest Reply
Gopichand_G
New Contributor II
  • 1 kudos

I have personally witnessed these kind of issues. Why these failures happen, usually as far as I have witnessed that the Driver Node might be unavailable or not responsive as you might have hit the maximum cpu or memory usage, may be your cache utili...

  • 1 kudos
3 More Replies
skooijman
by New Contributor II
  • 1914 Views
  • 4 replies
  • 7 kudos

dbt_project.yml won't load in databricks dbt job

We're running into issues with dbt jobs, which are not running anymore. The errors we receive suggest that the dbt_project.yml file cannot be found, while the profiles.yml can be found. We are running our dbt jobs with Databricks Workflows. We've tri...

  • 1914 Views
  • 4 replies
  • 7 kudos
Latest Reply
LokmenChouaya
New Contributor II
  • 7 kudos

Hello is there any updates please regarding the issue? I'm having the same problem on my prod 

  • 7 kudos
3 More Replies
Phani1
by Valued Contributor II
  • 3273 Views
  • 1 replies
  • 0 kudos

Databricks AI (LLM) Functionalities: Data Privacy and Security

Hi Databricks Team,When leveraging Databricks' AI (LLM) functionalities, such as ai_query and ai_assistant, how does Databricks safeguard customer data and ensure privacy, safety, and security?Regards,Phani

  • 3273 Views
  • 1 replies
  • 0 kudos
Latest Reply
Vinay_M_R
Databricks Employee
  • 0 kudos

Hello @Phani1, Databricks employs a multi-layered security approach to protect customer data when using AI functionalities like ai_query and Databricks Assistant. I am sharing below official documentation for your reference:https://learn.microsoft.co...

  • 0 kudos
Marvin_T
by New Contributor III
  • 20521 Views
  • 3 replies
  • 2 kudos

Resolved! Disabling query caching for SQL Warehouse

Hello everybody,I am currently trying to run some performance tests on queries in Databricks on Azure. For my tests, I am using a Classic SQL Warehouse in the SQL Editor. I have created two views that contain the same data but have different structur...

  • 20521 Views
  • 3 replies
  • 2 kudos
Latest Reply
Marvin_T
New Contributor III
  • 2 kudos

They are probably executing the same query plan now that you say it. And yes, restarting the warehouse does theoretically works but it isnt a nice solution.I guess I will do some restarting and build averages to have a good comparison for now

  • 2 kudos
2 More Replies
KristiLogos
by Contributor
  • 827 Views
  • 2 replies
  • 0 kudos

Netsuite error - The driver could not open a JDBC connection. Check the URL

I'm trying to connect to Netsuite2 with the JDBC driver I added to my cluster. I'm testing this in my Sandbox Netsuite and I have the below code but it keeps saying:requirement failed: The driver could not open a JDBC connection. Check the URL: jdbc:...

  • 827 Views
  • 2 replies
  • 0 kudos
Latest Reply
TheOC
Contributor III
  • 0 kudos

Hey @KristiLogos I had a little search online and found this which may be useful:https://stackoverflow.com/questions/79236996/pyspark-jdbc-connection-to-netsuite2-com-fails-with-failed-to-login-using-tbain short it seems that a token based connection...

  • 0 kudos
1 More Replies
seapen
by New Contributor II
  • 1143 Views
  • 1 replies
  • 0 kudos

[Question]: Get permissions for a schema containing backticks via the API

I am unsure if this is specific to the Java SDK, but i am having issues checking effective permissions on the following schema: databricks_dev.test_schema`In Scala i have the following example test: test("attempting to access schema with backtick") ...

  • 1143 Views
  • 1 replies
  • 0 kudos
Latest Reply
seapen
New Contributor II
  • 0 kudos

Update:Interestingly, if i URL encode _twice_ it appears to work, eg: test("attempting to access schema with backtick") { val client = new WorkspaceClient() client.config().setHost("redacted").setToken("redacted") val name = "databricks...

  • 0 kudos
lezwon
by Contributor
  • 615 Views
  • 2 replies
  • 1 kudos

Resolved! Databricks Serverless: Package import fails from notebook in subfolder after wheel installation

I have a Python package installed via wheel file in a Databricks serverless environment. The package imports work fine when my notebook is in the root directory, but fail when the notebook is in a subfolder. How can I fix this? src/ ├── datalake_util...

  • 615 Views
  • 2 replies
  • 1 kudos
Latest Reply
lezwon
Contributor
  • 1 kudos

It appears that there is a pre-installed package called datalake_utils available within Databricks. I had to rename my package to something else, and it worked like a charm.

  • 1 kudos
1 More Replies
AxelBrsn
by New Contributor III
  • 4363 Views
  • 5 replies
  • 1 kudos

Why materialized views are created in __databricks_internal ?

Hello, I have a question about why materialized views are created in "__databricks_internal" catalog?We specified catalog and schemas in the DLT Pipeline.

Data Engineering
catalog
Delta Live Table
materialized views
  • 4363 Views
  • 5 replies
  • 1 kudos
Latest Reply
Yogesh_Verma_
Contributor
  • 1 kudos

Hello,Materialized views created by Delta Live Tables (DLT) are stored in the __databricks_internal catalog for a few key reasons:Separation: This keeps system-generated tables (like materialized views) separate from your own tables and views, so you...

  • 1 kudos
4 More Replies
fostermink
by New Contributor II
  • 1678 Views
  • 6 replies
  • 0 kudos

Spark aws s3 folder partition pruning doesn't work

 Hi, I have a use case where my spark job running on EMR AWS, and it is reading from a s3 path: some-bucket/some-path/region=na/days=1during my read, I pass DataFrame df = sparkSession.read().option("mergeSchema", true).parquet("some-bucket/some-path...

  • 1678 Views
  • 6 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 0 kudos

In your case, Spark isn't automatically pruning partitions because:Missing Partition Discovery: For Spark to perform partition pruning when reading directly from paths (without a metastore table), you need to explicitly tell it about the partition st...

  • 0 kudos
5 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels