cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

shubham_007
by Contributor III
  • 3998 Views
  • 6 replies
  • 3 kudos

Resolved! What are powerfull data quality tools/libraries to build data quality framework in Databricks ?

Dear Community Experts,I need your expert advice and suggestions on development of data quality framework. What are powerfull data quality tools or libraries are good to go for development of data quality framework in Databricks ? Please guide team.R...

  • 3998 Views
  • 6 replies
  • 3 kudos
Latest Reply
ChrisBergh-Data
  • 3 kudos

Consider our open-source data quality tool, DataOps Data Quality TestGen. Our goal is to help data teams automatically generate 80% of the data tests they need with just a few clicks, while offering a nice UI for collaborating on the remaining 20% th...

  • 3 kudos
5 More Replies
ashish31negi
by New Contributor II
  • 2891 Views
  • 1 replies
  • 0 kudos

how to use azure one lake in aws databricks unity catalog

i'm trying to connect azure one lake in aws databricks unity catalog but i'm not able to storage credential, since it's currently allowing s3 location only but in hive catalog i'm able to connect to one lake but not in unity.

  • 2891 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

Azure OneLake cannot be directly connected or credentialed in AWS Databricks Unity Catalog at this time, because AWS Databricks Unity Catalog supports only storage credentials for S3 and a select few options (like Cloudflare R2), rather than Azure-ba...

  • 0 kudos
flodoamaral
by New Contributor
  • 3128 Views
  • 1 replies
  • 0 kudos

GitLab Integration

Hello I'm struggling with Gitlab integration in databricks.I've got jobs that run on a daily basis, pointing directly to .py files in my repo. In order to do so, my gitlab account is linked to databricks with a PAT expiring within a month.But every o...

  • 3128 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

The error you are experiencing—"UNAUTHENTICATED: Invalid Git provider Personal Access Token credentials for repository URL"—is a common pain point when integrating GitLab repos with Databricks using Personal Access Tokens (PATs), especially for sched...

  • 0 kudos
dc-rnc
by Contributor
  • 3151 Views
  • 2 replies
  • 0 kudos

Writing to Delta Table and retrieving back the IDs doesn't work

Hi.I have a workflow in which I write few rows into a Delta Table with auto-generated IDs. Then, I need to retrieve them back just after they're written into the table to collect those generated IDs, so I read the table and I use two columns (one is ...

dcrnc_4-1743006179065.png
  • 3151 Views
  • 2 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

Your workflow issue—writing to a Delta Table, immediately reading back using a join on client_id and timestamp, but sometimes missing rows—suggests a subtle problem, likely related to timing, consistency, or column precision between your input DataFr...

  • 0 kudos
1 More Replies
MarcoRezende
by New Contributor III
  • 3231 Views
  • 1 replies
  • 0 kudos

Slow performance in REFRESH MATERIALIZED VIEW over CTAS

Hello guys, i have some materialized views created in my databricks workspace and after 1 change in one of them, it became 3x slower (9 minutes to 30 minutes). After some debugging i found that the bottleneck process in the execution plan is one call...

  • 3231 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

Materialized views in Databricks, especially those maintained via DLT (Delta Live Tables) pipelines, often have different execution patterns and optimization strategies compared to running the same SQL in a standard serverless notebook. This can lead...

  • 0 kudos
Mortenfromdk
by New Contributor
  • 3030 Views
  • 1 replies
  • 0 kudos

Best practice for unified cloud cost attribution (Databricks + Azure)?

Hi! I’m working on a FinOps initiative to improve cloud cost visibility and attribution across departments and projects in our data platform. We do tagging production workflows on department level and can get a decent view in Azure Cost Analysis by f...

  • 3030 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

To achieve unified cloud cost visibility and attribution for Azure and Databricks (including SQL Serverless Warehouses), consider the following best practices and solutions: Tagging Databricks SQL Warehouses for Attribution Creating a separate SQL Wa...

  • 0 kudos
MingOnCloud
by New Contributor II
  • 2675 Views
  • 1 replies
  • 0 kudos

Schema Evolution with "schemaTrackingLocation" fails anyway

Hi, I'm trying to understand the usage of "schemaTrackLocation" with schema evolution.I use these articles as references:https://docs.delta.io/latest/delta-streaming.html#tracking-non-additive-schema-changeshttps://docs.databricks.com/aws/en/error-me...

  • 2675 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

Here are answers to your detailed questions about using schemaTrackingLocation for dropping columns in Delta Lake streaming, based on your references and operational experience.​ Question 1: schemaTrackingLocation Path Requirements Yes, it is normal...

  • 0 kudos
soumiknow
by Contributor II
  • 2494 Views
  • 1 replies
  • 0 kudos

data not inserting in 'overwrite' mode - Value has type STRUCT which cannot be inserted into column

We have the following code which we used to load data to BigQuery table after reading the parquet files from Azure Data Lake Storage:df.write.format("bigquery").option( "parentProject", gcp_project_id ).option("table", f"{bq_table_name}").option( "te...

  • 2494 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

The issue you are facing arises when using mode("overwrite") with Spark to load data into BigQuery—the error indicates BigQuery expects a STRING type for the source column, but it is being supplied a STRUCT type during overwrite operations. Strangely...

  • 0 kudos
jeremy98
by Honored Contributor
  • 2122 Views
  • 1 replies
  • 0 kudos

How to Initialize Sentry in All Notebooks Used in Jobs using __init__.py?

Hi Community,I'm looking to initialize Sentry in all notebooks that are used across multiple jobs. My goal is to capture exceptions using Sentry whenever a job runs a notebook.What’s the recommended approach for initializing Sentry packages in this c...

  • 2122 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

To consistently initialize Sentry in all notebooks for reliable exception tracking, experts recommend using a shared initialization approach that minimizes duplication and ensures setup for every job execution. Here’s a structured approach: Recommend...

  • 0 kudos
DataP1
by New Contributor
  • 2636 Views
  • 3 replies
  • 0 kudos

Excel File from Databricks Not Auto-Adjusting Columns in Power Automate Email Attachment

Hi community,I've built an automation workflow using Databricks and Power Automate. The process runs a query in Databricks, exports the result to Excel, auto-adjusts the columns based on the header/content, and then Power Automate picks up the file a...

  • 2636 Views
  • 3 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

Yes, this is a common challenge when automating Excel file generation—the default export (especially from pandas or Databricks) does not auto-fit column widths, resulting in cramped columns when viewed or emailed. Auto-fitting columns typically requi...

  • 0 kudos
2 More Replies
tt_921
by Visitor
  • 10 Views
  • 0 replies
  • 0 kudos

Databricks CLI binding storage credential to a workspace

In the documentation from Databricks it says to run the below for binding a storage credential to a workspace (after already completing step 1 to update the `isolation-mode` to be `ISOLATED`): databricks workspace-bindings update-bindings storage-cre...

  • 10 Views
  • 0 replies
  • 0 kudos
sanutopia
by New Contributor
  • 2101 Views
  • 1 replies
  • 0 kudos

How to ingest data from SAP Data Services (ECC, IP, MDG, FLP, MRP) to Databricks Lakehouse on GCP ?

Hi Friends,My customer is using Databricks (as GCP partner product). The ask is to ingest data from sources into Databricks Lakehouse. Currently customer has 3 types of sources : SAP (ECC, Hana) , Oracle and Kafka StreamWhat are the Databricks native...

  • 2101 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

Databricks on GCP offers several native ETL services and integration options to ingest data from SAP (ECC, HANA), Oracle, and Kafka Streams into the Lakehouse. Comparing Databricks-native solutions with GCP-native ETL like Data Fusion or Dataflow rev...

  • 0 kudos
LeoGriffM
by New Contributor II
  • 2357 Views
  • 2 replies
  • 0 kudos

Zip archive with PowerShell "Error: The zip file may not be valid or may be an unsupported version."

Zip archive "Error: The zip file may not be valid or may be an unsupported version."We are trying to upload a ZIP archive to a Databricks workspace for faster and atomic uploads of artifacts. The expected behaviour is that we can run the following co...

  • 2357 Views
  • 2 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

The error message "Error: The zip file may not be valid or may be an unsupported version" when importing a zip archive via the Databricks CLI is a known issue, especially with zip files created using PowerShell's Compress-Archive or [System.IO.Compre...

  • 0 kudos
1 More Replies
sandy311
by New Contributor III
  • 2291 Views
  • 3 replies
  • 1 kudos

Install python packages on serverless compute in DLT pipelines (using asset bundles)

Has anyone figured out how to install packages on serverless compute using asset bundle,s similar to how we handle it for jobs or job tasks?I didn’t see any direct option for this, apart from installing packages manually within a notebook.I tried ins...

Data Engineering
DLT Serverless
  • 2291 Views
  • 3 replies
  • 1 kudos
Latest Reply
mark_ott
Databricks Employee
  • 1 kudos

Installing Python packages on Databricks serverless compute via asset bundles is possible, but there are some unique limitations and required configuration adjustments compared to traditional jobs or job tasks. The core methods to install packages fo...

  • 1 kudos
2 More Replies
saicharandeepb
by New Contributor III
  • 1898 Views
  • 1 replies
  • 0 kudos

Implementing ADB Autoloader with Managed File Notification Mode for UC Ext Location (public preview)

Hi everyone,I'm planning to implement Azure Databricks Auto Loader using the Databricks-managed file notification mode for an external location registered in Unity Catalog. I understand this feature is currently in public preview, and I’d love to hea...

  • 1898 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

Yes, Azure Databricks Auto Loader with Databricks-managed file notification mode for external locations in Unity Catalog has been successfully implemented by users, especially since it entered public preview in 2025, and it's designed to make file di...

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels