Data Engineering

Forum Posts

Sorted by:

by hf-databricks • New Contributor II

a week ago

111 Views
2 replies
0 kudos

Unable to create workspace

Hi Team,we have challenge creating workspace in data bricks account created on top of aws.below are the details:Databricks account name : saichaitanya.vaddadhi@healthfirsttech.com's LakehouseAWS Account id : 720016114009databricks id: 1ee8765f-b472-4...

Data Engineering

111 Views
2 replies
0 kudos

a week ago

View Replies

Latest Reply

BS_THE_ANALYST
Esteemed Contributor II

a week ago

0 kudos

@hf-databricks there's a quickstart guide for creating a workspace with AWS: https://docs.databricks.com/aws/en/admin/workspace/quick-start There's a list of requirements:There's more options for creating workspaces. Above, I just listed the recommen...

0 kudos

a week ago

1 More Replies

by AniruddhaGI • New Contributor II

06-17-2025 11:01:39 PM

2114 Views
3 replies
1 kudos

Workspace allows dbf path to install in Databricks 16.4 LTS

Feature: Library installation using requirements.txt on DB Runtime 16.4 LTSAffected Areas: Workspace isolation, Library ManagementSteps to Reproduce:Upload a wheel file to dbfPut the requirements.txt file in the Workspace and put dbfs path in require...

Data Engineering

library

Security

Workspace

2114 Views
3 replies
1 kudos

06-17-2025 11:01:39 PM

View Replies

Latest Reply

AniruddhaGI
New Contributor II

06-17-2025 11:11:19 PM

1 kudos

I would like to know if the workspace isolation is a priority, and only Databricks 14.3 and lower allow installation via DBFS.Why should the requirements.txt allow you to install libraries or packages via dbfs path?Could someone please explain why th...

1 kudos

06-17-2025 11:11:19 PM

2 More Replies

by KKo • Contributor III

7 hours ago

32 Views
0 replies
0 kudos

On Prem MS sql to Azure Databricks

Hi allI need to ingest data from on prem MS sql tables using Databricks to Azure Cloud. For the ingest, previously I used notebooks, jdbc connectors, read sql tables and write in unity catalog tables. Now, I want to experiment Databricks connectors f...

Data Engineering

32 Views
0 replies
0 kudos

7 hours ago

by shubham_007 • Contributor III

01-05-2025 7:44:03 AM

4028 Views
6 replies
3 kudos

Resolved! What are powerfull data quality tools/libraries to build data quality framework in Databricks ?

Dear Community Experts,I need your expert advice and suggestions on development of data quality framework. What are powerfull data quality tools or libraries are good to go for development of data quality framework in Databricks ? Please guide team.R...

Data Engineering

4028 Views
6 replies
3 kudos

01-05-2025 7:44:03 AM

View Replies

Latest Reply

ChrisBergh-Data
Visitor

7 hours ago

3 kudos

Consider our open-source data quality tool, DataOps Data Quality TestGen. Our goal is to help data teams automatically generate 80% of the data tests they need with just a few clicks, while offering a nice UI for collaborating on the remaining 20% th...

3 kudos

7 hours ago

5 More Replies

by ashish31negi • New Contributor II

03-24-2025 9:24:03 AM

2904 Views
1 replies
0 kudos

how to use azure one lake in aws databricks unity catalog

i'm trying to connect azure one lake in aws databricks unity catalog but i'm not able to storage credential, since it's currently allowing s3 location only but in hive catalog i'm able to connect to one lake but not in unity.

Data Engineering

2904 Views
1 replies
0 kudos

03-24-2025 9:24:03 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

7 hours ago

0 kudos

Azure OneLake cannot be directly connected or credentialed in AWS Databricks Unity Catalog at this time, because AWS Databricks Unity Catalog supports only storage credentials for S3 and a select few options (like Cloudflare R2), rather than Azure-ba...

0 kudos

7 hours ago

by flodoamaral • New Contributor

03-24-2025 2:56:21 AM

3141 Views
1 replies
0 kudos

GitLab Integration

Hello I'm struggling with Gitlab integration in databricks.I've got jobs that run on a daily basis, pointing directly to .py files in my repo. In order to do so, my gitlab account is linked to databricks with a PAT expiring within a month.But every o...

Data Engineering

3141 Views
1 replies
0 kudos

03-24-2025 2:56:21 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

7 hours ago

0 kudos

The error you are experiencing—"UNAUTHENTICATED: Invalid Git provider Personal Access Token credentials for repository URL"—is a common pain point when integrating GitLab repos with Databricks using Personal Access Tokens (PATs), especially for sched...

0 kudos

7 hours ago

by dc-rnc • Contributor

03-26-2025 9:27:48 AM

3166 Views
2 replies
0 kudos

Writing to Delta Table and retrieving back the IDs doesn't work

Hi.I have a workflow in which I write few rows into a Delta Table with auto-generated IDs. Then, I need to retrieve them back just after they're written into the table to collect those generated IDs, so I read the table and I use two columns (one is ...

Data Engineering

3166 Views
2 replies
0 kudos

03-26-2025 9:27:48 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

7 hours ago

0 kudos

Your workflow issue—writing to a Delta Table, immediately reading back using a join on client_id and timestamp, but sometimes missing rows—suggests a subtle problem, likely related to timing, consistency, or column precision between your input DataFr...

0 kudos

7 hours ago

1 More Replies

by MarcoRezende • New Contributor III

03-27-2025 5:45:03 AM

3243 Views
1 replies
0 kudos

Slow performance in REFRESH MATERIALIZED VIEW over CTAS

Hello guys, i have some materialized views created in my databricks workspace and after 1 change in one of them, it became 3x slower (9 minutes to 30 minutes). After some debugging i found that the bottleneck process in the execution plan is one call...

Data Engineering

3243 Views
1 replies
0 kudos

03-27-2025 5:45:03 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

7 hours ago

0 kudos

Materialized views in Databricks, especially those maintained via DLT (Delta Live Tables) pipelines, often have different execution patterns and optimization strategies compared to running the same SQL in a standard serverless notebook. This can lead...

0 kudos

7 hours ago

by Mortenfromdk • New Contributor

04-22-2025 4:06:00 AM

3040 Views
1 replies
0 kudos

Best practice for unified cloud cost attribution (Databricks + Azure)?

Hi! I’m working on a FinOps initiative to improve cloud cost visibility and attribution across departments and projects in our data platform. We do tagging production workflows on department level and can get a decent view in Azure Cost Analysis by f...

Data Engineering

3040 Views
1 replies
0 kudos

04-22-2025 4:06:00 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

7 hours ago

0 kudos

To achieve unified cloud cost visibility and attribution for Azure and Databricks (including SQL Serverless Warehouses), consider the following best practices and solutions: Tagging Databricks SQL Warehouses for Attribution Creating a separate SQL Wa...

0 kudos

7 hours ago

by MingOnCloud • New Contributor II

05-12-2025 12:34:59 AM

2689 Views
1 replies
0 kudos

Schema Evolution with "schemaTrackingLocation" fails anyway

Hi, I'm trying to understand the usage of "schemaTrackLocation" with schema evolution.I use these articles as references:https://docs.delta.io/latest/delta-streaming.html#tracking-non-additive-schema-changeshttps://docs.databricks.com/aws/en/error-me...

Data Engineering

2689 Views
1 replies
0 kudos

05-12-2025 12:34:59 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

8 hours ago

0 kudos

Here are answers to your detailed questions about using schemaTrackingLocation for dropping columns in Delta Lake streaming, based on your references and operational experience. Question 1: schemaTrackingLocation Path Requirements Yes, it is normal...

0 kudos

8 hours ago

by soumiknow • Contributor II

05-18-2025 8:29:01 PM

2504 Views
1 replies
0 kudos

data not inserting in 'overwrite' mode - Value has type STRUCT which cannot be inserted into column

We have the following code which we used to load data to BigQuery table after reading the parquet files from Azure Data Lake Storage:df.write.format("bigquery").option( "parentProject", gcp_project_id ).option("table", f"{bq_table_name}").option( "te...

Data Engineering

2504 Views
1 replies
0 kudos

05-18-2025 8:29:01 PM

View Replies

Latest Reply

mark_ott
Databricks Employee

8 hours ago

0 kudos

The issue you are facing arises when using mode("overwrite") with Spark to load data into BigQuery—the error indicates BigQuery expects a STRING type for the source column, but it is being supplied a STRUCT type during overwrite operations. Strangely...

0 kudos

8 hours ago

by jeremy98 • Honored Contributor

05-26-2025 1:02:38 AM

2131 Views
1 replies
0 kudos

How to Initialize Sentry in All Notebooks Used in Jobs using init.py?

Hi Community,I'm looking to initialize Sentry in all notebooks that are used across multiple jobs. My goal is to capture exceptions using Sentry whenever a job runs a notebook.What’s the recommended approach for initializing Sentry packages in this c...

Data Engineering

2131 Views
1 replies
0 kudos

05-26-2025 1:02:38 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

8 hours ago

0 kudos

To consistently initialize Sentry in all notebooks for reliable exception tracking, experts recommend using a shared initialization approach that minimizes duplication and ensures setup for every job execution. Here’s a structured approach: Recommend...

0 kudos

8 hours ago

by DataP1 • New Contributor

06-01-2025 3:42:13 PM

2645 Views
3 replies
0 kudos

Excel File from Databricks Not Auto-Adjusting Columns in Power Automate Email Attachment

Hi community,I've built an automation workflow using Databricks and Power Automate. The process runs a query in Databricks, exports the result to Excel, auto-adjusts the columns based on the header/content, and then Power Automate picks up the file a...

Data Engineering

2645 Views
3 replies
0 kudos

06-01-2025 3:42:13 PM

View Replies

Latest Reply

mark_ott
Databricks Employee

8 hours ago

0 kudos

Yes, this is a common challenge when automating Excel file generation—the default export (especially from pandas or Databricks) does not auto-fit column widths, resulting in cramped columns when viewed or emailed. Auto-fitting columns typically requi...

0 kudos

8 hours ago

2 More Replies

by tt_921 • Visitor

8 hours ago

19 Views
0 replies
0 kudos

Databricks CLI binding storage credential to a workspace

In the documentation from Databricks it says to run the below for binding a storage credential to a workspace (after already completing step 1 to update the `isolation-mode` to be `ISOLATED`): databricks workspace-bindings update-bindings storage-cre...

Data Engineering

19 Views
0 replies
0 kudos

8 hours ago

by sanutopia • New Contributor

06-05-2025 2:01:38 AM

2109 Views
1 replies
0 kudos

How to ingest data from SAP Data Services (ECC, IP, MDG, FLP, MRP) to Databricks Lakehouse on GCP ?

Hi Friends,My customer is using Databricks (as GCP partner product). The ask is to ingest data from sources into Databricks Lakehouse. Currently customer has 3 types of sources : SAP (ECC, Hana) , Oracle and Kafka StreamWhat are the Databricks native...

Data Engineering

2109 Views
1 replies
0 kudos

06-05-2025 2:01:38 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

8 hours ago

0 kudos

Databricks on GCP offers several native ETL services and integration options to ingest data from SAP (ECC, HANA), Oracle, and Kafka Streams into the Lakehouse. Comparing Databricks-native solutions with GCP-native ETL like Data Fusion or Dataflow rev...

0 kudos

8 hours ago

Databricks Community

Forum Posts

Unable to create workspace

Workspace allows dbf path to install in Databricks 16.4 LTS

On Prem MS sql to Azure Databricks

Resolved! What are powerfull data quality tools/libraries to build data quality framework in Databricks ?

how to use azure one lake in aws databricks unity catalog

GitLab Integration

Writing to Delta Table and retrieving back the IDs doesn't work

Slow performance in REFRESH MATERIALIZED VIEW over CTAS

Best practice for unified cloud cost attribution (Databricks + Azure)?

Schema Evolution with "schemaTrackingLocation" fails anyway

data not inserting in 'overwrite' mode - Value has type STRUCT which cannot be inserted into column

How to Initialize Sentry in All Notebooks Used in Jobs using init.py?

Excel File from Databricks Not Auto-Adjusting Columns in Power Automate Email Attachment

Databricks CLI binding storage credential to a workspace

How to ingest data from SAP Data Services (ECC, IP, MDG, FLP, MRP) to Databricks Lakehouse on GCP ?

Join Us as a Local Community Builder!

Unable to login to community edition

Learning Path for Spark Developer Associate

DLT Pipeline Stopped working

Migrating Talend ETL Jobs to Databricks – Best Pra...

Conversational Agent App integration with genie in...