Data Engineering

Forum Posts

Sorted by:

by naveenanto • New Contributor III

4 weeks ago

175 Views
1 replies
0 kudos

Custom Spark Extension in SQL Warehouse

I understand only a limited spark configurations are supported in SQL Warehouse but is it possible to add spark extensions to SQL Warehouse clusters?Use Case: We've a few restricted table properties. We prevent that with spark extensions installed in...

Data Engineering

sql-warehouse

175 Views
1 replies
0 kudos

4 weeks ago

View Replies

Latest Reply

Kaniz
Community Manager

10m ago

0 kudos

Hi @naveenanto, While SQL Data Warehouse (now known as Azure Synapse Analytics) has some limitations when it comes to Spark configurations, you can indeed extend its capabilities by adding custom Spark extensions. Let me provide you with some inform...

0 kudos

10m ago

by JohanS • New Contributor III

4 weeks ago

151 Views
1 replies
0 kudos

WorkspaceClient authentication fails when running on a Docker cluster

from databricks.sdk import WorkspaceClientw = WorkspaceClient()ValueError: default auth: cannot configure default credentials ...I'm trying to instantiate a WorkspaceClient in a notebook on a cluster running a Docker image, but authentication fails.T...

Data Engineering

151 Views
1 replies
0 kudos

4 weeks ago

View Replies

Latest Reply

Kaniz
Community Manager

25m ago

0 kudos

Hi @JohanS, It seems you’re encountering an authentication issue when trying to instantiate a WorkspaceClient in a Docker image running Databricks. Let’s troubleshoot this! The error message you’re seeing, “default auth: cannot configure defau...

0 kudos

25m ago

by georgef • Visitor

28m ago

16 Views
0 replies
0 kudos

Cannot import relative python paths

Hello,Some variations of this question have been asked before but there doesn't seem to be an answer for the following simple use case:I have the following file structure on a Databricks Asset Bundles project: src --dir1 ----file1.py --dir2 ----file2...

Data Engineering

16 Views
0 replies
0 kudos

28m ago

by Anske • New Contributor II

2 weeks ago

117 Views
5 replies
1 kudos

Resolved! DLT apply_changes applies only deletes and inserts not updates

Hi,I have a DLT pipeline that applies changes from a source table (cdctest_cdc_enriched) to a target table (cdctest), by the following code:dlt.apply_changes( target = "cdctest", source = "cdctest_cdc_enriched", keys = ["ID"], sequence_by...

Data Engineering

Delta Live Tables

117 Views
5 replies
1 kudos

2 weeks ago

View Replies

Latest Reply

Kaniz
Community Manager

4 hours ago

1 kudos

Hi @Anske, It seems you’re encountering an issue with your Delta Live Tables (DLT) pipeline where updates from the source table are not being correctly applied to the target table. Let’s troubleshoot this together! Pipeline Update Process: Whe...

1 kudos

4 hours ago

4 More Replies

by Menegat • Visitor

33m ago

16 Views
0 replies
0 kudos

VACUUM seems to be deleting Autoloader's log files.

Hello everyone,I have a workflow setup that updates a few Delta tables incrementally with autoloader three times a day. Additionally, I run a separate workflow that performs VACUUM and OPTIMIZE on these tables once a week.The issue I'm facing is that...

Data Engineering

16 Views
0 replies
0 kudos

33m ago

by jainshasha • New Contributor

Monday

93 Views
6 replies
0 kudos

Job Cluster in Databricks workflow

Hi,I have configured 20 different workflows in Databricks. All of them configured with job cluster with different name. All 20 workfldows scheduled to run at same time. But even configuring different job cluster in all of them they run sequentially w...

Data Engineering

93 Views
6 replies
0 kudos

Monday

View Replies

Latest Reply

Wojciech_BUK
Contributor III

4 hours ago

0 kudos

HI @jainshasha i tried to replicate your problem but in my case i was able to run jobs in parallel(the only difference is that i am running notebook from workspace, not from repo)As you can see jobs did not started exactly same time but it run in par...

0 kudos

4 hours ago

5 More Replies

by Ameshj • New Contributor

Wednesday

285 Views
7 replies
0 kudos

Dbfs init script migration

I need help with migrating from dbfs on databricks to workspace. I am new to databricks and am struggling with what is on the links provided.My workspace.yml also has dbfs hard-coded. Included is a full deployment with great expectations.This was don...

Data Engineering

Azure Databricks

dbfs

Great expectations

python

285 Views
7 replies
0 kudos

Wednesday

View Replies

Latest Reply

NandiniN
Valued Contributor II

an hour ago

0 kudos

One of the other suggestions is to use Lakehouse Federation. It is possible it may be a driver issue (we will get to know from the logs)

0 kudos

an hour ago

6 More Replies

by ashraf1395 • Visitor

6 hours ago

55 Views
3 replies
2 kudos

Resolved! Optimising Clusters in Databricks on GCP

Hi there everyone,We are trying to get hands on Databricks Lakehouse for a prospective client's project.Our Major aim for the project is to Compare Datalakehosue on Databricks and Bigquery Datawarehouse in terms of Costs and time to setup and run que...

Data Engineering

55 Views
3 replies
2 kudos

6 hours ago

View Replies

Latest Reply

Kaniz
Community Manager

6 hours ago

2 kudos

Hi @ashraf1395, Comparing Databricks Lakehouse and Google BigQuery is essential to make an informed decision for your project. Let’s address your questions: Cluster Configurations for Databricks: Databricks provide flexibility in configuring com...

2 kudos

6 hours ago

2 More Replies

by tanjil • New Contributor III

03-10-2022 7:41:15 AM

8690 Views
8 replies
6 kudos

Resolved! Downloading sharepoint lists using python

Hello, I am trying to download lists from SharePoint into a pandas dataframe. However I cannot get any information successfully. I have attempted many solution mentioned in stackoverflow. Below is one of those attempts: # https://pypi.org/project/sha...

Data Engineering

8690 Views
8 replies
6 kudos

03-10-2022 7:41:15 AM

View Replies

Latest Reply

huntaccess
Visitor

2 hours ago

6 kudos

The error "<urlopen error [Errno -2] Name or service not known>" suggests that there's an issue with the server URL or network connectivity. Double-check the server URL to ensure it's correct and accessible. Also, verify that your network connection ...

6 kudos

2 hours ago

7 More Replies

by RabahO • New Contributor III

6 hours ago

29 Views
2 replies
0 kudos

Dashboard always display truncated data

Hello, we're working with a serverless SQL cluster to query Delta tables and display some analytics in dashboards. We have some basic group by queries that generate around 36k lines, and they are executed without the "limit" key word. So in the data ...

Data Engineering

29 Views
2 replies
0 kudos

6 hours ago

View Replies

Latest Reply

mhiltner
New Contributor II

3 hours ago

0 kudos

Hey @RabahO This is likely a memory issue. The current behavior is that Databricks will only attempt to display the first 64000 rows of data. If the first 64000 rows of data are larger than 2187 MB, then it will fail to display anything. In your cas...

0 kudos

3 hours ago

1 More Replies

by pragarwal • New Contributor II

7 hours ago

33 Views
2 replies
0 kudos

Adding Member to group using account databricks rest api

Hi All,I want to add a member to a group in databricks account level using rest api (https://docs.databricks.com/api/azure/account/accountgroups/patch) as mentioned in this link I could able to authenticate but not able to add member while using belo...

Data Engineering

33 Views
2 replies
0 kudos

7 hours ago

View Replies

Latest Reply

pragarwal
New Contributor II

2 hours ago

0 kudos

Hi @Kaniz I have tried suggest body also but still member is not added to group. is there any other method that i can use add member to the group at account levelThanks,Phani.

0 kudos

2 hours ago

1 More Replies

by smedegaard • New Contributor III

2 weeks ago

604 Views
3 replies
0 kudos

DLT run filas with "com.databricks.cdc.spark.DebeziumJDBCMicroBatchProvider not found"

I've created a streaming live table from a foreign catalog. When I run the DLT pipeline it fils with "com.databricks.cdc.spark.DebeziumJDBCMicroBatchProvider not found".I haven't seen any documentation that suggests I need to install Debezium manuall...

Data Engineering

604 Views
3 replies
0 kudos

2 weeks ago

View Replies

Latest Reply

Kaniz
Community Manager

3 hours ago

0 kudos

Hi @smedegaard, The error message you’re encountering, “com.databricks.cdc.spark.DebeziumJDBCMicroBatchProvider not found,” indicates that the specified class is not available in your classpath. To address this issue, follow these steps: Verif...

0 kudos

3 hours ago

2 More Replies

by Chengzhu • New Contributor

3 weeks ago

124 Views
1 replies
0 kudos

Databricks Model Registry Notification

Hi community,Currently, I am training models on databricks cluster and use mlflow to log and register models. My goal is to send notification to me when a new version of registered model happens (if the new run achieves some model performance baselin...

Data Engineering

124 Views
1 replies
0 kudos

3 weeks ago

View Replies

Latest Reply

Kaniz
Community Manager

3 hours ago

0 kudos

Hi @Chengzhu, It seems like you’re using MLflow’s Model Registry to manage the lifecycle of your machine learning models. Let’s explore this further. The MLflow Model Registry provides a centralized model store, APIs, and a UI to collaboratively m...

0 kudos

3 hours ago

by EWhitley • New Contributor II

2 weeks ago

254 Views
1 replies
0 kudos

Custom ENUM input as parameter for SQL UDF?

Hello - We're migrating from T-SQL to Spark SQL. We're migrating a significant number of queries."datediff(unit, start,end)" is different between these two implementations (in a good way). For the purpose of migration, we'd like to stay as consiste...

Data Engineering

sql

udf

254 Views
1 replies
0 kudos

2 weeks ago

View Replies

Latest Reply

Kaniz
Community Manager

3 hours ago

0 kudos

Hi @EWhitley, You’re on the right track with creating a custom UDF in Python for your migration. To achieve similar behaviour to the T-SQL DATEDIFF function with an enum-like unit parameter, you can follow these steps: Create a Custom UDF: Define...

0 kudos

3 hours ago

by YannLevavasseur • New Contributor

2 weeks ago

341 Views
1 replies
0 kudos

SQL function refactoring into Databricks environment

Hello all,I'm currently working on importing some SQL functions from Informix Database into Databricks using Asset Bundle deploying Delta Live Table to Unity Catalog. I'm struggling importing a recursive one, there is the code :CREATE FUNCTION "info...

Data Engineering

341 Views
1 replies
0 kudos

2 weeks ago

View Replies

Latest Reply

Kaniz
Community Manager

3 hours ago

0 kudos

Hi @YannLevavasseur, It looks like you’re dealing with a recursive SQL function for calculating the weight of articles in a Databricks environment. Handling recursion in SQL can be tricky, especially when translating existing Informix code to Data...

0 kudos

3 hours ago

User

Count

1603

736

344

284

247

Databricks

Forum Posts

Custom Spark Extension in SQL Warehouse

WorkspaceClient authentication fails when running on a Docker cluster

Cannot import relative python paths

Resolved! DLT apply_changes applies only deletes and inserts not updates

VACUUM seems to be deleting Autoloader's log files.

Job Cluster in Databricks workflow

Dbfs init script migration

Resolved! Optimising Clusters in Databricks on GCP

Resolved! Downloading sharepoint lists using python

Dashboard always display truncated data

Adding Member to group using account databricks rest api

DLT run filas with "com.databricks.cdc.spark.DebeziumJDBCMicroBatchProvider not found"

Databricks Model Registry Notification

Custom ENUM input as parameter for SQL UDF?

SQL function refactoring into Databricks environment

Optimising Clusters in Databricks on GCP

DLT apply_changes applies only deletes and inserts...

Azure Data Factory and Photon

Scheduled job output export

Upload file from local file system to Unity Catalo...