Data Engineering

Forum Posts

Sorted by:

by dlaxminaresh • New Contributor

04-09-2024 11:49:25 PM

322 Views
1 replies
0 kudos

what config do we use to set row groups fro delta tables on data bricks.

I have tried multiples way to set row group for delta tables on data bricks notebook its not working where as I am able to set it properly using spark.I tried 1. val blockSize = 1024 * 1024 * 60spark.sparkContext.hadoopConfiguration.setInt( "dfs.bloc...

Data Engineering

322 Views
1 replies
0 kudos

04-09-2024 11:49:25 PM

View Replies

Latest Reply

Kaniz
Community Manager

2 weeks ago

0 kudos

Hi @dlaxminaresh, Setting row groups for Delta tables in Databricks can be a bit tricky, but let’s explore some options to achieve this. First, let’s address the approaches you’ve tried: Setting Block Sizes: You’ve attempted to set the block size...

0 kudos

2 weeks ago

by JonathanFlint • New Contributor

04-10-2024 5:18:55 AM

328 Views
1 replies
0 kudos

DevOps Asset Bundle Deployment to Change the Catalog a Job Writes to

I am trying to set up CI/CD with azure devops and 3 workspaces, dev, test, prod using asset bundlesAll 3 workspaces will have their own catalog in unity catalog. I can't find a way to change which catalog should be used by the jobs and dlt pipelines ...

Data Engineering

328 Views
1 replies
0 kudos

04-10-2024 5:18:55 AM

View Replies

Latest Reply

Kaniz
Community Manager

2 weeks ago

0 kudos

Hi @JonathanFlint, Setting up CI/CD with Azure DevOps for Unity projects involving multiple workspaces and catalogs can be achieved. Here are some approaches you can consider: Catalog Switching at Runtime: At the beginning of your program, issue ...

0 kudos

2 weeks ago

by naveenanto • New Contributor III

04-10-2024 9:58:38 AM

207 Views
1 replies
0 kudos

Custom Spark Extension in SQL Warehouse

I understand only a limited spark configurations are supported in SQL Warehouse but is it possible to add spark extensions to SQL Warehouse clusters?Use Case: We've a few restricted table properties. We prevent that with spark extensions installed in...

Data Engineering

sql-warehouse

207 Views
1 replies
0 kudos

04-10-2024 9:58:38 AM

View Replies

Latest Reply

Kaniz
Community Manager

2 weeks ago

0 kudos

Hi @naveenanto, While SQL Data Warehouse (now known as Azure Synapse Analytics) has some limitations when it comes to Spark configurations, you can indeed extend its capabilities by adding custom Spark extensions. Let me provide you with some inform...

0 kudos

2 weeks ago

by ashraf1395 • New Contributor II

2 weeks ago

232 Views
3 replies
2 kudos

Resolved! Optimising Clusters in Databricks on GCP

Hi there everyone,We are trying to get hands on Databricks Lakehouse for a prospective client's project.Our Major aim for the project is to Compare Datalakehosue on Databricks and Bigquery Datawarehouse in terms of Costs and time to setup and run que...

Data Engineering

232 Views
3 replies
2 kudos

2 weeks ago

View Replies

Latest Reply

Kaniz
Community Manager

2 weeks ago

2 kudos

Hi @ashraf1395, Comparing Databricks Lakehouse and Google BigQuery is essential to make an informed decision for your project. Let’s address your questions: Cluster Configurations for Databricks: Databricks provide flexibility in configuring com...

2 kudos

2 weeks ago

2 More Replies

by tanjil • New Contributor III

03-10-2022 7:41:15 AM

8981 Views
8 replies
6 kudos

Resolved! Downloading sharepoint lists using python

Hello, I am trying to download lists from SharePoint into a pandas dataframe. However I cannot get any information successfully. I have attempted many solution mentioned in stackoverflow. Below is one of those attempts: # https://pypi.org/project/sha...

Data Engineering

8981 Views
8 replies
6 kudos

03-10-2022 7:41:15 AM

View Replies

Latest Reply

huntaccess
New Contributor II

2 weeks ago

6 kudos

The error "<urlopen error [Errno -2] Name or service not known>" suggests that there's an issue with the server URL or network connectivity. Double-check the server URL to ensure it's correct and accessible. Also, verify that your network connection ...

6 kudos

2 weeks ago

7 More Replies

by RabahO • New Contributor III

2 weeks ago

148 Views
2 replies
0 kudos

Dashboard always display truncated data

Hello, we're working with a serverless SQL cluster to query Delta tables and display some analytics in dashboards. We have some basic group by queries that generate around 36k lines, and they are executed without the "limit" key word. So in the data ...

Data Engineering

148 Views
2 replies
0 kudos

2 weeks ago

View Replies

Latest Reply

mhiltner
New Contributor II

2 weeks ago

0 kudos

Hey @RabahO This is likely a memory issue. The current behavior is that Databricks will only attempt to display the first 64000 rows of data. If the first 64000 rows of data are larger than 2187 MB, then it will fail to display anything. In your cas...

0 kudos

2 weeks ago

1 More Replies

by pragarwal • New Contributor II

2 weeks ago

147 Views
2 replies
0 kudos

Adding Member to group using account databricks rest api

Hi All,I want to add a member to a group in databricks account level using rest api (https://docs.databricks.com/api/azure/account/accountgroups/patch) as mentioned in this link I could able to authenticate but not able to add member while using belo...

Data Engineering

147 Views
2 replies
0 kudos

2 weeks ago

View Replies

Latest Reply

pragarwal
New Contributor II

2 weeks ago

0 kudos

Hi @Kaniz I have tried suggest body also but still member is not added to group. is there any other method that i can use add member to the group at account levelThanks,Phani.

0 kudos

2 weeks ago

1 More Replies

by smedegaard • New Contributor III

a month ago

688 Views
3 replies
0 kudos

DLT run filas with "com.databricks.cdc.spark.DebeziumJDBCMicroBatchProvider not found"

I've created a streaming live table from a foreign catalog. When I run the DLT pipeline it fils with "com.databricks.cdc.spark.DebeziumJDBCMicroBatchProvider not found".I haven't seen any documentation that suggests I need to install Debezium manuall...

Data Engineering

688 Views
3 replies
0 kudos

a month ago

View Replies

Latest Reply

Kaniz
Community Manager

2 weeks ago

0 kudos

Hi @smedegaard, The error message you’re encountering, “com.databricks.cdc.spark.DebeziumJDBCMicroBatchProvider not found,” indicates that the specified class is not available in your classpath. To address this issue, follow these steps: Verif...

0 kudos

2 weeks ago

2 More Replies

by Chengzhu • New Contributor

a month ago

142 Views
1 replies
0 kudos

Databricks Model Registry Notification

Hi community,Currently, I am training models on databricks cluster and use mlflow to log and register models. My goal is to send notification to me when a new version of registered model happens (if the new run achieves some model performance baselin...

Data Engineering

142 Views
1 replies
0 kudos

a month ago

View Replies

Latest Reply

Kaniz
Community Manager

2 weeks ago

0 kudos

Hi @Chengzhu, It seems like you’re using MLflow’s Model Registry to manage the lifecycle of your machine learning models. Let’s explore this further. The MLflow Model Registry provides a centralized model store, APIs, and a UI to collaboratively m...

0 kudos

2 weeks ago

by EWhitley • New Contributor III

4 weeks ago

327 Views
1 replies
0 kudos

Custom ENUM input as parameter for SQL UDF?

Hello - We're migrating from T-SQL to Spark SQL. We're migrating a significant number of queries."datediff(unit, start,end)" is different between these two implementations (in a good way). For the purpose of migration, we'd like to stay as consiste...

Data Engineering

sql

udf

327 Views
1 replies
0 kudos

4 weeks ago

View Replies

Latest Reply

Kaniz
Community Manager

2 weeks ago

0 kudos

Hi @EWhitley, You’re on the right track with creating a custom UDF in Python for your migration. To achieve similar behaviour to the T-SQL DATEDIFF function with an enum-like unit parameter, you can follow these steps: Create a Custom UDF: Define...

0 kudos

2 weeks ago

by YannLevavasseur • New Contributor

3 weeks ago

395 Views
1 replies
0 kudos

SQL function refactoring into Databricks environment

Hello all,I'm currently working on importing some SQL functions from Informix Database into Databricks using Asset Bundle deploying Delta Live Table to Unity Catalog. I'm struggling importing a recursive one, there is the code :CREATE FUNCTION "info...

Data Engineering

395 Views
1 replies
0 kudos

3 weeks ago

View Replies

Latest Reply

Kaniz
Community Manager

2 weeks ago

0 kudos

Hi @YannLevavasseur, It looks like you’re dealing with a recursive SQL function for calculating the weight of articles in a Databricks environment. Handling recursion in SQL can be tricky, especially when translating existing Informix code to Data...

0 kudos

2 weeks ago

by Sambit_S • New Contributor II

3 weeks ago

312 Views
1 replies
0 kudos

Error during deserializing protobuf data

I am receiving protobuf data in a json attribute and along with it I receive a descriptor file.I am using from_protobuf to deserialize the data as below,It works most of the time but giving error when there are some recursive fields within the protob...

Data Engineering

312 Views
1 replies
0 kudos

3 weeks ago

View Replies

Latest Reply

Kaniz
Community Manager

2 weeks ago

0 kudos

Hi @Sambit_S, Handling recursive fields in Protobuf can indeed be tricky, especially when deserializing data. Let’s explore some potential solutions to address this issue: Casting Issue with Recursive Fields: The error you’re encountering might b...

0 kudos

2 weeks ago

by Skr7 • New Contributor II

2 weeks ago

321 Views
1 replies
0 kudos

Databricks Asset Bundles

Hi, I'm implementing Databricks Asset bundles, my scripts are in GitHub and my /resource has all the .yml of my Databricks workflow which are pointing to the main branch git_source: git_url: https://github.com/xxxx git_provider: ...

Data Engineering

Databricks

321 Views
1 replies
0 kudos

2 weeks ago

View Replies

Latest Reply

Kaniz
Community Manager

2 weeks ago

0 kudos

Hi @Skr7 , Let’s break down your requirements: Dynamically Changing Git Branch for Databricks Asset Bundles (DABs): When deploying and running your DAB, you want the Databricks workflows to point to your feature branch instead of the main branch....

0 kudos

2 weeks ago

by dbdude • New Contributor II

08-17-2023 4:01:48 PM

4983 Views
7 replies
0 kudos

AWS Secrets Works In One Cluster But Not Another

Why can I use boto3 to go to secrets manager to retrieve a secret with a personal cluster but I get an error with a shared cluster?NoCredentialsError: Unable to locate credentials

Data Engineering

4983 Views
7 replies
0 kudos

08-17-2023 4:01:48 PM

View Replies

Latest Reply

Kaniz
Community Manager

2 weeks ago

0 kudos

Hi @dbdude and @drii_cavalcanti , The NoCredentialsError you’re encountering when using Boto3 to retrieve a secret from AWS Secrets Manager typically indicates that the AWS SDK is unable to find valid credentials for your API request. Let’s explor...

0 kudos

2 weeks ago

6 More Replies

by Skr7 • New Contributor II

09-21-2023 7:27:06 AM

1303 Views
2 replies
1 kudos

Resolved! Scheduled job output export

Hi ,I have a Databricks job that results in a dashboard post run , I'm able to download the dashboard as HTML from the view job runs page , but I want to automate the process , so I tried using the Databricks API , but it says {"error_code":"INVALID_...

Data Engineering

data engineering

1303 Views
2 replies
1 kudos

09-21-2023 7:27:06 AM

View Replies

Latest Reply

Kaniz
Community Manager

09-22-2023 12:13:20 AM

1 kudos

Hi @Skr7, You cannot automate exporting the dashboard as HTML using the Databricks API. The Databricks API only supports exporting results for notebook task runs, not for job run dashboards. Here's the relevant excerpt from the provided sources: Exp...

1 kudos

09-22-2023 12:13:20 AM

1 More Replies

User

Count

1603

737

344

284

247

Databricks

Forum Posts

what config do we use to set row groups fro delta tables on data bricks.

DevOps Asset Bundle Deployment to Change the Catalog a Job Writes to

Custom Spark Extension in SQL Warehouse

Resolved! Optimising Clusters in Databricks on GCP

Resolved! Downloading sharepoint lists using python

Dashboard always display truncated data

Adding Member to group using account databricks rest api

DLT run filas with "com.databricks.cdc.spark.DebeziumJDBCMicroBatchProvider not found"

Databricks Model Registry Notification

Custom ENUM input as parameter for SQL UDF?

SQL function refactoring into Databricks environment

Error during deserializing protobuf data

Databricks Asset Bundles

AWS Secrets Works In One Cluster But Not Another

Resolved! Scheduled job output export

External table from external location

How to increase executor memory in Databricks jobs

Databricks job keep getting failed due to executor...

Set up connection to on prem sql server

Git Integration with Databricks Query Files and Az...