Data Engineering

Forum Posts

Sorted by:

by Lazloo • New Contributor III

12-05-2023 4:25:10 AM

1016 Views
2 replies
0 kudos

Using spark jars using databricks-connect>=13.0

With the newest version of databricks-connect, I cannot configure the extra jars I want to use. In the older version, I did that viaspark = SparkSession.builder.appName('DataFrame').\ config('spark.jars.packages','org.apache.spark:spark-avro_...

Data Engineering

1016 Views
2 replies
0 kudos

12-05-2023 4:25:10 AM

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

12-05-2023 11:57:35 PM

0 kudos

Hi @Lazloo, In the newer versions of Databricks Connect, configuring additional JARs for your Spark session is still possible. Let’s adapt your previous approach to the latest version. Adding JARs to a Databricks cluster: If you want to add JAR f...

0 kudos

12-05-2023 11:57:35 PM

1 More Replies

by mannepk85 • New Contributor II

2 weeks ago

139 Views
2 replies
0 kudos

Get run details of a databricks job that provides similar data without using api '/api/2.0/jobs/runs

I have a notebook, which is attached to a task at the end of a job. This task will pull the status of all other tasks in the job and checks if they are success or failure. Depending on the result, this last task will send a slack notification (custom...

Data Engineering

139 Views
2 replies
0 kudos

2 weeks ago

View Replies

Latest Reply

Slash
New Contributor II

2 weeks ago

0 kudos

Hi @mannepk85 ,You can take a look on jobs system table. Notice though, that it is in public preview now so use it with caution: https://learn.microsoft.com/en-us/azure/databricks/admin/system-tables/jobs

0 kudos

2 weeks ago

1 More Replies

by johnp • New Contributor III

a week ago

84 Views
1 replies
0 kudos

Get the external public IP of the Job Compute cluster

We just moved our workflow from "all purpose compute cluster" to "job compute cluster". We need to find out the external public IP of the Job Compute cluster. On the all purpose compute cluster, we get the IP by attaching a notebook and run the comm...

Data Engineering

84 Views
1 replies
0 kudos

a week ago

View Replies

Latest Reply

johnp
New Contributor III

a week ago

0 kudos

I found the following IPs from the Cluster JSON file:"driver": {"private_ip": "10.*.*.*","public_dns": "172.*.*.*","node_id": "80*****",Similar the executors configuration"executors": [{"private_ip": "10.*.*.*","public_dns": "172.*.*.*","node_id": "7...

0 kudos

a week ago

by seefoods • New Contributor III

2 weeks ago

96 Views
1 replies
1 kudos

Resolved! use dbutils outside a notebook

Hello everyone, I want to use dbtuil function outside my notebook, so i will use it in my external jar.I have add dbutil librairies in my build.sbt file "com.databricks" %% "dbutils-api" % "0.0.6"I have import the librairie on top of my code import c...

Data Engineering

96 Views
1 replies
1 kudos

2 weeks ago

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

a week ago

1 kudos

Hi @seefoods, In order to use the dbutils functions, you'll need to initialize the dbutils instance. You can do this by adding the following code at the beginning of your Jar's main function: val dbutils = com.databricks.dbutils_v1.DBUtilsHolder...

1 kudos

a week ago

by Avinash_Narala • Contributor

a week ago

102 Views
1 replies
1 kudos

Resolved! Serverless Cluster Issue

Hi,While using Serverless cluster I'm not able to access dbfs files, saying I don't have permission to the file.But while accessing them using All-purpose cluster I'm able to access them.Why am I facing this issue?

Data Engineering

102 Views
1 replies
1 kudos

a week ago

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

a week ago

1 kudos

Hi @Avinash_Narala, When you use a Serverless cluster, it’s associated with a Databricks-managed IAM role that accesses AWS resources. However, this role might lack the necessary permissions to access DBFS resources in your account.On the other hand...

1 kudos

a week ago

by Poovarasan • New Contributor III

4 weeks ago

445 Views
7 replies
1 kudos

Error while installing ODBC to shared cluster

I previously used the following script to install and configure the ODBC driver on our shared cluster in Databricks, and it was functioning correctly. However, I am currently experiencing issues where the installation is not working as expected. Plea...

Data Engineering

445 Views
7 replies
1 kudos

4 weeks ago

View Replies

Latest Reply

imsabarinath
New Contributor II

a week ago

1 kudos

The below approach is working for me... I had to download the packages upfront and place it on a volume though.#!/bin/bashset -euxo pipefailecho 'debconf debconf/frontend select Noninteractive' | debconf-set-selectionssudo ACCEPT_EULA=Y dpkg -i odbci...

1 kudos

a week ago

6 More Replies

by Skr7 • New Contributor II

05-06-2024 3:00:39 AM

737 Views
2 replies
0 kudos

Databricks Asset Bundles

Hi, I'm implementing Databricks Asset bundles, my scripts are in GitHub and my /resource has all the .yml of my Databricks workflow which are pointing to the main branch git_source: git_url: https://github.com/xxxx git_provider: ...

Data Engineering

Databricks

737 Views
2 replies
0 kudos

05-06-2024 3:00:39 AM

View Replies

Latest Reply

JacekLaskowski
New Contributor III

a week ago

0 kudos

Why not use Substitutions and Custom variables that can be specified on command line using --var="<key>=<value>"?With all the features your databricks.yml would look as follows:variables: git_branch: default: maingit_source: git_url: https://git...

0 kudos

a week ago

1 More Replies

by PB-Data • New Contributor II

a week ago

458 Views
2 replies
1 kudos

right semi join

Hi All,I am having issue running a simple right semi join in my community databricks edition.select * from Y right semi join X on Y.y = X.a;Error : [PARSE_SYNTAX_ERROR] Syntax error at or near 'semi': extra input 'semi'. Not sure what is the issue wi...

Data Engineering

458 Views
2 replies
1 kudos

a week ago

View Replies

Latest Reply

PB-Data
New Contributor II

a week ago

1 kudos

Thanks @Slash

1 kudos

a week ago

1 More Replies

by NCat • New Contributor III

08-04-2023 8:47:20 AM

4938 Views
6 replies
0 kudos

ipywidgets: Uncaught RefferenceError require is not defined

Hi,When I tried to use ipywidgets, it returns the following error.I’m using Databricks with PrivateLink enabled on AWS, and Runtime version is 12.2 LTS.Is there something that I need to use ipywidgets in my environment?

Data Engineering

4938 Views
6 replies
0 kudos

08-04-2023 8:47:20 AM

View Replies

Latest Reply

jvjvjvjvjv
New Contributor II

a week ago

0 kudos

I am currently experiencing the same error, Azure DataBricks, Runtime version is 15.3 ML, default Notebook Editor.

0 kudos

a week ago

5 More Replies

by Avinash_Narala • Contributor

a week ago

139 Views
1 replies
0 kudos

Resolved! Liquid clustering vs partitioning

Hi,Is liquid clustering a replacement to partitioning?should we use still partitioning when we use liquid clustering?Can we use liquid clustering for all cases and ignore partitioning?

Data Engineering

139 Views
1 replies
0 kudos

a week ago

View Replies

Latest Reply

Slash
New Contributor II

a week ago

0 kudos

Hi @Avinash_Narala Yeah, you can think of it as a partitioning replacement. According with documentation: https://learn.microsoft.com/en-us/azure/databricks/delta/clusteringDelta Lake liquid clustering replaces table partitioning and ZORDER to simpli...

0 kudos

a week ago

by PushkarDeole • New Contributor II

2 weeks ago

132 Views
2 replies
0 kudos

State store configuration with applyInPandasWithState for optimal performance

Hello,We are using a stateful pipeline for data processing and analytics. For state store, we are using applyInPandasWithState function however the state needs to be persistent across node restarts etc. At this point, we are not sure how the state ca...

Data Engineering

132 Views
2 replies
0 kudos

2 weeks ago

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

2 weeks ago

0 kudos

Hi @PushkarDeole, To leverage RocksDB as the state store with `applyInPandasWithState` in Databricks, configure your Spark session with the following setting: spark.conf.set("spark.sql.streaming.stateStore.providerClass", "com.databricks.sql.streamin...

0 kudos

2 weeks ago

1 More Replies

by Monsem • New Contributor III

05-20-2024 11:04:14 PM

5791 Views
10 replies
3 kudos

Resolved! No Course Materials Widget below Lesson

Hello everyone,In my Databricks partner academy account, there is no course material while it should be under the lesson video. How can I resolve this problem? Does anyone else face the same problem? I had submitted a ticket to ask Databricks team bu...

Data Engineering

5791 Views
10 replies
3 kudos

05-20-2024 11:04:14 PM

View Replies

Latest Reply

Medhat_Elassi
New Contributor II

2 weeks ago

3 kudos

I have the same problem, can't find the course materials, only the slides in the last section.

3 kudos

2 weeks ago

9 More Replies

by youcanlearn • New Contributor III

2 weeks ago

197 Views
2 replies
2 kudos

Saving failed records with failed expectation name(s)

Hi all,I am using Databricks expectations to manage my data quality. But I wanted to save the failed records along side with the expectation name(s) - one or many - that the record failed. The only way I figure out is, not to use Databricks expectati...

Data Engineering

197 Views
2 replies
2 kudos

2 weeks ago

View Replies

Latest Reply

iakshaykr
New Contributor II

2 weeks ago

2 kudos

@youcanlearn Have you explore this : https://docs.databricks.com/en/delta-live-tables/expectations.html

2 kudos

2 weeks ago

1 More Replies

by erwingm10 • New Contributor

2 weeks ago

100 Views
1 replies
0 kudos

Get Level Cluster Metrics

Im looking for a way to Optimize the consumption of the jobs in my company and the last piece of data to achieve this is the statistics of the Cluster Level Metrics called Active Tasks over time. Do we have any way to get this? Seems easy when is alr...

Data Engineering

100 Views
1 replies
0 kudos

2 weeks ago

View Replies

Latest Reply

Slash
New Contributor II

2 weeks ago

0 kudos

Hi @erwingm10 ,Unfortunately, currently that there is no direct endpoint in REST API to get cluster metrics. You can extract some ganglia metrics through custom scripting, but they're not so detailed like the one you looking for.Look at below links ...

0 kudos

2 weeks ago

by Avinash_Narala • Contributor

2 weeks ago

97 Views
1 replies
0 kudos

shared serverless vs dedicated serverless?

Hi All,I gone through https://docs.databricks.com/en/admin/system-tables/serverless-billing.html and wondering..How serverless compute is shared across workloads.is there a option to setup that? difference between shared serverless vs dedicated serve...

Data Engineering

97 Views
1 replies
0 kudos

2 weeks ago

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

2 weeks ago

0 kudos

Hi @Avinash_Narala, Serverless Compute Overview: Serverless compute allows you to run jobs and notebooks without managing infrastructure. It’s designed for simplicity and efficiency.With serverless compute, you focus on implementing your data pr...

0 kudos

2 weeks ago

User

Count

1603

744

348

285

247

Databricks Community

Forum Posts

Using spark jars using databricks-connect>=13.0

Get run details of a databricks job that provides similar data without using api '/api/2.0/jobs/runs

Get the external public IP of the Job Compute cluster

Resolved! use dbutils outside a notebook

Resolved! Serverless Cluster Issue

Error while installing ODBC to shared cluster

Databricks Asset Bundles

right semi join

ipywidgets: Uncaught RefferenceError require is not defined

Resolved! Liquid clustering vs partitioning

State store configuration with applyInPandasWithState for optimal performance

Resolved! No Course Materials Widget below Lesson

Saving failed records with failed expectation name(s)

Get Level Cluster Metrics

shared serverless vs dedicated serverless?

Lost Databricks' dependency in a job.

Compute Policy Does Not Install Libraries

Is there a way to let the DLT pipeline retry by it...

Can't create Catalog on Databricks on AWS

Executing Notebooks - Run All Cells vs Run All Bel...