cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Lazloo
by New Contributor III
  • 1016 Views
  • 2 replies
  • 0 kudos

Using spark jars using databricks-connect>=13.0

With the newest version of databricks-connect, I cannot configure the extra jars I want to use. In the older version, I did that viaspark = SparkSession.builder.appName('DataFrame').\ config('spark.jars.packages','org.apache.spark:spark-avro_...

  • 1016 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Lazloo, In the newer versions of Databricks Connect, configuring additional JARs for your Spark session is still possible.   Let’s adapt your previous approach to the latest version.   Adding JARs to a Databricks cluster: If you want to add JAR f...

  • 0 kudos
1 More Replies
mannepk85
by New Contributor II
  • 139 Views
  • 2 replies
  • 0 kudos

Get run details of a databricks job that provides similar data without using api '/api/2.0/jobs/runs

I have a notebook, which is attached to a task at the end of a job. This task will pull the status of all other tasks in the job and checks if they are success or failure. Depending on the result, this last task will send a slack notification (custom...

  • 139 Views
  • 2 replies
  • 0 kudos
Latest Reply
Slash
New Contributor II
  • 0 kudos

Hi @mannepk85 ,You can take a look on jobs system table. Notice though, that it is in public preview now so use it with caution:  https://learn.microsoft.com/en-us/azure/databricks/admin/system-tables/jobs

  • 0 kudos
1 More Replies
johnp
by New Contributor III
  • 84 Views
  • 1 replies
  • 0 kudos

Get the external public IP of the Job Compute cluster

We just moved our workflow from "all purpose compute cluster" to "job compute cluster". We need to find out the external public IP of the Job Compute cluster.  On the all purpose compute cluster, we get the IP by attaching a notebook and run the comm...

  • 84 Views
  • 1 replies
  • 0 kudos
Latest Reply
johnp
New Contributor III
  • 0 kudos

I found the following IPs from the Cluster JSON file:"driver": {"private_ip": "10.*.*.*","public_dns": "172.*.*.*","node_id": "80*****",Similar the executors configuration"executors": [{"private_ip": "10.*.*.*","public_dns": "172.*.*.*","node_id": "7...

  • 0 kudos
seefoods
by New Contributor III
  • 96 Views
  • 1 replies
  • 1 kudos

Resolved! use dbutils outside a notebook

Hello everyone, I want to use dbtuil function outside my notebook, so i will use it in my external jar.I have add dbutil librairies in my build.sbt file "com.databricks" %% "dbutils-api" % "0.0.6"I have import the librairie on top of my code import c...

  • 96 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @seefoods, In order to use the dbutils functions, you'll need to initialize the dbutils instance. You can do this by adding the following code at the beginning of your Jar's main function:   val dbutils = com.databricks.dbutils_v1.DBUtilsHolder...

  • 1 kudos
Avinash_Narala
by Contributor
  • 102 Views
  • 1 replies
  • 1 kudos

Resolved! Serverless Cluster Issue

Hi,While using Serverless cluster I'm not able to access dbfs files, saying I don't have permission to the file.But while accessing them using All-purpose cluster I'm able to access them.Why am I facing this issue?

  • 102 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @Avinash_Narala,  When you use a Serverless cluster, it’s associated with a Databricks-managed IAM role that accesses AWS resources. However, this role might lack the necessary permissions to access DBFS resources in your account.On the other hand...

  • 1 kudos
Poovarasan
by New Contributor III
  • 445 Views
  • 7 replies
  • 1 kudos

Error while installing ODBC to shared cluster

I previously used the following script to install and configure the ODBC driver on our shared cluster in Databricks, and it was functioning correctly. However, I am currently experiencing issues where the installation is not working as expected. Plea...

  • 445 Views
  • 7 replies
  • 1 kudos
Latest Reply
imsabarinath
New Contributor II
  • 1 kudos

The below approach is working for me... I had to download the packages upfront and place it on a volume though.#!/bin/bashset -euxo pipefailecho 'debconf debconf/frontend select Noninteractive' | debconf-set-selectionssudo ACCEPT_EULA=Y dpkg -i odbci...

  • 1 kudos
6 More Replies
Skr7
by New Contributor II
  • 737 Views
  • 2 replies
  • 0 kudos

Databricks Asset Bundles

Hi, I'm implementing Databricks Asset bundles, my scripts are in GitHub and my /resource has all the .yml of my Databricks workflow which are pointing to the main branch      git_source: git_url: https://github.com/xxxx git_provider: ...

Data Engineering
Databricks
  • 737 Views
  • 2 replies
  • 0 kudos
Latest Reply
JacekLaskowski
New Contributor III
  • 0 kudos

Why not use Substitutions and Custom variables that can be specified on command line using --var="<key>=<value>"?With all the features your databricks.yml would look as follows:variables:  git_branch:    default: maingit_source:  git_url: https://git...

  • 0 kudos
1 More Replies
PB-Data
by New Contributor II
  • 458 Views
  • 2 replies
  • 1 kudos

right semi join

Hi All,I am having issue running a simple right semi join in my community databricks edition.select * from Y right semi join X on Y.y = X.a;Error : [PARSE_SYNTAX_ERROR] Syntax error at or near 'semi': extra input 'semi'. Not sure what is the issue wi...

  • 458 Views
  • 2 replies
  • 1 kudos
Latest Reply
PB-Data
New Contributor II
  • 1 kudos

Thanks @Slash 

  • 1 kudos
1 More Replies
NCat
by New Contributor III
  • 4938 Views
  • 6 replies
  • 0 kudos

ipywidgets: Uncaught RefferenceError require is not defined

Hi,When I tried to use ipywidgets, it returns the following error.I’m using Databricks with PrivateLink enabled on AWS, and Runtime version is 12.2 LTS.Is there something that I need to use ipywidgets in my environment?

CA0045C4-83C6-46FC-95DC-6857199FE69D.jpeg
  • 4938 Views
  • 6 replies
  • 0 kudos
Latest Reply
jvjvjvjvjv
New Contributor II
  • 0 kudos

I am currently experiencing the same error, Azure DataBricks, Runtime version is 15.3 ML, default Notebook Editor.

  • 0 kudos
5 More Replies
Avinash_Narala
by Contributor
  • 139 Views
  • 1 replies
  • 0 kudos

Resolved! Liquid clustering vs partitioning

Hi,Is liquid clustering a replacement to partitioning?should we use still partitioning when we use liquid clustering?Can we use liquid clustering for all cases and ignore partitioning?

  • 139 Views
  • 1 replies
  • 0 kudos
Latest Reply
Slash
New Contributor II
  • 0 kudos

Hi @Avinash_Narala Yeah, you can think of it as a partitioning replacement. According with documentation: https://learn.microsoft.com/en-us/azure/databricks/delta/clusteringDelta Lake liquid clustering replaces table partitioning and ZORDER to simpli...

  • 0 kudos
PushkarDeole
by New Contributor II
  • 132 Views
  • 2 replies
  • 0 kudos

State store configuration with applyInPandasWithState for optimal performance

Hello,We are using a stateful pipeline for data processing and analytics. For state store, we are using applyInPandasWithState function however the state needs to be persistent across node restarts etc. At this point, we are not sure how the state ca...

  • 132 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @PushkarDeole, To leverage RocksDB as the state store with `applyInPandasWithState` in Databricks, configure your Spark session with the following setting: spark.conf.set("spark.sql.streaming.stateStore.providerClass", "com.databricks.sql.streamin...

  • 0 kudos
1 More Replies
Monsem
by New Contributor III
  • 5791 Views
  • 10 replies
  • 3 kudos

Resolved! No Course Materials Widget below Lesson

Hello everyone,In my Databricks partner academy account, there is no course material while it should be under the lesson video. How can I resolve this problem? Does anyone else face the same problem? I had submitted a ticket to ask Databricks team bu...

  • 5791 Views
  • 10 replies
  • 3 kudos
Latest Reply
Medhat_Elassi
New Contributor II
  • 3 kudos

I have the same problem, can't find the course materials, only the slides in the last section.

  • 3 kudos
9 More Replies
youcanlearn
by New Contributor III
  • 197 Views
  • 2 replies
  • 2 kudos

Saving failed records with failed expectation name(s)

Hi all,I am using Databricks expectations to manage my data quality. But I wanted to save the failed records along side with the expectation name(s) - one or many - that the record failed. The only way I figure out is, not to use Databricks expectati...

  • 197 Views
  • 2 replies
  • 2 kudos
Latest Reply
iakshaykr
New Contributor II
  • 2 kudos

@youcanlearn Have you explore this : https://docs.databricks.com/en/delta-live-tables/expectations.html  

  • 2 kudos
1 More Replies
erwingm10
by New Contributor
  • 100 Views
  • 1 replies
  • 0 kudos

Get Level Cluster Metrics

Im looking for a way to Optimize the consumption of the jobs in my company and the last piece of data to achieve this is the statistics of the Cluster Level Metrics called Active Tasks over time. Do we have any way to get this? Seems easy when is alr...

  • 100 Views
  • 1 replies
  • 0 kudos
Latest Reply
Slash
New Contributor II
  • 0 kudos

 Hi @erwingm10 ,Unfortunately, currently that there is no direct endpoint in REST API to get cluster metrics. You can extract some ganglia metrics through custom scripting, but they're not so detailed like the one you looking for.Look at below links ...

  • 0 kudos
Avinash_Narala
by Contributor
  • 97 Views
  • 1 replies
  • 0 kudos

shared serverless vs dedicated serverless?

Hi All,I gone through https://docs.databricks.com/en/admin/system-tables/serverless-billing.html and wondering..How serverless compute is shared across workloads.is there a option to setup that? difference between shared serverless vs dedicated serve...

  • 97 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Avinash_Narala,  Serverless Compute Overview: Serverless compute allows you to run jobs and notebooks without managing infrastructure. It’s designed for simplicity and efficiency.With serverless compute, you focus on implementing your data pr...

  • 0 kudos
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels
Top Kudoed Authors