cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

jv_v
by New Contributor III
  • 830 Views
  • 2 replies
  • 2 kudos

Resolved! Azure SCIM Usage and Alternatives for Databricks

Hello Databricks Community,I'm exploring the use of Azure SCIM for our Databricks environment and have a few questions:How is Azure SCIM useful for Databricks? What are the specific benefits or advantages of using SCIM for user and group provisioning...

  • 830 Views
  • 2 replies
  • 2 kudos
Latest Reply
Rishabh_Tiwari
Community Manager
  • 2 kudos

HI @jv_v , Thank you for reaching out to our community! We're here to help you.  To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feedback n...

  • 2 kudos
1 More Replies
Erik_L
by Contributor II
  • 586 Views
  • 1 replies
  • 0 kudos

Workflow scheduler cancel unreliable

Workflow paramtersWarning: 4m 30s | Timeout: 6m 50sThe jobs took 20-50 minutes to cancel.This workflow must have high reliability for our requirements. Does anyone know why the scheduler failed this morning at ~5:20 AM PT?After several failures, we'r...

  • 586 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Erik_L, I’m sorry to hear about the issues you’re facing with the Databricks scheduler. There could be several reasons for the scheduler failure at ~5:20 AM PT.  If your cluster is running out of resources (CPU, memory), it might cause the schedu...

  • 0 kudos
JoseU
by New Contributor
  • 742 Views
  • 1 replies
  • 0 kudos

Cannot install libraries to cluster

Getting the following error when trying to install libraries to all purpose compute using the Library tab in Cluster details. We had vendor setup the cluster and they have since dropped off. I have switched the owner to an active AD user however stil...

  • 742 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @JoseU,  Ensure that the new owner (active AD user) has the necessary permissions to install libraries on the cluster. This includes being part of the appropriate groups and having the right roles assigned.Double-check the cluster configuration to...

  • 0 kudos
mdsilk77
by New Contributor
  • 427 Views
  • 1 replies
  • 0 kudos

No such file or directory error when accessing Azure Storage Container through Unity Catalog

Hello,I have a Databricks notebook that is attempting to unzip an archive located in Azure Storage Container.  Unity Catalog is setup to provide access to the container, yet I receive the following file not found error:FileNotFoundError: [Errno 2] No...

  • 427 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @mdsilk77,  Ensure that the file path is correctly specified. Sometimes, minor typos or incorrect paths can cause this error. Verify that the path abfss://pii@[REDACTED].dfs.core.windows.net/.../20190501-1.zip is accurate.Databricks provides utili...

  • 0 kudos
Lazloo
by New Contributor III
  • 1562 Views
  • 2 replies
  • 0 kudos

Using spark jars using databricks-connect>=13.0

With the newest version of databricks-connect, I cannot configure the extra jars I want to use. In the older version, I did that viaspark = SparkSession.builder.appName('DataFrame').\ config('spark.jars.packages','org.apache.spark:spark-avro_...

  • 1562 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Lazloo, In the newer versions of Databricks Connect, configuring additional JARs for your Spark session is still possible.   Let’s adapt your previous approach to the latest version.   Adding JARs to a Databricks cluster: If you want to add JAR f...

  • 0 kudos
1 More Replies
mannepk85
by New Contributor II
  • 470 Views
  • 2 replies
  • 0 kudos

Get run details of a databricks job that provides similar data without using api '/api/2.0/jobs/runs

I have a notebook, which is attached to a task at the end of a job. This task will pull the status of all other tasks in the job and checks if they are success or failure. Depending on the result, this last task will send a slack notification (custom...

  • 470 Views
  • 2 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Contributor III
  • 0 kudos

Hi @mannepk85 ,You can take a look on jobs system table. Notice though, that it is in public preview now so use it with caution:  https://learn.microsoft.com/en-us/azure/databricks/admin/system-tables/jobs

  • 0 kudos
1 More Replies
johnp
by New Contributor III
  • 327 Views
  • 1 replies
  • 0 kudos

Get the external public IP of the Job Compute cluster

We just moved our workflow from "all purpose compute cluster" to "job compute cluster". We need to find out the external public IP of the Job Compute cluster.  On the all purpose compute cluster, we get the IP by attaching a notebook and run the comm...

  • 327 Views
  • 1 replies
  • 0 kudos
Latest Reply
johnp
New Contributor III
  • 0 kudos

I found the following IPs from the Cluster JSON file:"driver": {"private_ip": "10.*.*.*","public_dns": "172.*.*.*","node_id": "80*****",Similar the executors configuration"executors": [{"private_ip": "10.*.*.*","public_dns": "172.*.*.*","node_id": "7...

  • 0 kudos
seefoods
by New Contributor III
  • 305 Views
  • 1 replies
  • 1 kudos

Resolved! use dbutils outside a notebook

Hello everyone, I want to use dbtuil function outside my notebook, so i will use it in my external jar.I have add dbutil librairies in my build.sbt file "com.databricks" %% "dbutils-api" % "0.0.6"I have import the librairie on top of my code import c...

  • 305 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @seefoods, In order to use the dbutils functions, you'll need to initialize the dbutils instance. You can do this by adding the following code at the beginning of your Jar's main function:   val dbutils = com.databricks.dbutils_v1.DBUtilsHolder...

  • 1 kudos
Avinash_Narala
by Contributor
  • 389 Views
  • 1 replies
  • 1 kudos

Resolved! Serverless Cluster Issue

Hi,While using Serverless cluster I'm not able to access dbfs files, saying I don't have permission to the file.But while accessing them using All-purpose cluster I'm able to access them.Why am I facing this issue?

  • 389 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @Avinash_Narala,  When you use a Serverless cluster, it’s associated with a Databricks-managed IAM role that accesses AWS resources. However, this role might lack the necessary permissions to access DBFS resources in your account.On the other hand...

  • 1 kudos
Poovarasan
by New Contributor III
  • 1209 Views
  • 7 replies
  • 1 kudos

Error while installing ODBC to shared cluster

I previously used the following script to install and configure the ODBC driver on our shared cluster in Databricks, and it was functioning correctly. However, I am currently experiencing issues where the installation is not working as expected. Plea...

  • 1209 Views
  • 7 replies
  • 1 kudos
Latest Reply
imsabarinath
New Contributor III
  • 1 kudos

The below approach is working for me... I had to download the packages upfront and place it on a volume though.#!/bin/bashset -euxo pipefailecho 'debconf debconf/frontend select Noninteractive' | debconf-set-selectionssudo ACCEPT_EULA=Y dpkg -i odbci...

  • 1 kudos
6 More Replies
Skr7
by New Contributor II
  • 1130 Views
  • 2 replies
  • 0 kudos

Databricks Asset Bundles

Hi, I'm implementing Databricks Asset bundles, my scripts are in GitHub and my /resource has all the .yml of my Databricks workflow which are pointing to the main branch      git_source: git_url: https://github.com/xxxx git_provider: ...

Data Engineering
Databricks
  • 1130 Views
  • 2 replies
  • 0 kudos
Latest Reply
JacekLaskowski
New Contributor III
  • 0 kudos

Why not use Substitutions and Custom variables that can be specified on command line using --var="<key>=<value>"?With all the features your databricks.yml would look as follows:variables:  git_branch:    default: maingit_source:  git_url: https://git...

  • 0 kudos
1 More Replies
PB-Data
by New Contributor III
  • 714 Views
  • 2 replies
  • 1 kudos

right semi join

Hi All,I am having issue running a simple right semi join in my community databricks edition.select * from Y right semi join X on Y.y = X.a;Error : [PARSE_SYNTAX_ERROR] Syntax error at or near 'semi': extra input 'semi'. Not sure what is the issue wi...

  • 714 Views
  • 2 replies
  • 1 kudos
Latest Reply
PB-Data
New Contributor III
  • 1 kudos

Thanks @szymon_dybczak 

  • 1 kudos
1 More Replies
NCat
by New Contributor III
  • 5582 Views
  • 6 replies
  • 0 kudos

ipywidgets: Uncaught RefferenceError require is not defined

Hi,When I tried to use ipywidgets, it returns the following error.I’m using Databricks with PrivateLink enabled on AWS, and Runtime version is 12.2 LTS.Is there something that I need to use ipywidgets in my environment?

CA0045C4-83C6-46FC-95DC-6857199FE69D.jpeg
  • 5582 Views
  • 6 replies
  • 0 kudos
Latest Reply
jvjvjvjvjv
New Contributor II
  • 0 kudos

I am currently experiencing the same error, Azure DataBricks, Runtime version is 15.3 ML, default Notebook Editor.

  • 0 kudos
5 More Replies
Avinash_Narala
by Contributor
  • 666 Views
  • 1 replies
  • 0 kudos

Resolved! Liquid clustering vs partitioning

Hi,Is liquid clustering a replacement to partitioning?should we use still partitioning when we use liquid clustering?Can we use liquid clustering for all cases and ignore partitioning?

  • 666 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Contributor III
  • 0 kudos

Hi @Avinash_Narala Yeah, you can think of it as a partitioning replacement. According with documentation: https://learn.microsoft.com/en-us/azure/databricks/delta/clusteringDelta Lake liquid clustering replaces table partitioning and ZORDER to simpli...

  • 0 kudos
PushkarDeole
by New Contributor III
  • 416 Views
  • 2 replies
  • 0 kudos

State store configuration with applyInPandasWithState for optimal performance

Hello,We are using a stateful pipeline for data processing and analytics. For state store, we are using applyInPandasWithState function however the state needs to be persistent across node restarts etc. At this point, we are not sure how the state ca...

  • 416 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @PushkarDeole, To leverage RocksDB as the state store with `applyInPandasWithState` in Databricks, configure your Spark session with the following setting: spark.conf.set("spark.sql.streaming.stateStore.providerClass", "com.databricks.sql.streamin...

  • 0 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels