cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

jv_v
by New Contributor III
  • 161 Views
  • 2 replies
  • 1 kudos

Resolved! Azure SCIM Usage and Alternatives for Databricks

Hello Databricks Community,I'm exploring the use of Azure SCIM for our Databricks environment and have a few questions:How is Azure SCIM useful for Databricks? What are the specific benefits or advantages of using SCIM for user and group provisioning...

  • 161 Views
  • 2 replies
  • 1 kudos
Latest Reply
Rishabh_Tiwari
Community Manager
  • 1 kudos

HI @jv_v , Thank you for reaching out to our community! We're here to help you.  To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feedback n...

  • 1 kudos
1 More Replies
AlainT
by New Contributor
  • 51 Views
  • 3 replies
  • 1 kudos

[GCP] Failed to migrate a project onto an organization

Hi,After migrating a project to an organization, we are unable to create a workspace without encountering errors. Previously working workspaces are also failing.I have granted admin/owner access to all users who need Databricks. The latest error invo...

  • 51 Views
  • 3 replies
  • 1 kudos
Latest Reply
AlainT
New Contributor
  • 1 kudos

Hi @Kaniz_Fatma I'm still checking all access and all IAM policies. Because, my question still which are "all necessary domains" and what's "all necessary IAM roles and permissions are correctly assigned" and how to test it.Note that I don't create a...

  • 1 kudos
2 More Replies
Patricckk
by Visitor
  • 9 Views
  • 1 replies
  • 0 kudos

Attributed-Based Access Control

Hi,Over here they are explaining attribute-based-access-controls, which I want to implement in my project but can't find the documentation or the option to create rules myself. Is this feature already available?https://www.databricks.com/dataaisummit...

  • 9 Views
  • 1 replies
  • 0 kudos
Latest Reply
Slash
New Contributor II
  • 0 kudos

Hi @Patricckk ,It's because this feature hasn't been released yet.

  • 0 kudos
Erik_L
by Contributor II
  • 22 Views
  • 1 replies
  • 0 kudos

Workflow scheduler cancel unreliable

Workflow paramtersWarning: 4m 30s | Timeout: 6m 50sThe jobs took 20-50 minutes to cancel.This workflow must have high reliability for our requirements. Does anyone know why the scheduler failed this morning at ~5:20 AM PT?After several failures, we'r...

  • 22 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Erik_L, I’m sorry to hear about the issues you’re facing with the Databricks scheduler. There could be several reasons for the scheduler failure at ~5:20 AM PT.  If your cluster is running out of resources (CPU, memory), it might cause the schedu...

  • 0 kudos
ayush25091995
by New Contributor II
  • 37 Views
  • 1 replies
  • 0 kudos

Get queries history run on UC enabled interactive cluster

Hi Team,I want to derived couple of kpis like most frequent queries, top queries, query type like select, insert or update on UC enabled interactive cluster. I know we can do this for SQL warehouse but what is the way we can do this interactive clust...

  • 37 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @ayush25091995, To get the query history on a Unity Catalog (UC) enabled interactive cluster in Databricks, you can use the system.query_history table. This table provides detailed information about the queries run on the cluster, including the qu...

  • 0 kudos
leungi
by New Contributor III
  • 29 Views
  • 1 replies
  • 0 kudos

Spark Out of Memory Error

BackgroundUsing R language's {sparklyr} package to fetch data from tables in Unity Catalog, and faced the error below.Tried the following, to no avail:Using memory optimized cluster - e.g., E4d.Using bigger (RAM) cluster - e.g., E8d.Enable auto-scali...

  • 29 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @leungi,  Since the error indicates that the total memory usage during row decode exceeds spark.driver.maxResultSize, you might try increasing this value beyond 4.0 GiB.Repartition your data to increase the number of partitions. This can help dist...

  • 0 kudos
vannipart
by New Contributor
  • 28 Views
  • 1 replies
  • 0 kudos

SparkOutOfMemoryError when merging data into a table that already has data

Hello, There is an issue with merging data from a dataframe into a table 2024 databricksJob aborted due to stage failure: Task 17 in stage 1770.0 failed 4 times, most recent failure: Lost task 17.3 in stage 1770.0 (TID 1669) (1x.xx.xx.xx executor 8):...

  • 28 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @vannipart,  You can try increasing the memory allocated to each executor. This can be done by setting the spark.executor.memory configuration. Since your notebook has many DataFrame transformations, ensure that you are caching intermediate DataFr...

  • 0 kudos
JoseU
by Visitor
  • 52 Views
  • 1 replies
  • 0 kudos

Cannot install libraries to cluster

Getting the following error when trying to install libraries to all purpose compute using the Library tab in Cluster details. We had vendor setup the cluster and they have since dropped off. I have switched the owner to an active AD user however stil...

  • 52 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @JoseU,  Ensure that the new owner (active AD user) has the necessary permissions to install libraries on the cluster. This includes being part of the appropriate groups and having the right roles assigned.Double-check the cluster configuration to...

  • 0 kudos
mdsilk77
by Visitor
  • 39 Views
  • 1 replies
  • 0 kudos

No such file or directory error when accessing Azure Storage Container through Unity Catalog

Hello,I have a Databricks notebook that is attempting to unzip an archive located in Azure Storage Container.  Unity Catalog is setup to provide access to the container, yet I receive the following file not found error:FileNotFoundError: [Errno 2] No...

  • 39 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @mdsilk77,  Ensure that the file path is correctly specified. Sometimes, minor typos or incorrect paths can cause this error. Verify that the path abfss://pii@[REDACTED].dfs.core.windows.net/.../20190501-1.zip is accurate.Databricks provides utili...

  • 0 kudos
AndreasB
by New Contributor
  • 37 Views
  • 1 replies
  • 0 kudos

Seeing results of materialized views while running notebooks

Hi!My team is currently trying out Delta Live Tables (DLT) for managing our ETL pipelines. An issue we're encountering is that we have notebooks that transform data using Spark SQL. We include these in a DLT pipeline, and we want to both run the pipe...

  • 37 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @AndreasB, The issue arises because DLT requires the use of the LIVE keyword to track dependencies within the pipeline, but this conflicts with running individual notebooks outside the pipeline context. You can continue using your current workarou...

  • 0 kudos
Linda22
by New Contributor
  • 40 Views
  • 1 replies
  • 0 kudos

Can we execute a single task in isolation from a multi task Databricks job

A task may be used to process some data. If we have 10 such tasks in a job and we want to process only a couple of datasets only through a couple of tasks, is that possible? 

  • 40 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Linda22, Yes, you can execute a single task in isolation from a multi-task Databricks job. To achieve this, you can use the Databricks Workflows feature, which allows you to manage and orchestrate tasks within a job.

  • 0 kudos
vvzadvor
by New Contributor II
  • 77 Views
  • 3 replies
  • 0 kudos

Debugging python code outside of Notebooks

Hi experts,Does anyone know if there's a way of properly debugging python code outside of notebooks?We have a complicated python-based framework for loading files, transforming them according to the business specification and saving the results into ...

  • 77 Views
  • 3 replies
  • 0 kudos
Latest Reply
vvzadvor
New Contributor II
  • 0 kudos

I actually managed to get a permission for using pat token to connect to databricks development environment. And with that I managed to set up VSCode extension for databricks, connect to cluster and create sync location. I can even run spark apps fro...

  • 0 kudos
2 More Replies
Lazloo
by New Contributor III
  • 947 Views
  • 2 replies
  • 0 kudos

Using spark jars using databricks-connect>=13.0

With the newest version of databricks-connect, I cannot configure the extra jars I want to use. In the older version, I did that viaspark = SparkSession.builder.appName('DataFrame').\ config('spark.jars.packages','org.apache.spark:spark-avro_...

  • 947 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Lazloo, In the newer versions of Databricks Connect, configuring additional JARs for your Spark session is still possible.   Let’s adapt your previous approach to the latest version.   Adding JARs to a Databricks cluster: If you want to add JAR f...

  • 0 kudos
1 More Replies
hayden_blair
by New Contributor
  • 45 Views
  • 1 replies
  • 0 kudos

Why Shared Access Mode for Unity Catalog enabled DLT pipeline?

Hello all,I am trying to use an RDD API in a Unity Catalog enabled Delta Live Tables pipeline.I am getting an error because Unity Catalog enabled DLT can only run on "shared access mode" compute, and RDD APIs are not supported on shared access comput...

  • 45 Views
  • 1 replies
  • 0 kudos
Latest Reply
Slash
New Contributor II
  • 0 kudos

Hi @hayden_blair ,The error you are encountering is related to Py4J security settings in Apache Spark. In Shared access mode, Py4J security is enabled by default for security reasons, which restricts certain methods from being called on the Spark RDD...

  • 0 kudos
mannepk85
by New Contributor
  • 89 Views
  • 2 replies
  • 0 kudos

Get run details of a databricks job that provides similar data without using api '/api/2.0/jobs/runs

I have a notebook, which is attached to a task at the end of a job. This task will pull the status of all other tasks in the job and checks if they are success or failure. Depending on the result, this last task will send a slack notification (custom...

  • 89 Views
  • 2 replies
  • 0 kudos
Latest Reply
Slash
New Contributor II
  • 0 kudos

Hi @mannepk85 ,You can take a look on jobs system table. Notice though, that it is in public preview now so use it with caution:  https://learn.microsoft.com/en-us/azure/databricks/admin/system-tables/jobs

  • 0 kudos
1 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels