cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Leszek
by Contributor
  • 11331 Views
  • 6 replies
  • 5 kudos

Resolved! Unity Catalog - Azure account console - how to access?

I'm trying to access account console in Azure but I only can see the list of workspaces and access them. I didn't find documentation about account console for Azure. Do you know how to access account console?

  • 11331 Views
  • 6 replies
  • 5 kudos
Latest Reply
vimalii
New Contributor II
  • 5 kudos

Hello @Leszek​ . Please tell me is it works for you ?Did you find the root cause ?I still don't understand why I should grant to myself some extra permissions if I already global administrator, owner of subscription, owner of databricks workspace but...

  • 5 kudos
5 More Replies
drii_cavalcanti
by New Contributor III
  • 1961 Views
  • 3 replies
  • 0 kudos

Databricks App with DAB

Hi All,I am trying to deploy a DBX APP via DAB, however source_code_path seems not to be parsed correctly to the app configuration.- dbx_dash/-- resources/---- app.yml-- src/---- app.yaml---- app.py-- databricks.ymlresources/app.yml:resources:apps: m...

  • 1961 Views
  • 3 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

Hi Adriana, Have you adjusted the root_path in your databricks.yml Kindly add /Workspace and entire path to the root_path Thanks

  • 0 kudos
2 More Replies
p_romm
by New Contributor III
  • 4183 Views
  • 1 replies
  • 0 kudos

INVALID_HANDLE.SESSION_NOT_FOUND

We run several workflows and tasks parallel using serverless compute. In many different places of code we started to get errors as below. It looks like that when one task fails, every other that run at the same moment fails as well. After retry on on...

  • 4183 Views
  • 1 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

Hi, The error  INVALID_HANDLE.SESSION_NOT_FOUND  https://docs.databricks.com/aws/en/error-messages/invalid-handle-error-class#session_not_foundis a handled error but the grpc errors are something where more improvements are being pushed in eve...

  • 0 kudos
Sega2
by New Contributor III
  • 3746 Views
  • 1 replies
  • 0 kudos

spark.sql makes debugger freeze

I have just created a simple bundle with databricks, and is using Databricks connect to debug locally. This is my script:from pyspark.sql import SparkSession, DataFrame def get_taxis(spark: SparkSession) -> DataFrame: return spark.read.table("samp...

Sega2_0-1739520074229.png Sega2_1-1739520103137.png
  • 3746 Views
  • 1 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

Ensure that your Databricks Connect is properly set up and is using the correct version compatible with your cluster’s runtime. For VS Code, any mismatches between the installed databricks-connect Python package version and the cluster runtime could ...

  • 0 kudos
Vasu_Kumar_T
by Databricks Partner
  • 4779 Views
  • 1 replies
  • 0 kudos

ODI 12C to Databricks equvivalent

Hello All,We are planning to convert from ODI12C to data bricks equivalentsWhat are the steps involved, what are the limitations in this case Thanks,Vasu Kumar T

  • 4779 Views
  • 1 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

Hi, These blogs can help you get an idea on the migration planning and insights. https://www.databricks.com/blog/how-migrate-your-oracle-plsql-code-databricks-lakehouse-platform https://www.databricks.com/blog/databricks-migration-strategy-lessons-le...

  • 0 kudos
alonisser
by Contributor II
  • 1485 Views
  • 2 replies
  • 1 kudos

Very long vacuum on s3

Since we've moved from azure to aws, a specific job has extremely long vacuum runs, is there a specific flag/configuration for the s3 storage that is needed to support faster vacuum.How can I research what's going on?Note, it's not ALL jobs, but a sp...

  • 1485 Views
  • 2 replies
  • 1 kudos
Latest Reply
NandiniN
Databricks Employee
  • 1 kudos

For faster Vacuum run performance, (1) avoid over-partitioned directories (2) avoid concurrent runs (during vacuum command run) (3) avoid enabling S3 versioning (As delta lake itself maintains the history) (4) run periodic “optimize” command,  (5) en...

  • 1 kudos
1 More Replies
biafch
by Contributor
  • 2849 Views
  • 1 replies
  • 0 kudos

spark.sql with CTEs (10 minutes) VS pyspark code + spark.sql (without CTE) (3 seconds), why?

Hello,I have two codes with the exact same outcome, one takes 7-10 minutes to load, and the other takes exactly 3 seconds, and I'm just trying to understand why:This takes 7-10 minutes:F_IntakeStepsPerDay = spark.sql(""" WITH BASE AS ( SELECT ...

  • 2849 Views
  • 1 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

The code is not a apple to apple comparison, and debugging with the help of Spark UI, plan can give a better understanding. But reviewing the code I can see in the PySpark implementation, you explicitly repartition the DataFrame (repartition("JobAppl...

  • 0 kudos
pradeepvatsvk
by New Contributor III
  • 1056 Views
  • 1 replies
  • 0 kudos

Connecting Databricks to react application

Hi team , i want to connect my unity catalog tables to the react application, also we need to write some data back to the tables from react UI, for example  we are having some records which will be checked by the business people and the will approve ...

  • 1056 Views
  • 1 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

A React application cannot directly interface with Unity Catalog for data operations. You can use Databricks APIs or JDBC connections to interact with your Unity Catalog tables. You can also check the REST API and use them in the applications https:/...

  • 0 kudos
minhhung0507
by Valued Contributor
  • 2286 Views
  • 1 replies
  • 0 kudos

Handling Streaming Query Hangs & Delta Upsert Failures in Multi-Table Jobs

Hi Databricks Experts,I'm encountering issues with my streaming jobs in Databricks and need some advice. I’ve implemented a custom streaming query listener to capture job status events and upsert them into a Delta table. However, the solution behaves...

  • 2286 Views
  • 1 replies
  • 0 kudos
Latest Reply
mmayorga
Databricks Employee
  • 0 kudos

Hello Hung, Working with streaming tables is always a challenge. Let's remember we are working with unbounded data so it's important to consider a few points: If you are working with Job, you can define your job cluster for each task. Consider the co...

  • 0 kudos
mehalrathod
by New Contributor II
  • 2390 Views
  • 2 replies
  • 0 kudos

Overwrite to a table taking 12+ hours

One of our Databricks notebook (using python, py-spark) has been running long for 12+ hours specifically on the overwrite command into a table. This notebook along with overwrite step has been completed within 10 mins in the past. But suddenly the ov...

  • 2390 Views
  • 2 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Esteemed Contributor
  • 0 kudos

Hi @mehalrathod This sort of performance regression in Databricks (especially for overwrite) is usually caused by one or more of the following:Common Causes of Overwrite Slowness1. Delta Table History or File Explosion- If the target table is a Delta...

  • 0 kudos
1 More Replies
Linda22
by New Contributor II
  • 7594 Views
  • 7 replies
  • 5 kudos

Can we execute a single task in isolation from a multi task Databricks job

A task may be used to process some data. If we have 10 such tasks in a job and we want to process only a couple of datasets only through a couple of tasks, is that possible? 

  • 7594 Views
  • 7 replies
  • 5 kudos
Latest Reply
slimbnsalah
New Contributor II
  • 5 kudos

Generally available!

  • 5 kudos
6 More Replies
Livingstone
by New Contributor II
  • 5221 Views
  • 5 replies
  • 3 kudos

Install maven package to serverless cluster

My task is to export data from CSV/SQL into Excel format with minimal latency. To achieve this, I used a Serverless cluster.Since PySpark does not support saving in XLSX format, it is necessary to install the Maven package spark-excel_2.12. However, ...

  • 5221 Views
  • 5 replies
  • 3 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 3 kudos

As you stated, ou cannot install Maven packages on Databricks serverless clusters due to restricted library management capabilities. However, there are alternative approaches to export data to Excel with minimal latency. Solutions to Export Excel Fi...

  • 3 kudos
4 More Replies
krishnakmr512
by New Contributor
  • 1443 Views
  • 1 replies
  • 1 kudos

Resolved! Missed my certification Exam Reschedule is required

Hi Team, @data_help @helpdesk @Cert-Team @Cert-TeamOPS I have missed the certification exam schedule due to an emergency situation yesterday, Is there a possibility this can be reschedule to today anytime or tomorrow? I am not able to reschedule opti...

Data Engineering
@Cert-Team
  • 1443 Views
  • 1 replies
  • 1 kudos
Latest Reply
Cert-Team
Databricks Employee
  • 1 kudos

@krishnakmr512 usually the fastest way to get assistance is filing a ticket with our support team. I was able to reschedule your exam to future date. Please log into your account and reschedule to a date and time that suits you.

  • 1 kudos
db_eswar
by New Contributor
  • 3097 Views
  • 2 replies
  • 1 kudos

what is iowait, will it impact performance of my job

One job taking more than 7hrs, when i added below configuration its taking <2:30 mins but after deployment with same parameters again its taking 7+hrs. 1) spark.conf.set("spark.sql.shuffle.partitions", 500) --> spark.conf.set("spark.sql.shuffle.parti...

  • 3097 Views
  • 2 replies
  • 1 kudos
Latest Reply
SP_6721
Honored Contributor II
  • 1 kudos

Hi @db_eswar High iowait in your Spark jobs is probably caused by storage or disk bottlenecks, not CPU or memory issues. The slowdown you're seeing could be due to a cold cache, slower disks, or increased resource usage.To troubleshoot, you can use t...

  • 1 kudos
1 More Replies
Labels