Data Engineering

Forum Posts

Sorted by:

Start a conversation

by Srikanth_Gupta_ • Valued Contributor

06-15-2021 5:03:02 AM

588 Views
1 replies
0 kudos

How does Spark SQL Catalyst optimizer work?

How does Catalyst optimizer improves the performances, what is its role?

Data Engineering

588 Views
1 replies
0 kudos

06-15-2021 5:03:02 AM

View Replies

Latest Reply

Srikanth_Gupta_
Valued Contributor

06-15-2021 5:03:55 AM

0 kudos

Catalyst optimizer converts unresolved logical plan into executable physical plan, deep dive is available here

0 kudos

06-15-2021 5:03:55 AM

by Anonymous • Not applicable

06-14-2021 2:55:19 PM

460 Views
0 replies
0 kudos

How should one debug the cause when Attaching to cluster takes a long time? (more than a few minutes)

Data Engineering

460 Views
0 replies
0 kudos

06-14-2021 2:55:19 PM

by User16826992185 • New Contributor II

06-14-2021 10:45:02 AM

718 Views
0 replies
0 kudos

Will Databricks support self-service Web Application Firewalls (WAF) in the future?

It would be great if Databricks could support WAFs in the near future. Our use case is to prevent data downloads and egress except from a few allowed IPs (e.g. company VPN). We do still want to allow workspace login from all IPs however.

Data Engineering

718 Views
0 replies
0 kudos

06-14-2021 10:45:02 AM

by User16826994223 • Honored Contributor III

06-03-2021 11:24:45 PM

734 Views
1 replies
0 kudos

Unable to start cluster Error :- Defunct Resource Detected

Hi AllI am getting this error for some jobs. Can you please let me know what could be the reasonRun result unavailable: job failed with an error message -Run result unavailable: job failed with error messageUnexpected failure while waiting for the cl...

Data Engineering

734 Views
1 replies
0 kudos

06-03-2021 11:24:45 PM

View Replies

Latest Reply

User16826994223
Honored Contributor III

06-14-2021 5:57:46 AM

0 kudos

This is an issue on the cloud level so try to put retries in the job as it happens not for all cluster start , it may fails once but will start after retry,Also, raise a databricks ticket , they will provide permanent solution

0 kudos

06-14-2021 5:57:46 AM

by jose_gonzalez • Moderator

06-04-2021 11:45:57 AM

631 Views
1 replies
0 kudos

How to solve Hive connectivity issues?

I can see connectivity issues in my driver logs. How to solve this issue?

Data Engineering

631 Views
1 replies
0 kudos

06-04-2021 11:45:57 AM

View Replies

Latest Reply

User16826994223
Honored Contributor III

06-14-2021 5:52:37 AM

0 kudos

Can you give us some more error please, I hope you will get more error in logs, whether it is a connection issue because of JDbc URL or host name or password,something like this

0 kudos

06-14-2021 5:52:37 AM

by User15787040559 • New Contributor III

06-07-2021 9:11:09 AM

3708 Views
1 replies
0 kudos

How do I see the java version being used on the cluster?

Environment Tab in the Spark UI

Data Engineering

3708 Views
1 replies
0 kudos

06-07-2021 9:11:09 AM

View Replies

Latest Reply

User16826994223
Honored Contributor III

06-14-2021 5:48:14 AM

0 kudos

In spark UI - > Environment Tab

0 kudos

06-14-2021 5:48:14 AM

by jose_gonzalez • Moderator

06-10-2021 4:55:07 PM

3361 Views
1 replies
0 kudos

Resolved! How to get the size of my Delta table

I would like to know how to get the total size of my Delta table

Data Engineering

3361 Views
1 replies
0 kudos

06-10-2021 4:55:07 PM

View Replies

Latest Reply

jose_gonzalez
Moderator

06-11-2021 3:39:21 PM

0 kudos

The following Kb will show a step by step example on how to get the size of a Delta table https://kb.databricks.com/sql/find-size-of-table.html

0 kudos

06-11-2021 3:39:21 PM

by jose_gonzalez • Moderator

06-10-2021 4:59:06 PM

10357 Views
1 replies
0 kudos

Resolved! error message rpc response (of 20978566 bytes) exceeds limit of 20971520 bytes

Im getting the following error message when trying to use display()Internal error, sorry. Attach your notebook to a different cluster or restart the current cluster.com.databricks.rpc.RPCResponseTooLarge: rpc response (of 20978566 bytes) exceeds limi...

Data Engineering

10357 Views
1 replies
0 kudos

06-10-2021 4:59:06 PM

View Replies

Latest Reply

jose_gonzalez
Moderator

06-11-2021 11:03:05 AM

0 kudos

It seems like the error is coming from the 20MB output limit. For more information please check this https://docs.databricks.com/jobs.html#output-size-limits

0 kudos

06-11-2021 11:03:05 AM

by User16826994223 • Honored Contributor III

06-11-2021 8:04:54 AM

510 Views
0 replies
1 kudos

can I delete any user without userid using SCIM Api , lets say by email Id

I have a case where I don't know the user Id of the user but I have emails of the user whose Id have to delete , is it possible to provide email id in SCIM Api to delete the user Idhttps://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/late...

Data Engineering

510 Views
0 replies
1 kudos

06-11-2021 8:04:54 AM

by Anonymous • Not applicable

06-10-2021 9:12:05 PM

753 Views
1 replies
1 kudos

Resolved! What are the benefits of Databricks? How is it different than Open Source Spark?

Data Engineering

753 Views
1 replies
1 kudos

06-10-2021 9:12:05 PM

View Replies

Latest Reply

Digan_Parikh
Valued Contributor

06-11-2021 4:43:00 AM

1 kudos

High level:Check this out for a detailed comparison - https://databricks.com/spark/comparing-databricks-to-apache-spark

1 kudos

06-11-2021 4:43:00 AM

by Anonymous • Not applicable

06-10-2021 9:16:12 PM

1146 Views
1 replies
0 kudos

Resolved! How long does a task have to be in the queue before the cluster autoscales?:

Data Engineering

1146 Views
1 replies
0 kudos

06-10-2021 9:16:12 PM

View Replies

Latest Reply

Ryan_Chynoweth
Honored Contributor III

06-11-2021 1:46:00 AM

0 kudos

There are two types of auto scaling in Databricks: Standard and Optimized. In both scenarios when tasks are submitted the cluster will begin scaling to execute as many of them in parallel immediately.Scaling down is different. In optimized autoscalin...

0 kudos

06-11-2021 1:46:00 AM

by Anonymous • Not applicable

06-10-2021 9:14:10 PM

3461 Views
1 replies
0 kudos

Resolved! Can we set up alerts on cluster metrics / job failures in Databricks?

Data Engineering

3461 Views
1 replies
0 kudos

06-10-2021 9:14:10 PM

View Replies

Latest Reply

User16019159252
New Contributor III

06-11-2021 12:29:00 AM

0 kudos

Yes, you can alerts - Email alerts sent in case of job failure, success, or timeout. You can set alerts up for job start, job success, and job failure (including skipped jobs), providing multiple comma-separated email addresses for each alert type. Y...

0 kudos

06-11-2021 12:29:00 AM

by Anonymous • Not applicable

06-10-2021 7:27:19 PM

479 Views
0 replies
0 kudos

Using multiple clouds

Are there recommendations and/or examples of leveraging AWS and Azure with Databricks? If so, is there any best practices to follow? Want to ensure we avoid expensive data transfer across clouds

Data Engineering

479 Views
0 replies
0 kudos

06-10-2021 7:27:19 PM

by Anonymous • Not applicable

06-10-2021 7:24:51 PM

964 Views
0 replies
0 kudos

Automatically create folder structure

I imported one workspace into another and noticed there were several instances of RESOURCE_DOES_NOT_EXIST errors because of the folder structure of the workspace (despite importing the workspace as well), see example below:Get: https://dbc-9d482d3a-f...

Data Engineering

964 Views
0 replies
0 kudos

06-10-2021 7:24:51 PM

by Anonymous • Not applicable

06-10-2021 2:59:39 PM

615 Views
1 replies
0 kudos

Resolved! What is the frequency of usage log delivery?

Data Engineering

615 Views
1 replies
0 kudos

06-10-2021 2:59:39 PM

View Replies

Latest Reply

Anonymous
Not applicable

06-10-2021 9:07:00 AM

0 kudos

Hi Brinda, it's daily. https://docs.databricks.com/administration-guide/account-settings/billable-usage-delivery.html#high-level-flow

0 kudos

06-10-2021 9:07:00 AM

User

Count

1601

736

343

284

247

Databricks

Forum Posts

How does Spark SQL Catalyst optimizer work?

How should one debug the cause when Attaching to cluster takes a long time? (more than a few minutes)

Will Databricks support self-service Web Application Firewalls (WAF) in the future?

Unable to start cluster Error :- Defunct Resource Detected

How to solve Hive connectivity issues?

How do I see the java version being used on the cluster?

Resolved! How to get the size of my Delta table

Resolved! error message rpc response (of 20978566 bytes) exceeds limit of 20971520 bytes

can I delete any user without userid using SCIM Api , lets say by email Id

Resolved! What are the benefits of Databricks? How is it different than Open Source Spark?

Resolved! How long does a task have to be in the queue before the cluster autoscales?:

Resolved! Can we set up alerts on cluster metrics / job failures in Databricks?

Using multiple clouds

Automatically create folder structure

Resolved! What is the frequency of usage log delivery?

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...