Data Engineering

Forum Posts

Sorted by:

by sanjay • Valued Contributor II

06-02-2023 12:00:04 AM

3299 Views
2 replies
1 kudos

Resolved! How can I prioritize message in autoloader

Hi,I am using autoloader, it picks data from AWS S3 and stores in delta table. In case there are large number of messages, I like to process messages by priority. Is it possible to prioritize messages in autoloader.Regards,Sanjay

Data Engineering

3299 Views
2 replies
1 kudos

06-02-2023 12:00:04 AM

View Replies

Latest Reply

sanjay
Valued Contributor II

06-02-2023 4:47:44 AM

1 kudos

Thank you Sandeep. Other option is I can keep messages in 2 different folders in S3. Can autoloader read message from multiple folders

1 kudos

06-02-2023 4:47:44 AM

1 More Replies

by pauloquantile • Databricks Partner

05-23-2023 5:17:32 AM

7038 Views
8 replies
0 kudos

Resolved! Disable scheduling of notebooks

Hi,We are wondering if it is possible to disable the possibility to disable scheduling of a notebook. A client wants to allow many analysts access to databricks, but a concern is the possibility of setting schedules (the fastest is every minute!). Is...

Data Engineering

7038 Views
8 replies
0 kudos

05-23-2023 5:17:32 AM

View Replies

Latest Reply

Anonymous
Not applicable

06-01-2023 1:42:58 AM

0 kudos

Hi @Paulo Rijnberg Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedba...

0 kudos

06-01-2023 1:42:58 AM

7 More Replies

by deep_thought • Contributor

12-18-2022 9:27:54 PM

31260 Views
16 replies
9 kudos

Resolved! Schedule job to run sequentially after another job

Is there a way to schedule a job to run after some other job is complete?E.g. Schedule Job A, then upon it's completion run Job B.

Data Engineering

31260 Views
16 replies
9 kudos

12-18-2022 9:27:54 PM

View Replies

Latest Reply

claytonseverson
Databricks Employee

06-01-2023 9:44:41 PM

9 kudos

Here is the User Guide for Jobs-as-Tasks - https://docs.google.com/document/d/1OJsc-g7IwAJjYooCp7T01Rxyt_xFkMPjmAAGdDGPkY4/edit#heading=h.oudvb5fyfd0n

9 kudos

06-01-2023 9:44:41 PM

15 More Replies

by vladcrisan • New Contributor II

10-04-2022 11:46:57 PM

6647 Views
5 replies
1 kudos

Can Spark History server be created in Databricks?

We have a Spark pipeline producing more than 3k Spark jobs. After the pipeline finishes and the cluster shuts down, only a subset (<1k) of these can be recovered from the Spark UI.We would like to have access to the full Spark UI after the pipeline t...

Data Engineering

6647 Views
5 replies
1 kudos

10-04-2022 11:46:57 PM

View Replies

Latest Reply

Sandeep
Databricks Employee

06-02-2023 3:29:54 AM

1 kudos

@Vlad Crisan , you can use the Databricks clusters to replay the events. Please follow this kb: https://kb.databricks.com/clusters/replay-cluster-spark-eventsNote: Please spin up a cluster with version 10.4 LTS.

1 kudos

06-02-2023 3:29:54 AM

4 More Replies

by yunna_wei • Databricks Employee

06-02-2023 2:44:46 AM

1927 Views
0 replies
3 kudos

In any Spark application, Spark driver plays a critical role and performs the following functions: 1. Initiating a Spark Session 2. Communicating with...

In any Spark application, Spark driver plays a critical role and performs the following functions:1. Initiating a Spark Session2. Communicating with the cluster manager to request resources (CPU, memory, etc) from the cluster manager for Spark's exec...

Data Engineering

1927 Views
0 replies
3 kudos

06-02-2023 2:44:46 AM

by nav • New Contributor II

03-02-2023 8:00:56 PM

7344 Views
8 replies
0 kudos

R packages not getting installed on cluster when creating cluster from dockerfile

I'm trying to use dockerfile to create a cluster which has Robyn (https://facebookexperimental.github.io/Robyn/) and other R libraries installed. But it is failing to install the R libraries to the cluster. When I run the container in interactive mod...

Data Engineering

7344 Views
8 replies
0 kudos

03-02-2023 8:00:56 PM

View Replies

Latest Reply

workingtogetdbw
New Contributor II

06-01-2023 10:32:40 AM

0 kudos

What there has been no answer here! @Debayan Mukherjee @Vartika Nain So I am running into this same problem as the idea of having to wait 45 minutes for libraries to install is absolutely wild as well as I have done everything outside of working w...

0 kudos

06-01-2023 10:32:40 AM

7 More Replies

by Dave_Nithio • Contributor II

11-01-2022 2:03:11 PM

10583 Views
1 replies
3 kudos

Delta Live Table Schema Error

I'm using Delta Live Tables to load a set of csv files in a directory. I am pre-defining the schema to avoid issues with schema inference. This works with autoloader on a regular delta table, but is failing for Delta Live Tables. Below is an example ...

Data Engineering

10583 Views
1 replies
3 kudos

11-01-2022 2:03:11 PM

View Replies

Latest Reply

shagun
New Contributor III

06-01-2023 6:39:19 AM

3 kudos

i was facing similar issue in loading json files through autoloader for delta live tables.Was able to fix with this option .option("cloudFiles.inferColumnTypes", "True")From the docs "For formats that don’t encode data types (JSON and CSV), Auto Load...

3 kudos

06-01-2023 6:39:19 AM

by Kannan1206 • New Contributor II

05-20-2023 8:09:03 PM

2835 Views
4 replies
0 kudos

Databricks Certification Exam Got Suspended. Need help in resolving the issue

Hi Team,I have taken online exam for Databricks Certified Associate Developer for Apache Spark 3.0 - Python on 21-May-2023 6:30 , In between the exam my session got suspended. by proctor eventhough I was in my seat and looking at camera . Again I cou...

Data Engineering

2835 Views
4 replies
0 kudos

05-20-2023 8:09:03 PM

View Replies

Latest Reply

Kannan1206
New Contributor II

06-01-2023 3:41:24 AM

0 kudos

Hi @Vidula Khanna , I got the relevant details from the team , was able to complete the certification as well . Thanks for help .

0 kudos

06-01-2023 3:41:24 AM

3 More Replies

by sindh • New Contributor II

05-25-2023 5:32:01 AM

2695 Views
3 replies
0 kudos

session suspended , for the databricks exam , how to restart it.

session suspended , please enable launch option

Data Engineering

2695 Views
3 replies
0 kudos

05-25-2023 5:32:01 AM

View Replies

Latest Reply

Anonymous
Not applicable

06-01-2023 1:41:15 AM

0 kudos

Hi @sindhu goyal Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedback...

0 kudos

06-01-2023 1:41:15 AM

2 More Replies

by Enzo_Bahrami • New Contributor III

05-25-2023 5:53:30 PM

4809 Views
2 replies
0 kudos

Resolved! Input File Path from Autoloader in Delta Live Tables

Hello everyone!I was wondering if there is any way to get the subdirectories in which the file resides while loading while loading using Autoloader with DLT. For example:def customer(): return ( spark.readStream.format('cloudfiles') .option('clou...

Data Engineering

4809 Views
2 replies
0 kudos

05-25-2023 5:53:30 PM

View Replies

Latest Reply

Anonymous
Not applicable

06-01-2023 1:37:22 AM

0 kudos

Hi @Parsa Bahraminejad We haven't heard from you since the last response from @Vigneshraja Palaniraj , and I was checking back to see if her suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be...

0 kudos

06-01-2023 1:37:22 AM

1 More Replies

by ros • New Contributor III

05-31-2023 12:47:59 AM

3697 Views
2 replies
2 kudos

merge vs MERGE INTO

from 10.4 LTS version we have low shuffle merge, so merge is more faster. But what about MERGE INTO function that we run in sql notebook of databricks. Is there any performance difference when we use databrciks pyspark ".merge" function vs databricks...

Data Engineering

3697 Views
2 replies
2 kudos

05-31-2023 12:47:59 AM

View Replies

Latest Reply

Anonymous
Not applicable

06-01-2023 12:10:35 AM

2 kudos

Hi @Roshan RC Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers you...

2 kudos

06-01-2023 12:10:35 AM

1 More Replies

by erickeniuk • New Contributor II

05-30-2023 11:47:15 AM

3429 Views
2 replies
1 kudos

Search for Databricks Jobs By Name

The Databricks CLI has the ability to list jobs by exact name using “Databricks jobs list —name my_job”. Is there a way to search for jobs using this same method, where I could put a partial name of a job and get all the jobs that match? Ex: “databri...

Data Engineering

3429 Views
2 replies
1 kudos

05-30-2023 11:47:15 AM

View Replies

Latest Reply

Anonymous
Not applicable

05-31-2023 8:17:40 PM

1 kudos

Hi @Eric Keniuk Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers y...

1 kudos

05-31-2023 8:17:40 PM

1 More Replies

by Nishant1307056 • New Contributor

05-31-2023 7:05:53 AM

1421 Views
0 replies
0 kudos

I have completed the "Lakehouse Fundamentals" course and assessment and received the certificate instantly. How long will it take for the Ba...

I have completed the "Lakehouse Fundamentals" course and assessment and received the certificate instantly. How long will it take for the Badge to generate or What is the process to get it??

Data Engineering

1421 Views
0 replies
0 kudos

05-31-2023 7:05:53 AM

by vijaykumarbotla • New Contributor III

05-29-2023 9:15:23 AM

6686 Views
5 replies
1 kudos

Resolved! Getting error : Analysis Exception : olumn Is There a PO#17748 are ambiguous. It's probably because you joined several Datasets together, and some of these Datasets are the same. This column points to one of the Datasets but Spark.

AnalysisException: Column Is There a PO#17748 are ambiguous. It's probably because you joined several Datasets together, and some of these Datasets are the same. This column points to one of the Datasets but Spark is unable to figure out which one. ...

Data Engineering

6686 Views
5 replies
1 kudos

05-29-2023 9:15:23 AM

View Replies

Latest Reply

vijaykumarbotla
New Contributor III

05-31-2023 6:56:24 AM

1 kudos

Hi All,the solution for this problem is very strange.this has caused due to the version of the Databricks runtime.We are using Runtime version 7.0 with Apache Spark 3.0.0 version.In PRD we are using Runtime version 11.3LTS with Apache Spark 3.3.0 ver...

1 kudos

05-31-2023 6:56:24 AM

4 More Replies

by darioAnt • New Contributor II

05-31-2023 1:37:33 AM

2585 Views
1 replies
2 kudos

Filtering delta table by CONCAT of a partition column and a non-partition one

Hi,I know how filtering a delta table on a partition column is a very powerful time-saving approach, but what if this column appears as a CONCAT in the where-clause?I explain my case: I have a delta table with only one partition column, say called co...

Data Engineering

2585 Views
1 replies
2 kudos

05-31-2023 1:37:33 AM

View Replies

Latest Reply

darioAnt
New Contributor II

05-31-2023 6:21:20 AM

2 kudos

I did myself a test and the answer is no:with a Concat filter, spark sql does not know I am using a partition-based column, so it scan all the table.

2 kudos

05-31-2023 6:21:20 AM

Databricks Community

Forum Posts

Resolved! How can I prioritize message in autoloader

Resolved! Disable scheduling of notebooks

Resolved! Schedule job to run sequentially after another job

Can Spark History server be created in Databricks?

In any Spark application, Spark driver plays a critical role and performs the following functions: 1. Initiating a Spark Session 2. Communicating with...

R packages not getting installed on cluster when creating cluster from dockerfile

Delta Live Table Schema Error

Databricks Certification Exam Got Suspended. Need help in resolving the issue

session suspended , for the databricks exam , how to restart it.

Resolved! Input File Path from Autoloader in Delta Live Tables

merge vs MERGE INTO

Search for Databricks Jobs By Name

I have completed the "Lakehouse Fundamentals" course and assessment and received the certificate instantly. How long will it take for the Ba...

Resolved! Getting error : Analysis Exception : olumn Is There a PO#17748 are ambiguous. It's probably because you joined several Datasets together, and some of these Datasets are the same. This column points to one of the Datasets but Spark.

Filtering delta table by CONCAT of a partition column and a non-partition one

Databricks to Salesforce Core (Not cloud)

Databricks optimization for query perfomance and p...

Parametrize the DLT pipeline for dynamic loading o...

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...