Data Engineering

Forum Posts

Sorted by:

by Phani1 • Databricks MVP

03-01-2023 9:40:00 PM

4242 Views
3 replies
0 kudos

Performance issue while loading bulk data into Post Gress DB from data bricks.

We are facing a performance issue while loading bulk data into Postgress DB from data bricks. We are using spark JDBC connections to move the data. However, the rate of transfer is very low which is causing performance bottleneck. is there any better...

Data Engineering

4242 Views
3 replies
0 kudos

03-01-2023 9:40:00 PM

View Replies

Latest Reply

User16502773013
Databricks Employee

03-29-2023 7:30:59 PM

0 kudos

Hello @Janga Reddy @Daniel Sahal and @Vidula Khanna ,To enhance performance in general we need to design for more parallelism, in Spark JDBC context this controlled by the number of partitions for the data to be writtenThe example here shows how t...

0 kudos

03-29-2023 7:30:59 PM

2 More Replies

by Avvar2022 • Contributor

03-26-2023 7:52:01 AM

4764 Views
2 replies
2 kudos

Resolved! I am new to data bricks. setting up Workspace for NON-prod environment Separate workspaces for DEV, QA or Just one work space for NON-prod ?

What i learned based on learning materials, documents, etc.. For data bricks it is a good practice to set up 1 non-prod workspace but separate clusters for Dev, QA, SIT, etc.Is it best practice to set up only 1 NON-PROD Workspace instead of separate ...

Databricks non-prod workspace set up options

Data Engineering

4764 Views
2 replies
2 kudos

03-26-2023 7:52:01 AM

View Replies

Latest Reply

Avvar2022
Contributor

03-29-2023 4:07:43 PM

2 kudos

Thank you. This helps.

2 kudos

03-29-2023 4:07:43 PM

1 More Replies

by Arnold_Souza • New Contributor III

03-17-2023 7:24:11 PM

6098 Views
4 replies
2 kudos

SAT - Security Analysis Tool implementation error

I want to implement SAT in my workspace account. I was able to execute the terraform that enable the necessary infra to work on that. When I try to execute the workflow "SAT Initializer Notebook (one-time)" it fails with the error:AnalysisException: ...

Data Engineering

6098 Views
4 replies
2 kudos

03-17-2023 7:24:11 PM

View Replies

Latest Reply

Anonymous
Not applicable

03-25-2023 4:17:33 AM

2 kudos

Hi @Arnold Souza Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers ...

2 kudos

03-25-2023 4:17:33 AM

3 More Replies

by Hubert-Dudek • Databricks MVP

03-29-2023 9:52:55 AM

4795 Views
1 replies
7 kudos

SQL cells in databricks notebooks can now be run in parallel, which means faster query processing and analysis. This new feature is especially helpful...

SQL cells in databricks notebooks can now be run in parallel, which means faster query processing and analysis. This new feature is especially helpful for queries that take longer to run or analyze large datasets. With parallel processing, Databricks...

Data Engineering

4795 Views
1 replies
7 kudos

03-29-2023 9:52:55 AM

View Replies

Latest Reply

Rishabh-Pandey
Databricks MVP

03-29-2023 11:57:51 AM

7 kudos

Informative

7 kudos

03-29-2023 11:57:51 AM

by oleole • Contributor

03-28-2023 1:36:30 PM

16668 Views
1 replies
1 kudos

Resolved! MERGE to update a column of a table using Spark SQL

Coming from MS SQL background, I'm trying to write a query in Spark SQL that simply update a column value of table A (source table) by INNER JOINing a new table B with a filter.MS SQL query looks like this:UPDATE T SET T.OfferAmount = OSE.EndpointEve...

Data Engineering

16668 Views
1 replies
1 kudos

03-28-2023 1:36:30 PM

View Replies

Latest Reply

oleole
Contributor

03-28-2023 8:21:37 PM

1 kudos

Posting answer to my question: MERGE into TempOffer VIEW USING OfferSeq OSE ON VIEW.OfferId = OSE.OfferID AND OSE.OfferId = 1 WHEN MATCHED THEN UPDATE set VIEW.OfferAmount = OSE.EndpointEventAmountValue;

1 kudos

03-28-2023 8:21:37 PM

by RyanHager • Contributor

02-27-2023 6:26:13 AM

3721 Views
5 replies
2 kudos

Is there a stream / Kafka topic that we can connect to for monitoring all Databricks jobs/workflows (create/status update/fail/error/complete)?

Currently we are creating and monitoring jobs using the api. This results in a lot of polling of the API for job status. Is there a Kafka stream, we could listen to get jobs updates and significantly reduce the number of calls to the Databricks jobs...

Data Engineering

3721 Views
5 replies
2 kudos

02-27-2023 6:26:13 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-12-2023 9:42:51 PM

2 kudos

Hi @Ryan Hager Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we...

2 kudos

03-12-2023 9:42:51 PM

4 More Replies

by Ramana • Valued Contributor

03-27-2023 3:13:37 PM

4087 Views
3 replies
3 kudos

Resolved! How do we set spark_version in cluster policies to select the latest GPU ML LTS version as defaultValue?

Currently, I use the below two different JSON snippets to choose either Standard or ML runtime. Similar to the below, what is the defaultValue for spark_version to select the latest GPU ML LTS runtime version? "spark_version": { "type": "regex", "p...

Data Engineering

4087 Views
3 replies
3 kudos

03-27-2023 3:13:37 PM

View Replies

Latest Reply

LandanG
Databricks Employee

03-27-2023 5:26:15 PM

3 kudos

Hi @Ramana Kancharana ,As of right now these options are only available for non-GPU DBRs

3 kudos

03-27-2023 5:26:15 PM

2 More Replies

by irfanaziz • Contributor II

08-10-2022 10:31:58 PM

5590 Views
1 replies
3 kudos

TimestampFormat issue

The databricks notebook failed yesterday due to timestamp format issue. error:"SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: Fail to parse '2022-08-10 00:00:14.2760000' in the new parser. You can set spark.s...

Data Engineering

5590 Views
1 replies
3 kudos

08-10-2022 10:31:58 PM

View Replies

Latest Reply

searchs
New Contributor II

03-28-2023 1:25:13 PM

3 kudos

You must have solved this issue by now but for the sake of those that encounter this again, here's the solution that worked for me:spark.sql("set spark.sql.legacy.timeParserPolicy=LEGACY")

3 kudos

03-28-2023 1:25:13 PM

by yzhang • Contributor

03-27-2023 4:04:49 PM

4247 Views
5 replies
0 kudos

Cannot find such info if Databricks supports nested jobs or tasks. For example, I have a 'job_a', which contains list of tasks, and another &#...

Cannot find such info if Databricks supports nested jobs or tasks. For example, I have a 'job_a', which contains list of tasks, and another 'job_b', also contains a list of tasks. Now I'd like to have a 'job_all' that will run both 'job_a' and 'job_b...

Data Engineering

4247 Views
5 replies
0 kudos

03-27-2023 4:04:49 PM

View Replies

Latest Reply

Anonymous
Not applicable

03-27-2023 9:08:45 PM

0 kudos

Hi @Yanan Zhang Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the response and select the one that best answers yo...

0 kudos

03-27-2023 9:08:45 PM

4 More Replies

by Chris_Shehu • Valued Contributor III

03-22-2023 3:20:27 PM

6090 Views
4 replies
2 kudos

Resolved! No Explicit Deny for User security configurations at the group level?

Currently when you add new users to the Databricks workspace they get added to a "Users" group that has full access to the workspace. There should be a way to use group security to explicitly deny access to those same settings. This setting should ov...

Data Engineering

6090 Views
4 replies
2 kudos

03-22-2023 3:20:27 PM

View Replies

Latest Reply

Anonymous
Not applicable

03-28-2023 3:14:06 AM

2 kudos

@dean james I am not sure about your case why you want to deny access to the group once you create it. Anyhow, we can use deacticate/activate an user using "2.0/preview/scim/v2/Users/{id}" rest API endpoint. We can also deactivate users that have no...

2 kudos

03-28-2023 3:14:06 AM

3 More Replies

by andrew0117 • Contributor

03-26-2023 9:04:50 PM

6487 Views
4 replies
0 kudos

Resolved! Can merge() function be applied to dataframe?

if I have two dataframes df_target and df_source, can I do df_target.as("t).merge(df_source.as("s"), "s.id=t.id").whenMatched().updateAll().whenNotMatched.insertAll.execute(). when I tried the code above, I got the error "merge is not a member of the...

Data Engineering

6487 Views
4 replies
0 kudos

03-26-2023 9:04:50 PM

View Replies

Latest Reply

Anonymous
Not applicable

03-27-2023 9:10:57 PM

0 kudos

Hi @andrew li Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

0 kudos

03-27-2023 9:10:57 PM

3 More Replies

by JJL • New Contributor II

03-21-2022 1:51:28 PM

20586 Views
3 replies
3 kudos

Resolved! Does Spark SQL can perform UPDATE with INNER JOIN and LIKE with '%' + [column] + '%' ?

Hi All,I came from MS SQL and just started to learning more about Spark SQLHere is one part that I'm trying to perform. In MS SQL, it can be easily done, but it seems like it doesn't in SparkSo, I want to make a simple update to the record, if the co...

Data Engineering

20586 Views
3 replies
3 kudos

03-21-2022 1:51:28 PM

View Replies

Latest Reply

oleole
Contributor

03-27-2023 9:58:31 PM

3 kudos

@Hubert Dudek Hello, I'm having the same issue with using UPDATE in spark sql and came across your answer. When you say "replace source_table_reference with view" in MERGE, do you mean to replace "P" with "VIEW" that looks something as below:%sql ME...

3 kudos

03-27-2023 9:58:31 PM

2 More Replies

by Anonymous • Not applicable

03-20-2023 3:49:35 PM

6396 Views
1 replies
1 kudos

Databricks-connect configured with service principal token but unable to retrieve information to local machine

installed databricks-connect and configured with service principal token, able to start cluster when I use command spark=SparkSession\.builder\.getOrCreate() But when trying to retrieve s3 bucket data to local machine or even i run test command ex...

Data Engineering

6396 Views
1 replies
1 kudos

03-20-2023 3:49:35 PM

View Replies

Latest Reply

Anonymous
Not applicable

03-27-2023 10:09:06 PM

1 kudos

Hi @divya08Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

1 kudos

03-27-2023 10:09:06 PM

by Gaurav_Raj • New Contributor III

03-26-2023 10:08:06 PM

4027 Views
3 replies
3 kudos

Resolved! Lakehouse Fundamentals Accreditation Badge not received after the course completion

I completed the Databricks Lakehouse Fundamentals Accreditation course today, but I didn't receive my badge yet.I even checked in: https://credentials.databricks.com/ but shows no record/ credentials. see the screenshot below. Please help me out with...

Data Engineering

4027 Views
3 replies
3 kudos

03-26-2023 10:08:06 PM

View Replies

Latest Reply

Anonymous
Not applicable

03-27-2023 8:53:12 PM

3 kudos

Hi @Gaurav Raj Thank you for posting your question in our community! We are happy to assist you.Every best answer marked contributes to the growth and success of our community.Regards

3 kudos

03-27-2023 8:53:12 PM

2 More Replies

by RengarLee • Contributor

06-01-2022 12:37:41 AM

13270 Views
10 replies
3 kudos

Resolved! Databricks write to Azure Data Explorer writes suddenly become slower

Now, I write to Azure Data explorer using Spark streaming. one day， writes suddenly become slower. restart is no effect.I have a questions about Spark Streaming to Azure Data explorer.Q1: What should I do to get performance to reply?Figure 1 shows th...

Data Engineering

13270 Views
10 replies
3 kudos

06-01-2022 12:37:41 AM

View Replies

Latest Reply

RengarLee
Contributor

03-27-2023 7:18:45 PM

3 kudos

I'm so sorry, I just thought the issue wasn't resolvedSolutionSet maxFilesPerTrigger and maxBytesPerTrigger Enable autpoptimizeReason for the first day, it processes larger files and then eventually process smaller files。Detailed reason B...

3 kudos

03-27-2023 7:18:45 PM

9 More Replies

Databricks Community

Forum Posts

Performance issue while loading bulk data into Post Gress DB from data bricks.

Resolved! I am new to data bricks. setting up Workspace for NON-prod environment Separate workspaces for DEV, QA or Just one work space for NON-prod ?

SAT - Security Analysis Tool implementation error

SQL cells in databricks notebooks can now be run in parallel, which means faster query processing and analysis. This new feature is especially helpful...

Resolved! MERGE to update a column of a table using Spark SQL

Is there a stream / Kafka topic that we can connect to for monitoring all Databricks jobs/workflows (create/status update/fail/error/complete)?

Resolved! How do we set spark_version in cluster policies to select the latest GPU ML LTS version as defaultValue?

TimestampFormat issue

Cannot find such info if Databricks supports nested jobs or tasks. For example, I have a 'job_a', which contains list of tasks, and another &#...

Resolved! No Explicit Deny for User security configurations at the group level?

Resolved! Can merge() function be applied to dataframe?

Resolved! Does Spark SQL can perform UPDATE with INNER JOIN and LIKE with '%' + [column] + '%' ?

Databricks-connect configured with service principal token but unable to retrieve information to local machine

Resolved! Lakehouse Fundamentals Accreditation Badge not received after the course completion

Resolved! Databricks write to Azure Data Explorer writes suddenly become slower

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template