cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Phani1
by Databricks MVP
  • 4242 Views
  • 3 replies
  • 0 kudos

Performance issue while loading bulk data into Post Gress DB from data bricks.

We are facing a performance issue while loading bulk data into Postgress DB from data bricks. We are using spark JDBC connections to move the data. However, the rate of transfer is very low which is causing performance bottleneck. is there any better...

  • 4242 Views
  • 3 replies
  • 0 kudos
Latest Reply
User16502773013
Databricks Employee
  • 0 kudos

Hello @Janga Reddy​ @Daniel Sahal​ and @Vidula Khanna​ ,To enhance performance in general we need to design for more parallelism, in Spark JDBC context this controlled by the number of partitions for the data to be writtenThe example here shows how t...

  • 0 kudos
2 More Replies
Avvar2022
by Contributor
  • 4764 Views
  • 2 replies
  • 2 kudos

Resolved! I am new to data bricks. setting up Workspace for NON-prod environment Separate workspaces for DEV, QA or Just one work space for NON-prod ?

What i learned based on learning materials, documents, etc.. For data bricks it is a good practice to set up 1 non-prod workspace but separate clusters for Dev, QA, SIT, etc.Is it best practice to set up only 1 NON-PROD Workspace instead of separate ...

Databricks non-prod workspace set up options
  • 4764 Views
  • 2 replies
  • 2 kudos
Latest Reply
Avvar2022
Contributor
  • 2 kudos

Thank you. This helps.

  • 2 kudos
1 More Replies
Arnold_Souza
by New Contributor III
  • 6098 Views
  • 4 replies
  • 2 kudos

SAT - Security Analysis Tool implementation error

I want to implement SAT in my workspace account. I was able to execute the terraform that enable the necessary infra to work on that. When I try to execute the workflow "SAT Initializer Notebook (one-time)" it fails with the error:AnalysisException: ...

1 2
  • 6098 Views
  • 4 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Arnold Souza​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers ...

  • 2 kudos
3 More Replies
Hubert-Dudek
by Databricks MVP
  • 4795 Views
  • 1 replies
  • 7 kudos

SQL cells in databricks notebooks can now be run in parallel, which means faster query processing and analysis. This new feature is especially helpful...

SQL cells in databricks notebooks can now be run in parallel, which means faster query processing and analysis. This new feature is especially helpful for queries that take longer to run or analyze large datasets. With parallel processing, Databricks...

paraler
  • 4795 Views
  • 1 replies
  • 7 kudos
Latest Reply
Rishabh-Pandey
Databricks MVP
  • 7 kudos

Informative ​

  • 7 kudos
oleole
by Contributor
  • 16668 Views
  • 1 replies
  • 1 kudos

Resolved! MERGE to update a column of a table using Spark SQL

Coming from MS SQL background, I'm trying to write a query in Spark SQL that simply update a column value of table A (source table) by INNER JOINing a new table B with a filter.MS SQL query looks like this:UPDATE T SET T.OfferAmount = OSE.EndpointEve...

  • 16668 Views
  • 1 replies
  • 1 kudos
Latest Reply
oleole
Contributor
  • 1 kudos

Posting answer to my question:   MERGE into TempOffer VIEW USING OfferSeq OSE ON VIEW.OfferId = OSE.OfferID AND OSE.OfferId = 1 WHEN MATCHED THEN UPDATE set VIEW.OfferAmount = OSE.EndpointEventAmountValue;

  • 1 kudos
RyanHager
by Contributor
  • 3721 Views
  • 5 replies
  • 2 kudos

Is there a stream / Kafka topic that we can connect to for monitoring all Databricks jobs/workflows (create/status update/fail/error/complete)?

Currently we are creating and monitoring jobs using the api. This results in a lot of polling of the API for job status. Is there a Kafka stream, we could listen to get jobs updates and significantly reduce the number of calls to the Databricks jobs...

  • 3721 Views
  • 5 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Ryan Hager​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we...

  • 2 kudos
4 More Replies
Ramana
by Valued Contributor
  • 4087 Views
  • 3 replies
  • 3 kudos

Resolved! How do we set spark_version in cluster policies to select the latest GPU ML LTS version as defaultValue?

Currently, I use the below two different JSON snippets to choose either Standard or ML runtime. Similar to the below, what is the defaultValue for spark_version to select the latest GPU ML LTS runtime version? "spark_version": {  "type": "regex",  "p...

  • 4087 Views
  • 3 replies
  • 3 kudos
Latest Reply
LandanG
Databricks Employee
  • 3 kudos

Hi @Ramana Kancharana​ ,As of right now these options are only available for non-GPU DBRs

  • 3 kudos
2 More Replies
irfanaziz
by Contributor II
  • 5590 Views
  • 1 replies
  • 3 kudos

TimestampFormat issue

The databricks notebook failed yesterday due to timestamp format issue. error:"SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: Fail to parse '2022-08-10 00:00:14.2760000' in the new parser. You can set spark.s...

  • 5590 Views
  • 1 replies
  • 3 kudos
Latest Reply
searchs
New Contributor II
  • 3 kudos

You must have solved this issue by now but for the sake of those that encounter this again, here's the solution that worked for me:spark.sql("set spark.sql.legacy.timeParserPolicy=LEGACY")

  • 3 kudos
yzhang
by Contributor
  • 4247 Views
  • 5 replies
  • 0 kudos

Cannot find such info if Databricks supports nested jobs or tasks. For example, I have a 'job_a', which contains list of tasks, and another &#...

Cannot find such info if Databricks supports nested jobs or tasks. For example, I have a 'job_a', which contains list of tasks, and another 'job_b', also contains a list of tasks. Now I'd like to have a 'job_all' that will run both 'job_a' and 'job_b...

  • 4247 Views
  • 5 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Yanan Zhang​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the response and select the one that best answers yo...

  • 0 kudos
4 More Replies
Chris_Shehu
by Valued Contributor III
  • 6090 Views
  • 4 replies
  • 2 kudos

Resolved! No Explicit Deny for User security configurations at the group level?

Currently when you add new users to the Databricks workspace they get added to a "Users" group that has full access to the workspace. There should be a way to use group security to explicitly deny access to those same settings. This setting should ov...

image image image
  • 6090 Views
  • 4 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@dean james​ I am not sure about your case why you want to deny access to the group once you create it. Anyhow, we can use deacticate/activate an user using "2.0/preview/scim/v2/Users/{id}" rest API endpoint. We can also deactivate users that have no...

  • 2 kudos
3 More Replies
andrew0117
by Contributor
  • 6487 Views
  • 4 replies
  • 0 kudos

Resolved! Can merge() function be applied to dataframe?

if I have two dataframes df_target and df_source, can I do df_target.as("t).merge(df_source.as("s"), "s.id=t.id").whenMatched().updateAll().whenNotMatched.insertAll.execute(). when I tried the code above, I got the error "merge is not a member of the...

  • 6487 Views
  • 4 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @andrew li​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 0 kudos
3 More Replies
JJL
by New Contributor II
  • 20586 Views
  • 3 replies
  • 3 kudos

Resolved! Does Spark SQL can perform UPDATE with INNER JOIN and LIKE with '%' + [column] + '%' ?

Hi All,I came from MS SQL and just started to learning more about Spark SQLHere is one part that I'm trying to perform. In MS SQL, it can be easily done, but it seems like it doesn't in SparkSo, I want to make a simple update to the record, if the co...

  • 20586 Views
  • 3 replies
  • 3 kudos
Latest Reply
oleole
Contributor
  • 3 kudos

@Hubert Dudek​ Hello, I'm having the same issue with using UPDATE in spark sql and came across your answer. When you say "replace source_table_reference with view" in MERGE, do you mean to replace "P" with "VIEW" that looks something as below:%sql ME...

  • 3 kudos
2 More Replies
Anonymous
by Not applicable
  • 6396 Views
  • 1 replies
  • 1 kudos

Databricks-connect configured with service principal token but unable to retrieve information to local machine

installed databricks-connect and configured with service principal token, able to start cluster when I use command spark=SparkSession\.builder\.getOrCreate() But when trying to retrieve s3 bucket data to local machine or even i run test command ex...

  • 6396 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @divya08Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 1 kudos
Gaurav_Raj
by New Contributor III
  • 4027 Views
  • 3 replies
  • 3 kudos

Resolved! Lakehouse Fundamentals Accreditation Badge not received after the course completion

I completed the Databricks Lakehouse Fundamentals Accreditation course today, but I didn't receive my badge yet.I even checked in: https://credentials.databricks.com/ but shows no record/ credentials. see the screenshot below. Please help me out with...

image
  • 4027 Views
  • 3 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Gaurav Raj​ Thank you for posting your question in our community! We are happy to assist you.Every best answer marked contributes to the growth and success of our community.Regards

  • 3 kudos
2 More Replies
RengarLee
by Contributor
  • 13270 Views
  • 10 replies
  • 3 kudos

Resolved! Databricks write to Azure Data Explorer writes suddenly become slower

Now, I write to Azure Data explorer using Spark streaming. one day, writes suddenly become slower. restart is no effect.I have a questions about Spark Streaming to Azure Data explorer.Q1: What should I do to get performance to reply?Figure 1 shows th...

  • 13270 Views
  • 10 replies
  • 3 kudos
Latest Reply
RengarLee
Contributor
  • 3 kudos

I'm so sorry, I just thought the issue wasn't resolvedSolutionSet maxFilesPerTrigger and maxBytesPerTrigger Enable autpoptimizeReason for the first day, it processes larger files and then eventually process smaller files。Detailed reason B...

  • 3 kudos
9 More Replies
Labels