cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Arnold_Souza
by New Contributor III
  • 2308 Views
  • 4 replies
  • 2 kudos

SAT - Security Analysis Tool implementation error

I want to implement SAT in my workspace account. I was able to execute the terraform that enable the necessary infra to work on that. When I try to execute the workflow "SAT Initializer Notebook (one-time)" it fails with the error:AnalysisException: ...

1 2
  • 2308 Views
  • 4 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Arnold Souza​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers ...

  • 2 kudos
3 More Replies
Hubert-Dudek
by Esteemed Contributor III
  • 1885 Views
  • 1 replies
  • 6 kudos

SQL cells in databricks notebooks can now be run in parallel, which means faster query processing and analysis. This new feature is especially helpful...

SQL cells in databricks notebooks can now be run in parallel, which means faster query processing and analysis. This new feature is especially helpful for queries that take longer to run or analyze large datasets. With parallel processing, Databricks...

paraler
  • 1885 Views
  • 1 replies
  • 6 kudos
Latest Reply
Rishabh-Pandey
Honored Contributor II
  • 6 kudos

Informative ​

  • 6 kudos
oleole
by Contributor
  • 8216 Views
  • 1 replies
  • 1 kudos

Resolved! MERGE to update a column of a table using Spark SQL

Coming from MS SQL background, I'm trying to write a query in Spark SQL that simply update a column value of table A (source table) by INNER JOINing a new table B with a filter.MS SQL query looks like this:UPDATE T SET T.OfferAmount = OSE.EndpointEve...

  • 8216 Views
  • 1 replies
  • 1 kudos
Latest Reply
oleole
Contributor
  • 1 kudos

Posting answer to my question:   MERGE into TempOffer VIEW USING OfferSeq OSE ON VIEW.OfferId = OSE.OfferID AND OSE.OfferId = 1 WHEN MATCHED THEN UPDATE set VIEW.OfferAmount = OSE.EndpointEventAmountValue;

  • 1 kudos
RyanHager
by Contributor
  • 1796 Views
  • 5 replies
  • 2 kudos

Is there a stream / Kafka topic that we can connect to for monitoring all Databricks jobs/workflows (create/status update/fail/error/complete)?

Currently we are creating and monitoring jobs using the api. This results in a lot of polling of the API for job status. Is there a Kafka stream, we could listen to get jobs updates and significantly reduce the number of calls to the Databricks jobs...

  • 1796 Views
  • 5 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Ryan Hager​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we...

  • 2 kudos
4 More Replies
Ramana
by Contributor
  • 1607 Views
  • 3 replies
  • 3 kudos

Resolved! How do we set spark_version in cluster policies to select the latest GPU ML LTS version as defaultValue?

Currently, I use the below two different JSON snippets to choose either Standard or ML runtime. Similar to the below, what is the defaultValue for spark_version to select the latest GPU ML LTS runtime version? "spark_version": {  "type": "regex",  "p...

  • 1607 Views
  • 3 replies
  • 3 kudos
Latest Reply
LandanG
Honored Contributor
  • 3 kudos

Hi @Ramana Kancharana​ ,As of right now these options are only available for non-GPU DBRs

  • 3 kudos
2 More Replies
irfanaziz
by Contributor II
  • 2710 Views
  • 1 replies
  • 3 kudos

TimestampFormat issue

The databricks notebook failed yesterday due to timestamp format issue. error:"SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: Fail to parse '2022-08-10 00:00:14.2760000' in the new parser. You can set spark.s...

  • 2710 Views
  • 1 replies
  • 3 kudos
Latest Reply
searchs
New Contributor II
  • 3 kudos

You must have solved this issue by now but for the sake of those that encounter this again, here's the solution that worked for me:spark.sql("set spark.sql.legacy.timeParserPolicy=LEGACY")

  • 3 kudos
yzhang
by New Contributor III
  • 2005 Views
  • 5 replies
  • 0 kudos

Cannot find such info if Databricks supports nested jobs or tasks. For example, I have a 'job_a', which contains list of tasks, and another &#...

Cannot find such info if Databricks supports nested jobs or tasks. For example, I have a 'job_a', which contains list of tasks, and another 'job_b', also contains a list of tasks. Now I'd like to have a 'job_all' that will run both 'job_a' and 'job_b...

  • 2005 Views
  • 5 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Yanan Zhang​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the response and select the one that best answers yo...

  • 0 kudos
4 More Replies
Chris_Shehu
by Valued Contributor III
  • 2296 Views
  • 4 replies
  • 2 kudos

Resolved! No Explicit Deny for User security configurations at the group level?

Currently when you add new users to the Databricks workspace they get added to a "Users" group that has full access to the workspace. There should be a way to use group security to explicitly deny access to those same settings. This setting should ov...

image image image
  • 2296 Views
  • 4 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@dean james​ I am not sure about your case why you want to deny access to the group once you create it. Anyhow, we can use deacticate/activate an user using "2.0/preview/scim/v2/Users/{id}" rest API endpoint. We can also deactivate users that have no...

  • 2 kudos
3 More Replies
andrew0117
by Contributor
  • 2367 Views
  • 4 replies
  • 0 kudos

Resolved! Can merge() function be applied to dataframe?

if I have two dataframes df_target and df_source, can I do df_target.as("t).merge(df_source.as("s"), "s.id=t.id").whenMatched().updateAll().whenNotMatched.insertAll.execute(). when I tried the code above, I got the error "merge is not a member of the...

  • 2367 Views
  • 4 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @andrew li​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 0 kudos
3 More Replies
JJL
by New Contributor II
  • 10271 Views
  • 3 replies
  • 3 kudos

Resolved! Does Spark SQL can perform UPDATE with INNER JOIN and LIKE with '%' + [column] + '%' ?

Hi All,I came from MS SQL and just started to learning more about Spark SQLHere is one part that I'm trying to perform. In MS SQL, it can be easily done, but it seems like it doesn't in SparkSo, I want to make a simple update to the record, if the co...

  • 10271 Views
  • 3 replies
  • 3 kudos
Latest Reply
oleole
Contributor
  • 3 kudos

@Hubert Dudek​ Hello, I'm having the same issue with using UPDATE in spark sql and came across your answer. When you say "replace source_table_reference with view" in MERGE, do you mean to replace "P" with "VIEW" that looks something as below:%sql ME...

  • 3 kudos
2 More Replies
Anonymous
by Not applicable
  • 1488 Views
  • 2 replies
  • 2 kudos

Databricks-connect configured with service principal token but unable to retrieve information to local machine

installed databricks-connect and configured with service principal token, able to start cluster when I use command spark=SparkSession\.builder\.getOrCreate() But when trying to retrieve s3 bucket data to local machine or even i run test command ex...

  • 1488 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @divya08Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 2 kudos
1 More Replies
Gaurav_Raj
by New Contributor III
  • 1573 Views
  • 3 replies
  • 3 kudos

Resolved! Lakehouse Fundamentals Accreditation Badge not received after the course completion

I completed the Databricks Lakehouse Fundamentals Accreditation course today, but I didn't receive my badge yet.I even checked in: https://credentials.databricks.com/ but shows no record/ credentials. see the screenshot below. Please help me out with...

image
  • 1573 Views
  • 3 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Gaurav Raj​ Thank you for posting your question in our community! We are happy to assist you.Every best answer marked contributes to the growth and success of our community.Regards

  • 3 kudos
2 More Replies
RengarLee
by Contributor
  • 5553 Views
  • 11 replies
  • 3 kudos

Resolved! Databricks write to Azure Data Explorer writes suddenly become slower

Now, I write to Azure Data explorer using Spark streaming. one day, writes suddenly become slower. restart is no effect.I have a questions about Spark Streaming to Azure Data explorer.Q1: What should I do to get performance to reply?Figure 1 shows th...

  • 5553 Views
  • 11 replies
  • 3 kudos
Latest Reply
RengarLee
Contributor
  • 3 kudos

I'm so sorry, I just thought the issue wasn't resolvedSolutionSet maxFilesPerTrigger and maxBytesPerTrigger Enable autpoptimizeReason for the first day, it processes larger files and then eventually process smaller files。Detailed reason B...

  • 3 kudos
10 More Replies
MetaRossiVinli
by Contributor
  • 3169 Views
  • 1 replies
  • 1 kudos

Resolved! Find root path to Repo for .py file import

I want to import a Python function stored in the following file path:`<repo>/lib/lib_helpers.py`I want to import the function from any file in my repo. For instance from these:`<repo>/notebooks/etl/bronze/dlt_bronze_elt``<repo>/workers/job_worker`It ...

  • 3169 Views
  • 1 replies
  • 1 kudos
Latest Reply
MetaRossiVinli
Contributor
  • 1 kudos

Ok, I figured it out. If you just make it a Python module by adding an empty `__init__.py`, Databricks will load it on start. Then, you can just import it.

  • 1 kudos
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels