Data Engineering

Forum Posts

Sorted by:

by Graham • New Contributor III

11-01-2022 11:38:00 AM

4440 Views
4 replies
3 kudos

Resolved! Inline comment next to un-tickmarked SET statement = Syntax error

Running this code in databricks SQL works great:SET USE_CACHED_RESULT = FALSE; -- Result: -- key value -- USE_CACHED_RESULT FALSEIf I add an inline comment, however, I get a syntax error:SET USE_CACHED_RESUL...

Data Engineering

4440 Views
4 replies
3 kudos

11-01-2022 11:38:00 AM

View Replies

Latest Reply

rafal_walisko
New Contributor II

09-09-2024 5:00:03 AM

3 kudos

Hi, I'm getting the same error when trying to execute statement through API "statement": "SET `USE_CACHED_RESULT` = FALSE; SELECT COUNT(*) FROM TABLE" Every combination fail "status": { "state": "FAILED", "error": { "e...

3 kudos

09-09-2024 5:00:03 AM

3 More Replies

by peterwishart • New Contributor III

09-19-2022 12:23:25 PM

4029 Views
4 replies
0 kudos

Resolved! Programmatically updating the “run_as_user_name” parameter for jobs

I am trying to write a process that will programmatically update the “run_as_user_name” parameter for all jobs in an Azure Databricks workspace, using powershell to interact with the Jobs API. I have been trying to do this with a test job without suc...

Data Engineering

4029 Views
4 replies
0 kudos

09-19-2022 12:23:25 PM

View Replies

Latest Reply

baubleglue
New Contributor II

10-04-2023 7:34:24 AM

0 kudos

Solution you've submitted is a solution for different topic (permission to run job, the job still runs as the user in run_as_user_name field). Here is an example of changing "run_as_user_name"Docs:https://docs.databricks.com/api/azure/workspace/job...

0 kudos

10-04-2023 7:34:24 AM

3 More Replies

by elgeo • Valued Contributor II

10-19-2022 4:30:21 AM

33432 Views
9 replies
5 kudos

Resolved! SQL Declare Variable equivalent in databricks

Hello. What would be the equivalent of the below in databricks?DECLARE @LastChangeDate as dateSET @LastChangeDate = GetDate()I already tried the below and worked. However I need to know how set a sql variable dynamicallySET da.dbname = test;SELECT "$...

Data Engineering

33432 Views
9 replies
5 kudos

10-19-2022 4:30:21 AM

View Replies

Latest Reply

srinitechworld
New Contributor II

05-20-2023 4:05:26 AM

5 kudos

hi try to to control the variables

5 kudos

05-20-2023 4:05:26 AM

8 More Replies

by jch • New Contributor III

05-15-2023 2:48:30 PM

7358 Views
4 replies
5 kudos

Resolved! Why does spark.read.csv come back with an error: com.databricks.sql.io.FileReadException: Error while reading file dbfs:/mnt/cntnr/demo/circuits.csv ?

I need help understanding why I can't open a file.In a databricks notebook, I use this code:%fs ls /mnt/cntnr/demoI get back dbfs:/mnt/cntnr/demo/circuits.csv as one of the path values.When I use this code, I get an error:circuits_df = spark.read....

Data Engineering

7358 Views
4 replies
5 kudos

05-15-2023 2:48:30 PM

View Replies

Latest Reply

jch
New Contributor III

06-21-2023 5:56:15 AM

5 kudos

It turns out my spark config was wrong #Set Spark configuration configs = {"fs.azure.account.auth.type": "OAuth", "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider", "fs.azu...

5 kudos

06-21-2023 5:56:15 AM

3 More Replies

by Ankith • New Contributor

05-11-2023 4:51:55 AM

2833 Views
2 replies
1 kudos

Resolved! How to enable spark.shuffle.compress in spark 3.3.0 or above versions?

when I try to set that I got a following error, appreciate your comments, thanks you in advance.

Data Engineering

2833 Views
2 replies
1 kudos

05-11-2023 4:51:55 AM

View Replies

Latest Reply

Anonymous
Not applicable

05-21-2023 11:57:16 PM

1 kudos

Hi @Ankith Patlolla Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answe...

1 kudos

05-21-2023 11:57:16 PM

1 More Replies

by Ogi • New Contributor II

04-03-2023 2:30:34 AM

1363 Views
4 replies
1 kudos

Setting right processingTime

How to set just the right processingTime for readStream to maximize the performance? Based on which factors it depends and is there a way to measure this?

Data Engineering

1363 Views
4 replies
1 kudos

04-03-2023 2:30:34 AM

View Replies

Latest Reply

Ogi
New Contributor II

04-20-2023 3:56:59 AM

1 kudos

Thanks @Ajay Pandey and @Nandini N for your answers. I wanted to know more about what should I do in order to do it properly. Should I change processing times (1, 5, 10, 30, 60 seconds) and see how it affects running job in terms of time and CPU/me...

1 kudos

04-20-2023 3:56:59 AM

3 More Replies

by oteng • New Contributor III

02-16-2023 4:03:10 PM

1841 Views
1 replies
0 kudos

SET configuration in SQL DLT pipeline not working

I'm not able to get the SET command to work when using sql in DLT pipeline. I am copying the code from this documentation https://docs.databricks.com/workflows/delta-live-tables/delta-live-tables-sql-ref.html#sql-spec (relevant code below). When I ru...

Data Engineering

1841 Views
1 replies
0 kudos

02-16-2023 4:03:10 PM

View Replies

Latest Reply

Anonymous
Not applicable

03-10-2023 6:06:52 PM

0 kudos

Hi @Oliver Teng Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks...

0 kudos

03-10-2023 6:06:52 PM

by danny_edm • New Contributor

08-19-2022 9:44:18 PM

653 Views
0 replies
0 kudos

collect_set wired result when Proton enable

Cluster : DBR 10.4 LTS with protonSample schemaseq_no (decimal)type (string)Sample dataseq_no type1 A1 A2 A2 B2 Bcommand : F.size(F.collect_set(F.col("type")).over(Window.partitionBy("seq_no"))...

Data Engineering

653 Views
0 replies
0 kudos

08-19-2022 9:44:18 PM

by Sam • New Contributor III

09-02-2021 3:39:48 PM

4143 Views
3 replies
6 kudos

Resolved! QuantileDiscretizer not respecting NumBuckets

I have set numBuckets and numBucketsArray for a group of columns to bin them into 5 buckets.Unfortunately the number of buckets does not seem to be respected across all columns even though there is variation within them.I have tried setting the relat...

Data Engineering

4143 Views
3 replies
6 kudos

09-02-2021 3:39:48 PM

View Replies

Latest Reply

Sam
New Contributor III

09-13-2021 6:19:17 PM

6 kudos

Thank you.What I did was:Apply QuntileBucketizer to Non-Zeros and specified a very small value (bottom 1%) to capture the lower bucket including zeroes.That fixed the issue! You can define your own splits which would work as well but the splits thems...

6 kudos

09-13-2021 6:19:17 PM

2 More Replies

by sgannavaram • New Contributor III

04-07-2022 5:53:37 AM

3223 Views
1 replies
2 kudos

Resolved! How to pass variables into query string?

I have two variables StartTimeStmp and EndTimeStmp, i am going to assign the Start timestamp to it based on Last Successful Job Runtime and EndTimeStamp would be current time of system.SET StartTimeStmp = '2022-03-24 15:40:00.000';SET EndTimeStmp = '...

Data Engineering

3223 Views
1 replies
2 kudos

04-07-2022 5:53:37 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

04-07-2022 6:05:46 AM

2 kudos

@Srinivas Gannavaram , in python:spark.sql(f""" SELECT CI.CORPORATE_ITEM_INTEGRATION_ID , CI.CORPORATE_ITEM_CD WHERE CI.DW_CREATE_TS < '{my_timestamp_variable}' ; """)

2 kudos

04-07-2022 6:05:46 AM

by Sam • New Contributor III

12-02-2021 3:53:18 PM

1199 Views
1 replies
4 kudos

collect_set/ collect_list Pushdown

Hello,I've noticed that Collect_Set and Collect_List are not pushed down to the database?Runtime DB 9.1LTSSpark 3.1.2Database: SnowflakeIs there any way to get a distinct set from a group by in a way that will push down the query to the database?

Data Engineering

1199 Views
1 replies
4 kudos

12-02-2021 3:53:18 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

12-02-2021 11:29:53 PM

4 kudos

Hm so collect_set does not get translated to listagg.Can you try the following?use a more recent version of dbrxuse delta lake as spark sourceuse the latest version of the snowflake connectorcheck if pushdown to snowflake is enabled

4 kudos

12-02-2021 11:29:53 PM

by kjoth • Contributor II

11-25-2021 2:55:23 AM

5661 Views
7 replies
12 kudos

Resolved! Databricks cluster Encryption keystore_password

How to set up this value? Is this any value we can provide or the default value we have to p#!/bin/bash keystore_file="/dbfs/<keystore_directory>/jetty_ssl_driver_keystore.jks" keystore_password="gb1gQqZ9ZIHS" sasl_secret=$(sha256sum $keystore_file...

Data Engineering

5661 Views
7 replies
12 kudos

11-25-2021 2:55:23 AM

View Replies

Latest Reply

Prabakar
Databricks Employee

11-25-2021 4:07:09 AM

12 kudos

Hi @karthick J please refer to this notebook.https://docs.microsoft.com/en-us/azure/databricks/_static/notebooks/cluster-encryption-init-script.htmlFurther, if you will be using %pip magic command the below post will be helpful.https://community.dat...

12 kudos

11-25-2021 4:07:09 AM

6 More Replies

by brickster_2018 • Databricks Employee

06-25-2021 3:51:24 PM

790 Views
0 replies
0 kudos

Is it safe to set ignoreMissingFiles to true on a Streaming workload

Data Engineering

790 Views
0 replies
0 kudos

06-25-2021 3:51:24 PM

Databricks Community

Resolved! Inline comment next to un-tickmarked SET statement = Syntax error

Resolved! Programmatically updating the “run_as_user_name” parameter for jobs

Resolved! SQL Declare Variable equivalent in databricks

Resolved! Why does spark.read.csv come back with an error: com.databricks.sql.io.FileReadException: Error while reading file dbfs:/mnt/cntnr/demo/circuits.csv ?

Resolved! How to enable spark.shuffle.compress in spark 3.3.0 or above versions?

Setting right processingTime

SET configuration in SQL DLT pipeline not working

collect_set wired result when Proton enable

Resolved! QuantileDiscretizer not respecting NumBuckets

Resolved! How to pass variables into query string?

collect_set/ collect_list Pushdown

Resolved! Databricks cluster Encryption keystore_password

Is it safe to set ignoreMissingFiles to true on a Streaming workload