cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Graham
by New Contributor III
  • 4440 Views
  • 4 replies
  • 3 kudos

Resolved! Inline comment next to un-tickmarked SET statement = Syntax error

Running this code in databricks SQL works great:SET USE_CACHED_RESULT = FALSE;   -- Result: -- key value -- USE_CACHED_RESULT FALSEIf I add an inline comment, however, I get a syntax error:SET USE_CACHED_RESUL...

  • 4440 Views
  • 4 replies
  • 3 kudos
Latest Reply
rafal_walisko
New Contributor II
  • 3 kudos

Hi, I'm getting the same error when trying to execute statement through API "statement": "SET `USE_CACHED_RESULT` = FALSE; SELECT COUNT(*) FROM TABLE" Every combination fail  "status": { "state": "FAILED", "error": { "e...

  • 3 kudos
3 More Replies
peterwishart
by New Contributor III
  • 4029 Views
  • 4 replies
  • 0 kudos

Resolved! Programmatically updating the “run_as_user_name” parameter for jobs

I am trying to write a process that will programmatically update the “run_as_user_name” parameter for all jobs in an Azure Databricks workspace, using powershell to interact with the Jobs API. I have been trying to do this with a test job without suc...

  • 4029 Views
  • 4 replies
  • 0 kudos
Latest Reply
baubleglue
New Contributor II
  • 0 kudos

  Solution you've submitted is a solution for different topic (permission to run job, the job still runs as the user in run_as_user_name field). Here is an example of changing "run_as_user_name"Docs:https://docs.databricks.com/api/azure/workspace/job...

  • 0 kudos
3 More Replies
elgeo
by Valued Contributor II
  • 33432 Views
  • 9 replies
  • 5 kudos

Resolved! SQL Declare Variable equivalent in databricks

Hello. What would be the equivalent of the below in databricks?DECLARE @LastChangeDate as dateSET @LastChangeDate = GetDate()I already tried the below and worked. However I need to know how set a sql variable dynamicallySET da.dbname = test;SELECT "$...

  • 33432 Views
  • 9 replies
  • 5 kudos
Latest Reply
srinitechworld
New Contributor II
  • 5 kudos

hi try to to control the variables  

  • 5 kudos
8 More Replies
jch
by New Contributor III
  • 7358 Views
  • 4 replies
  • 5 kudos

Resolved! Why does spark.read.csv come back with an error: com.databricks.sql.io.FileReadException: Error while reading file dbfs:/mnt/cntnr/demo/circuits.csv ?

I need help understanding why I can't open a file.In a databricks notebook, I use this code:%fs   ls /mnt/cntnr/demoI get back dbfs:/mnt/cntnr/demo/circuits.csv as one of the path values.When I use this code, I get an error:circuits_df = spark.read....

  • 7358 Views
  • 4 replies
  • 5 kudos
Latest Reply
jch
New Contributor III
  • 5 kudos

It turns out my spark config was wrong    #Set Spark configuration    configs = {"fs.azure.account.auth.type": "OAuth",          "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",          "fs.azu...

  • 5 kudos
3 More Replies
Ankith
by New Contributor
  • 2833 Views
  • 2 replies
  • 1 kudos

Resolved! How to enable spark.shuffle.compress in spark 3.3.0 or above versions?

when I try to set that I got a following error, appreciate your comments, thanks you in advance.

image.png
  • 2833 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Ankith Patlolla​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answe...

  • 1 kudos
1 More Replies
Ogi
by New Contributor II
  • 1363 Views
  • 4 replies
  • 1 kudos

Setting right processingTime

How to set just the right processingTime for readStream to maximize the performance? Based on which factors it depends and is there a way to measure this?

  • 1363 Views
  • 4 replies
  • 1 kudos
Latest Reply
Ogi
New Contributor II
  • 1 kudos

Thanks @Ajay Pandey​ and @Nandini N​ for your answers. I wanted to know more about what should I do in order to do it properly. Should I change processing times (1, 5, 10, 30, 60 seconds) and see how it affects running job in terms of time and CPU/me...

  • 1 kudos
3 More Replies
oteng
by New Contributor III
  • 1841 Views
  • 1 replies
  • 0 kudos

SET configuration in SQL DLT pipeline not working

I'm not able to get the SET command to work when using sql in DLT pipeline. I am copying the code from this documentation https://docs.databricks.com/workflows/delta-live-tables/delta-live-tables-sql-ref.html#sql-spec (relevant code below). When I ru...

image
  • 1841 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Oliver Teng​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks...

  • 0 kudos
danny_edm
by New Contributor
  • 653 Views
  • 0 replies
  • 0 kudos

collect_set wired result when Proton enable

Cluster : DBR 10.4 LTS with protonSample schemaseq_no (decimal)type (string)Sample dataseq_no type1 A1 A2 A2 B2 Bcommand : F.size(F.collect_set(F.col("type")).over(Window.partitionBy("seq_no"))...

  • 653 Views
  • 0 replies
  • 0 kudos
Sam
by New Contributor III
  • 4143 Views
  • 3 replies
  • 6 kudos

Resolved! QuantileDiscretizer not respecting NumBuckets

I have set numBuckets and numBucketsArray for a group of columns to bin them into 5 buckets.Unfortunately the number of buckets does not seem to be respected across all columns even though there is variation within them.I have tried setting the relat...

  • 4143 Views
  • 3 replies
  • 6 kudos
Latest Reply
Sam
New Contributor III
  • 6 kudos

Thank you.What I did was:Apply QuntileBucketizer to Non-Zeros and specified a very small value (bottom 1%) to capture the lower bucket including zeroes.That fixed the issue! You can define your own splits which would work as well but the splits thems...

  • 6 kudos
2 More Replies
sgannavaram
by New Contributor III
  • 3223 Views
  • 1 replies
  • 2 kudos

Resolved! How to pass variables into query string?

I have two variables StartTimeStmp and EndTimeStmp, i am going to assign the Start timestamp to it based on Last Successful Job Runtime and EndTimeStamp would be current time of system.SET StartTimeStmp = '2022-03-24 15:40:00.000';SET EndTimeStmp = '...

  • 3223 Views
  • 1 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

@Srinivas Gannavaram​ , in python:spark.sql(f""" SELECT CI.CORPORATE_ITEM_INTEGRATION_ID , CI.CORPORATE_ITEM_CD WHERE CI.DW_CREATE_TS < '{my_timestamp_variable}' ; """)

  • 2 kudos
Sam
by New Contributor III
  • 1199 Views
  • 1 replies
  • 4 kudos

collect_set/ collect_list Pushdown

Hello,I've noticed that Collect_Set and Collect_List are not pushed down to the database?Runtime DB 9.1LTSSpark 3.1.2Database: SnowflakeIs there any way to get a distinct set from a group by in a way that will push down the query to the database?

  • 1199 Views
  • 1 replies
  • 4 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 4 kudos

Hm so collect_set does not get translated to listagg.Can you try the following?use a more recent version of dbrxuse delta lake as spark sourceuse the latest version of the snowflake connectorcheck if pushdown to snowflake is enabled

  • 4 kudos
kjoth
by Contributor II
  • 5661 Views
  • 7 replies
  • 12 kudos

Resolved! Databricks cluster Encryption keystore_password

How to set up this value? Is this any value we can provide or the default value we have to p#!/bin/bash   keystore_file="/dbfs/<keystore_directory>/jetty_ssl_driver_keystore.jks" keystore_password="gb1gQqZ9ZIHS" sasl_secret=$(sha256sum $keystore_file...

  • 5661 Views
  • 7 replies
  • 12 kudos
Latest Reply
Prabakar
Databricks Employee
  • 12 kudos

Hi @karthick J​ please refer to this notebook.https://docs.microsoft.com/en-us/azure/databricks/_static/notebooks/cluster-encryption-init-script.htmlFurther, if you will be using %pip magic command the below post will be helpful.https://community.dat...

  • 12 kudos
6 More Replies
Labels