Data Engineering

Forum Posts

Sorted by:

by Databricks143 • New Contributor III

09-25-2023 10:23:37 AM

1097 Views
4 replies
0 kudos

Correlated column is not allowed in non predicate in UDF SQL

Hi Team,I am new to databricks and currently working on creating sql udf 's in databricks .In udf we are calculating min date and that date column using in where clause also.While running udf getting Correlated column is not allowed in non predica...

Data Engineering

1097 Views
4 replies
0 kudos

09-25-2023 10:23:37 AM

View Replies

Latest Reply

Noopur_Nigam
Valued Contributor II

09-26-2023 3:58:50 AM

0 kudos

Could you please provide your full code? I would also like to know which DBR version you are using in your cluster.

0 kudos

09-26-2023 3:58:50 AM

3 More Replies

by thomann • New Contributor III

12-13-2022 8:48:53 AM

4010 Views
5 replies
6 kudos

Bug? Unity Catalog incompatible with Sparklyr in RStudio (on Driver) and as well if used on one cluster from multiple notebooks?

If I start a RStudio Server with in cluster init script as described here in a Unity Catalog Cluster the sparklyr connection fails with an error about a missing Credential Scope.=LI tried it both in 11.3LTS and 12.0 Beta. I tried it only in a Persona...

Data Engineering

4010 Views
5 replies
6 kudos

12-13-2022 8:48:53 AM

View Replies

Latest Reply

kunalmishra9
New Contributor III

09-26-2023 4:42:07 PM

6 kudos

Have run into this issue as well. Let me know if there was any resolution

6 kudos

09-26-2023 4:42:07 PM

4 More Replies

by soumyaPattnaik • New Contributor III

11-01-2022 7:34:26 AM

1970 Views
4 replies
6 kudos

How can I customize the Notebook Job # while using dbutils.notebook.run method?

When running multiple notebooks parallelly using dbutils.notebook.run from a parent notebook, an url to that running notebook is printed, like belowNotebook job #211371132480519Is there a way I can print the notebook name or some customized string in...

Data Engineering

1970 Views
4 replies
6 kudos

11-01-2022 7:34:26 AM

View Replies

Latest Reply

soumyaPattnaik
New Contributor III

06-29-2023 1:17:46 AM

6 kudos

Hi @Debayan Thank you for your reply.However, the answer I am looking for is : how to print/get a more meaningful name of the jobs when running multiple notebooks parallelly using dbutils.notebook.run from a parent notebook.Now in the parent notebook...

6 kudos

06-29-2023 1:17:46 AM

3 More Replies

by Leo_138525 • New Contributor II

09-15-2022 12:38:20 AM

1695 Views
4 replies
1 kudos

Resolved! RDD not picking up spark configuration for azure storage account access

I want to open some CSV files as an RDD, do some processing and then load it as a DataFrame. Since the files are stored in an Azure blob storage account I need to configure the access accordingly, which for some reason does not work when using an RDD...

Data Engineering

1695 Views
4 replies
1 kudos

09-15-2022 12:38:20 AM

View Replies

Latest Reply

Leo_138525
New Contributor II

09-28-2022 12:27:23 AM

1 kudos

I decided to load the files into a DataFrame with a single column and then do the processing before splitting it into separate columns and this works just fine.@Hyper Guy thanks for the link, I didn't try that but it seems like it would resolve the ...

1 kudos

09-28-2022 12:27:23 AM

3 More Replies

by JohnJustus • New Contributor III

09-21-2023 11:48:56 AM

2237 Views
4 replies
2 kudos

TypeError : withcolumn() takes 3 positional arguments but 4 were given.

Hi All,Can some one please help me with the error.This is my small python code.binmaster = binmasterdf.withColumnRenamed("tag_number","BinKey")\.withColumn ("Category", when (length("Bin")==4,'SMALL LOT'),(length("Bin")==7,'RACKING'))TypeError : with...

Data Engineering

2237 Views
4 replies
2 kudos

09-21-2023 11:48:56 AM

View Replies

Latest Reply

Noopur_Nigam
Valued Contributor II

09-26-2023 4:22:17 AM

2 kudos

Hi @JohnJustus If you see closely in .withColumn ("Category", when (length("Bin")==4,'SMALL LOT'), when (length("Bin")==7,'RACKING'), otherwise('FLOOR')), withcolumn would take 2 parameters. The first parameter as a string and the second as the colum...

2 kudos

09-26-2023 4:22:17 AM

3 More Replies

by Erik • Valued Contributor II

09-17-2023 11:45:46 PM

2732 Views
4 replies
4 kudos

Liquid clustering with structured streaming pyspark

I would like to try out liquid clustering, but all the examples I see seem to be SQL tables created from selecting from other tables. Our gold tables are pyspark tables written directly to a table, e.g. like this: silver_df.writeStream.partitionBy(["...

Data Engineering

2732 Views
4 replies
4 kudos

09-17-2023 11:45:46 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

09-26-2023 1:27:50 AM

4 kudos

I did not find anything in the docs either. I suppose a pyspark version will come in the future?

4 kudos

09-26-2023 1:27:50 AM

3 More Replies

by Yoshe1101 • New Contributor III

11-28-2022 8:24:08 AM

1905 Views
2 replies
1 kudos

Resolved! Cluster terminated. Reason: Npip Tunnel Setup Failure

Hi, I have recently deployed a new Workspace in AWS and getting the following error when trying to start the cluster:"NPIP tunnel setup failure during launch. Please try again later and contact Databricks if the problem persists. Instance bootstrap f...

Data Engineering

1905 Views
2 replies
1 kudos

11-28-2022 8:24:08 AM

View Replies

Latest Reply

Yoshe1101
New Contributor III

01-17-2023 8:41:40 AM

1 kudos

Finally, this error was fixed by changing the DHCP configuration of the VPC.

1 kudos

01-17-2023 8:41:40 AM

1 More Replies

by MichaelO • New Contributor III

05-05-2023 2:07:40 AM

2038 Views
4 replies
2 kudos

Resolved! Call python image function in pyspark

I have a function for rotating images written in python:from PIL import Image def rotate_image(image, rotation_angle): im = Image.open(image) out = im.rotate(rotation_angle, expand = True) return outI now want to use this function as a pyspark ...

Data Engineering

2038 Views
4 replies
2 kudos

05-05-2023 2:07:40 AM

View Replies

Latest Reply

Raluka
New Contributor III

09-23-2023 8:42:04 PM

2 kudos

Stock photos, I've come to realize, are the catalysts of imagination. This website's vast reservoir of images new york seal sparks ideas that ripple through my projects. They empower me to envision the previously unimagined, helping me breathe life i...

2 kudos

09-23-2023 8:42:04 PM

3 More Replies

by Christine • Contributor

05-24-2022 11:42:57 PM

3974 Views
8 replies
5 kudos

Resolved! pyspark dataframe empties after it has been saved to delta lake.

Hi, I am facing a problem that I hope to get some help to understand. I have created a function that is supposed to check if the input data already exist in a saved delta table and if not, it should create some calculations and append the new data to...

Data Engineering

3974 Views
8 replies
5 kudos

05-24-2022 11:42:57 PM

View Replies

Latest Reply

SharathE
New Contributor II

09-23-2023 11:04:59 AM

5 kudos

Hi,im also having similar issue ..does creating temp view and reading it again after saving to a table works?? /

5 kudos

09-23-2023 11:04:59 AM

7 More Replies

by ottomes • New Contributor II

08-17-2023 11:39:49 AM

1241 Views
3 replies
0 kudos

What is my subscription plan?

I am working as data engineer I was about checking the subscription plan. I would like to know how I can check the subscription plan. I am "Admin" but I cannot "Manage accounts" on Databricks workspace portal.This subscription information is pretty i...

Data Engineering

calculation

DBU

Premium

pricing

Standard

1241 Views
3 replies
0 kudos

08-17-2023 11:39:49 AM

View Replies

Latest Reply

ottomes
New Contributor II

09-23-2023 9:03:20 AM

0 kudos

Hey, not really. Update you: if you try to add ACL to a secret scope then you will be sure that your subscription is Enterprise or Standard, because either you succeed then you are working with Enterprise or the API respond with the Standard subscrip...

0 kudos

09-23-2023 9:03:20 AM

2 More Replies

by alonisser • Contributor

08-27-2023 4:36:31 AM

545 Views
2 replies
0 kudos

Trying to vacuum a table that is constantly being "createdOrReplaced"

and it seems that older data (From the "replaced" table) isn't being removed, long after the retention period I'd be glad for clues on how to handle this

Data Engineering

545 Views
2 replies
0 kudos

08-27-2023 4:36:31 AM

View Replies

Latest Reply

alonisser
Contributor

09-23-2023 7:05:40 AM

0 kudos

Thanks, I know that, but the table history shows 30 days, but the actual data size and number of files and all other indicators , correlate to 170 days.

0 kudos

09-23-2023 7:05:40 AM

1 More Replies

by Rishitha • New Contributor III

09-22-2023 11:28:09 AM

5327 Views
3 replies
2 kudos

Resolved! DLT pipeline

Hi all!I have a question about setting a target schema. How to set different targets for 2 different tables in the same delta live table pipeline. We have 2 target schemas in a database Bronze_chema and silver_schema. The pipeline has a streaming ra...

Data Engineering

5327 Views
3 replies
2 kudos

09-22-2023 11:28:09 AM

View Replies

Latest Reply

Rishitha
New Contributor III

09-22-2023 12:44:16 PM

2 kudos

Thanks again @btafur Hoping for this feature to release soon!

2 kudos

09-22-2023 12:44:16 PM

2 More Replies

by Jayanth746 • New Contributor III

11-07-2022 7:04:31 PM

9419 Views
10 replies
4 kudos

Kafka unable to read client.keystore.jks.

Below is the error we have received when trying to read the stream Caused by: kafkashaded.org.apache.kafka.common.KafkaException: Failed to load SSL keystore /dbfs/FileStore/Certs/client.keystore.jksCaused by: java.nio.file.NoSuchFileException: /dbfs...

Data Engineering

9419 Views
10 replies
4 kudos

11-07-2022 7:04:31 PM

View Replies

Latest Reply

mwoods
New Contributor III

09-22-2023 12:13:13 PM

4 kudos

Ok, scrub that - the problem in my case was that I was using the 14.0 databricks runtime, which appears to have a bug relating to abfss paths here. Switching back to the 13.3 LTS release resolved it for me. So if you're in the same boat finding abfss...

4 kudos

09-22-2023 12:13:13 PM

9 More Replies

by mwoods • New Contributor III

09-18-2023 9:29:39 AM

4003 Views
3 replies
2 kudos

Resolved! Spark readStream kafka.ssl.keystore.location abfss path

Similar to https://community.databricks.com/t5/data-engineering/kafka-unable-to-read-client-keystore-jks/td-p/23301 - the documentation (https://learn.microsoft.com/en-gb/azure/databricks/structured-streaming/kafka#use-ssl-to-connect-azure-databricks...

Data Engineering

4003 Views
3 replies
2 kudos

09-18-2023 9:29:39 AM

View Replies

Latest Reply

mwoods
New Contributor III

09-22-2023 12:09:59 PM

2 kudos

@Kaniz- quick update - managed to find the cause. It's neither of the above, it's a bug in the DataBricks 14.0 runtime. I had switched back to the 13.3 LTS runtime, and that is what caused the error to disappear.As soon as I try to read directly from...

2 kudos

09-22-2023 12:09:59 PM

2 More Replies

by Sahha_Krishna • New Contributor

07-27-2023 2:42:19 PM

584 Views
1 replies
0 kudos

Unable to start Cluster in Databricks because of `BOOTSTRAP_TIMEOUT`

Unable to start the Cluster in AWS-hosted Databricks because of the below reason{ "reason": { "code": "BOOTSTRAP_TIMEOUT", "parameters": { "databricks_error_message": "[id: InstanceId(i-0634ee9c2d420edc8), status: INSTANCE_INITIALIZIN...

Data Engineering

AWS

584 Views
1 replies
0 kudos

07-27-2023 2:42:19 PM

View Replies

Latest Reply

Harrison_S
New Contributor III

09-22-2023 10:24:19 AM

0 kudos

Hi Sahha, It may be a DNS issue if that wasn't rolled back, can you check the documentation on troubleshooting guide and see if these configurations were rolled back as well? https://docs.databricks.com/en/administration-guide/cloud-configurations/aw...

0 kudos

09-22-2023 10:24:19 AM

User

Count

1602

736

344

284

247

Databricks

Forum Posts

Correlated column is not allowed in non predicate in UDF SQL

Bug? Unity Catalog incompatible with Sparklyr in RStudio (on Driver) and as well if used on one cluster from multiple notebooks?

How can I customize the Notebook Job # while using dbutils.notebook.run method?

Resolved! RDD not picking up spark configuration for azure storage account access

TypeError : withcolumn() takes 3 positional arguments but 4 were given.

Liquid clustering with structured streaming pyspark

Resolved! Cluster terminated. Reason: Npip Tunnel Setup Failure

Resolved! Call python image function in pyspark

Resolved! pyspark dataframe empties after it has been saved to delta lake.

What is my subscription plan?

Trying to vacuum a table that is constantly being "createdOrReplaced"

Resolved! DLT pipeline

Kafka unable to read client.keystore.jks.

Resolved! Spark readStream kafka.ssl.keystore.location abfss path

Unable to start Cluster in Databricks because of `BOOTSTRAP_TIMEOUT`

Best way to parse Google Analytics data in Databri...

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...