Data Engineering

Forum Posts

Sorted by:

by Erik • Valued Contributor II

09-17-2023 11:45:46 PM

3879 Views
4 replies
4 kudos

Liquid clustering with structured streaming pyspark

I would like to try out liquid clustering, but all the examples I see seem to be SQL tables created from selecting from other tables. Our gold tables are pyspark tables written directly to a table, e.g. like this: silver_df.writeStream.partitionBy(["...

Data Engineering

3879 Views
4 replies
4 kudos

09-17-2023 11:45:46 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

09-26-2023 1:27:50 AM

4 kudos

I did not find anything in the docs either. I suppose a pyspark version will come in the future?

4 kudos

09-26-2023 1:27:50 AM

3 More Replies

by Yoshe1101 • New Contributor III

11-28-2022 8:24:08 AM

2188 Views
2 replies
1 kudos

Resolved! Cluster terminated. Reason: Npip Tunnel Setup Failure

Hi, I have recently deployed a new Workspace in AWS and getting the following error when trying to start the cluster:"NPIP tunnel setup failure during launch. Please try again later and contact Databricks if the problem persists. Instance bootstrap f...

Data Engineering

2188 Views
2 replies
1 kudos

11-28-2022 8:24:08 AM

View Replies

Latest Reply

Yoshe1101
New Contributor III

01-17-2023 8:41:40 AM

1 kudos

Finally, this error was fixed by changing the DHCP configuration of the VPC.

1 kudos

01-17-2023 8:41:40 AM

1 More Replies

by MichaelO • New Contributor III

05-05-2023 2:07:40 AM

2614 Views
4 replies
2 kudos

Resolved! Call python image function in pyspark

I have a function for rotating images written in python:from PIL import Image def rotate_image(image, rotation_angle): im = Image.open(image) out = im.rotate(rotation_angle, expand = True) return outI now want to use this function as a pyspark ...

Data Engineering

2614 Views
4 replies
2 kudos

05-05-2023 2:07:40 AM

View Replies

Latest Reply

Raluka
New Contributor III

09-23-2023 8:42:04 PM

2 kudos

Stock photos, I've come to realize, are the catalysts of imagination. This website's vast reservoir of images new york seal sparks ideas that ripple through my projects. They empower me to envision the previously unimagined, helping me breathe life i...

2 kudos

09-23-2023 8:42:04 PM

3 More Replies

by ottomes • New Contributor II

08-17-2023 11:39:49 AM

1750 Views
3 replies
0 kudos

What is my subscription plan?

I am working as data engineer I was about checking the subscription plan. I would like to know how I can check the subscription plan. I am "Admin" but I cannot "Manage accounts" on Databricks workspace portal.This subscription information is pretty i...

Data Engineering

calculation

DBU

Premium

pricing

Standard

1750 Views
3 replies
0 kudos

08-17-2023 11:39:49 AM

View Replies

Latest Reply

ottomes
New Contributor II

09-23-2023 9:03:20 AM

0 kudos

Hey, not really. Update you: if you try to add ACL to a secret scope then you will be sure that your subscription is Enterprise or Standard, because either you succeed then you are working with Enterprise or the API respond with the Standard subscrip...

0 kudos

09-23-2023 9:03:20 AM

2 More Replies

by alonisser • Contributor

08-27-2023 4:36:31 AM

706 Views
2 replies
0 kudos

Trying to vacuum a table that is constantly being "createdOrReplaced"

and it seems that older data (From the "replaced" table) isn't being removed, long after the retention period I'd be glad for clues on how to handle this

Data Engineering

706 Views
2 replies
0 kudos

08-27-2023 4:36:31 AM

View Replies

Latest Reply

alonisser
Contributor

09-23-2023 7:05:40 AM

0 kudos

Thanks, I know that, but the table history shows 30 days, but the actual data size and number of files and all other indicators , correlate to 170 days.

0 kudos

09-23-2023 7:05:40 AM

1 More Replies

by Rishitha • New Contributor III

09-22-2023 11:28:09 AM

6073 Views
3 replies
2 kudos

Resolved! DLT pipeline

Hi all!I have a question about setting a target schema. How to set different targets for 2 different tables in the same delta live table pipeline. We have 2 target schemas in a database Bronze_chema and silver_schema. The pipeline has a streaming ra...

Data Engineering

6073 Views
3 replies
2 kudos

09-22-2023 11:28:09 AM

View Replies

Latest Reply

Rishitha
New Contributor III

09-22-2023 12:44:16 PM

2 kudos

Thanks again @btafur Hoping for this feature to release soon!

2 kudos

09-22-2023 12:44:16 PM

2 More Replies

by Jayanth746 • New Contributor III

11-07-2022 7:04:31 PM

11083 Views
10 replies
4 kudos

Kafka unable to read client.keystore.jks.

Below is the error we have received when trying to read the stream Caused by: kafkashaded.org.apache.kafka.common.KafkaException: Failed to load SSL keystore /dbfs/FileStore/Certs/client.keystore.jksCaused by: java.nio.file.NoSuchFileException: /dbfs...

Data Engineering

11083 Views
10 replies
4 kudos

11-07-2022 7:04:31 PM

View Replies

Latest Reply

mwoods
New Contributor III

09-22-2023 12:13:13 PM

4 kudos

Ok, scrub that - the problem in my case was that I was using the 14.0 databricks runtime, which appears to have a bug relating to abfss paths here. Switching back to the 13.3 LTS release resolved it for me. So if you're in the same boat finding abfss...

4 kudos

09-22-2023 12:13:13 PM

9 More Replies

by mwoods • New Contributor III

09-18-2023 9:29:39 AM

4683 Views
3 replies
2 kudos

Resolved! Spark readStream kafka.ssl.keystore.location abfss path

Similar to https://community.databricks.com/t5/data-engineering/kafka-unable-to-read-client-keystore-jks/td-p/23301 - the documentation (https://learn.microsoft.com/en-gb/azure/databricks/structured-streaming/kafka#use-ssl-to-connect-azure-databricks...

Data Engineering

4683 Views
3 replies
2 kudos

09-18-2023 9:29:39 AM

View Replies

Latest Reply

mwoods
New Contributor III

09-22-2023 12:09:59 PM

2 kudos

@Kaniz_Fatma- quick update - managed to find the cause. It's neither of the above, it's a bug in the DataBricks 14.0 runtime. I had switched back to the 13.3 LTS runtime, and that is what caused the error to disappear.As soon as I try to read directl...

2 kudos

09-22-2023 12:09:59 PM

2 More Replies

by Sahha_Krishna • New Contributor

07-27-2023 2:42:19 PM

779 Views
1 replies
0 kudos

Unable to start Cluster in Databricks because of `BOOTSTRAP_TIMEOUT`

Unable to start the Cluster in AWS-hosted Databricks because of the below reason{ "reason": { "code": "BOOTSTRAP_TIMEOUT", "parameters": { "databricks_error_message": "[id: InstanceId(i-0634ee9c2d420edc8), status: INSTANCE_INITIALIZIN...

Data Engineering

AWS

779 Views
1 replies
0 kudos

07-27-2023 2:42:19 PM

View Replies

Latest Reply

Harrison_S
New Contributor III

09-22-2023 10:24:19 AM

0 kudos

Hi Sahha, It may be a DNS issue if that wasn't rolled back, can you check the documentation on troubleshooting guide and see if these configurations were rolled back as well? https://docs.databricks.com/en/administration-guide/cloud-configurations/aw...

0 kudos

09-22-2023 10:24:19 AM

by Vaibhav1000 • New Contributor II

09-22-2023 2:55:25 AM

648 Views
1 replies
0 kudos

Spark streaming is not able to assume role

Hello,I am trying to assume an IAM role in spark streaming with "s3-sqs" format. It is giving a 403 error. The code is provided below:spark.readStream .format("s3-sqs") .option("fileFormat", "json") .option("roleArn", roleArn) .option("compressi...

Data Engineering

648 Views
1 replies
0 kudos

09-22-2023 2:55:25 AM

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

09-22-2023 3:37:21 AM

0 kudos

Hi @Vaibhav1000, The 403 error you're encountering with the "s3-sqs" format could be due to incorrect configuration or insufficient permissions. You can try the following steps to resolve this issue: 1. Check your IAM role permissions: Ensure that t...

0 kudos

09-22-2023 3:37:21 AM

by rpaschenko • New Contributor II

09-21-2023 4:21:16 AM

1571 Views
2 replies
2 kudos

Databricks Job issue (run was cancelled bydatabricks and spark UI is not available after 10mins)

Hi!We had an issue on 09/19/2023 - we launched job, run was started, but after 10mins it was cancelled with no reasons. The spark ui is not available (which probably means that claster has not been started at all) and I don’t see any logs even.Could ...

Data Engineering

1571 Views
2 replies
2 kudos

09-21-2023 4:21:16 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

09-22-2023 12:39:46 AM

2 kudos

Was it a one time only error or a recurring one?For the former, I'd check if your vCPU quota was not exceeded, or perhaps there was a temporary issue with the cloud provider,... Could be a lot of things (lots of moving parts under the hood).For the ...

2 kudos

09-22-2023 12:39:46 AM

1 More Replies

by Shivani_DB • New Contributor

09-21-2023 4:31:14 AM

601 Views
1 replies
0 kudos

Performance Issues experienced when cluster was upgraded from 10.4 LTS to 11.3 LTS

Performance Issues experienced when cluster was upgraded from 10.4 LTS to 11.3 LTS , The notebooks were running fine with the existing cluster. Soon after the upgrade the notebooks started to fail or exhaust memory executors etc . Any suggestions?

Data Engineering

601 Views
1 replies
0 kudos

09-21-2023 4:31:14 AM

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

09-22-2023 12:30:58 AM

0 kudos

Hi @Shivani_DB, The possible reasons for the issues might be: - Incompatible dependencies with Spark elastic jar - Class not found error: org/sparkproject/guava/cache/CacheLoader Suggestions to resolve the issues: - Check for incompatible dependenci...

0 kudos

09-22-2023 12:30:58 AM

by kmorton • New Contributor

09-21-2023 6:04:48 AM

956 Views
1 replies
0 kudos

Autoloader start and end date for ingestion

I have been searching for a way to set up backfilling using autoloader with an option to set a "start_date" or "end_date". I am working on ingesting a massive file system but I don't want to ingest everything from the beginning. I have a start date t...

Data Engineering

autoloader

backfill

ETL

ingestion

956 Views
1 replies
0 kudos

09-21-2023 6:04:48 AM

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

09-22-2023 12:26:00 AM

0 kudos

Hi @kmorton, Databricks Auto Loader does support backfilling to capture any missed files with file notifications. This is achieved by using the cloudFiles.backfillInterval option to schedule regular backfills over your data. However, it does not spec...

0 kudos

09-22-2023 12:26:00 AM

by robbie1 • New Contributor

05-23-2022 7:19:13 AM

1992 Views
4 replies
2 kudos

Resolved! Can't login anymore: Invalid email address or password

Since last Friday i cannot access databricks community any more, which is kinda annoying since my Bachelors dissertation is due in a couple of weeks. I always get the message: "Invalid email address or password Note: Emails/usernames are case-sensiti...

Data Engineering

1992 Views
4 replies
2 kudos

05-23-2022 7:19:13 AM

View Replies

Latest Reply

nnaincy
New Contributor III

09-21-2023 10:30:42 PM

2 kudos

Hi Team,My community edition databricks cred are locked. I am doing very important project. Please help me resolve the issue Please try that it not gets locked in future as well.Email used for login @Kaniz_Fatma @Sujitha I have sent a email to commu...

2 kudos

09-21-2023 10:30:42 PM

3 More Replies

by aicd_de • New Contributor III

09-19-2023 5:01:46 PM

1815 Views
4 replies
2 kudos

Unity Catalog - Writing to PNG Files to Cluster and then using dbutils.fs.cp to send to Azure ADLS2

Hi AllLooking to get some help. We are on Unity Catalog in Azure. We have a requirement to use Python to write out PNG files (several) via Matplotlib and then drop those into an ADLS2 Bucket. With Unity Catalog, we can easily use dbutils.fs.cp or fs....

Data Engineering

1815 Views
4 replies
2 kudos

09-19-2023 5:01:46 PM

View Replies

Latest Reply

aicd_de
New Contributor III

09-21-2023 7:51:32 PM

2 kudos

Hmm I read something different - someone else had this error because they used a shared cluster - apparently it does not happen on a single user cluster. All those settings are already done and I am a fully admin.

2 kudos

09-21-2023 7:51:32 PM

3 More Replies

User

Count

1602

737

348

285

247

Databricks Community

Forum Posts

Liquid clustering with structured streaming pyspark

Resolved! Cluster terminated. Reason: Npip Tunnel Setup Failure

Resolved! Call python image function in pyspark

What is my subscription plan?

Trying to vacuum a table that is constantly being "createdOrReplaced"

Resolved! DLT pipeline

Kafka unable to read client.keystore.jks.

Resolved! Spark readStream kafka.ssl.keystore.location abfss path

Unable to start Cluster in Databricks because of `BOOTSTRAP_TIMEOUT`

Spark streaming is not able to assume role

Databricks Job issue (run was cancelled bydatabricks and spark UI is not available after 10mins)

Performance Issues experienced when cluster was upgraded from 10.4 LTS to 11.3 LTS

Autoloader start and end date for ingestion

Resolved! Can't login anymore: Invalid email address or password

Unity Catalog - Writing to PNG Files to Cluster and then using dbutils.fs.cp to send to Azure ADLS2

Getting com.databricks.client.jdbc.Driver is not f...

Unit Testing DLT Pipelines

Retrieve job-level parameters in spark_python_task...

Cannot pass arrays to spark.sql() using named para...

unity catalog with external table and column maski...