cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

mwoods
by New Contributor III
  • 9158 Views
  • 3 replies
  • 2 kudos

Resolved! Spark readStream kafka.ssl.keystore.location abfss path

Similar to https://community.databricks.com/t5/data-engineering/kafka-unable-to-read-client-keystore-jks/td-p/23301 - the documentation (https://learn.microsoft.com/en-gb/azure/databricks/structured-streaming/kafka#use-ssl-to-connect-azure-databricks...

  • 9158 Views
  • 3 replies
  • 2 kudos
Latest Reply
mwoods
New Contributor III
  • 2 kudos

@Kaniz_Fatma- quick update - managed to find the cause. It's neither of the above, it's a bug in the DataBricks 14.0 runtime. I had switched back to the 13.3 LTS runtime, and that is what caused the error to disappear.As soon as I try to read directl...

  • 2 kudos
2 More Replies
Sahha_Krishna
by New Contributor
  • 1001 Views
  • 1 replies
  • 0 kudos

Unable to start Cluster in Databricks because of `BOOTSTRAP_TIMEOUT`

Unable to start the Cluster in AWS-hosted Databricks because of the below reason{ "reason": { "code": "BOOTSTRAP_TIMEOUT", "parameters": { "databricks_error_message": "[id: InstanceId(i-0634ee9c2d420edc8), status: INSTANCE_INITIALIZIN...

  • 1001 Views
  • 1 replies
  • 0 kudos
Latest Reply
Harrison_S
New Contributor III
  • 0 kudos

Hi Sahha, It may be a DNS issue if that wasn't rolled back, can you check the documentation on troubleshooting guide and see if these configurations were rolled back as well? https://docs.databricks.com/en/administration-guide/cloud-configurations/aw...

  • 0 kudos
Vaibhav1000
by New Contributor II
  • 888 Views
  • 1 replies
  • 0 kudos

Spark streaming is not able to assume role

Hello,I am trying to assume an IAM role in spark streaming with "s3-sqs" format. It is giving a 403 error.  The code is provided below:spark.readStream .format("s3-sqs") .option("fileFormat", "json") .option("roleArn", roleArn) .option("compressi...

  • 888 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Vaibhav1000, The 403 error you're encountering with the "s3-sqs" format could be due to incorrect configuration or insufficient permissions. You can try the following steps to resolve this issue: 1. Check your IAM role permissions: Ensure that t...

  • 0 kudos
rpaschenko
by New Contributor II
  • 2236 Views
  • 2 replies
  • 2 kudos

Databricks Job issue (run was cancelled bydatabricks and spark UI is not available after 10mins)

Hi!We had an issue on 09/19/2023 - we launched job, run was started, but after 10mins it was cancelled with no reasons. The spark ui is not available (which probably means that claster has not been started at all) and I don’t see any logs even.Could ...

  • 2236 Views
  • 2 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

Was it a one time only error or a recurring one?For the former, I'd check if your vCPU quota was not exceeded, or perhaps there was a temporary issue with the cloud provider,...  Could be a lot of things (lots of moving parts under the hood).For the ...

  • 2 kudos
1 More Replies
Shivani_DB
by New Contributor
  • 741 Views
  • 1 replies
  • 0 kudos

Performance Issues experienced when cluster was upgraded from 10.4 LTS to 11.3 LTS

Performance Issues experienced when cluster was upgraded from 10.4 LTS to 11.3 LTS , The notebooks were running fine with the existing cluster. Soon after the upgrade the notebooks started to fail or exhaust memory executors etc . Any suggestions?

  • 741 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Shivani_DB,  The possible reasons for the issues might be: - Incompatible dependencies with Spark elastic jar - Class not found error: org/sparkproject/guava/cache/CacheLoader Suggestions to resolve the issues: - Check for incompatible dependenci...

  • 0 kudos
kmorton
by New Contributor
  • 1262 Views
  • 1 replies
  • 0 kudos

Autoloader start and end date for ingestion

I have been searching for a way to set up backfilling using autoloader with an option to set a "start_date" or "end_date". I am working on ingesting a massive file system but I don't want to ingest everything from the beginning. I have a start date t...

Data Engineering
autoloader
backfill
ETL
ingestion
  • 1262 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @kmorton, Databricks Auto Loader does support backfilling to capture any missed files with file notifications. This is achieved by using the cloudFiles.backfillInterval option to schedule regular backfills over your data. However, it does not spec...

  • 0 kudos
robbie1
by New Contributor
  • 2484 Views
  • 4 replies
  • 2 kudos

Resolved! Can't login anymore: Invalid email address or password

Since last Friday i cannot access databricks community any more, which is kinda annoying since my Bachelors dissertation is due in a couple of weeks. I always get the message: "Invalid email address or password Note: Emails/usernames are case-sensiti...

  • 2484 Views
  • 4 replies
  • 2 kudos
Latest Reply
nnaincy
New Contributor III
  • 2 kudos

Hi Team,My community edition databricks cred are locked. I am doing very important project. Please help me resolve the issue Please try that it not gets locked in future as well.Email used for login @Kaniz_Fatma @Sujitha I have sent a email to  commu...

  • 2 kudos
3 More Replies
aicd_de
by New Contributor III
  • 2318 Views
  • 4 replies
  • 2 kudos

Unity Catalog - Writing to PNG Files to Cluster and then using dbutils.fs.cp to send to Azure ADLS2

Hi AllLooking to get some help. We are on Unity Catalog in Azure. We have a requirement to use Python to write out PNG files (several) via Matplotlib and then drop those into an ADLS2 Bucket. With Unity Catalog, we can easily use dbutils.fs.cp or fs....

  • 2318 Views
  • 4 replies
  • 2 kudos
Latest Reply
aicd_de
New Contributor III
  • 2 kudos

Hmm I read something different - someone else had this error because they used a shared cluster - apparently it does not happen on a single user cluster. All those settings are already done and I am a fully admin.

  • 2 kudos
3 More Replies
njglen
by New Contributor III
  • 2542 Views
  • 4 replies
  • 0 kudos

Resolved! How do you enable verbose logging from with in Workspace Settings using Terraform?

I've searched in the databricks provider and online and couldn't find out if it is possible to set the `Verbose Audit Logs` to `enabled` using Terraform. Can anybody clarify if it is possible?

  • 2542 Views
  • 4 replies
  • 0 kudos
Latest Reply
qiaochu
New Contributor II
  • 0 kudos

The switch you're looking for is enableVerboseAuditLogs in databricks_workspace_confresource: { databricks_workspace_conf: { this: { custom_config: { enableIpAccessLists: true, enableVerboseAuditLogs: true, }, }, },

  • 0 kudos
3 More Replies
Ravikumashi
by Contributor
  • 1712 Views
  • 3 replies
  • 0 kudos

Resolved! Issue with Logging Spark Events to LogAnalytics after Upgrading to Databricks 11.3 LTS

We have recently been in the process of upgrading our Databricks clusters to version 11.3 LTS. As part of this upgrade, we have been working on integrating the logging of Spark events to LogAnalytics using the repository available at https://github.c...

  • 1712 Views
  • 3 replies
  • 0 kudos
Latest Reply
swethaNandan
New Contributor III
  • 0 kudos

Hi Ravikumashi, Can you please raise a ticket with us so that we can look deeper in to the issue

  • 0 kudos
2 More Replies
Manjula_Ganesap
by Contributor
  • 3483 Views
  • 4 replies
  • 1 kudos

Resolved! Delta Live Table pipeline failure - Table missing

Hi All,I set up a DLT pipeline to create 58 bronze tables and a subsequent DLT live table that joins the 58 bronze tables created in the first step. The pipeline runs successfully most times.My issue is that the pipeline fails once every 3/4 runs say...

Manjula_Ganesap_0-1692373291621.png Manjula_Ganesap_1-1692373340027.png
  • 3483 Views
  • 4 replies
  • 1 kudos
Latest Reply
Manjula_Ganesap
Contributor
  • 1 kudos

@jose_gonzalez @Kaniz_Fatma  - Missed to update the group on the fix. Reached out to Databricks to understand and it was identified that the threads call that i was making was causing the issue. After i removed it - i don't see it happening. 

  • 1 kudos
3 More Replies
Manjula_Ganesap
by Contributor
  • 1534 Views
  • 3 replies
  • 1 kudos

Delta Live Table (DLT) Initialization fails frequently

With no change in code, i've noticed that my DLT initialization fails and then an automatic rerun succeeds. Can someone help me understand this behavior. Thank you.  

Manjula_Ganesap_0-1694002699491.png
  • 1534 Views
  • 3 replies
  • 1 kudos
Latest Reply
Manjula_Ganesap
Contributor
  • 1 kudos

@jose_gonzalez  - Missed to update the group on the fix. Reached out to Databricks to understand and it was identified that the threads call that i was making was causing the issue. After i removed it - i don't see it happening. 

  • 1 kudos
2 More Replies
Kit
by New Contributor III
  • 3988 Views
  • 3 replies
  • 1 kudos

How to use checkpoint with change data feed

I have a scheduled job (running in continuous mode) with the following code``` ( spark .readStream .option("checkpointLocation", databricks_checkpoint_location) .option("readChangeFeed", "true") .option("startingVersion", VERSION + 1)...

  • 3988 Views
  • 3 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Kit Yam Tse​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers y...

  • 1 kudos
2 More Replies
editter
by New Contributor II
  • 1894 Views
  • 2 replies
  • 2 kudos

Unable to open a file in dbfs. Trying to move files from Google Bucket to Azure Blob Storage

Background:I am attempting to download the google cloud sdk on Databricks. The end goal is to be able to use the sdk to transfer files from a Google Cloud Bucket to Azure Blob Storage using Databricks. (If you have any other ideas for this transfer p...

Data Engineering
dbfs
Google Cloud SDK
pyspark
tarfile
  • 1894 Views
  • 2 replies
  • 2 kudos
Latest Reply
editter
New Contributor II
  • 2 kudos

Thanks you for the response!2 Questions:1. How would you create a cluster with the custom requirements for the google cloud sdk? Is that still possible for a Unity Catalog enabled cluster with Shared Access Mode?2. Is a script action the same as a cl...

  • 2 kudos
1 More Replies
AMadan
by New Contributor II
  • 4124 Views
  • 1 replies
  • 0 kudos

Date difference in Months

Hi Team,I am working on migration from Sql server to databricks environment.I encounter a challenge where Databricks and sql server giving different results for date difference function. Can you please help?--SQL SERVERSELECT DATEDIFF(MONTH , '2007-0...

  • 4124 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

While I was pretty sure it has to do with T-SQL not following ANSI standards, I could not actually tell you what exactly the difference is.  So I asked chatgpt and here we go:The difference between DATEDIFF(month, date1, date2) in T-SQL and ANSI SQL ...

  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels