cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

ADataBricksSP
by New Contributor
  • 763 Views
  • 1 replies
  • 1 kudos

Resolved! Community Edition SignUp option not visible

I would like to do some practical scenarios testing thro communication edition. For that, i check signup for community edition , but this option is not visible/shows in login page itself whereas shows only sign-in .. Can anyone help?

  • 763 Views
  • 1 replies
  • 1 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 1 kudos

Hi @ADataBricksSP Please find below URL for signup as new user please all the required details click continue then you will find the option to signup as community edition -Databricks Community Signup Link 

  • 1 kudos
danatsafe
by New Contributor
  • 3716 Views
  • 3 replies
  • 0 kudos

Amazon returns a 403 error code when trying to access an S3 Bucket

Hey! So far I have followed along with the Configure S3 access with instance profiles article to grant my cluster access to an S3 bucket. I have also made sure to disable IAM role passthrough on the cluster. Upon querying the bucket through a noteboo...

  • 3716 Views
  • 3 replies
  • 0 kudos
Latest Reply
winojoe
New Contributor III
  • 0 kudos

I had the same issue and I found a solutionFor me, the permission problems only exist when the Cluster's (compute's) Access mode is "Shared No Isolation".  When the Access Mode is either "Shared" or "Single User" then the IAM configuration seems to a...

  • 0 kudos
2 More Replies
cmditch
by New Contributor II
  • 1273 Views
  • 2 replies
  • 0 kudos

Intermittent secret resolution error service fault in GCP

Experiencing the error below in GCP when starting a cluster (both manually and in jobs). It's causing our ETL and other production jobs to fail multiple times a week. Its intermittent, but requires manual intervention to retry scheduled jobs. run fai...

  • 1273 Views
  • 2 replies
  • 0 kudos
Latest Reply
cmditch
New Contributor II
  • 0 kudos

Thanks @Kaniz_Fatma . 1 and 2 are confirmed fine. I would imagine 3 to not result in intermittent failures if it were a config issue, but perhaps it's another network related issue that would be susceptible to intermittent failure.The link you provid...

  • 0 kudos
1 More Replies
Starki
by New Contributor II
  • 1938 Views
  • 3 replies
  • 0 kudos

StreamingQueryListener onQueryTerminated in Databricks Job

I am defining a StreamingQueryListener that collects metrics on my Spark Structured Streaming tasks and sends them to a Prometheus Pushgateway.When the job is terminated, I want to use the onQueryTerminated to cleanup the metrics for each job from th...

Data Engineering
onQueryTerminated
StreamingQueryListener
  • 1938 Views
  • 3 replies
  • 0 kudos
Latest Reply
shan_chandra
Esteemed Contributor
  • 0 kudos

@Starki  - Per documentation, StreamingQueryListener.onQueryTerminated is called when the query is stopped, e.g., StreamingQuery.stop.and each of these Python observable APIs work asynchronously. https://www.databricks.com/blog/2022/05/27/how-to-moni...

  • 0 kudos
2 More Replies
Deloitte_DS
by New Contributor II
  • 2292 Views
  • 2 replies
  • 0 kudos

Unable to install poppler-utils

Hi,I'm trying to install system level package "Poppler-utils" for the cluster. I added the following line to the init.sh script.sudo apt-get -f -y install poppler-utilsI got the following error: PDFInfoNotInstalledError: Unable to get page count. Is ...

  • 2292 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Deloitte_DS , You can use an init script to install system-level packages at the cluster level in Databricks. An init script is a shell script that runs during the startup of each cluster node before the Spark driver or worker JVM starts. You can...

  • 0 kudos
1 More Replies
User16765131552
by Contributor III
  • 1466 Views
  • 2 replies
  • 0 kudos

Resolved! Disable welcome emails

Is it possible to disable the welcome emails that go to users when added to the workspace?

  • 1466 Views
  • 2 replies
  • 0 kudos
Latest Reply
User16765131552
Contributor III
  • 0 kudos

I have found that it is possible to suppress welcome emails if users are added via the API using flags.

  • 0 kudos
1 More Replies
Ravikumashi
by Contributor
  • 1295 Views
  • 3 replies
  • 1 kudos

maven libraries installation issue on 11.3/12.2 LTS

we've encountered issue while attempting to install Maven libraries on Databricks clusters version 11.3 LTS. Specifically, we are encountering SSL handshake errors during the installation process. It's worth noting that these same libraries install w...

  • 1295 Views
  • 3 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @Ravikumashi, The SSL handshake errors you're encountering during the Maven library installation on Databricks clusters version 11.3 LTS might be due to transient Maven issues or conflicts with existing libraries. There are no specific solutions p...

  • 1 kudos
2 More Replies
Loki
by New Contributor III
  • 2312 Views
  • 4 replies
  • 1 kudos

Resolved! Accessing ADLS Gen 2 Raw Files with UC ?

We are using service principal to access data from raw files such as json, CSV .I saw a video suggesting that it could be done via unity catalog as well.Could someone comment on this please ?

  • 2312 Views
  • 4 replies
  • 1 kudos
Latest Reply
donkyhotes
New Contributor II
  • 1 kudos

@Loki wrote:We are using service principal to access data from raw files such as json, CSV .Car GamesI saw a video suggesting that it could be done via unity catalog as well.Could someone comment on this please ?That's great! Service principals are a...

  • 1 kudos
3 More Replies
AndLuffman
by New Contributor II
  • 1728 Views
  • 5 replies
  • 1 kudos

QRY Results incorrect but Exported data is OK

I ran a query "Select * from fact_Orders".     This presented a lot of garbage,  The correct column headers, but the contents were extremely random, e.g.  blanks in the key column, VAT rates of 12282384234E-45  . When I export to CSV , it presents fi...

  • 1728 Views
  • 5 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @AndLuffman, The issue you're experiencing might be related to the limitations of the Databricks interface when dealing with large datasets with many columns. The interface has a limit on the number of rows it can display at once, which can lead t...

  • 1 kudos
4 More Replies
Erik_L
by Contributor II
  • 1273 Views
  • 2 replies
  • 1 kudos

Structured Streaming from TimescaleDB?

I realize that the best practice would be to integrate our service with Kafka as a streaming source for Databricks, but given that the service already stores data into TimescaleDB, how can I stream data from TimescaleDB into DBX? Debezium doesn't wor...

  • 1273 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @Erik_L, Currently, there is no direct way to stream data from TimescaleDB into Databricks. However, there are a couple of ways you can approach this: 1. **Kafka Integration**: You can integrate Kafka into your service for consuming data. Kafka i...

  • 1 kudos
1 More Replies
kg6ka
by New Contributor
  • 1729 Views
  • 2 replies
  • 1 kudos

Is it possible to do without the github token and integration?

Hey, guys.I have a question, so, I have databricks jobs in workflow that are linked to my databricks repo, which contains the necessary scripts for one or another job. That is, the job is linked to the databricks repo.The main code is developed in gi...

  • 1729 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @kg6ka, Based on the provided information, you are trying to push code from Github to Databricks repo using Databricks REST API. However, the error message you are getting indicates that you are missing Git provider credentials.  According to the...

  • 1 kudos
1 More Replies
romangehrn
by New Contributor
  • 515 Views
  • 0 replies
  • 0 kudos

speed issue DBR 13+ for R

I got a notebook running on DBR 12.2 with the following R code: install.packages("microbenchmark") install.packages("furrr") library(microbenchmark) library(tidyverse) # example tibble df_test <- tibble(id = 1:100000, street_raw = rep("Bahnhofs...

Data Engineering
DBR 13
performance slow
R
speed error
  • 515 Views
  • 0 replies
  • 0 kudos
210573
by New Contributor
  • 2095 Views
  • 4 replies
  • 2 kudos

Unable to stream from google pub/sub

I am trying to run below for subscribing to a pubsub but this code is throwing this exception java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/v2/DataSourceV2I have tried using all versions of https://mvnrepository.com/artifact/com.google...

  • 2095 Views
  • 4 replies
  • 2 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 2 kudos

Hi @210573 Databricks now start supporting pub/sub streaming natively now you can start using pubsub streaming for your use case for more info visit below official URL -PUB/SUB with Databricks 

  • 2 kudos
3 More Replies
vonjack
by New Contributor II
  • 1389 Views
  • 3 replies
  • 1 kudos

Resolved! How to unload a Jar for UDF without restart spark context?

In the scala notebook of databricks, I created a temporary function with a certain Jar and class name. Then I want to update the Jar. But without restart the context, I can not reload the new Jar, the temporary function always reuses the old classes....

  • 1389 Views
  • 3 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @vonjack, I'm sorry, but based on the information provided, there isn't a direct way to refresh the classes of a temporary function in Databricks without restarting the context.  Databricks does support updating JARs, including replacing a default...

  • 1 kudos
2 More Replies
sparkrookie
by New Contributor II
  • 1334 Views
  • 2 replies
  • 0 kudos

Structured Streaming Delta Table - Reading and writing from same table

Hi I have a structured streaming job that reads from a delta table "A" and pushes to another delta table "B".A Schema - group_key, id, timestamp, valueB Schema - group_key, watermark_timestamp, derived_valueOne requirement is that i need to get the m...

  • 1334 Views
  • 2 replies
  • 0 kudos
Latest Reply
KarenGalvez
New Contributor III
  • 0 kudos

Navigating the intricacies of structured streaming and Delta table operations on the same platform has been a stimulating yet demanding task. The community at Databricks has been instrumental in clarifying nuances. As I delve deeper, I'm reminded of ...

  • 0 kudos
1 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels