cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Ketna
by New Contributor
  • 1441 Views
  • 2 replies
  • 1 kudos

Resolved! I have included SparkJDBC42.jar in my war file. but when i start my application using tomcat, i get EOFExceptions from log4j classes. I need help with what is causing this and How to resolve this issue? Please help.

Below is part of the exceptions I am getting:org.apache.catalina.startup.ContextConfig processAnnotationsJarSEVERE: Unable to process Jar entry [com/simba/spark/jdbc42/internal/apache/logging/log4j/core/pattern/ThreadIdPatternConverter.class] from Ja...

  • 1441 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @Ketna Khalasi​ , For Log4j related queries, please go through this post.

  • 1 kudos
1 More Replies
anthony_cros
by New Contributor
  • 3266 Views
  • 2 replies
  • 0 kudos

Resolved! How to publish a notebook in order to share its URL, as a Premium Plan user?

Hi,I'm a Premium Plan user and am trying to share a notebook via URL.The link at https://docs.databricks.com/notebooks/notebooks-manage.html#publish-a-notebook states: "If you’re using Community Edition, you can publish a notebook so that you can sha...

  • 3266 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hello @Anthony Cros​ - My name is Piper, and I'm a moderator for Databricks. Welcome and thank you for your question. We will give the members some time to answer your question. If needed, we will circle back around later.

  • 0 kudos
1 More Replies
jpwp
by New Contributor III
  • 4567 Views
  • 2 replies
  • 1 kudos

Resolved! Adding a dependent library to a Job task permanently adds it to the entire cluster?

Why does adding a dependent library to a Job task also permanently add it to the entire cluster?I am using python wheels, and even when I remove the dependent library from a Job task, the wheel is still part of the cluster configuration.If I then upd...

  • 4567 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

If you have configured a library to install on all clusters automatically, or you select an existing terminated cluster that has libraries installed, the job execution does not wait for library installation to complete. If a job requires a specific l...

  • 1 kudos
1 More Replies
Chris_Shehu
by Valued Contributor III
  • 5305 Views
  • 2 replies
  • 1 kudos

Resolved! Are there Visio stencils for Databricks?

I need to create some Visio's to show different processes in Databricks. Microsoft offers a Azure pack but it doesn't include Databricks.

  • 5305 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @Christopher Shehu​ , You can try this. It has the stencils for Databricks as well as Databricks Azure.

  • 1 kudos
1 More Replies
Situs_UG300_Off
by New Contributor
  • 336 Views
  • 0 replies
  • 0 kudos

res.cloudinary.com

Link UG300 ada menyediakan depo tipe e- wallet yang dapat dipakai unyuk dapat melaksanakan pembelian ataupun top up saldo ke e- wallet tujuan yang telah ada di dalam web. Adanya berita gembira buat kalian yang tidak mempunyai rekening bank, Jika kali...

  • 336 Views
  • 0 replies
  • 0 kudos
frank26364
by New Contributor III
  • 28211 Views
  • 7 replies
  • 4 kudos

Resolved! Export Databricks results to Blob in a csv file

Hello everyone,I want to export my data from Databricks to the blob. My Databricks commands select some pdf from my blob, run Form Recognizer and export the output results in my blob. Here is the code: %pip install azure.storage.blob %pip install...

  • 28211 Views
  • 7 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

@Francis Bouliane​ - Thank you for sharing the solution.

  • 4 kudos
6 More Replies
frank26364
by New Contributor III
  • 11888 Views
  • 4 replies
  • 0 kudos

Resolved! Command prompt won't let me type the Databricks token

Hi, I am trying to set up Databricks CLI using the command prompt on my computer. I downloaded the Python 3.9 app and successfully ran the command pip install databricks-cliWhen I try to set up the Databricks token, I am able to type my Databricks Ho...

  • 11888 Views
  • 4 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hey there! You're on a roll today! Thanks for letting us know.

  • 0 kudos
3 More Replies
William_Scardua
by Valued Contributor
  • 4319 Views
  • 2 replies
  • 0 kudos

Resolved! how to Intercept Spark Listener with Pyspark ?

hi guys,​It`s possible to intercept Spark Listener with Pyspark to collect indicator like shuffle, skew ratio, etc ?

  • 4319 Views
  • 2 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

interesting question.I know that you can use the SparkListener to collect info, f.e. here.Mind that the class is written in Scala, so my first thought was that it is not possible in python/pyspark.But SO says it is possible, but with a lot of overhea...

  • 0 kudos
1 More Replies
BorislavBlagoev
by Valued Contributor III
  • 2289 Views
  • 2 replies
  • 4 kudos

Resolved! Converting dataframe to delta.

Is it possible to convert the dataframe to a delta table without saving the dataframe on the storage?

  • 2289 Views
  • 2 replies
  • 4 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 4 kudos

no, it will only be a delta table when writing it.

  • 4 kudos
1 More Replies
bluetail
by Contributor
  • 13482 Views
  • 6 replies
  • 5 kudos

Resolved! ModuleNotFoundError: No module named 'mlflow' when running a notebook

I am running a notebook on the Coursera platform.my configuration file, Classroom-Setup, looks like this:%python   spark.conf.set("com.databricks.training.module-name", "deep-learning") spark.conf.set("com.databricks.training.expected-dbr", "6.4")   ...

  • 13482 Views
  • 6 replies
  • 5 kudos
Latest Reply
User16753724663
Valued Contributor
  • 5 kudos

Hi @Maria Bruevich​ ,From the error description, it looks like the mlflow library is not present. You can use ML cluster as these type of cluster already have mlflow library. Please check the below document:https://docs.databricks.com/release-notes/r...

  • 5 kudos
5 More Replies
DanVartanian
by New Contributor II
  • 4935 Views
  • 4 replies
  • 1 kudos

Resolved! Help trying to calculate a percentage

The image below shows what my source data is (HAVE) and what I'm trying to get to (WANT).I want to be able to calculate the percentage of bad messages (where formattedMessage = false) by source and date.I'm not sure how to achieve this in DatabricksS...

havewant
  • 4935 Views
  • 4 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

you could use a windows function over source and date with a sum of messagecount. This gives you the total per source/date repeated on every line.Then apply a filter on formattedmessage == false and divide messagecount by the sum above.

  • 1 kudos
3 More Replies
SettlerOfCatan
by New Contributor
  • 2094 Views
  • 0 replies
  • 0 kudos

Access data within the blob storage without downloading

Our customer is using Azure’s blob storage service to save big files so that we can work with them using an Azure online service, like Databricks.We want to read and work with these files with a computing resource obtained by Azure directly without d...

blob-storage Azure-ML fileytypes blob
  • 2094 Views
  • 0 replies
  • 0 kudos
Azure_Data_Eng1
by New Contributor
  • 455 Views
  • 0 replies
  • 0 kudos

data=[['x', 20220118, 'FALSE', 3],['x', 20220118, 'TRUE', 97],['x', 20220119, 'FALSE', 1],['x'...

data=[['x', 20220118, 'FALSE', 3],['x', 20220118, 'TRUE', 97],['x', 20220119, 'FALSE', 1],['x', 20220119, 'TRUE', 49],['Y', 20220118, 'FALSE', 100],['Y', 20220118, 'TRUE', 900],['Y', 20220119, 'FALSE', 200],['Y', 20220119, 'TRUE', 800]]df=spark.creat...

  • 455 Views
  • 0 replies
  • 0 kudos
prasadvaze
by Valued Contributor II
  • 4936 Views
  • 8 replies
  • 2 kudos

Resolved! SQL endpoint is unable to connect to external hive metastore ( Azure databricks)

Using Azure databricks, I have set up SQL Endpoint with the connection details that match with global init script. I am able to browse tables from regular cluster in Data Engineering module but i get below error when trying a query using SQL Endpoint...

  • 4936 Views
  • 8 replies
  • 2 kudos
Latest Reply
prasadvaze
Valued Contributor II
  • 2 kudos

@Prabakar Ammeappin​  @Kaniz Fatma​  Also I found out that after delta table is created in external metastore (and the table data resides in ADLS) then in the sql end point settings I do not need to provide ADLS connection details. I only provided...

  • 2 kudos
7 More Replies
Soma
by Valued Contributor
  • 2947 Views
  • 3 replies
  • 1 kudos

Resolved! AutoLoader with Custom Queue

Hi Everyone can someone help with creating custom queue for auto loader as given here as default FlushwithClose event is not getting created when my data is uploaded to blob as given in azure DB docscloudFiles.queueNameThe name of the Azure queue. If...

  • 2947 Views
  • 3 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

you need to setup notification service for blob/adls like here https://docs.databricks.com/spark/latest/structured-streaming/auto-loader-gen2.html#cloud-resource-managementsetUpNotificationServices will return queue name which later can be used in au...

  • 1 kudos
2 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels