Data Engineering

Forum Posts

Sorted by:

by Maverick1 • Valued Contributor II

11-10-2021 8:50:25 PM

7564 Views
14 replies
7 kudos

Resolved! How to deploy a databricks managed workspace model to sagemaker from databricks notebook

I wanted to deploy a registered model present in databricks managed MLFlow to a sagemaker via databricks notebook?As of now, it is not able to run mlflow sagemaker build-and-push container command directly. What all configurations or steps needed to ...

Data Engineering

7564 Views
14 replies
7 kudos

11-10-2021 8:50:25 PM

View Replies

Latest Reply

Kaniz
Community Manager

01-25-2022 6:32:31 AM

7 kudos

Hi @Saurabh Verma , Yes, it's the right process. Thanks.

7 kudos

01-25-2022 6:32:31 AM

13 More Replies

by study_community • New Contributor III

01-22-2022 11:51:16 PM

1346 Views
2 replies
3 kudos

Resolved! Error creating delta table over an existing delta schema

I created a delta table through a cluster over a dbfs location .Schema :create external table tmp_db.delta_data(delta_id int ,delta_name varchar(20) , delta_variation decimal(10,4) ,delta_incoming_timestamp timestamp,delta_date date generated always ...

Data Engineering

1346 Views
2 replies
3 kudos

01-22-2022 11:51:16 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

01-25-2022 6:28:51 AM

3 kudos

varchartype is only available as from spark 3.1 I think.https://spark.apache.org/docs/latest/sql-ref-datatypes.htmlThe link is for spark 3.2, and 3.1 also has varchartype. So can you check your spark version?Also if the table definition still exists...

3 kudos

01-25-2022 6:28:51 AM

1 More Replies

by sh_abrishami_ie • New Contributor II

01-12-2022 1:37:42 AM

3436 Views
3 replies
3 kudos

Resolved! Driver is up but is not responsive, likely due to GC.

Hi,I have a problem with writing an excel file into the mounted file.after 10 mins I see the Driver is up but is not responsive, likely due to GC on the log events.I'm using the following script:df.repartition(1).write .format("com.crealytics.spark....

Data Engineering

3436 Views
3 replies
3 kudos

01-12-2022 1:37:42 AM

View Replies

Latest Reply

Kaniz
Community Manager

01-25-2022 5:45:58 AM

3 kudos

Hi @Shokoufeh Abrishami , Can you show the error stack or the logs?

3 kudos

01-25-2022 5:45:58 AM

2 More Replies

by Soma • Valued Contributor

01-11-2022 6:30:19 AM

1385 Views
7 replies
0 kudos

Resolved! Queries regarding workspace Migration to Premium

We are planning to migrate from standard to premium workspaceWe need to know if below artifacts will be maintainedneed to check on streaming Job DowntimeAccess token DBFS Access Production Cluster /JobsCluster ID Job ID and other properties like URL ...

Data Engineering

1385 Views
7 replies
0 kudos

01-11-2022 6:30:19 AM

View Replies

Latest Reply

Soma
Valued Contributor

01-25-2022 4:36:57 AM

0 kudos

hi @Kaniz Fatma then I can assume there wont be any impact on metastore and all the metadata(table definition,schema ) will be available post upgradation

0 kudos

01-25-2022 4:36:57 AM

6 More Replies

by Ketna • New Contributor

01-13-2022 11:14:36 AM

921 Views
2 replies
1 kudos

Resolved! I have included SparkJDBC42.jar in my war file. but when i start my application using tomcat, i get EOFExceptions from log4j classes. I need help with what is causing this and How to resolve this issue? Please help.

Below is part of the exceptions I am getting:org.apache.catalina.startup.ContextConfig processAnnotationsJarSEVERE: Unable to process Jar entry [com/simba/spark/jdbc42/internal/apache/logging/log4j/core/pattern/ThreadIdPatternConverter.class] from Ja...

Data Engineering

921 Views
2 replies
1 kudos

01-13-2022 11:14:36 AM

View Replies

Latest Reply

Kaniz
Community Manager

01-24-2022 4:17:17 PM

1 kudos

Hi @Ketna Khalasi , For Log4j related queries, please go through this post.

1 kudos

01-24-2022 4:17:17 PM

1 More Replies

by anthony_cros • New Contributor

01-11-2022 8:13:24 AM

2507 Views
2 replies
0 kudos

Resolved! How to publish a notebook in order to share its URL, as a Premium Plan user?

Hi,I'm a Premium Plan user and am trying to share a notebook via URL.The link at https://docs.databricks.com/notebooks/notebooks-manage.html#publish-a-notebook states: "If you’re using Community Edition, you can publish a notebook so that you can sha...

Data Engineering

2507 Views
2 replies
0 kudos

01-11-2022 8:13:24 AM

View Replies

Latest Reply

Anonymous
Not applicable

01-11-2022 4:50:14 PM

0 kudos

Hello @Anthony Cros - My name is Piper, and I'm a moderator for Databricks. Welcome and thank you for your question. We will give the members some time to answer your question. If needed, we will circle back around later.

0 kudos

01-11-2022 4:50:14 PM

1 More Replies

by jpwp • New Contributor III

01-10-2022 5:07:28 PM

3505 Views
2 replies
1 kudos

Resolved! Adding a dependent library to a Job task permanently adds it to the entire cluster?

Why does adding a dependent library to a Job task also permanently add it to the entire cluster?I am using python wheels, and even when I remove the dependent library from a Job task, the wheel is still part of the cluster configuration.If I then upd...

Data Engineering

3505 Views
2 replies
1 kudos

01-10-2022 5:07:28 PM

View Replies

Latest Reply

Kaniz
Community Manager

01-24-2022 2:54:05 PM

1 kudos

If you have configured a library to install on all clusters automatically, or you select an existing terminated cluster that has libraries installed, the job execution does not wait for library installation to complete. If a job requires a specific l...

1 kudos

01-24-2022 2:54:05 PM

1 More Replies

by Chris_Shehu • Valued Contributor III

01-14-2022 5:36:32 AM

3631 Views
2 replies
1 kudos

Resolved! Are there Visio stencils for Databricks?

I need to create some Visio's to show different processes in Databricks. Microsoft offers a Azure pack but it doesn't include Databricks.

Data Engineering

3631 Views
2 replies
1 kudos

01-14-2022 5:36:32 AM

View Replies

Latest Reply

Kaniz
Community Manager

01-24-2022 2:41:37 PM

1 kudos

Hi @Christopher Shehu , You can try this. It has the stencils for Databricks as well as Databricks Azure.

1 kudos

01-24-2022 2:41:37 PM

1 More Replies

by Situs_UG300_Off • New Contributor

01-24-2022 4:22:57 AM

199 Views
0 replies
0 kudos

res.cloudinary.com

Link UG300 ada menyediakan depo tipe e- wallet yang dapat dipakai unyuk dapat melaksanakan pembelian ataupun top up saldo ke e- wallet tujuan yang telah ada di dalam web. Adanya berita gembira buat kalian yang tidak mempunyai rekening bank, Jika kali...

Data Engineering

199 Views
0 replies
0 kudos

01-24-2022 4:22:57 AM

by frank26364 • New Contributor III

12-20-2021 5:38:54 AM

6947 Views
7 replies
4 kudos

Resolved! Export Databricks results to Blob in a csv file

Hello everyone,I want to export my data from Databricks to the blob. My Databricks commands select some pdf from my blob, run Form Recognizer and export the output results in my blob. Here is the code: %pip install azure.storage.blob %pip install...

Data Engineering

6947 Views
7 replies
4 kudos

12-20-2021 5:38:54 AM

View Replies

Latest Reply

Anonymous
Not applicable

01-21-2022 12:01:01 PM

4 kudos

@Francis Bouliane - Thank you for sharing the solution.

4 kudos

01-21-2022 12:01:01 PM

6 More Replies

by frank26364 • New Contributor III

01-20-2022 5:36:35 AM

6392 Views
4 replies
0 kudos

Resolved! Command prompt won't let me type the Databricks token

Hi, I am trying to set up Databricks CLI using the command prompt on my computer. I downloaded the Python 3.9 app and successfully ran the command pip install databricks-cliWhen I try to set up the Databricks token, I am able to type my Databricks Ho...

Data Engineering

6392 Views
4 replies
0 kudos

01-20-2022 5:36:35 AM

View Replies

Latest Reply

Anonymous
Not applicable

01-21-2022 12:03:47 PM

0 kudos

Hey there! You're on a roll today! Thanks for letting us know.

0 kudos

01-21-2022 12:03:47 PM

3 More Replies

by William_Scardua • Valued Contributor

01-19-2022 12:12:37 PM

3434 Views
2 replies
0 kudos

Resolved! how to Intercept Spark Listener with Pyspark ?

hi guys,It`s possible to intercept Spark Listener with Pyspark to collect indicator like shuffle, skew ratio, etc ?

Data Engineering

3434 Views
2 replies
0 kudos

01-19-2022 12:12:37 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

01-21-2022 7:04:02 AM

0 kudos

interesting question.I know that you can use the SparkListener to collect info, f.e. here.Mind that the class is written in Scala, so my first thought was that it is not possible in python/pyspark.But SO says it is possible, but with a lot of overhea...

0 kudos

01-21-2022 7:04:02 AM

1 More Replies

by BorislavBlagoev • Valued Contributor III

01-21-2022 2:56:33 AM

1554 Views
2 replies
2 kudos

Resolved! Converting dataframe to delta.

Is it possible to convert the dataframe to a delta table without saving the dataframe on the storage?

Data Engineering

1554 Views
2 replies
2 kudos

01-21-2022 2:56:33 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

01-21-2022 6:33:24 AM

2 kudos

no, it will only be a delta table when writing it.

2 kudos

01-21-2022 6:33:24 AM

1 More Replies

by bluetail • Contributor

01-16-2022 7:20:16 AM

11441 Views
6 replies
5 kudos

Resolved! ModuleNotFoundError: No module named 'mlflow' when running a notebook

I am running a notebook on the Coursera platform.my configuration file, Classroom-Setup, looks like this:%python spark.conf.set("com.databricks.training.module-name", "deep-learning") spark.conf.set("com.databricks.training.expected-dbr", "6.4") ...

Data Engineering

11441 Views
6 replies
5 kudos

01-16-2022 7:20:16 AM

View Replies

Latest Reply

User16753724663
Valued Contributor

01-17-2022 4:21:24 AM

5 kudos

Hi @Maria Bruevich ,From the error description, it looks like the mlflow library is not present. You can use ML cluster as these type of cluster already have mlflow library. Please check the below document:https://docs.databricks.com/release-notes/r...

5 kudos

01-17-2022 4:21:24 AM

5 More Replies

by DanVartanian • New Contributor II

01-20-2022 4:05:33 AM

3157 Views
4 replies
1 kudos

Resolved! Help trying to calculate a percentage

The image below shows what my source data is (HAVE) and what I'm trying to get to (WANT).I want to be able to calculate the percentage of bad messages (where formattedMessage = false) by source and date.I'm not sure how to achieve this in DatabricksS...

Data Engineering

3157 Views
4 replies
1 kudos

01-20-2022 4:05:33 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

01-21-2022 12:28:02 AM

1 kudos

you could use a windows function over source and date with a sum of messagecount. This gives you the total per source/date repeated on every line.Then apply a filter on formattedmessage == false and divide messagecount by the sum above.

1 kudos

01-21-2022 12:28:02 AM

3 More Replies

User

Count

1601

736

343

284

246

Databricks

Forum Posts

Resolved! How to deploy a databricks managed workspace model to sagemaker from databricks notebook

Resolved! Error creating delta table over an existing delta schema

Resolved! Driver is up but is not responsive, likely due to GC.

Resolved! Queries regarding workspace Migration to Premium

Resolved! I have included SparkJDBC42.jar in my war file. but when i start my application using tomcat, i get EOFExceptions from log4j classes. I need help with what is causing this and How to resolve this issue? Please help.

Resolved! How to publish a notebook in order to share its URL, as a Premium Plan user?

Resolved! Adding a dependent library to a Job task permanently adds it to the entire cluster?

Resolved! Are there Visio stencils for Databricks?

res.cloudinary.com

Resolved! Export Databricks results to Blob in a csv file

Resolved! Command prompt won't let me type the Databricks token

Resolved! how to Intercept Spark Listener with Pyspark ?

Resolved! Converting dataframe to delta.

Resolved! ModuleNotFoundError: No module named 'mlflow' when running a notebook

Resolved! Help trying to calculate a percentage

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...