Data Engineering

Forum Posts

Sorted by:

by User16826994223 • Databricks Employee

06-11-2021 8:04:54 AM

1156 Views
0 replies
1 kudos

can I delete any user without userid using SCIM Api , lets say by email Id

I have a case where I don't know the user Id of the user but I have emails of the user whose Id have to delete , is it possible to provide email id in SCIM Api to delete the user Idhttps://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/late...

Data Engineering

1156 Views
0 replies
1 kudos

06-11-2021 8:04:54 AM

by Anonymous • Not applicable

06-10-2021 9:12:05 PM

2193 Views
1 replies
1 kudos

Resolved! What are the benefits of Databricks? How is it different than Open Source Spark?

Data Engineering

2193 Views
1 replies
1 kudos

06-10-2021 9:12:05 PM

View Replies

Latest Reply

Digan_Parikh
Databricks Employee

06-11-2021 4:43:00 AM

1 kudos

High level:Check this out for a detailed comparison - https://databricks.com/spark/comparing-databricks-to-apache-spark

1 kudos

06-11-2021 4:43:00 AM

by Anonymous • Not applicable

06-10-2021 9:16:12 PM

2377 Views
1 replies
0 kudos

Resolved! How long does a task have to be in the queue before the cluster autoscales?:

Data Engineering

2377 Views
1 replies
0 kudos

06-10-2021 9:16:12 PM

View Replies

Latest Reply

Ryan_Chynoweth
Databricks Employee

06-11-2021 1:46:00 AM

0 kudos

There are two types of auto scaling in Databricks: Standard and Optimized. In both scenarios when tasks are submitted the cluster will begin scaling to execute as many of them in parallel immediately.Scaling down is different. In optimized autoscalin...

0 kudos

06-11-2021 1:46:00 AM

by Anonymous • Not applicable

06-10-2021 9:14:10 PM

5152 Views
1 replies
0 kudos

Resolved! Can we set up alerts on cluster metrics / job failures in Databricks?

Data Engineering

5152 Views
1 replies
0 kudos

06-10-2021 9:14:10 PM

View Replies

Latest Reply

User16019159252
Databricks Employee

06-11-2021 12:29:00 AM

0 kudos

Yes, you can alerts - Email alerts sent in case of job failure, success, or timeout. You can set alerts up for job start, job success, and job failure (including skipped jobs), providing multiple comma-separated email addresses for each alert type. Y...

0 kudos

06-11-2021 12:29:00 AM

by Anonymous • Not applicable

06-10-2021 7:27:19 PM

1176 Views
0 replies
0 kudos

Using multiple clouds

Are there recommendations and/or examples of leveraging AWS and Azure with Databricks? If so, is there any best practices to follow? Want to ensure we avoid expensive data transfer across clouds

Data Engineering

1176 Views
0 replies
0 kudos

06-10-2021 7:27:19 PM

by Anonymous • Not applicable

06-10-2021 7:24:51 PM

2072 Views
0 replies
0 kudos

Automatically create folder structure

I imported one workspace into another and noticed there were several instances of RESOURCE_DOES_NOT_EXIST errors because of the folder structure of the workspace (despite importing the workspace as well), see example below:Get: https://dbc-9d482d3a-f...

Data Engineering

2072 Views
0 replies
0 kudos

06-10-2021 7:24:51 PM

by Anonymous • Not applicable

06-10-2021 2:59:39 PM

1677 Views
1 replies
0 kudos

Resolved! What is the frequency of usage log delivery?

Data Engineering

1677 Views
1 replies
0 kudos

06-10-2021 2:59:39 PM

View Replies

Latest Reply

Anonymous
Not applicable

06-10-2021 9:07:00 AM

0 kudos

Hi Brinda, it's daily. https://docs.databricks.com/administration-guide/account-settings/billable-usage-delivery.html#high-level-flow

0 kudos

06-10-2021 9:07:00 AM

by User16752241457 • Databricks Employee

06-10-2021 10:57:09 AM

2685 Views
1 replies
0 kudos

Saving display() plots

Is there an easy way I can save the plots generated by the display() cmd?

Data Engineering

2685 Views
1 replies
0 kudos

06-10-2021 10:57:09 AM

View Replies

Latest Reply

User16788317454
Databricks Employee

06-10-2021 12:13:25 PM

0 kudos

Plots generated via the display() command are automatically saved under /FileStore/plots. See the documentation for more info: https://docs.databricks.com/data/filestore.html#filestore.However, perhaps an easier approach to save/revisit plots is to u...

0 kudos

06-10-2021 12:13:25 PM

by User16788317454 • Databricks Employee

06-10-2021 10:53:28 AM

1792 Views
1 replies
0 kudos

Resolved! I have a single node XGBoost model written in Python. How can I scale it with Spark?

Data Engineering

1792 Views
1 replies
0 kudos

06-10-2021 10:53:28 AM

View Replies

Latest Reply

j_weaver
Databricks Employee

06-10-2021 11:41:45 AM

0 kudos

If you are talking about distributed training of a single XGBoost model, there is no built-in capability in SparkML. SparkML supports gradient boosted trees, but not XGBoost specifically. However, there are 3rd party packages, such as XGBoost4J that ...

0 kudos

06-10-2021 11:41:45 AM

by j_weaver • Databricks Employee

06-10-2021 10:59:37 AM

2085 Views
1 replies
0 kudos

Resolved! How can I scale my neural network with spark? I'm building a fully connected tensorflow.keras model.

Data Engineering

2085 Views
1 replies
0 kudos

06-10-2021 10:59:37 AM

View Replies

Latest Reply

User16788317454
Databricks Employee

06-10-2021 11:35:04 AM

0 kudos

With Spark, there are a few ways you can scale your model: TrainingHyperparameter tuningInferenceIf you're looking to train one model across multiple workers, you can leverage Horovod. It's an open source project designed to simplify distributed neur...

0 kudos

06-10-2021 11:35:04 AM

by jose_gonzalez • Databricks Employee

06-04-2021 11:54:24 AM

1685 Views
2 replies
0 kudos

Cluster goes unresponsive after installing a library

Right after I install a library in my cluster, my cluster goes unresponsive and nothing runs. How to solve this issue?

Data Engineering

1685 Views
2 replies
0 kudos

06-04-2021 11:54:24 AM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

06-10-2021 11:31:22 AM

0 kudos

it is a standard cluster. It is happening for all libraries. is there a way to debug or show the errors messages if any?

0 kudos

06-10-2021 11:31:22 AM

1 More Replies

by j_weaver • Databricks Employee

06-10-2021 10:57:03 AM

1948 Views
1 replies
0 kudos

Resolved! When should I use pandas, Pyspark, and Koalas?

Data Engineering

1948 Views
1 replies
0 kudos

06-10-2021 10:57:03 AM

View Replies

Latest Reply

User16752246141
Databricks Employee

06-10-2021 10:59:14 AM

0 kudos

Pandas works for single machine computations, so any pandas code you write on Databricks will run on the driver of the cluster. Pyspark and Koalas are both distributed frameworks for when you have large datasets. You can use Pyspark and Koalas inte...

0 kudos

06-10-2021 10:59:14 AM

by Joseph_B • Databricks Employee

06-09-2021 5:51:24 PM

2120 Views
1 replies
0 kudos

When doing hyperparameter tuning with Hyperopt, when should I use SparkTrials? Does it work with both single-machine ML (like sklearn) and distributed ML (like Apache Spark ML)?

I want to know how to use Hyperopt in different situations:Tuning a single-machine algorithm from scikit-learn or single-node TensorFlowTuning a distributed algorithm from Spark ML or distributed TensorFlow / Horovod

Data Engineering

2120 Views
1 replies
0 kudos

06-09-2021 5:51:24 PM

View Replies

Latest Reply

Joseph_B
Databricks Employee

06-09-2021 5:56:20 PM

0 kudos

The right question to ask is indeed: Is the algorithm you want to tune single-machine or distributed?If it's a single-machine algorithm like any from scikit-learn, then you can use SparkTrials with Hyperopt to distribute hyperparameter tuning.If it's...

0 kudos

06-09-2021 5:56:20 PM

by FrancisLau1897 • New Contributor

08-03-2018 10:35:22 AM

23178 Views
7 replies
0 kudos

Getting "java.lang.ClassNotFoundException: Failed to find data source: xml" error when loading XML

Both the following commands fail df1 = sqlContext.read.format("xml").load(loadPath) df2 = sqlContext.read.format("com.databricks.spark.xml").load(loadPath) with the following error message: java.lang.ClassNotFoundException: Failed to find data sour...

Data Engineering

23178 Views
7 replies
0 kudos

08-03-2018 10:35:22 AM

View Replies

Latest Reply

alvaroagx
New Contributor II

06-09-2021 4:39:44 PM

0 kudos

Hi, If you are getting this error is due com.sun.xml.bind library is obsolete now. You need to download org.jvnet.jaxb2.maven package into a library by using Maven Central and attach that into a cluster. Then you are going to be able to use xml...

0 kudos

06-09-2021 4:39:44 PM

6 More Replies

by Digan_Parikh • Databricks Employee

06-09-2021 1:47:03 PM

2344 Views
0 replies
0 kudos

Widgets - Way to validate config parameters

Yes, you can use the widgets api to have some control to validate the input before you pass the values to the rest of your codeFor example:folder = dbutils.widgets.get("Folder") if folder == "": raise Exception("Folder missing")or to get spark se...

Data Engineering

2344 Views
0 replies
0 kudos

06-09-2021 1:47:03 PM

Databricks Community

Forum Posts

can I delete any user without userid using SCIM Api , lets say by email Id

Resolved! What are the benefits of Databricks? How is it different than Open Source Spark?

Resolved! How long does a task have to be in the queue before the cluster autoscales?:

Resolved! Can we set up alerts on cluster metrics / job failures in Databricks?

Using multiple clouds

Automatically create folder structure

Resolved! What is the frequency of usage log delivery?

Saving display() plots

Resolved! I have a single node XGBoost model written in Python. How can I scale it with Spark?

Resolved! How can I scale my neural network with spark? I'm building a fully connected tensorflow.keras model.

Cluster goes unresponsive after installing a library

Resolved! When should I use pandas, Pyspark, and Koalas?

When doing hyperparameter tuning with Hyperopt, when should I use SparkTrials? Does it work with both single-machine ML (like sklearn) and distributed ML (like Apache Spark ML)?

Getting "java.lang.ClassNotFoundException: Failed to find data source: xml" error when loading XML

Widgets - Way to validate config parameters

Join Us as a Local Community Builder!

SQL Stored Procedures - Notebook to always run the...

Notebook dashboard export unavailable

Azure Data Factory and Photon

Quota Limit Exhausted Error when Creating Data Ing...

How do use Databricks Lakeflow Declarative Pipelin...