Data Engineering

Forum Posts

Sorted by:

Start a conversation

by Digan_Parikh • Valued Contributor

06-22-2021 11:05:45 AM

949 Views
1 replies
0 kudos

Resolved! Package cells for Python notebooks

Do we have an analogous concept to package cells for Python notebooks?

Data Engineering

949 Views
1 replies
0 kudos

06-22-2021 11:05:45 AM

View Replies

Latest Reply

Digan_Parikh
Valued Contributor

06-22-2021 11:07:06 AM

0 kudos

You can just declare your classes and in one cell, and use them in the others. It is recommended to get all your classes in one notebook, and use %run in the other to "import" those classes.The one thing you cannot do is to literally import a folder/...

0 kudos

06-22-2021 11:07:06 AM

by User16826987838 • Contributor

06-18-2021 2:03:35 PM

620 Views
1 replies
1 kudos

Is there a way to create a geo-based (specifically US map) charts in Databricks SQL

Data Engineering

620 Views
1 replies
1 kudos

06-18-2021 2:03:35 PM

View Replies

Latest Reply

Digan_Parikh
Valued Contributor

06-22-2021 10:59:45 AM

1 kudos

@Rathna Sundaralingam Yes, in the visualization editor select the following:Type: MapUnder General: Map: USAKey Column: you need a state column here (for ex: CA, NY)Target Field: USPS AbbreviationValue Column: your desired value for the heatmap.

1 kudos

06-22-2021 10:59:45 AM

by Digan_Parikh • Valued Contributor

06-22-2021 10:54:46 AM

649 Views
1 replies
0 kudos

Resolved! %run in R?

Is %run magic command supported in R notebook?

Data Engineering

649 Views
1 replies
0 kudos

06-22-2021 10:54:46 AM

View Replies

Latest Reply

Digan_Parikh
Valued Contributor

06-22-2021 10:55:34 AM

0 kudos

The %/magic commands are notebook commands and not tied to any language so R notebook also supports %run.

0 kudos

06-22-2021 10:55:34 AM

by User16826992666 • Valued Contributor

06-22-2021 8:17:55 AM

673 Views
1 replies
0 kudos

Resolved! In Databricks SQL how can I tell if my query is using Photon?

I have turned Photon on in my endpoint, but I don't know if it's actually being used in my queries. Is there some way I can see this other than manually testing queries with Photon turned on and off?

Data Engineering

673 Views
1 replies
0 kudos

06-22-2021 8:17:55 AM

View Replies

Latest Reply

Digan_Parikh
Valued Contributor

06-22-2021 10:50:20 AM

0 kudos

@Trevor Bishop If you go to the History tab in DBSQL, click on the specific query and look at the execution details. At the bottom, you will see "Task time in Photon".

0 kudos

06-22-2021 10:50:20 AM

by Srikanth_Gupta_ • Valued Contributor

06-22-2021 7:48:02 AM

722 Views
1 replies
1 kudos

What new features(DLT, Unity Catalog and Delta sharing) are available with open source Delta with out using Databricks?

Data Engineering

722 Views
1 replies
1 kudos

06-22-2021 7:48:02 AM

View Replies

Latest Reply

User15787040559
New Contributor III

06-22-2021 9:48:29 AM

1 kudos

Only Delta Sharing will be initially OSS, see here.DLT and Unity Catalog will be Databricks only.

1 kudos

06-22-2021 9:48:29 AM

by User16826994223 • Honored Contributor III

06-22-2021 6:08:09 AM

2112 Views
2 replies
0 kudos

Resolved! Garbage Collection optimization

I have a case where garbage collection is taking much time and I want to optimize it for better performance

Data Engineering

2112 Views
2 replies
0 kudos

06-22-2021 6:08:09 AM

View Replies

Latest Reply

sean_owen
Honored Contributor II

06-22-2021 9:06:59 AM

0 kudos

You can also tune the JVM's GC parameters directly, if you mean the pauses are too long. Set "spark.executor.extraJavaOptions", but it does require knowing a thing or two about how to tune for what performance goal.

0 kudos

06-22-2021 9:06:59 AM

1 More Replies

by User16826994223 • Honored Contributor III

06-22-2021 5:36:42 AM

884 Views
1 replies
0 kudos

Resolved! How do I register UDF in sql

Can I get an example of how do create UDF in python and use in sql

Data Engineering

884 Views
1 replies
0 kudos

06-22-2021 5:36:42 AM

View Replies

Latest Reply

User16826994223
Honored Contributor III

06-22-2021 5:41:16 AM

0 kudos

def squared(s): return s * s spark.udf.register("squaredWithPython", squared)You can optionally set the return type of your UDF. The default return type is StringType.from pyspark.sql.types import LongType def squared_typed(s): return s * s spark...

0 kudos

06-22-2021 5:41:16 AM

by User16826994223 • Honored Contributor III

06-22-2021 5:14:01 AM

645 Views
1 replies
1 kudos

Resolved! IDE supports in databricks

Which IDEs are integrated with Databricks till today

Data Engineering

645 Views
1 replies
1 kudos

06-22-2021 5:14:01 AM

View Replies

Latest Reply

User16826994223
Honored Contributor III

06-22-2021 5:14:39 AM

1 kudos

EclipseIntelliJJupyterPyCharmSBTsparklyr and RStudio DesktopSparkR and RStudio DesktopVisual Studio Code

1 kudos

06-22-2021 5:14:39 AM

by User16826994223 • Honored Contributor III

06-22-2021 3:48:59 AM

1711 Views
1 replies
0 kudos

Resolved! Cluster terminated. Reason: Spark Startup Failure: Spark was not able to start in time

Cluster terminated. Reason: Spark Startup Failure: Spark was not able to start in time. This issue can be caused by a malfunctioning Hive metastore, invalid Spark configurations, or malfunctioning init scripts. Please refer to the Spark driver logs t...

Data Engineering

1711 Views
1 replies
0 kudos

06-22-2021 3:48:59 AM

View Replies

Latest Reply

User16826994223
Honored Contributor III

06-22-2021 3:49:50 AM

0 kudos

I Think Container is unable to talk to hosting instance or DBFS account. It Can be solved by adding a custom route to the subnets for the DBFS account with the next hop being public route.

0 kudos

06-22-2021 3:49:50 AM

by User16826994223 • Honored Contributor III

06-22-2021 3:46:08 AM

1696 Views
1 replies
0 kudos

Resolved! Cloud Provider Launch Failure

Cloud Provider Launch Failure: A cloud provider error was encountered while setting up the cluster. See the Azure Databricks guide for more information. Azure error code: AuthorizationFailed/InvalidResourceReference.

Data Engineering

1696 Views
1 replies
0 kudos

06-22-2021 3:46:08 AM

View Replies

Latest Reply

User16826994223
Honored Contributor III

06-22-2021 3:46:34 AM

0 kudos

Possible cause: the VNet or subnets do not exist any more. Make sure the VNet and subnets exist.

0 kudos

06-22-2021 3:46:34 AM

by Srikanth_Gupta_ • Valued Contributor

06-18-2021 2:33:41 PM

1348 Views
4 replies
1 kudos

can we read XML files into Dataframes in Spark?

Data Engineering

1348 Views
4 replies
1 kudos

06-18-2021 2:33:41 PM

View Replies

Latest Reply

sean_owen
Honored Contributor II

06-22-2021 3:08:46 AM

1 kudos

Note that you will need to install the spark-xml library to make this work: https://github.com/databricks/spark-xml For example you can create a Library in the workspace that references com.databricks:spark-xml_2.12:0.12.0 and then attach it to a clu...

1 kudos

06-22-2021 3:08:46 AM

3 More Replies

by Anonymous • Not applicable

06-21-2021 12:30:26 PM

1727 Views
3 replies
1 kudos

What is the difference between mlflow projects and mlflow model?

They both seem to package it. When should one use one over the other?

Data Engineering

1727 Views
3 replies
1 kudos

06-21-2021 12:30:26 PM

View Replies

Latest Reply

sean_owen
Honored Contributor II

06-22-2021 3:02:53 AM

1 kudos

One thing I think it's useful to point out for Databricks users is that you would typically not use MLflow Projects to describe execution of a modeling run. You would just use MLflow directly in Databricks and use Databricks notebooks to manage code ...

1 kudos

06-22-2021 3:02:53 AM

2 More Replies

by User16826994223 • Honored Contributor III

06-22-2021 3:00:43 AM

530 Views
1 replies
0 kudos

Do we have a template to create a azure workspace with V net Injection

Data Engineering

530 Views
1 replies
0 kudos

06-22-2021 3:00:43 AM

View Replies

Latest Reply

User16826994223
Honored Contributor III

06-22-2021 3:02:10 AM

0 kudos

yes we have a community template available to create a workspace https://docs.microsoft.com/en-us/azure/databricks/administration-guide/cloud-configurations/azure/vnet-inject#--advanced-configuration-using-azure-resource-manager-templates

0 kudos

06-22-2021 3:02:10 AM

by Anonymous • Not applicable

06-21-2021 1:57:37 PM

1015 Views
1 replies
0 kudos

Resolved! When using MLflow tracking, where does it store the tracked parameters, metrics and artifacts?

I saw default path for artifacts as dbfs but not sure if that's where everything else is stored. Can we modify it?

Data Engineering

1015 Views
1 replies
0 kudos

06-21-2021 1:57:37 PM

View Replies

Latest Reply

sean_owen
Honored Contributor II

06-22-2021 3:01:31 AM

0 kudos

Artifacts like models, model metadata like the "MLmodel" file, input samples, and other logged artifacts like plots, config, network architectures, are stored as files. While these could be simple local filesystem files when the tracking server is ru...

0 kudos

06-22-2021 3:01:31 AM

by Anonymous • Not applicable

06-21-2021 2:05:48 PM

670 Views
1 replies
0 kudos

Resolved! What is the important / benefits of tracking artifacts in MLflow tracking?

Data Engineering

670 Views
1 replies
0 kudos

06-21-2021 2:05:48 PM

View Replies

Latest Reply

sean_owen
Honored Contributor II

06-22-2021 2:55:36 AM

0 kudos

For me, the main benefit is that it is little or no work to enable. For example, when autologging is enabled for a library like sklearn or Pytorch, a lot of information about a model is captured with no additional steps. Further in Databricks, the tr...

0 kudos

06-22-2021 2:55:36 AM

User

Count

1601

736

343

284

247

Databricks

Forum Posts

Resolved! Package cells for Python notebooks

Is there a way to create a geo-based (specifically US map) charts in Databricks SQL

Resolved! %run in R?

Resolved! In Databricks SQL how can I tell if my query is using Photon?

What new features(DLT, Unity Catalog and Delta sharing) are available with open source Delta with out using Databricks?

Resolved! Garbage Collection optimization

Resolved! How do I register UDF in sql

Resolved! IDE supports in databricks

Resolved! Cluster terminated. Reason: Spark Startup Failure: Spark was not able to start in time

Resolved! Cloud Provider Launch Failure

can we read XML files into Dataframes in Spark?

What is the difference between mlflow projects and mlflow model?

Do we have a template to create a azure workspace with V net Injection

Resolved! When using MLflow tracking, where does it store the tracked parameters, metrics and artifacts?

Resolved! What is the important / benefits of tracking artifacts in MLflow tracking?

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...