cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Digan_Parikh
by Valued Contributor
  • 949 Views
  • 1 replies
  • 0 kudos

Resolved! Package cells for Python notebooks

Do we have an analogous concept to package cells for Python notebooks?

  • 949 Views
  • 1 replies
  • 0 kudos
Latest Reply
Digan_Parikh
Valued Contributor
  • 0 kudos

You can just declare your classes and in one cell, and use them in the others. It is recommended to get all your classes in one notebook, and use %run in the other to "import" those classes.The one thing you cannot do is to literally import a folder/...

  • 0 kudos
User16826987838
by Contributor
  • 620 Views
  • 1 replies
  • 1 kudos
  • 620 Views
  • 1 replies
  • 1 kudos
Latest Reply
Digan_Parikh
Valued Contributor
  • 1 kudos

@Rathna Sundaralingam​  Yes, in the visualization editor select the following:Type: MapUnder General: Map: USAKey Column: you need a state column here (for ex: CA, NY)Target Field: USPS AbbreviationValue Column: your desired value for the heatmap.

  • 1 kudos
Digan_Parikh
by Valued Contributor
  • 649 Views
  • 1 replies
  • 0 kudos

Resolved! %run in R?

Is %run magic command supported in R notebook? 

  • 649 Views
  • 1 replies
  • 0 kudos
Latest Reply
Digan_Parikh
Valued Contributor
  • 0 kudos

The %/magic commands are notebook commands and not tied to any language so R notebook also supports %run.

  • 0 kudos
User16826992666
by Valued Contributor
  • 673 Views
  • 1 replies
  • 0 kudos

Resolved! In Databricks SQL how can I tell if my query is using Photon?

I have turned Photon on in my endpoint, but I don't know if it's actually being used in my queries. Is there some way I can see this other than manually testing queries with Photon turned on and off?

  • 673 Views
  • 1 replies
  • 0 kudos
Latest Reply
Digan_Parikh
Valued Contributor
  • 0 kudos

@Trevor Bishop​ If you go to the History tab in DBSQL, click on the specific query and look at the execution details. At the bottom, you will see "Task time in Photon".

  • 0 kudos
Srikanth_Gupta_
by Valued Contributor
  • 722 Views
  • 1 replies
  • 1 kudos
  • 722 Views
  • 1 replies
  • 1 kudos
Latest Reply
User15787040559
New Contributor III
  • 1 kudos

Only Delta Sharing will be initially OSS, see here.DLT and Unity Catalog will be Databricks only.

  • 1 kudos
User16826994223
by Honored Contributor III
  • 2112 Views
  • 2 replies
  • 0 kudos

Resolved! Garbage Collection optimization

I have a case where garbage collection is taking much time and I want to optimize it for better performance

  • 2112 Views
  • 2 replies
  • 0 kudos
Latest Reply
sean_owen
Honored Contributor II
  • 0 kudos

You can also tune the JVM's GC parameters directly, if you mean the pauses are too long. Set "spark.executor.extraJavaOptions", but it does require knowing a thing or two about how to tune for what performance goal.

  • 0 kudos
1 More Replies
User16826994223
by Honored Contributor III
  • 884 Views
  • 1 replies
  • 0 kudos

Resolved! How do I register UDF in sql

Can I get an example of how do create UDF in python and use in sql

  • 884 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

def squared(s): return s * s spark.udf.register("squaredWithPython", squared)You can optionally set the return type of your UDF. The default return type is StringType.from pyspark.sql.types import LongType def squared_typed(s): return s * s spark...

  • 0 kudos
User16826994223
by Honored Contributor III
  • 645 Views
  • 1 replies
  • 1 kudos

Resolved! IDE supports in databricks

Which IDEs are integrated with Databricks till today

  • 645 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 1 kudos

EclipseIntelliJJupyterPyCharmSBTsparklyr and RStudio DesktopSparkR and RStudio DesktopVisual Studio Code

  • 1 kudos
User16826994223
by Honored Contributor III
  • 1711 Views
  • 1 replies
  • 0 kudos

Resolved! Cluster terminated. Reason: Spark Startup Failure: Spark was not able to start in time

Cluster terminated. Reason: Spark Startup Failure: Spark was not able to start in time. This issue can be caused by a malfunctioning Hive metastore, invalid Spark configurations, or malfunctioning init scripts. Please refer to the Spark driver logs t...

  • 1711 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

I Think Container is unable to talk to hosting instance or DBFS account. It Can be solved by adding a custom route to the subnets for the DBFS account with the next hop being public route.

  • 0 kudos
User16826994223
by Honored Contributor III
  • 1696 Views
  • 1 replies
  • 0 kudos

Resolved! Cloud Provider Launch Failure

Cloud Provider Launch Failure: A cloud provider error was encountered while setting up the cluster. See the Azure Databricks guide for more information. Azure error code: AuthorizationFailed/InvalidResourceReference.

  • 1696 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

Possible cause: the VNet or subnets do not exist any more. Make sure the VNet and subnets exist.

  • 0 kudos
Srikanth_Gupta_
by Valued Contributor
  • 1348 Views
  • 4 replies
  • 1 kudos
  • 1348 Views
  • 4 replies
  • 1 kudos
Latest Reply
sean_owen
Honored Contributor II
  • 1 kudos

Note that you will need to install the spark-xml library to make this work: https://github.com/databricks/spark-xml For example you can create a Library in the workspace that references com.databricks:spark-xml_2.12:0.12.0 and then attach it to a clu...

  • 1 kudos
3 More Replies
Anonymous
by Not applicable
  • 1727 Views
  • 3 replies
  • 1 kudos

What is the difference between mlflow projects and mlflow model?

 They both seem to package it. When should one use one over the other?

  • 1727 Views
  • 3 replies
  • 1 kudos
Latest Reply
sean_owen
Honored Contributor II
  • 1 kudos

One thing I think it's useful to point out for Databricks users is that you would typically not use MLflow Projects to describe execution of a modeling run. You would just use MLflow directly in Databricks and use Databricks notebooks to manage code ...

  • 1 kudos
2 More Replies
User16826994223
by Honored Contributor III
  • 530 Views
  • 1 replies
  • 0 kudos

Do we have a template to create a azure workspace with V net Injection

Do we have a template to create a azure workspace with V net Injection 

  • 530 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

yes we have a community template available to create a workspace https://docs.microsoft.com/en-us/azure/databricks/administration-guide/cloud-configurations/azure/vnet-inject#--advanced-configuration-using-azure-resource-manager-templates

  • 0 kudos
Anonymous
by Not applicable
  • 1015 Views
  • 1 replies
  • 0 kudos

Resolved! When using MLflow tracking, where does it store the tracked parameters, metrics and artifacts?

I saw default path for artifacts as dbfs but not sure if that's where everything else is stored. Can we modify it?

  • 1015 Views
  • 1 replies
  • 0 kudos
Latest Reply
sean_owen
Honored Contributor II
  • 0 kudos

Artifacts like models, model metadata like the "MLmodel" file, input samples, and other logged artifacts like plots, config, network architectures, are stored as files. While these could be simple local filesystem files when the tracking server is ru...

  • 0 kudos
Anonymous
by Not applicable
  • 670 Views
  • 1 replies
  • 0 kudos
  • 670 Views
  • 1 replies
  • 0 kudos
Latest Reply
sean_owen
Honored Contributor II
  • 0 kudos

For me, the main benefit is that it is little or no work to enable. For example, when autologging is enabled for a library like sklearn or Pytorch, a lot of information about a model is captured with no additional steps. Further in Databricks, the tr...

  • 0 kudos
Labels
Top Kudoed Authors