cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

aladda
by Databricks Employee
  • 2861 Views
  • 1 replies
  • 0 kudos
  • 2861 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Databricks Employee
  • 0 kudos

For an optimal processing experience Databricks segregates the Spark application traffic from the management traffic to avoid Network contention. Spark application traffic communications between the Driver-Executor and the Executors themselves where ...

  • 0 kudos
brickster_2018
by Databricks Employee
  • 1735 Views
  • 1 replies
  • 0 kudos

Resolved! Does Delta perform listing of data directories?

I never ran VACUUM on the Delta table. Will Delta perform direct listing on those directories, I am afraid the query performance is going to get degraded over timeHow about the log directories. I have more than 100k JSON files in the log directory

  • 1735 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

For both Data and logs, Delta does not perform listing. The transaction logs have the details of the files and the objects are directly accessed. Similarly with in the Delta log directory listing is performed. However, it's performed using a continu...

  • 0 kudos
User16826992666
by Databricks Employee
  • 4329 Views
  • 1 replies
  • 0 kudos

Resolved! Can I copy my MLflow experiments from one workspace to another?

I would like to move my saved experiments and artifacts to a different Databricks workspace from where I originally created them. Is this possible?

  • 4329 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Databricks Employee
  • 0 kudos

It might be possible with a bit of code via mlflow client api ( there seems to be a way to run list_registered_models and extract info ) - but haven't tried it out. If the requirement is to share models between workspaces, one approach could be to h...

  • 0 kudos
MoJaMa
by Databricks Employee
  • 1424 Views
  • 1 replies
  • 0 kudos
  • 1424 Views
  • 1 replies
  • 0 kudos
Latest Reply
MoJaMa
Databricks Employee
  • 0 kudos

If it's disabled / lost, then it's broken. If customer cannot recover the key, then it's gone forever. Databricks has no knowledge of the key (us knowing it would render it insecure).

  • 0 kudos
User16790091296
by Databricks Employee
  • 1596 Views
  • 1 replies
  • 0 kudos
  • 1596 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Databricks Employee
  • 0 kudos

Depends on what you're looking for from a management perspective, but one option is the Account API which allows deploying/updating/configuring multiple workspaces in a given E2 accountUse this API to programmatically deploy, update, and delete works...

  • 0 kudos
brickster_2018
by Databricks Employee
  • 2462 Views
  • 1 replies
  • 0 kudos
  • 2462 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

curl -H Metadata:true --noproxy "*" "http://169.254.169.254/metadata/instance?api-version=2020-09-01" | jq '.compute.tagsList[] | select(.name=="Creator") | .value'

  • 0 kudos
aladda
by Databricks Employee
  • 5821 Views
  • 1 replies
  • 1 kudos

Why do Databricks deployments require 2 subnets for each workspace

Databricks must have access to at least two subnets for each workspace, with each subnet in a different availability zone per docs here

  • 5821 Views
  • 1 replies
  • 1 kudos
Latest Reply
aladda
Databricks Employee
  • 1 kudos

This is designed for optimal user experience and as a capacity planning strategy where if instances are not available in one AZ, the other subnet in a different AZ can be used to deploy instance from instead

  • 1 kudos
brickster_2018
by Databricks Employee
  • 4028 Views
  • 1 replies
  • 1 kudos
  • 4028 Views
  • 1 replies
  • 1 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 1 kudos

Find the DriverDaemon%sh jpsTake the heap dump%sh jmap -dump:live,format=b,file=pbs_worker_DriverDaemon.hprof 2413Copy out to download%sh cp pbs_worker_DriverDaemon.hprof /dbfs/FileStore/pbs_worker_04-30-2021T15-50-00.hprof

  • 1 kudos
User16826992666
by Databricks Employee
  • 7827 Views
  • 1 replies
  • 0 kudos

Resolved! When using MLflow should I use log_model or save_model?

They seem to have similar functions. What is the recommended pattern here?

  • 7827 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Databricks Employee
  • 0 kudos

mlflow.<model-type>.log_model(model, ...) saves the model to the MLflow tracking server. mlflow.<model-type>.save_model(model, modelpath) saved the model locally to  a DBFS path.More details at https://docs.databricks.com/applications/mlflow/models...

  • 0 kudos
Labels