Databricks Community

sajith_appukutt · 06-13-2021

We have a structured streaming job configured to read from event-hub and persist to the delta raw/bronze layer via MERGE inside a foreachBatch, However of-late, the merge process is taking longer time. How can i optimize this pipeline ?

sajith_appukutt · 06-13-2021

I'm working on setting up tooling to allow team members to easily register and load models from a central mlflow model registry via dbconnect. However after following the instructions at the public docs , hitting this error raise _NoDbutilsError mlfl...

sajith_appukutt · 06-11-2021

sajith_appukutt · 06-11-2021

sajith_appukutt · 06-11-2021

sajith_appukutt · 06-25-2021

G1GC can solve problems in some cases where garbage collection is a bottleneck. checkout https://databricks.com/blog/2015/05/28/tuning-java-garbage-collection-for-spark-applications.html

sajith_appukutt · 06-25-2021

You could look at S3 Intelligent-Tiering - https://aws.amazon.com/about-aws/whats-new/2018/11/s3-intelligent-tiering/

sajith_appukutt · 06-25-2021

If it is on aws, consider using Nitro instances which gives this automatically. For more details check https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/data-protection.html#encryption-transit

sajith_appukutt · 06-25-2021

Yes, multiple users could work on individual notebooks and still use the same experiment via mlflow.set_experiment(). You could also assign different permission levels to experiments from a governance point of view

sajith_appukutt · 06-25-2021

You could mount an s3 bucket in the workspace and save your model using the mounts DBFS path For e.gmodelpath = "/dbfs/my-s3-bucket/model-%f-%f" % (alpha, l1_ratio) mlflow.sklearn.save_model(lr, modelpath)

Databricks Community

User Stats

User Activity

MERGE operation on PI data getting slower. How can I debug?

Unable to get mlflow central model registry to work with dbconnect.

How can I configure a custom DNS for my databricks workspace to talk to my on-premises systems

The nodes on my spark cluster are launched with public ips. How can i remove that ?

How can I reduce the risk of data exfiltration while using Databricks

Re: Is it recommended to use G1GC on the Databricks cluster

Re: Can I move some partitions of a Delta table to a different location?

Re: Is it possible to enable encryption in between worker nodes?

Re: Can multiple users collaborate together on MLflow experiments?

Re: Can I save MLflow artifacts to locations other than the dbfs?