cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Anonymous
by Not applicable
  • 2473 Views
  • 1 replies
  • 0 kudos

Resolved! When using MLflow tracking, where does it store the tracked parameters, metrics and artifacts?

I saw default path for artifacts as dbfs but not sure if that's where everything else is stored. Can we modify it?

  • 2473 Views
  • 1 replies
  • 0 kudos
Latest Reply
sean_owen
Databricks Employee
  • 0 kudos

Artifacts like models, model metadata like the "MLmodel" file, input samples, and other logged artifacts like plots, config, network architectures, are stored as files. While these could be simple local filesystem files when the tracking server is ru...

  • 0 kudos
Anonymous
by Not applicable
  • 1695 Views
  • 1 replies
  • 0 kudos
  • 1695 Views
  • 1 replies
  • 0 kudos
Latest Reply
sean_owen
Databricks Employee
  • 0 kudos

For me, the main benefit is that it is little or no work to enable. For example, when autologging is enabled for a library like sklearn or Pytorch, a lot of information about a model is captured with no additional steps. Further in Databricks, the tr...

  • 0 kudos
Anonymous
by Not applicable
  • 2523 Views
  • 1 replies
  • 0 kudos
  • 2523 Views
  • 1 replies
  • 0 kudos
Latest Reply
sean_owen
Databricks Employee
  • 0 kudos

For the tracking server? Yes, it does produce logs which you could see if running the tracking server as a standalone service. They are not exposed from the hosted tracking server in Databricks. However there typically aren't errors or logs of intere...

  • 0 kudos
User16826994223
by Honored Contributor III
  • 6624 Views
  • 1 replies
  • 0 kudos

Resolved! How Azure Databricks manages network security group rules

How Azure Databricks manages network security group rules

  • 6624 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

The NSG rules listed in the following sections represent those that Azure Databricks auto-provisions and manages in your NSG, by virtue of the delegation of your VNet’s host and container subnets to the Microsoft.Databricks/workspaces service. You do...

  • 0 kudos
User16826994223
by Honored Contributor III
  • 4464 Views
  • 0 replies
  • 0 kudos

Virtual network requirements inAzure (V net Injection) The VNet that you deploy your Azure Databricks workspace to must meet the following requirement...

Virtual network requirements inAzure (V net Injection)The VNet that you deploy your Azure Databricks workspace to must meet the following requirements:Region: The VNet must reside in the same region as the Azure Databricks workspace.Subscription: The...

  • 4464 Views
  • 0 replies
  • 0 kudos
User16826994223
by Honored Contributor III
  • 1791 Views
  • 0 replies
  • 0 kudos

Benefits of using Vnet injection in Azure Databricks  Connect Azure Databricks to other Azure services (such as Azure Storage) in a more secure manne...

Benefits of using Vnet injection in Azure Databricks Connect Azure Databricks to other Azure services (such as Azure Storage) in a more secure manner using service endpoints or private endpoints.Connect to on-premises data sources for use with Azure...

  • 1791 Views
  • 0 replies
  • 0 kudos
sajith_appukutt
by Honored Contributor II
  • 2618 Views
  • 1 replies
  • 0 kudos

Resolved! I'm using the Redshift data source to load data into spark SQL data frames. However, I'm not seeing predicate push down for my queries ran on Redshift - is that expected?

I was expecting filter operations to be pushed down to Redshift by the optimizer. However, the entire dataset is getting loaded from Redshift.

  • 2618 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

The Spark driver for Redshift pushes the following operators down into Redshift:FilterProjectSortLimitAggregationJoinHowever, it does not support expressions operating on dates and timestamps today. If you have a similar requirement, please add a fea...

  • 0 kudos
User16752239289
by Databricks Employee
  • 2913 Views
  • 1 replies
  • 0 kudos

The cluster with the instance profile cannot access the S3 bucket. 403 permission denied is thrown

The document has been followed to configure the instance profile. The ec2 instance is able to access the S3 bucket when configured the same instance profile. However, the cluster configured to use the same instance profile failed to access the S3 buc...

  • 2913 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16752239289
Databricks Employee
  • 0 kudos

I suspect this is due to AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY has been added to the spark environmental variable. You can run %sh env | grep -i aws on your cluster and make sure AWS_ACCESS_KEY_ID is not present. If so, then please remove it e...

  • 0 kudos
sajith_appukutt
by Honored Contributor II
  • 2637 Views
  • 1 replies
  • 0 kudos

Resolved! Re-optimize in delta not splitting large files to smaller files.

I am trying to re-optimize the a delta table with a max file size of 32 MB. But after changing spark.databricks.delta.optimize.maxFileSize and trying to optimize a partition, it doesn't split larger files to smaller ones. How can i get it to work.

  • 2637 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

spark.databricks.delta.optimize.maxFileSize controls the target size to binpack files when you run OPTIMIZE command. But it will not split larger files to smaller ones today.  File splitting happens when ZORDER is ran however.

  • 0 kudos
sajith_appukutt
by Honored Contributor II
  • 1760 Views
  • 1 replies
  • 0 kudos
  • 1760 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

You could leverage SHOW GRANT which displays the privilegesSHOW GRANT [<user>] ON [CATALOG | DATABASE <database-name> | TABLE <table-name> | VIEW <view-name> | FUNCTION <function-name> | ANONYMOUS FUNCTION | ANY FILE]You could use this code snippet ...

  • 0 kudos
sajith_appukutt
by Honored Contributor II
  • 1856 Views
  • 1 replies
  • 0 kudos

Resolved! MERGE operation on PI data getting slower. How can I debug?

We have a structured streaming job configured to read from event-hub and persist to the delta raw/bronze layer via MERGE inside a foreachBatch, However of-late, the merge process is taking longer time. How can i optimize this pipeline ?

  • 1856 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

Delta Lake completes a  MERGE  in two stepsPerform an inner join between the target table and source table to select all files that have matches.Perform an outer join between the selected files in the target and source tables and write out the update...

  • 0 kudos
Anonymous
by Not applicable
  • 1156 Views
  • 0 replies
  • 0 kudos

What is Auto auto-logging?

How is it different from regular autologging? When should I consider enabling Auto autologging ? How can I switch the feature on?

  • 1156 Views
  • 0 replies
  • 0 kudos
Anonymous
by Not applicable
  • 1741 Views
  • 1 replies
  • 1 kudos
  • 1741 Views
  • 1 replies
  • 1 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 1 kudos

mlflow is an open source framework and you could pip install mlflow in your laptop for example. https://mlflow.org/docs/latest/quickstart.html

  • 1 kudos
User16826987838
by Contributor
  • 1822 Views
  • 2 replies
  • 0 kudos
  • 1822 Views
  • 2 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

def getVaccumSize(table: String): Long = { val listFiles = spark.sql(s"VACUUM $table DRY RUN").select("path").collect().map(_(0)).toList var sum = 0L listFiles.foreach(x => sum += dbutils.fs.ls(x.toString)(0).size) sum }   getVaccumSize("<yo...

  • 0 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels