Data Engineering

Forum Posts

Sorted by:

Start a conversation

by Anonymous • Not applicable

06-21-2021 1:57:37 PM

2473 Views
1 replies
0 kudos

Resolved! When using MLflow tracking, where does it store the tracked parameters, metrics and artifacts?

I saw default path for artifacts as dbfs but not sure if that's where everything else is stored. Can we modify it?

Data Engineering

2473 Views
1 replies
0 kudos

06-21-2021 1:57:37 PM

View Replies

Latest Reply

sean_owen
Databricks Employee

06-22-2021 3:01:31 AM

0 kudos

Artifacts like models, model metadata like the "MLmodel" file, input samples, and other logged artifacts like plots, config, network architectures, are stored as files. While these could be simple local filesystem files when the tracking server is ru...

0 kudos

06-22-2021 3:01:31 AM

by Anonymous • Not applicable

06-21-2021 2:05:48 PM

1695 Views
1 replies
0 kudos

Resolved! What is the important / benefits of tracking artifacts in MLflow tracking?

Data Engineering

1695 Views
1 replies
0 kudos

06-21-2021 2:05:48 PM

View Replies

Latest Reply

sean_owen
Databricks Employee

06-22-2021 2:55:36 AM

0 kudos

For me, the main benefit is that it is little or no work to enable. For example, when autologging is enabled for a library like sklearn or Pytorch, a lot of information about a model is captured with no additional steps. Further in Databricks, the tr...

0 kudos

06-22-2021 2:55:36 AM

by Anonymous • Not applicable

06-21-2021 2:15:25 PM

2523 Views
1 replies
0 kudos

Resolved! Do errors get logged as part of the artifact tracking in MLflow? Or is there a way to log errors in general?

Data Engineering

2523 Views
1 replies
0 kudos

06-21-2021 2:15:25 PM

View Replies

Latest Reply

sean_owen
Databricks Employee

06-22-2021 2:50:53 AM

0 kudos

For the tracking server? Yes, it does produce logs which you could see if running the tracking server as a standalone service. They are not exposed from the hosted tracking server in Databricks. However there typically aren't errors or logs of intere...

0 kudos

06-22-2021 2:50:53 AM

by User16826994223 • Honored Contributor III

06-22-2021 2:30:57 AM

6624 Views
1 replies
0 kudos

Resolved! How Azure Databricks manages network security group rules

How Azure Databricks manages network security group rules

Data Engineering

6624 Views
1 replies
0 kudos

06-22-2021 2:30:57 AM

View Replies

Latest Reply

User16826994223
Honored Contributor III

06-22-2021 2:31:15 AM

0 kudos

The NSG rules listed in the following sections represent those that Azure Databricks auto-provisions and manages in your NSG, by virtue of the delegation of your VNet’s host and container subnets to the Microsoft.Databricks/workspaces service. You do...

0 kudos

06-22-2021 2:31:15 AM

by User16826994223 • Honored Contributor III

06-22-2021 12:57:04 AM

4464 Views
0 replies
0 kudos

Virtual network requirements inAzure (V net Injection) The VNet that you deploy your Azure Databricks workspace to must meet the following requirement...

Virtual network requirements inAzure (V net Injection)The VNet that you deploy your Azure Databricks workspace to must meet the following requirements:Region: The VNet must reside in the same region as the Azure Databricks workspace.Subscription: The...

Data Engineering

4464 Views
0 replies
0 kudos

06-22-2021 12:57:04 AM

by User16826994223 • Honored Contributor III

06-22-2021 12:48:11 AM

1791 Views
0 replies
0 kudos

Benefits of using Vnet injection in Azure Databricks Connect Azure Databricks to other Azure services (such as Azure Storage) in a more secure manne...

Benefits of using Vnet injection in Azure Databricks Connect Azure Databricks to other Azure services (such as Azure Storage) in a more secure manner using service endpoints or private endpoints.Connect to on-premises data sources for use with Azure...

Data Engineering

1791 Views
0 replies
0 kudos

06-22-2021 12:48:11 AM

by sajith_appukutt • Honored Contributor II

06-09-2021 1:28:20 AM

2618 Views
1 replies
0 kudos

Resolved! I'm using the Redshift data source to load data into spark SQL data frames. However, I'm not seeing predicate push down for my queries ran on Redshift - is that expected?

I was expecting filter operations to be pushed down to Redshift by the optimizer. However, the entire dataset is getting loaded from Redshift.

Data Engineering

2618 Views
1 replies
0 kudos

06-09-2021 1:28:20 AM

View Replies

Latest Reply

sajith_appukutt
Honored Contributor II

06-21-2021 6:02:06 PM

0 kudos

The Spark driver for Redshift pushes the following operators down into Redshift:FilterProjectSortLimitAggregationJoinHowever, it does not support expressions operating on dates and timestamps today. If you have a similar requirement, please add a fea...

0 kudos

06-21-2021 6:02:06 PM

by User16752239289 • Databricks Employee

06-21-2021 4:05:50 PM

2913 Views
1 replies
0 kudos

The cluster with the instance profile cannot access the S3 bucket. 403 permission denied is thrown

The document has been followed to configure the instance profile. The ec2 instance is able to access the S3 bucket when configured the same instance profile. However, the cluster configured to use the same instance profile failed to access the S3 buc...

Data Engineering

2913 Views
1 replies
0 kudos

06-21-2021 4:05:50 PM

View Replies

Latest Reply

User16752239289
Databricks Employee

06-21-2021 4:14:16 PM

0 kudos

I suspect this is due to AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY has been added to the spark environmental variable. You can run %sh env | grep -i aws on your cluster and make sure AWS_ACCESS_KEY_ID is not present. If so, then please remove it e...

0 kudos

06-21-2021 4:14:16 PM

by sajith_appukutt • Honored Contributor II

06-09-2021 12:07:20 AM

2637 Views
1 replies
0 kudos

Resolved! Re-optimize in delta not splitting large files to smaller files.

I am trying to re-optimize the a delta table with a max file size of 32 MB. But after changing spark.databricks.delta.optimize.maxFileSize and trying to optimize a partition, it doesn't split larger files to smaller ones. How can i get it to work.

Data Engineering

2637 Views
1 replies
0 kudos

06-09-2021 12:07:20 AM

View Replies

Latest Reply

sajith_appukutt
Honored Contributor II

06-21-2021 3:21:11 PM

0 kudos

spark.databricks.delta.optimize.maxFileSize controls the target size to binpack files when you run OPTIMIZE command. But it will not split larger files to smaller ones today. File splitting happens when ZORDER is ran however.

0 kudos

06-21-2021 3:21:11 PM

by sajith_appukutt • Honored Contributor II

06-08-2021 11:49:34 PM

1760 Views
1 replies
0 kudos

Resolved! How can i generate an audit report that lists all Data object privileges in my catalog

Data Engineering

1760 Views
1 replies
0 kudos

06-08-2021 11:49:34 PM

View Replies

Latest Reply

sajith_appukutt
Honored Contributor II

06-21-2021 3:14:46 PM

0 kudos

0 kudos

06-21-2021 3:14:46 PM

by sajith_appukutt • Honored Contributor II

06-13-2021 4:55:00 PM

1856 Views
1 replies
0 kudos

Resolved! MERGE operation on PI data getting slower. How can I debug?

We have a structured streaming job configured to read from event-hub and persist to the delta raw/bronze layer via MERGE inside a foreachBatch, However of-late, the merge process is taking longer time. How can i optimize this pipeline ?

Data Engineering

1856 Views
1 replies
0 kudos

06-13-2021 4:55:00 PM

View Replies

Latest Reply

sajith_appukutt
Honored Contributor II

06-21-2021 3:08:55 PM

0 kudos

Delta Lake completes a MERGE in two stepsPerform an inner join between the target table and source table to select all files that have matches.Perform an outer join between the selected files in the target and source tables and write out the update...

0 kudos

06-21-2021 3:08:55 PM

by Anonymous • Not applicable

06-21-2021 2:29:45 PM

1156 Views
0 replies
0 kudos

What is Auto auto-logging?

How is it different from regular autologging? When should I consider enabling Auto autologging ? How can I switch the feature on?

Data Engineering

1156 Views
0 replies
0 kudos

06-21-2021 2:29:45 PM

by Anonymous • Not applicable

06-21-2021 1:53:12 PM

1741 Views
1 replies
1 kudos

Resolved! Can I use mlflow locally on my machine or does it always have to be through Databricks?

Would it require DB connect / DB CLI / API?

Data Engineering

1741 Views
1 replies
1 kudos

06-21-2021 1:53:12 PM

View Replies

Latest Reply

sajith_appukutt
Honored Contributor II

06-21-2021 2:15:45 PM

1 kudos

mlflow is an open source framework and you could pip install mlflow in your laptop for example. https://mlflow.org/docs/latest/quickstart.html

1 kudos

06-21-2021 2:15:45 PM

by User16826987838 • Contributor

06-18-2021 2:07:31 PM

1822 Views
2 replies
0 kudos

How do I get the size of files cleaned up by a vacuum for a Delta table.

Data Engineering

1822 Views
2 replies
0 kudos

06-18-2021 2:07:31 PM

View Replies

Latest Reply

sajith_appukutt
Honored Contributor II

06-21-2021 2:06:16 PM

0 kudos

def getVaccumSize(table: String): Long = { val listFiles = spark.sql(s"VACUUM $table DRY RUN").select("path").collect().map(_(0)).toList var sum = 0L listFiles.foreach(x => sum += dbutils.fs.ls(x.toString)(0).size) sum } getVaccumSize("<yo...

0 kudos

06-21-2021 2:06:16 PM

1 More Replies

by User16826987838 • Contributor

06-18-2021 5:47:55 PM

1418 Views
2 replies
0 kudos

Is it possible to increase the maximum number of secret scopes (currently 100) for a workspace

Data Engineering

1418 Views
2 replies
0 kudos

06-18-2021 5:47:55 PM

View Replies

Latest Reply

Anonymous
Not applicable

06-21-2021 1:36:36 PM

0 kudos

You can file a Tech Support ticket requesting for it. No need for ES ticket.

0 kudos

06-21-2021 1:36:36 PM

1 More Replies

Databricks Community

Forum Posts

Resolved! When using MLflow tracking, where does it store the tracked parameters, metrics and artifacts?

Resolved! What is the important / benefits of tracking artifacts in MLflow tracking?

Resolved! Do errors get logged as part of the artifact tracking in MLflow? Or is there a way to log errors in general?

Resolved! How Azure Databricks manages network security group rules

Virtual network requirements inAzure (V net Injection) The VNet that you deploy your Azure Databricks workspace to must meet the following requirement...

Benefits of using Vnet injection in Azure Databricks Connect Azure Databricks to other Azure services (such as Azure Storage) in a more secure manne...

Resolved! I'm using the Redshift data source to load data into spark SQL data frames. However, I'm not seeing predicate push down for my queries ran on Redshift - is that expected?

The cluster with the instance profile cannot access the S3 bucket. 403 permission denied is thrown

Resolved! Re-optimize in delta not splitting large files to smaller files.

Resolved! How can i generate an audit report that lists all Data object privileges in my catalog

Resolved! MERGE operation on PI data getting slower. How can I debug?

What is Auto auto-logging?

Resolved! Can I use mlflow locally on my machine or does it always have to be through Databricks?

How do I get the size of files cleaned up by a vacuum for a Delta table.

Is it possible to increase the maximum number of secret scopes (currently 100) for a workspace

Join Us as a Local Community Builder!

How to build Data Pipeline to consume data from Ad...

Cognito as IdP provider for Delta Share

How to Retrieve the spark.statistics.createdAt Whe...

Not able to find lab for Data Engineering Learning...

Lakeflow Connect - Postgres connector