@Jared Webb :
Yes, it is possible to use a stratified sampling strategy for the train/test/validate splits in the AutoML library. The AutoMLConfig class in the azureml.train.automl package allows you to specify a
featurization configuration, which includes a stratification_column_names parameter that you can use to specify the column(s) to stratify on. Here is an example code snippet that shows how to use stratified sampling with AutoML:
from azureml.core import Dataset
from azureml.train.automl import AutoMLConfig
# Load your dataset from a registered dataset in AzureML
dataset = Dataset.get_by_name(workspace, dataset_name)
# Specify the column(s) to stratify on
stratification_columns = ['group']
# Configure AutoML to use stratified sampling
automl_config = AutoMLConfig(
task='classification',
primary_metric='accuracy',
training_data=dataset,
label_column_name='label',
featurization='auto',
n_cross_validations=5,
stratification_column_names=stratification_columns,
compute_target=compute_target,
max_concurrent_iterations=4,
experiment_timeout_minutes=30,
enable_early_stopping=True
)
# Run AutoML
automl_run = experiment.submit(automl_config)
In this example, stratification_columns is a list of column names to use for stratification. You can include as many columns as needed to get the desired level of stratification.
Note that you need to ensure that the stratification_columns you use are present in your dataset and have appropriate values for stratification. Also, keep in mind that stratification may not always be possible or desirable depending on your use case and data distribution, so you should carefully consider whether it is appropriate for your specific situation.