Meta Models (Ensembles)

Ensemble Benefits

Meta models combine predictions from multiple endpoints (XGBoost, PyTorch, ChemProp, etc.) to improve accuracy and robustness. Different model frameworks often make different errors, so aggregating them can reduce overall prediction error.

Meta models aggregate predictions from multiple deployed endpoints into a single ensemble prediction. Rather than training on raw features, a meta model calls existing endpoints at inference time and combines their outputs using confidence-weighted voting strategies.

Why Use a Meta Model?

Different model frameworks capture different aspects of molecular structure:

XGBoost excels at tabular features (RDKit descriptors)
PyTorch can learn nonlinear descriptor interactions
ChemProp operates directly on molecular graphs

By combining predictions across frameworks, you get:

Lower error — individual model mistakes get averaged out
Better calibration — ensemble disagreement provides a natural uncertainty signal
Robustness — no single model failure dominates the prediction

Quick Start

Simulate Ensemble Performance

Before creating a meta model, you can simulate how different aggregation strategies would perform using existing endpoint predictions:

from workbench.api import MetaModel

# Simulate ensemble performance across endpoints
sim = MetaModel.simulate(["logd-xgb-end", "logd-pytorch-end", "logd-chemprop-end"])
sim.report()

Example Output

===  Individual Model Performance  ===
logd-xgb-end:      MAE=0.428  RMSE=0.580  R²=0.817
logd-pytorch-end:   MAE=0.445  RMSE=0.594  R²=0.808
logd-chemprop-end:  MAE=0.412  RMSE=0.558  R²=0.831

===  Ensemble Strategy Comparison  ===
Strategy                     MAE     RMSE    R²
simple_mean                 0.391   0.532   0.846
confidence_weighted         0.388   0.528   0.849
inverse_mae_weighted        0.385   0.525   0.850
scaled_conf_weighted        0.383   0.522   0.852  ← Best
calibrated_conf_weighted    0.384   0.523   0.851
drop_worst                  0.398   0.541   0.841

The simulator analyzes all aggregation strategies and identifies the one that gives the best performance on held-out cross-fold data.

Create a Meta Model

Once you're satisfied with the simulation results, create the meta model. It auto-simulates internally to pick the best strategy:

from workbench.api import MetaModel

# Create a meta model (auto-simulates to find best strategy)
meta = MetaModel.create(
    name="logd-meta",
    endpoints=["logd-xgb-end", "logd-pytorch-end", "logd-chemprop-end"],
    description="Meta model for LogD prediction",
    tags=["meta", "logd", "ensemble"],
)
print(meta.summary())

Deploy the Meta Model

Meta models deploy like any other Workbench model:

# Deploy to an AWS Endpoint
endpoint = meta.to_endpoint(tags=["meta", "logd"])

# Run inference — the meta endpoint calls child endpoints internally
results_df = endpoint.inference(my_dataframe)

At inference time, the meta endpoint calls all child endpoints in parallel, collects their predictions and confidence scores, and aggregates them using the selected strategy.

Aggregation Strategies

The meta model supports five aggregation strategies for combining endpoint predictions:

Strategy	Description	When to Use
simple_mean	Equal weight to all endpoints	Baseline; all models perform similarly
confidence_weighted	Weight by per-row confidence score	Models have well-calibrated confidence
inverse_mae_weighted	Static weights from inverse MAE (lower error = higher weight)	Default; good when per-row confidence isn't reliable
scaled_conf_weighted	Inverse-MAE weights × per-row confidence	Best of both worlds; often the top performer
calibrated_conf_weighted	Confidence scaled by \|confidence–error correlation\|	Rewards models whose confidence actually predicts accuracy

How Strategies Work

All strategies produce per-row weights that are normalized to sum to 1. The final prediction is:

prediction = Σ (weight_i × prediction_i)

For the confidence-based strategies, each row can have different weights — if one model is very confident about a particular compound and another is not, the confident model gets more influence on that row.

Zero-Confidence Fallback

When all child endpoints report zero confidence for a given row (e.g., the compound is outside the training domain for all models), the confidence-weighted strategies fall back to static inverse-MAE weights. This prevents degenerate predictions and ensures every row gets a reasonable ensemble output.

Simulation Deep Dive

The simulator provides several methods for detailed analysis:

sim = MetaModel.simulate(["logd-xgb-end", "logd-pytorch-end", "logd-chemprop-end"])

# Full report with all strategies
sim.report()

# Get the best strategy configuration (used internally by create())
config = sim.get_best_strategy_config()
print(config)
# {'aggregation_strategy': 'scaled_conf_weighted',
#  'model_weights': {'logd-xgb-end': 0.34, 'logd-pytorch-end': 0.30, ...},
#  'corr_scale': {'logd-xgb-end': 0.62, 'logd-pytorch-end': 0.55, ...},
#  'endpoints': ['logd-xgb-end', 'logd-pytorch-end', 'logd-chemprop-end'],
#  'target_column': 'logd'}

# Export the best ensemble's predictions to CSV
df = sim.best_ensemble_predictions()
df.to_csv("ensemble_predictions.csv", index=False)

Drop-Worst Analysis

The simulator also evaluates whether removing the worst-performing model improves the ensemble. If dropping a model reduces error, the get_best_strategy_config() method returns the reduced endpoint list. This is handled automatically by MetaModel.create().

CLI Tool

The meta_model_sim CLI provides quick ensemble analysis from the command line:

# Simulate ensemble performance
meta_model_sim logd-xgb-end logd-pytorch-end logd-chemprop-end

# Use a specific inference capture
meta_model_sim logd-xgb-end logd-pytorch-end logd-chemprop-end \
    --capture-name full_cross_fold

# Save best ensemble predictions to CSV
meta_model_sim logd-xgb-end logd-pytorch-end logd-chemprop-end \
    --output ensemble_results.csv

How It Works Under the Hood

Creation Flow

When you call MetaModel.create(), the following happens:

Lineage resolution — Backtraces the first endpoint's lineage (endpoint → model → FeatureSet) to automatically resolve the target column, ID column, and feature list
Simulation — Runs MetaModelSimulator to evaluate all aggregation strategies on cross-fold prediction data
Strategy selection — Picks the best-performing strategy, including checking if dropping the worst model helps
Training job — Runs a minimal SageMaker training job that saves the meta configuration (endpoints, weights, strategy) as a model artifact
Registration — Creates a SageMaker Model Package with the meta inference container
Metadata — Sets Workbench metadata (model type, framework, features, endpoints)

Inference Flow

When a meta endpoint receives a prediction request:

Parse input — Reads the incoming CSV/JSON data
Fan out — Calls all child endpoints in parallel using fast_inference
Aggregate — Combines predictions using the stored aggregation strategy and weights
Return — Outputs aggregated prediction, prediction_std (ensemble disagreement), and confidence

API Reference

MetaModel: A Model that aggregates predictions from multiple endpoints.

MetaModels don't train on feature data - they combine predictions from existing endpoints using confidence-weighted voting. This provides ensemble benefits across different model frameworks (XGBoost, PyTorch, ChemProp, etc.).

`MetaModel`

Bases: Model

MetaModel: A Model that aggregates predictions from endpoints.

Common Usage

# Simulate ensemble performance before creating
sim = MetaModel.simulate(["endpoint-1", "endpoint-2", "endpoint-3"])
sim.report()

# Create a meta model (auto-simulates to find best strategy)
meta = MetaModel.create(
    name="my-meta-model",
    endpoints=["endpoint-1", "endpoint-2", "endpoint-3"],
    target_column="target"
)

# Deploy like any other model
endpoint = meta.to_endpoint()

Source code in src/workbench/api/meta_model.py

class MetaModel(Model):
    """MetaModel: A Model that aggregates predictions from endpoints.

    Common Usage:
        ```python
        # Simulate ensemble performance before creating
        sim = MetaModel.simulate(["endpoint-1", "endpoint-2", "endpoint-3"])
        sim.report()

        # Create a meta model (auto-simulates to find best strategy)
        meta = MetaModel.create(
            name="my-meta-model",
            endpoints=["endpoint-1", "endpoint-2", "endpoint-3"],
            target_column="target"
        )

        # Deploy like any other model
        endpoint = meta.to_endpoint()
        ```
    """

    @classmethod
    def simulate(
        cls,
        endpoints: list[str],
        capture_name: str = "full_cross_fold",
    ) -> "MetaModelSimulator":
        """Run ensemble simulation to analyze how different strategies perform.

        Backtraces the first endpoint's lineage to automatically resolve the
        ID column from the underlying FeatureSet.

        Args:
            endpoints (list[str]): List of endpoint names to include in the simulation
            capture_name (str): Inference capture name to load predictions from
                (default: 'full_cross_fold')

        Returns:
            MetaModelSimulator: Simulator instance for analysis and reporting
        """
        from workbench.utils.meta_model_simulator import MetaModelSimulator

        id_column = cls._resolve_id_column(endpoints[0])
        return MetaModelSimulator(endpoints, id_column=id_column, capture_name=capture_name)

    @classmethod
    def create(
        cls,
        name: str,
        endpoints: list[str],
        description: str = None,
        tags: list[str] = None,
        capture_name: str = "full_cross_fold",
    ) -> "MetaModel":
        """Create a new MetaModel from a list of endpoints.

        Automatically backtraces endpoint lineage to resolve the target column
        and ID column, then runs ensemble simulation to find the best aggregation
        strategy, model weights, and confidence calibration parameters.

        Args:
            name (str): Name for the meta model
            endpoints (list[str]): List of endpoint names to aggregate
            description (str): Optional description for the model
            tags (list[str]): Optional list of tags
            capture_name (str): Inference capture name for simulation
                (default: 'full_cross_fold')

        Returns:
            MetaModel: The created meta model
        """
        from workbench.utils.meta_model_simulator import MetaModelSimulator

        Artifact.is_name_valid(name, delimiter="-", lower_case=False)

        # Validate endpoints and get lineage info from primary endpoint
        feature_list, feature_set_name, id_column, target_column = cls._validate_and_get_lineage(endpoints)

        # Run ensemble simulation to find best strategy
        log.important("Running ensemble simulation to find best strategy...")
        sim = MetaModelSimulator(endpoints, id_column=id_column, capture_name=capture_name)
        sim.report()
        config = sim.get_best_strategy_config()

        # Use the simulator's recommended config
        aggregation_strategy = config["aggregation_strategy"]
        model_weights = config["model_weights"]
        corr_scale = config["corr_scale"]
        final_endpoints = config["endpoints"]  # May differ if drop_worst won

        log.important(f"Best strategy: {aggregation_strategy}")
        log.important(f"Endpoints: {final_endpoints}")
        log.important(f"Model weights: {model_weights}")

        # Delete existing model if it exists
        log.important(f"Trying to delete existing model {name}...")
        ModelCore.managed_delete(name)

        # Run training and register model
        aws_clamp = AWSAccountClamp()
        estimator = cls._run_training(
            name, final_endpoints, target_column, model_weights, aggregation_strategy, corr_scale, aws_clamp
        )
        cls._register_model(name, final_endpoints, description, tags, estimator, aws_clamp)

        # Set metadata and onboard
        cls._set_metadata(name, target_column, feature_list, feature_set_name, final_endpoints)

        log.important(f"MetaModel {name} created successfully!")
        return cls(name)

    @classmethod
    def _resolve_id_column(cls, endpoint_name: str) -> str:
        """Backtrace an endpoint to find the ID column from its underlying FeatureSet.

        Args:
            endpoint_name (str): Endpoint name to backtrace

        Returns:
            str: The ID column name from the FeatureSet
        """
        ep = Endpoint(endpoint_name)
        model = Model(ep.get_input())
        feature_set_name = model.get_input()
        fs = FeatureSet(feature_set_name)
        log.info(f"Resolved id_column='{fs.id_column}' from {endpoint_name} -> {model.name} -> {feature_set_name}")
        return fs.id_column

    @classmethod
    def _validate_and_get_lineage(cls, endpoints: list[str]) -> tuple[list[str], str, str, str]:
        """Validate endpoints exist and get lineage info from primary endpoint.

        Backtraces: endpoint → model → feature_set to resolve features, id_column,
        and target_column automatically.

        Args:
            endpoints (list[str]): List of endpoint names

        Returns:
            tuple: (feature_list, feature_set_name, id_column, target_column)
        """
        log.info("Verifying endpoints...")
        for ep_name in endpoints:
            ep = Endpoint(ep_name)
            if not ep.exists():
                raise ValueError(f"Endpoint '{ep_name}' does not exist")
            log.info(f"  {ep_name}: OK")

        # Use first endpoint as primary - backtrack to get model and feature set
        primary_endpoint = Endpoint(endpoints[0])
        primary_model = Model(primary_endpoint.get_input())
        feature_list = primary_model.features()
        feature_set_name = primary_model.get_input()
        target_column = primary_model.target()
        fs = FeatureSet(feature_set_name)
        id_column = fs.id_column

        log.info(
            f"Primary: {endpoints[0]} -> {primary_model.name} -> {feature_set_name} "
            f"(id_column: {id_column}, target: {target_column})"
        )
        return feature_list, feature_set_name, id_column, target_column

    @classmethod
    def _run_training(
        cls,
        name: str,
        endpoints: list[str],
        target_column: str,
        model_weights: dict[str, float],
        aggregation_strategy: str,
        corr_scale: dict[str, float] | None,
        aws_clamp: AWSAccountClamp,
    ) -> Estimator:
        """Run the minimal training job that saves the meta model config.

        Args:
            name (str): Model name
            endpoints (list[str]): List of endpoint names
            target_column (str): Target column name
            model_weights (dict): Dict mapping endpoint name to weight
            aggregation_strategy (str): Ensemble aggregation strategy name
            corr_scale (dict): Dict mapping endpoint name to |confidence_error_correlation|
            aws_clamp (AWSAccountClamp): AWS account clamp

        Returns:
            Estimator: The fitted estimator
        """
        sm_session = aws_clamp.sagemaker_session()
        cm = ConfigManager()
        workbench_bucket = cm.get_config("WORKBENCH_BUCKET")
        models_s3_path = f"s3://{workbench_bucket}/models"

        # Generate the model script from template
        template_params = {
            "model_type": ModelType.REGRESSOR,
            "model_framework": ModelFramework.META,
            "endpoints": endpoints,
            "target_column": target_column,
            "model_weights": model_weights,
            "aggregation_strategy": aggregation_strategy,
            "corr_scale": corr_scale or {},
            "model_metrics_s3_path": f"{models_s3_path}/{name}/training",
            "aws_region": sm_session.boto_region_name,
        }
        script_path = generate_model_script(template_params)

        # Create estimator
        training_image = ModelImages.get_image_uri(sm_session.boto_region_name, "meta_training")
        log.info(f"Using Meta Training Image: {training_image}")
        estimator = Estimator(
            entry_point=Path(script_path).name,
            source_dir=str(Path(script_path).parent),
            role=aws_clamp.aws_session.get_workbench_execution_role_arn(),
            instance_count=1,
            instance_type="ml.m5.large",
            sagemaker_session=sm_session,
            image_uri=training_image,
        )

        # Run training (no input data needed - just saves config)
        log.important(f"Creating MetaModel {name}...")
        estimator.fit()

        return estimator

    @classmethod
    def _register_model(
        cls,
        name: str,
        endpoints: list[str],
        description: str,
        tags: list[str],
        estimator: Estimator,
        aws_clamp: AWSAccountClamp,
    ):
        """Create model group and register the model.

        Args:
            name (str): Model name
            endpoints (list[str]): List of endpoint names
            description (str): Model description
            tags (list[str]): Model tags
            estimator (Estimator): Fitted estimator
            aws_clamp (AWSAccountClamp): AWS account clamp
        """
        sm_session = aws_clamp.sagemaker_session()
        model_description = description or f"Meta model aggregating: {', '.join(endpoints)}"

        # Create model group
        aws_clamp.sagemaker_client().create_model_package_group(
            ModelPackageGroupName=name,
            ModelPackageGroupDescription=model_description,
            Tags=[{"Key": "workbench_tags", "Value": "::".join(tags or [name])}],
        )

        # Register the model with meta inference image
        inference_image = ModelImages.get_image_uri(sm_session.boto_region_name, "meta_inference")
        log.important(f"Registering model {name} with Inference Image {inference_image}...")
        estimator.create_model(role=aws_clamp.aws_session.get_workbench_execution_role_arn()).register(
            model_package_group_name=name,
            image_uri=inference_image,
            content_types=["text/csv"],
            response_types=["text/csv"],
            inference_instances=supported_instance_types("x86_64"),
            transform_instances=["ml.m5.large", "ml.m5.xlarge"],
            approval_status="Approved",
            description=model_description,
        )

    @classmethod
    def _set_metadata(
        cls, name: str, target_column: str, feature_list: list[str], feature_set_name: str, endpoints: list[str]
    ):
        """Set model metadata and onboard.

        Args:
            name (str): Model name
            target_column (str): Target column name
            feature_list (list[str]): List of feature names
            feature_set_name (str): Name of the input FeatureSet
            endpoints (list[str]): List of endpoint names
        """
        time.sleep(3)
        output_model = ModelCore(name)
        output_model._set_model_type(ModelType.UQ_REGRESSOR)
        output_model._set_model_framework(ModelFramework.META)
        output_model.set_input(feature_set_name, force=True)
        output_model.upsert_workbench_meta({"workbench_model_target": target_column})
        output_model.upsert_workbench_meta({"workbench_model_features": feature_list})
        output_model.upsert_workbench_meta({"endpoints": endpoints})
        output_model.onboard_with_args(ModelType.UQ_REGRESSOR, target_column, feature_list=feature_list)

`create(name, endpoints, description=None, tags=None, capture_name='full_cross_fold')` `classmethod`

Create a new MetaModel from a list of endpoints.

Automatically backtraces endpoint lineage to resolve the target column and ID column, then runs ensemble simulation to find the best aggregation strategy, model weights, and confidence calibration parameters.

Parameters:

Name	Type	Description	Default
`name`	`str`	Name for the meta model	required
`endpoints`	`list[str]`	List of endpoint names to aggregate	required
`description`	`str`	Optional description for the model	`None`
`tags`	`list[str]`	Optional list of tags	`None`
`capture_name`	`str`	Inference capture name for simulation (default: 'full_cross_fold')	`'full_cross_fold'`

Returns:

Name	Type	Description
`MetaModel`	`'MetaModel'`	The created meta model

Source code in src/workbench/api/meta_model.py

@classmethod
def create(
    cls,
    name: str,
    endpoints: list[str],
    description: str = None,
    tags: list[str] = None,
    capture_name: str = "full_cross_fold",
) -> "MetaModel":
    """Create a new MetaModel from a list of endpoints.

    Automatically backtraces endpoint lineage to resolve the target column
    and ID column, then runs ensemble simulation to find the best aggregation
    strategy, model weights, and confidence calibration parameters.

    Args:
        name (str): Name for the meta model
        endpoints (list[str]): List of endpoint names to aggregate
        description (str): Optional description for the model
        tags (list[str]): Optional list of tags
        capture_name (str): Inference capture name for simulation
            (default: 'full_cross_fold')

    Returns:
        MetaModel: The created meta model
    """
    from workbench.utils.meta_model_simulator import MetaModelSimulator

    Artifact.is_name_valid(name, delimiter="-", lower_case=False)

    # Validate endpoints and get lineage info from primary endpoint
    feature_list, feature_set_name, id_column, target_column = cls._validate_and_get_lineage(endpoints)

    # Run ensemble simulation to find best strategy
    log.important("Running ensemble simulation to find best strategy...")
    sim = MetaModelSimulator(endpoints, id_column=id_column, capture_name=capture_name)
    sim.report()
    config = sim.get_best_strategy_config()

    # Use the simulator's recommended config
    aggregation_strategy = config["aggregation_strategy"]
    model_weights = config["model_weights"]
    corr_scale = config["corr_scale"]
    final_endpoints = config["endpoints"]  # May differ if drop_worst won

    log.important(f"Best strategy: {aggregation_strategy}")
    log.important(f"Endpoints: {final_endpoints}")
    log.important(f"Model weights: {model_weights}")

    # Delete existing model if it exists
    log.important(f"Trying to delete existing model {name}...")
    ModelCore.managed_delete(name)

    # Run training and register model
    aws_clamp = AWSAccountClamp()
    estimator = cls._run_training(
        name, final_endpoints, target_column, model_weights, aggregation_strategy, corr_scale, aws_clamp
    )
    cls._register_model(name, final_endpoints, description, tags, estimator, aws_clamp)

    # Set metadata and onboard
    cls._set_metadata(name, target_column, feature_list, feature_set_name, final_endpoints)

    log.important(f"MetaModel {name} created successfully!")
    return cls(name)

`simulate(endpoints, capture_name='full_cross_fold')` `classmethod`

Run ensemble simulation to analyze how different strategies perform.

Backtraces the first endpoint's lineage to automatically resolve the ID column from the underlying FeatureSet.

Parameters:

Name	Type	Description	Default
`endpoints`	`list[str]`	List of endpoint names to include in the simulation	required
`capture_name`	`str`	Inference capture name to load predictions from (default: 'full_cross_fold')	`'full_cross_fold'`

Returns:

Name	Type	Description
`MetaModelSimulator`	`'MetaModelSimulator'`	Simulator instance for analysis and reporting

Source code in src/workbench/api/meta_model.py

@classmethod
def simulate(
    cls,
    endpoints: list[str],
    capture_name: str = "full_cross_fold",
) -> "MetaModelSimulator":
    """Run ensemble simulation to analyze how different strategies perform.

    Backtraces the first endpoint's lineage to automatically resolve the
    ID column from the underlying FeatureSet.

    Args:
        endpoints (list[str]): List of endpoint names to include in the simulation
        capture_name (str): Inference capture name to load predictions from
            (default: 'full_cross_fold')

    Returns:
        MetaModelSimulator: Simulator instance for analysis and reporting
    """
    from workbench.utils.meta_model_simulator import MetaModelSimulator

    id_column = cls._resolve_id_column(endpoints[0])
    return MetaModelSimulator(endpoints, id_column=id_column, capture_name=capture_name)

Questions?

The SuperCowPowers team is happy to answer any questions you may have about AWS and Workbench.

Support: workbench@supercowpowers.com
Discord: Join us on Discord
Website: supercowpowers.com

® Amazon Web Services, AWS, the Powered by AWS logo, are trademarks of Amazon.com, Inc. or its affiliates

Meta Models (Ensembles)

Why Use a Meta Model?

Quick Start

Simulate Ensemble Performance

Create a Meta Model

Deploy the Meta Model

Aggregation Strategies

How Strategies Work

Zero-Confidence Fallback

Simulation Deep Dive

Drop-Worst Analysis

CLI Tool

How It Works Under the Hood

Creation Flow

Inference Flow

API Reference

MetaModel

create(name, endpoints, description=None, tags=None, capture_name='full_cross_fold') classmethod

simulate(endpoints, capture_name='full_cross_fold') classmethod

Questions?

`MetaModel`

`create(name, endpoints, description=None, tags=None, capture_name='full_cross_fold')` `classmethod`

`simulate(endpoints, capture_name='full_cross_fold')` `classmethod`