Skip to content

Meta Models (Ensembles)

Ensemble Benefits

Meta models combine predictions from multiple endpoints (XGBoost, PyTorch, ChemProp, etc.) to improve accuracy and robustness. Different model frameworks often make different errors, so aggregating them can reduce overall prediction error.

Meta models aggregate predictions from multiple deployed endpoints into a single ensemble prediction. Rather than training on raw features, a meta model calls existing endpoints at inference time and combines their outputs using confidence-weighted voting strategies.

Why Use a Meta Model?

Different model frameworks capture different aspects of molecular structure:

  • XGBoost excels at tabular features (RDKit descriptors)
  • PyTorch can learn nonlinear descriptor interactions
  • ChemProp operates directly on molecular graphs

By combining predictions across frameworks, you get:

  • Lower error — individual model mistakes get averaged out
  • Better calibration — ensemble disagreement provides a natural uncertainty signal
  • Robustness — no single model failure dominates the prediction

Quick Start

Simulate Ensemble Performance

Before creating a meta model, you can simulate how different aggregation strategies would perform using existing endpoint predictions:

from workbench.api import MetaModel

# Simulate ensemble performance across endpoints
sim = MetaModel.simulate(["logd-xgb-end", "logd-pytorch-end", "logd-chemprop-end"])
sim.report()

Example Output

===  Individual Model Performance  ===
logd-xgb-end:      MAE=0.428  RMSE=0.580  R²=0.817
logd-pytorch-end:   MAE=0.445  RMSE=0.594  R²=0.808
logd-chemprop-end:  MAE=0.412  RMSE=0.558  R²=0.831

===  Ensemble Strategy Comparison  ===
Strategy                     MAE     RMSE    R²
simple_mean                 0.391   0.532   0.846
confidence_weighted         0.388   0.528   0.849
inverse_mae_weighted        0.385   0.525   0.850
scaled_conf_weighted        0.383   0.522   0.852  ← Best
calibrated_conf_weighted    0.384   0.523   0.851
drop_worst                  0.398   0.541   0.841

The simulator analyzes all aggregation strategies and identifies the one that gives the best performance on held-out cross-fold data.

Create a Meta Model

Once you're satisfied with the simulation results, create the meta model. It auto-simulates internally to pick the best strategy:

from workbench.api import MetaModel

# Create a meta model (auto-simulates to find best strategy)
meta = MetaModel.create(
    name="logd-meta",
    endpoints=["logd-xgb-end", "logd-pytorch-end", "logd-chemprop-end"],
    description="Meta model for LogD prediction",
    tags=["meta", "logd", "ensemble"],
)
print(meta.summary())

Deploy the Meta Model

Meta models deploy like any other Workbench model:

# Deploy to an AWS Endpoint
endpoint = meta.to_endpoint(tags=["meta", "logd"])

# Run inference — the meta endpoint calls child endpoints internally
results_df = endpoint.inference(my_dataframe)

At inference time, the meta endpoint calls all child endpoints in parallel, collects their predictions and confidence scores, and aggregates them using the selected strategy.

Aggregation Strategies

The meta model supports five aggregation strategies for combining endpoint predictions:

Strategy Description When to Use
simple_mean Equal weight to all endpoints Baseline; all models perform similarly
confidence_weighted Weight by per-row confidence score Models have well-calibrated confidence
inverse_mae_weighted Static weights from inverse MAE (lower error = higher weight) Default; good when per-row confidence isn't reliable
scaled_conf_weighted Inverse-MAE weights × per-row confidence Best of both worlds; often the top performer
calibrated_conf_weighted Confidence scaled by |confidence–error correlation| Rewards models whose confidence actually predicts accuracy

How Strategies Work

All strategies produce per-row weights that are normalized to sum to 1. The final prediction is:

prediction = Σ (weight_i × prediction_i)

For the confidence-based strategies, each row can have different weights — if one model is very confident about a particular compound and another is not, the confident model gets more influence on that row.

Zero-Confidence Fallback

When all child endpoints report zero confidence for a given row (e.g., the compound is outside the training domain for all models), the confidence-weighted strategies fall back to static inverse-MAE weights. This prevents degenerate predictions and ensures every row gets a reasonable ensemble output.

Simulation Deep Dive

The simulator provides several methods for detailed analysis:

sim = MetaModel.simulate(["logd-xgb-end", "logd-pytorch-end", "logd-chemprop-end"])

# Full report with all strategies
sim.report()

# Get the best strategy configuration (used internally by create())
config = sim.get_best_strategy_config()
print(config)
# {'aggregation_strategy': 'scaled_conf_weighted',
#  'model_weights': {'logd-xgb-end': 0.34, 'logd-pytorch-end': 0.30, ...},
#  'corr_scale': {'logd-xgb-end': 0.62, 'logd-pytorch-end': 0.55, ...},
#  'endpoints': ['logd-xgb-end', 'logd-pytorch-end', 'logd-chemprop-end'],
#  'target_column': 'logd'}

# Export the best ensemble's predictions to CSV
df = sim.best_ensemble_predictions()
df.to_csv("ensemble_predictions.csv", index=False)

Drop-Worst Analysis

The simulator also evaluates whether removing the worst-performing model improves the ensemble. If dropping a model reduces error, the get_best_strategy_config() method returns the reduced endpoint list. This is handled automatically by MetaModel.create().

CLI Tool

The meta_model_sim CLI provides quick ensemble analysis from the command line:

# Simulate ensemble performance
meta_model_sim logd-xgb-end logd-pytorch-end logd-chemprop-end

# Use a specific inference capture
meta_model_sim logd-xgb-end logd-pytorch-end logd-chemprop-end \
    --capture-name full_cross_fold

# Save best ensemble predictions to CSV
meta_model_sim logd-xgb-end logd-pytorch-end logd-chemprop-end \
    --output ensemble_results.csv

How It Works Under the Hood

Creation Flow

When you call MetaModel.create(), the following happens:

  1. Lineage resolution — Backtraces the first endpoint's lineage (endpoint → model → FeatureSet) to automatically resolve the target column, ID column, and feature list
  2. Simulation — Runs MetaModelSimulator to evaluate all aggregation strategies on cross-fold prediction data
  3. Strategy selection — Picks the best-performing strategy, including checking if dropping the worst model helps
  4. Training job — Runs a minimal SageMaker training job that saves the meta configuration (endpoints, weights, strategy) as a model artifact
  5. Registration — Creates a SageMaker Model Package with the meta inference container
  6. Metadata — Sets Workbench metadata (model type, framework, features, endpoints)

Inference Flow

When a meta endpoint receives a prediction request:

  1. Parse input — Reads the incoming CSV/JSON data
  2. Fan out — Calls all child endpoints in parallel using fast_inference
  3. Aggregate — Combines predictions using the stored aggregation strategy and weights
  4. Return — Outputs aggregated prediction, prediction_std (ensemble disagreement), and confidence

API Reference

MetaModel: A Model that aggregates predictions from multiple endpoints.

MetaModels don't train on feature data - they combine predictions from existing endpoints using confidence-weighted voting. This provides ensemble benefits across different model frameworks (XGBoost, PyTorch, ChemProp, etc.).

MetaModel

Bases: Model

MetaModel: A Model that aggregates predictions from endpoints.

Common Usage
# Simulate ensemble performance before creating
sim = MetaModel.simulate(["endpoint-1", "endpoint-2", "endpoint-3"])
sim.report()

# Create a meta model (auto-simulates to find best strategy)
meta = MetaModel.create(
    name="my-meta-model",
    endpoints=["endpoint-1", "endpoint-2", "endpoint-3"],
    target_column="target"
)

# Deploy like any other model
endpoint = meta.to_endpoint()
Source code in src/workbench/api/meta_model.py
class MetaModel(Model):
    """MetaModel: A Model that aggregates predictions from endpoints.

    Common Usage:
        ```python
        # Simulate ensemble performance before creating
        sim = MetaModel.simulate(["endpoint-1", "endpoint-2", "endpoint-3"])
        sim.report()

        # Create a meta model (auto-simulates to find best strategy)
        meta = MetaModel.create(
            name="my-meta-model",
            endpoints=["endpoint-1", "endpoint-2", "endpoint-3"],
            target_column="target"
        )

        # Deploy like any other model
        endpoint = meta.to_endpoint()
        ```
    """

    @classmethod
    def simulate(
        cls,
        endpoints: list[str],
        capture_name: str = "full_cross_fold",
    ) -> "MetaModelSimulator":
        """Run ensemble simulation to analyze how different strategies perform.

        Backtraces the first endpoint's lineage to automatically resolve the
        ID column from the underlying FeatureSet.

        Args:
            endpoints (list[str]): List of endpoint names to include in the simulation
            capture_name (str): Inference capture name to load predictions from
                (default: 'full_cross_fold')

        Returns:
            MetaModelSimulator: Simulator instance for analysis and reporting
        """
        from workbench.utils.meta_model_simulator import MetaModelSimulator

        id_column = cls._resolve_id_column(endpoints[0])
        return MetaModelSimulator(endpoints, id_column=id_column, capture_name=capture_name)

    @classmethod
    def create(
        cls,
        name: str,
        endpoints: list[str],
        description: str = None,
        tags: list[str] = None,
        capture_name: str = "full_cross_fold",
    ) -> "MetaModel":
        """Create a new MetaModel from a list of endpoints.

        Automatically backtraces endpoint lineage to resolve the target column
        and ID column, then runs ensemble simulation to find the best aggregation
        strategy, model weights, and confidence calibration parameters.

        Args:
            name (str): Name for the meta model
            endpoints (list[str]): List of endpoint names to aggregate
            description (str): Optional description for the model
            tags (list[str]): Optional list of tags
            capture_name (str): Inference capture name for simulation
                (default: 'full_cross_fold')

        Returns:
            MetaModel: The created meta model
        """
        from workbench.utils.meta_model_simulator import MetaModelSimulator

        Artifact.is_name_valid(name, delimiter="-", lower_case=False)

        # Validate endpoints and get lineage info from primary endpoint
        feature_list, feature_set_name, id_column, target_column = cls._validate_and_get_lineage(endpoints)

        # Run ensemble simulation to find best strategy
        log.important("Running ensemble simulation to find best strategy...")
        sim = MetaModelSimulator(endpoints, id_column=id_column, capture_name=capture_name)
        sim.report()
        config = sim.get_best_strategy_config()

        # Use the simulator's recommended config
        aggregation_strategy = config["aggregation_strategy"]
        model_weights = config["model_weights"]
        corr_scale = config["corr_scale"]
        final_endpoints = config["endpoints"]  # May differ if drop_worst won

        log.important(f"Best strategy: {aggregation_strategy}")
        log.important(f"Endpoints: {final_endpoints}")
        log.important(f"Model weights: {model_weights}")

        # Delete existing model if it exists
        log.important(f"Trying to delete existing model {name}...")
        ModelCore.managed_delete(name)

        # Run training and register model
        aws_clamp = AWSAccountClamp()
        estimator = cls._run_training(
            name, final_endpoints, target_column, model_weights, aggregation_strategy, corr_scale, aws_clamp
        )
        cls._register_model(name, final_endpoints, description, tags, estimator, aws_clamp)

        # Set metadata and onboard
        cls._set_metadata(name, target_column, feature_list, feature_set_name, final_endpoints)

        log.important(f"MetaModel {name} created successfully!")
        return cls(name)

    @classmethod
    def _resolve_id_column(cls, endpoint_name: str) -> str:
        """Backtrace an endpoint to find the ID column from its underlying FeatureSet.

        Args:
            endpoint_name (str): Endpoint name to backtrace

        Returns:
            str: The ID column name from the FeatureSet
        """
        ep = Endpoint(endpoint_name)
        model = Model(ep.get_input())
        feature_set_name = model.get_input()
        fs = FeatureSet(feature_set_name)
        log.info(f"Resolved id_column='{fs.id_column}' from {endpoint_name} -> {model.name} -> {feature_set_name}")
        return fs.id_column

    @classmethod
    def _validate_and_get_lineage(cls, endpoints: list[str]) -> tuple[list[str], str, str, str]:
        """Validate endpoints exist and get lineage info from primary endpoint.

        Backtraces: endpoint → model → feature_set to resolve features, id_column,
        and target_column automatically.

        Args:
            endpoints (list[str]): List of endpoint names

        Returns:
            tuple: (feature_list, feature_set_name, id_column, target_column)
        """
        log.info("Verifying endpoints...")
        for ep_name in endpoints:
            ep = Endpoint(ep_name)
            if not ep.exists():
                raise ValueError(f"Endpoint '{ep_name}' does not exist")
            log.info(f"  {ep_name}: OK")

        # Use first endpoint as primary - backtrack to get model and feature set
        primary_endpoint = Endpoint(endpoints[0])
        primary_model = Model(primary_endpoint.get_input())
        feature_list = primary_model.features()
        feature_set_name = primary_model.get_input()
        target_column = primary_model.target()
        fs = FeatureSet(feature_set_name)
        id_column = fs.id_column

        log.info(
            f"Primary: {endpoints[0]} -> {primary_model.name} -> {feature_set_name} "
            f"(id_column: {id_column}, target: {target_column})"
        )
        return feature_list, feature_set_name, id_column, target_column

    @classmethod
    def _run_training(
        cls,
        name: str,
        endpoints: list[str],
        target_column: str,
        model_weights: dict[str, float],
        aggregation_strategy: str,
        corr_scale: dict[str, float] | None,
        aws_clamp: AWSAccountClamp,
    ) -> Estimator:
        """Run the minimal training job that saves the meta model config.

        Args:
            name (str): Model name
            endpoints (list[str]): List of endpoint names
            target_column (str): Target column name
            model_weights (dict): Dict mapping endpoint name to weight
            aggregation_strategy (str): Ensemble aggregation strategy name
            corr_scale (dict): Dict mapping endpoint name to |confidence_error_correlation|
            aws_clamp (AWSAccountClamp): AWS account clamp

        Returns:
            Estimator: The fitted estimator
        """
        sm_session = aws_clamp.sagemaker_session()
        cm = ConfigManager()
        workbench_bucket = cm.get_config("WORKBENCH_BUCKET")
        models_s3_path = f"s3://{workbench_bucket}/models"

        # Generate the model script from template
        template_params = {
            "model_type": ModelType.REGRESSOR,
            "model_framework": ModelFramework.META,
            "endpoints": endpoints,
            "target_column": target_column,
            "model_weights": model_weights,
            "aggregation_strategy": aggregation_strategy,
            "corr_scale": corr_scale or {},
            "model_metrics_s3_path": f"{models_s3_path}/{name}/training",
            "aws_region": sm_session.boto_region_name,
        }
        script_path = generate_model_script(template_params)

        # Create estimator
        training_image = ModelImages.get_image_uri(sm_session.boto_region_name, "meta_training")
        log.info(f"Using Meta Training Image: {training_image}")
        estimator = Estimator(
            entry_point=Path(script_path).name,
            source_dir=str(Path(script_path).parent),
            role=aws_clamp.aws_session.get_workbench_execution_role_arn(),
            instance_count=1,
            instance_type="ml.m5.large",
            sagemaker_session=sm_session,
            image_uri=training_image,
        )

        # Run training (no input data needed - just saves config)
        log.important(f"Creating MetaModel {name}...")
        estimator.fit()

        return estimator

    @classmethod
    def _register_model(
        cls,
        name: str,
        endpoints: list[str],
        description: str,
        tags: list[str],
        estimator: Estimator,
        aws_clamp: AWSAccountClamp,
    ):
        """Create model group and register the model.

        Args:
            name (str): Model name
            endpoints (list[str]): List of endpoint names
            description (str): Model description
            tags (list[str]): Model tags
            estimator (Estimator): Fitted estimator
            aws_clamp (AWSAccountClamp): AWS account clamp
        """
        sm_session = aws_clamp.sagemaker_session()
        model_description = description or f"Meta model aggregating: {', '.join(endpoints)}"

        # Create model group
        aws_clamp.sagemaker_client().create_model_package_group(
            ModelPackageGroupName=name,
            ModelPackageGroupDescription=model_description,
            Tags=[{"Key": "workbench_tags", "Value": "::".join(tags or [name])}],
        )

        # Register the model with meta inference image
        inference_image = ModelImages.get_image_uri(sm_session.boto_region_name, "meta_inference")
        log.important(f"Registering model {name} with Inference Image {inference_image}...")
        estimator.create_model(role=aws_clamp.aws_session.get_workbench_execution_role_arn()).register(
            model_package_group_name=name,
            image_uri=inference_image,
            content_types=["text/csv"],
            response_types=["text/csv"],
            inference_instances=supported_instance_types("x86_64"),
            transform_instances=["ml.m5.large", "ml.m5.xlarge"],
            approval_status="Approved",
            description=model_description,
        )

    @classmethod
    def _set_metadata(
        cls, name: str, target_column: str, feature_list: list[str], feature_set_name: str, endpoints: list[str]
    ):
        """Set model metadata and onboard.

        Args:
            name (str): Model name
            target_column (str): Target column name
            feature_list (list[str]): List of feature names
            feature_set_name (str): Name of the input FeatureSet
            endpoints (list[str]): List of endpoint names
        """
        time.sleep(3)
        output_model = ModelCore(name)
        output_model._set_model_type(ModelType.UQ_REGRESSOR)
        output_model._set_model_framework(ModelFramework.META)
        output_model.set_input(feature_set_name, force=True)
        output_model.upsert_workbench_meta({"workbench_model_target": target_column})
        output_model.upsert_workbench_meta({"workbench_model_features": feature_list})
        output_model.upsert_workbench_meta({"endpoints": endpoints})
        output_model.onboard_with_args(ModelType.UQ_REGRESSOR, target_column, feature_list=feature_list)

create(name, endpoints, description=None, tags=None, capture_name='full_cross_fold') classmethod

Create a new MetaModel from a list of endpoints.

Automatically backtraces endpoint lineage to resolve the target column and ID column, then runs ensemble simulation to find the best aggregation strategy, model weights, and confidence calibration parameters.

Parameters:

Name Type Description Default
name str

Name for the meta model

required
endpoints list[str]

List of endpoint names to aggregate

required
description str

Optional description for the model

None
tags list[str]

Optional list of tags

None
capture_name str

Inference capture name for simulation (default: 'full_cross_fold')

'full_cross_fold'

Returns:

Name Type Description
MetaModel 'MetaModel'

The created meta model

Source code in src/workbench/api/meta_model.py
@classmethod
def create(
    cls,
    name: str,
    endpoints: list[str],
    description: str = None,
    tags: list[str] = None,
    capture_name: str = "full_cross_fold",
) -> "MetaModel":
    """Create a new MetaModel from a list of endpoints.

    Automatically backtraces endpoint lineage to resolve the target column
    and ID column, then runs ensemble simulation to find the best aggregation
    strategy, model weights, and confidence calibration parameters.

    Args:
        name (str): Name for the meta model
        endpoints (list[str]): List of endpoint names to aggregate
        description (str): Optional description for the model
        tags (list[str]): Optional list of tags
        capture_name (str): Inference capture name for simulation
            (default: 'full_cross_fold')

    Returns:
        MetaModel: The created meta model
    """
    from workbench.utils.meta_model_simulator import MetaModelSimulator

    Artifact.is_name_valid(name, delimiter="-", lower_case=False)

    # Validate endpoints and get lineage info from primary endpoint
    feature_list, feature_set_name, id_column, target_column = cls._validate_and_get_lineage(endpoints)

    # Run ensemble simulation to find best strategy
    log.important("Running ensemble simulation to find best strategy...")
    sim = MetaModelSimulator(endpoints, id_column=id_column, capture_name=capture_name)
    sim.report()
    config = sim.get_best_strategy_config()

    # Use the simulator's recommended config
    aggregation_strategy = config["aggregation_strategy"]
    model_weights = config["model_weights"]
    corr_scale = config["corr_scale"]
    final_endpoints = config["endpoints"]  # May differ if drop_worst won

    log.important(f"Best strategy: {aggregation_strategy}")
    log.important(f"Endpoints: {final_endpoints}")
    log.important(f"Model weights: {model_weights}")

    # Delete existing model if it exists
    log.important(f"Trying to delete existing model {name}...")
    ModelCore.managed_delete(name)

    # Run training and register model
    aws_clamp = AWSAccountClamp()
    estimator = cls._run_training(
        name, final_endpoints, target_column, model_weights, aggregation_strategy, corr_scale, aws_clamp
    )
    cls._register_model(name, final_endpoints, description, tags, estimator, aws_clamp)

    # Set metadata and onboard
    cls._set_metadata(name, target_column, feature_list, feature_set_name, final_endpoints)

    log.important(f"MetaModel {name} created successfully!")
    return cls(name)

simulate(endpoints, capture_name='full_cross_fold') classmethod

Run ensemble simulation to analyze how different strategies perform.

Backtraces the first endpoint's lineage to automatically resolve the ID column from the underlying FeatureSet.

Parameters:

Name Type Description Default
endpoints list[str]

List of endpoint names to include in the simulation

required
capture_name str

Inference capture name to load predictions from (default: 'full_cross_fold')

'full_cross_fold'

Returns:

Name Type Description
MetaModelSimulator 'MetaModelSimulator'

Simulator instance for analysis and reporting

Source code in src/workbench/api/meta_model.py
@classmethod
def simulate(
    cls,
    endpoints: list[str],
    capture_name: str = "full_cross_fold",
) -> "MetaModelSimulator":
    """Run ensemble simulation to analyze how different strategies perform.

    Backtraces the first endpoint's lineage to automatically resolve the
    ID column from the underlying FeatureSet.

    Args:
        endpoints (list[str]): List of endpoint names to include in the simulation
        capture_name (str): Inference capture name to load predictions from
            (default: 'full_cross_fold')

    Returns:
        MetaModelSimulator: Simulator instance for analysis and reporting
    """
    from workbench.utils.meta_model_simulator import MetaModelSimulator

    id_column = cls._resolve_id_column(endpoints[0])
    return MetaModelSimulator(endpoints, id_column=id_column, capture_name=capture_name)

Questions?

The SuperCowPowers team is happy to answer any questions you may have about AWS and Workbench.

® Amazon Web Services, AWS, the Powered by AWS logo, are trademarks of Amazon.com, Inc. or its affiliates