Meta Models (Ensembles)
Ensemble Benefits
Meta models combine predictions from multiple endpoints (XGBoost, PyTorch, ChemProp, etc.) to improve accuracy and robustness. Different model frameworks often make different errors, so aggregating them can reduce overall prediction error.
Meta models aggregate predictions from multiple deployed endpoints into a single ensemble prediction. Rather than training on raw features, a meta model calls existing endpoints at inference time and combines their outputs using confidence-weighted voting strategies.
Why Use a Meta Model?
Different model frameworks capture different aspects of molecular structure:
- XGBoost excels at tabular features (RDKit descriptors)
- PyTorch can learn nonlinear descriptor interactions
- ChemProp operates directly on molecular graphs
By combining predictions across frameworks, you get:
- Lower error — individual model mistakes get averaged out
- Better calibration — ensemble disagreement provides a natural uncertainty signal
- Robustness — no single model failure dominates the prediction
Quick Start
Simulate Ensemble Performance
Before creating a meta model, you can simulate how different aggregation strategies would perform using existing endpoint predictions:
from workbench.api import MetaModel
# Simulate ensemble performance across endpoints
sim = MetaModel.simulate(["logd-xgb-end", "logd-pytorch-end", "logd-chemprop-end"])
sim.report()
Example Output
=== Individual Model Performance ===
logd-xgb-end: MAE=0.428 RMSE=0.580 R²=0.817
logd-pytorch-end: MAE=0.445 RMSE=0.594 R²=0.808
logd-chemprop-end: MAE=0.412 RMSE=0.558 R²=0.831
=== Ensemble Strategy Comparison ===
Strategy MAE RMSE R²
simple_mean 0.391 0.532 0.846
confidence_weighted 0.388 0.528 0.849
inverse_mae_weighted 0.385 0.525 0.850
scaled_conf_weighted 0.383 0.522 0.852 ← Best
calibrated_conf_weighted 0.384 0.523 0.851
drop_worst 0.398 0.541 0.841
The simulator analyzes all aggregation strategies and identifies the one that gives the best performance on held-out cross-fold data.
Create a Meta Model
Once you're satisfied with the simulation results, create the meta model. It auto-simulates internally to pick the best strategy:
from workbench.api import MetaModel
# Create a meta model (auto-simulates to find best strategy)
meta = MetaModel.create(
name="logd-meta",
endpoints=["logd-xgb-end", "logd-pytorch-end", "logd-chemprop-end"],
description="Meta model for LogD prediction",
tags=["meta", "logd", "ensemble"],
)
print(meta.summary())
Deploy the Meta Model
Meta models deploy like any other Workbench model:
# Deploy to an AWS Endpoint
endpoint = meta.to_endpoint(tags=["meta", "logd"])
# Run inference — the meta endpoint calls child endpoints internally
results_df = endpoint.inference(my_dataframe)
At inference time, the meta endpoint calls all child endpoints in parallel, collects their predictions and confidence scores, and aggregates them using the selected strategy.
Aggregation Strategies
The meta model supports five aggregation strategies for combining endpoint predictions:
| Strategy | Description | When to Use |
|---|---|---|
| simple_mean | Equal weight to all endpoints | Baseline; all models perform similarly |
| confidence_weighted | Weight by per-row confidence score | Models have well-calibrated confidence |
| inverse_mae_weighted | Static weights from inverse MAE (lower error = higher weight) | Default; good when per-row confidence isn't reliable |
| scaled_conf_weighted | Inverse-MAE weights × per-row confidence | Best of both worlds; often the top performer |
| calibrated_conf_weighted | Confidence scaled by |confidence–error correlation| | Rewards models whose confidence actually predicts accuracy |
How Strategies Work
All strategies produce per-row weights that are normalized to sum to 1. The final prediction is:
For the confidence-based strategies, each row can have different weights — if one model is very confident about a particular compound and another is not, the confident model gets more influence on that row.
Zero-Confidence Fallback
When all child endpoints report zero confidence for a given row (e.g., the compound is outside the training domain for all models), the confidence-weighted strategies fall back to static inverse-MAE weights. This prevents degenerate predictions and ensures every row gets a reasonable ensemble output.
Simulation Deep Dive
The simulator provides several methods for detailed analysis:
sim = MetaModel.simulate(["logd-xgb-end", "logd-pytorch-end", "logd-chemprop-end"])
# Full report with all strategies
sim.report()
# Get the best strategy configuration (used internally by create())
config = sim.get_best_strategy_config()
print(config)
# {'aggregation_strategy': 'scaled_conf_weighted',
# 'model_weights': {'logd-xgb-end': 0.34, 'logd-pytorch-end': 0.30, ...},
# 'corr_scale': {'logd-xgb-end': 0.62, 'logd-pytorch-end': 0.55, ...},
# 'endpoints': ['logd-xgb-end', 'logd-pytorch-end', 'logd-chemprop-end'],
# 'target_column': 'logd'}
# Export the best ensemble's predictions to CSV
df = sim.best_ensemble_predictions()
df.to_csv("ensemble_predictions.csv", index=False)
Drop-Worst Analysis
The simulator also evaluates whether removing the worst-performing model improves the ensemble. If dropping a model reduces error, the get_best_strategy_config() method returns the reduced endpoint list. This is handled automatically by MetaModel.create().
CLI Tool
The meta_model_sim CLI provides quick ensemble analysis from the command line:
# Simulate ensemble performance
meta_model_sim logd-xgb-end logd-pytorch-end logd-chemprop-end
# Use a specific inference capture
meta_model_sim logd-xgb-end logd-pytorch-end logd-chemprop-end \
--capture-name full_cross_fold
# Save best ensemble predictions to CSV
meta_model_sim logd-xgb-end logd-pytorch-end logd-chemprop-end \
--output ensemble_results.csv
How It Works Under the Hood
Creation Flow
When you call MetaModel.create(), the following happens:
- Lineage resolution — Backtraces the first endpoint's lineage (endpoint → model → FeatureSet) to automatically resolve the target column, ID column, and feature list
- Simulation — Runs
MetaModelSimulatorto evaluate all aggregation strategies on cross-fold prediction data - Strategy selection — Picks the best-performing strategy, including checking if dropping the worst model helps
- Training job — Runs a minimal SageMaker training job that saves the meta configuration (endpoints, weights, strategy) as a model artifact
- Registration — Creates a SageMaker Model Package with the meta inference container
- Metadata — Sets Workbench metadata (model type, framework, features, endpoints)
Inference Flow
When a meta endpoint receives a prediction request:
- Parse input — Reads the incoming CSV/JSON data
- Fan out — Calls all child endpoints in parallel using
fast_inference - Aggregate — Combines predictions using the stored aggregation strategy and weights
- Return — Outputs aggregated prediction, prediction_std (ensemble disagreement), and confidence
API Reference
MetaModel: A Model that aggregates predictions from multiple endpoints.
MetaModels don't train on feature data - they combine predictions from existing endpoints using confidence-weighted voting. This provides ensemble benefits across different model frameworks (XGBoost, PyTorch, ChemProp, etc.).
MetaModel
Bases: Model
MetaModel: A Model that aggregates predictions from endpoints.
Common Usage
# Simulate ensemble performance before creating
sim = MetaModel.simulate(["endpoint-1", "endpoint-2", "endpoint-3"])
sim.report()
# Create a meta model (auto-simulates to find best strategy)
meta = MetaModel.create(
name="my-meta-model",
endpoints=["endpoint-1", "endpoint-2", "endpoint-3"],
target_column="target"
)
# Deploy like any other model
endpoint = meta.to_endpoint()
Source code in src/workbench/api/meta_model.py
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 | |
create(name, endpoints, description=None, tags=None, capture_name='full_cross_fold')
classmethod
Create a new MetaModel from a list of endpoints.
Automatically backtraces endpoint lineage to resolve the target column and ID column, then runs ensemble simulation to find the best aggregation strategy, model weights, and confidence calibration parameters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Name for the meta model |
required |
endpoints
|
list[str]
|
List of endpoint names to aggregate |
required |
description
|
str
|
Optional description for the model |
None
|
tags
|
list[str]
|
Optional list of tags |
None
|
capture_name
|
str
|
Inference capture name for simulation (default: 'full_cross_fold') |
'full_cross_fold'
|
Returns:
| Name | Type | Description |
|---|---|---|
MetaModel |
'MetaModel'
|
The created meta model |
Source code in src/workbench/api/meta_model.py
simulate(endpoints, capture_name='full_cross_fold')
classmethod
Run ensemble simulation to analyze how different strategies perform.
Backtraces the first endpoint's lineage to automatically resolve the ID column from the underlying FeatureSet.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
endpoints
|
list[str]
|
List of endpoint names to include in the simulation |
required |
capture_name
|
str
|
Inference capture name to load predictions from (default: 'full_cross_fold') |
'full_cross_fold'
|
Returns:
| Name | Type | Description |
|---|---|---|
MetaModelSimulator |
'MetaModelSimulator'
|
Simulator instance for analysis and reporting |
Source code in src/workbench/api/meta_model.py
Questions?

The SuperCowPowers team is happy to answer any questions you may have about AWS and Workbench.
- Support: workbench@supercowpowers.com
- Discord: Join us on Discord
- Website: supercowpowers.com
® Amazon Web Services, AWS, the Powered by AWS logo, are trademarks of Amazon.com, Inc. or its affiliates