Scikit-Learn Models

Workbench supports any scikit-learn estimator — classifiers, regressors, clustering, and more. Specify the class name and import string, and Workbench handles training, deployment, and inference on AWS.

Creating a Scikit-Learn Model

Classification (RandomForest)

from workbench.api import FeatureSet

fs = FeatureSet("wine_features")

model = fs.to_model(
    model_class="RandomForestClassifier",
    model_import_str="from sklearn.ensemble import RandomForestClassifier",
    name="wine-rfc-class",
    target_column="wine_class",
    description="Wine RandomForest Classification",
    tags=["wine", "random-forest"],
)

Clustering (KMeans)

from workbench.api import FeatureSet

fs = FeatureSet("abalone_features")

model = fs.to_model(
    model_class="KMeans",
    model_import_str="from sklearn.cluster import KMeans",
    name="abalone-kmeans",
    target_column="class_number_of_rings",
    description="Abalone KMeans Clustering",
    tags=["abalone", "kmeans"],
)

Clustering (DBSCAN)

from workbench.api import FeatureSet

fs = FeatureSet("abalone_features")

model = fs.to_model(
    model_class="DBSCAN",
    model_import_str="from sklearn.cluster import DBSCAN",
    name="abalone-dbscan",
    target_column="class_number_of_rings",
    description="Abalone DBSCAN Clustering",
    tags=["abalone", "dbscan"],
    train_all_data=True,
)

Hyperparameters

Pass scikit-learn constructor arguments directly via hyperparameters:

model = fs.to_model(
    model_class="RandomForestClassifier",
    model_import_str="from sklearn.ensemble import RandomForestClassifier",
    name="wine-rfc-tuned",
    target_column="wine_class",
    hyperparameters={
        "n_estimators": 200,
        "max_depth": 10,
        "min_samples_split": 5,
    },
)

Supported Estimators

Any scikit-learn class that follows the estimator API works. Common choices:

Task	Estimator	Import
Classification	RandomForestClassifier	`sklearn.ensemble`
Classification	GradientBoostingClassifier	`sklearn.ensemble`
Classification	SVC	`sklearn.svm`
Regression	Ridge	`sklearn.linear_model`
Regression	BayesianRidge	`sklearn.linear_model`
Regression	KNeighborsRegressor	`sklearn.neighbors`
Clustering	KMeans	`sklearn.cluster`
Clustering	DBSCAN	`sklearn.cluster`

Tip

Use train_all_data=True to train on the full dataset without a holdout split — useful for clustering or when you need maximum training data.

Examples

Full code listings: examples/models/random_forest.py, examples/models/knn.py, examples/models/dbscan.py

Questions?

The SuperCowPowers team is happy to answer any questions you may have about AWS® and Workbench.

Support: workbench@supercowpowers.com
Discord: Join us on Discord
Website: supercowpowers.com