Features To Model
API Classes
For most users the API Classes will provide all the general functionality to create a full AWS ML Pipeline
FeaturesToModel: Train/Create a Model from a Feature Set
FeaturesToModel
Bases: Transform
FeaturesToModel: Train/Create a Model from a FeatureSet
Common Usage
from sageworks.core.transforms.features_to_model.features_to_model import FeaturesToModel
to_model = FeaturesToModel(feature_uuid, model_uuid, model_type=ModelType)
to_model.set_output_tags(["abalone", "public", "whatever"])
to_model.transform(target_column="class_number_of_rings",
feature_list=["my", "best", "features"])
Source code in src/sageworks/core/transforms/features_to_model/features_to_model.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 |
|
__init__(feature_uuid, model_uuid, model_type, model_class=None, model_import_str=None, custom_script=None)
FeaturesToModel Initialization Args: feature_uuid (str): UUID of the FeatureSet to use as input model_uuid (str): UUID of the Model to create as output model_type (ModelType): ModelType.REGRESSOR or ModelType.CLASSIFIER, etc. model_class (str, optional): The class of the model (default None) model_import_str (str, optional): The import string for the model (default None) custom_script (str, optional): Custom script to use for the model (default None)
Source code in src/sageworks/core/transforms/features_to_model/features_to_model.py
create_and_register_model()
Create and Register the Model
Source code in src/sageworks/core/transforms/features_to_model/features_to_model.py
post_transform(**kwargs)
Post-Transform: Calling onboard() on the Model
Source code in src/sageworks/core/transforms/features_to_model/features_to_model.py
transform_impl(target_column, description=None, feature_list=None, train_all_data=False)
Generic Features to Model: Note you should create a new class and inherit from this one to include specific logic for your Feature Set/Model Args: target_column (str): Column name of the target variable description (str): Description of the model (optional) feature_list (list[str]): A list of columns for the features (default None, will try to guess) train_all_data (bool): Train on ALL (100%) of the data (default False)
Source code in src/sageworks/core/transforms/features_to_model/features_to_model.py
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 |
|
Supported Models
Currently SageWorks supports XGBoost (classifier/regressor), and Scikit Learn models. Those models can be created by just specifying different parameters to the FeaturesToModel
class. The main issue with the supported models is they are vanilla versions with default parameters, any customization should be done with Custom Models
XGBoost
from sageworks.core.transforms.features_to_model.features_to_model import FeaturesToModel
# XGBoost Regression Model
input_uuid = "abalone_features"
output_uuid = "abalone-regression"
to_model = FeaturesToModel(input_uuid, output_uuid, model_type=ModelType.REGRESSOR)
to_model.set_output_tags(["abalone", "public"])
to_model.transform(target_column="class_number_of_rings", description="Abalone Regression")
# XGBoost Classification Model
input_uuid = "wine_features"
output_uuid = "wine-classification"
to_model = FeaturesToModel(input_uuid, output_uuid, ModelType.CLASSIFIER)
to_model.set_output_tags(["wine", "public"])
to_model.transform(target_column="wine_class", description="Wine Classification")
# Quantile Regression Model (Abalone)
input_uuid = "abalone_features"
output_uuid = "abalone-quantile-reg"
to_model = FeaturesToModel(input_uuid, output_uuid, ModelType.QUANTILE_REGRESSOR)
to_model.set_output_tags(["abalone", "quantiles"])
to_model.transform(target_column="class_number_of_rings", description="Abalone Quantile Regression")
Scikit-Learn
from sageworks.core.transforms.features_to_model.features_to_model import FeaturesToModel
# Scikit-Learn Kmeans Clustering Model
input_uuid = "wine_features"
output_uuid = "wine-clusters"
to_model = FeaturesToModel(
input_uuid,
output_uuid,
model_class="KMeans", # Clustering algorithm
model_import_str="from sklearn.cluster import KMeans", # Import statement for KMeans
model_type=ModelType.CLUSTERER,
)
to_model.set_output_tags(["wine", "clustering"])
to_model.transform(target_column=None, description="Wine Clustering", train_all_data=True)
# Scikit-Learn HDBSCAN Clustering Model
input_uuid = "wine_features"
output_uuid = "wine-clusters-hdbscan"
to_model = FeaturesToModel(
input_uuid,
output_uuid,
model_class="HDBSCAN", # Density-based clustering algorithm
model_import_str="from sklearn.cluster import HDBSCAN",
model_type=ModelType.CLUSTERER,
)
to_model.set_output_tags(["wine", "density-based clustering"])
to_model.transform(target_column=None, description="Wine Clustering with HDBSCAN", train_all_data=True)
# Scikit-Learn 2D Projection Model using UMAP
input_uuid = "wine_features"
output_uuid = "wine-2d-projection"
to_model = FeaturesToModel(
input_uuid,
output_uuid,
model_class="UMAP",
model_import_str="from umap import UMAP",
model_type=ModelType.PROJECTION,
)
to_model.set_output_tags(["wine", "2d-projection"])
to_model.transform(target_column=None, description="Wine 2D Projection", train_all_data=True)
Custom Models
For custom models we recommend the following steps:
Experimental
The SageWorks Custom Models are currently in experimental mode so have fun but expect issues. Requires sageworks >= 0.8.60
. Feel free to submit issues to SageWorks Github
- Copy the example custom model script into your own directory
- See: Custom Model Script
- Make a requirements.txt and put into the same directory
- Train/deploy the ^existing^ example
- This is an important step, don't skip it
- If the existing model script trains/deploys your in great shape for the next step, if it doesn't then now is a good time to debug AWS account/permissions/etc.
- Now customize the model script
- Train/deploy your custom script
Training/Deploying Custom Models
from sageworks.api import ModelType
from sageworks.core.transforms.features_to_model.features_to_model import FeaturesToModel
# Note this directory should also have a requirements.txt in it
my_custom_script = "/full/path/to/my/directory/my_custom_script.py"
input_uuid = "wine_features" # FeatureSet you want to use
output_uuid = "my-custom-model" # change to whatever
target_column = "wine-class" # change to whatever
to_model = FeaturesToModel(input_uuid, output_uuid,
model_type=ModelType.CLASSIFIER,
custom_script=my_custom_script)
to_model.set_output_tags(["your", "tags"])
to_model.transform(target_column=target_column, description="Custom Model")
Custom Models: Create an Endpoint/Run Inference
from sageworks.api import Model, Endpoint
model = Model("my-custom-model")
end = model.to_endpoint() # Note: This takes a while
# Now run inference on my custom model :)
end.auto_inference(capture=True)
# Run inference with my own dataframe
df = fs.pull_dataframe() # Or whatever dataframe
end.inference(df)
Questions?
The SuperCowPowers team is happy to answer any questions you may have about AWS and SageWorks. Please contact us at sageworks@supercowpowers.com or on chat us up on Discord