Release 0.8.50

Need Help?

The SuperCowPowers team is happy to give any assistance needed when setting up AWS and SageWorks. So please contact us at sageworks@supercowpowers.com or on chat us up on Discord

The SageWorks framework continues to flex to support different real world use cases when operating a set of production machine learning pipelines.

Note: These release notes cover the changes from 0.8.46 to 0.8.50

General

This release is an incremental release as part of the road map for v.0.9.0. Please see the full details of the planned changes here: v0.9.0 Roadmap.

FeatureSet: id_column lock in

We're going to lock in id_columns when FeatureSets are created, AWS FeatureGroup requires an id column, so this is the best place to do it, see API Changes below.

FeatureSet: Robust handling of training column

In the past we haven't supported giving a training column as input data. FeatureSets are read-only, so locking in the training rows is 'suboptimal'. In general you might want to use the FeatureSet for several models with different training/hold_out sets. Now if a FeatureSet detects a training column it will give the follow message:

Training column detected: Since FeatureSets are read only, SageWorks 
creates training views that can be dynamically changed. We'll use 
this training column to create a training view.

Endpoint: auto_inference()

We're changing the internal logic for the auto_inference() method to include the id_column in it's output.

API Changes

FeatureSet

When creating a FeatureSet the id_column is now a required argument.

ds = DataSource("test_data")
fs = ds.to_features("test_features", id_column="my_id") <-- Required

to_features = PandasToFeatures("my_feature_set")
to_features.set_input(df_features, id_column="my_id")  <-- Required
to_features.set_output_tags(["blah", "whatever"])
to_features.transform()

If you're data doesn't have a id column you can specify "auto"

to_features = PandasToFeatures("my_feature_set")
to_features.set_input(df_features, id_column="auto")  <-- Auto Id (index)

For more details see: FeatureSet Class

The new Meta() API will be used inside of the Artifact classes (see Internal Changes...Artifacts... below)

Improvements

DFStore

Robust handling of slashes, so now it will 'just work' with various upserts and gets:

```
# These all give you /ml/shap_value dataframe
df_store.get("/ml/shap_values")
df_store.get("ml/shap_values")
df_store.get("//ml/shap_values")
```

Internal Changes

There's a whole new directory structure that helps isolate Cloud Platform specific funcitonality.

- sageworks/src
     - core/cloud_platform
        - aws
        - azure
        - gcp

The DFStore now uses AWSDFStore as its concrete implementation class.
Both CachedMeta and AWSAccountClamp have had a revamp of their singleton logic.

Internal Caching

So as part of our v0.9.0 Roadmap we're continuing to revamp caching. We're experimenting with CachedMeta Class inside the Artifact classes. Caching continues to be challenging for the framework, it's an absolute must for Web Inferface/UI performance and then it needs to get out of the way for batch jobs and the concurrent building of ML pipelines.

Specific Code Changes

Code Diff v0.8.46 --> v0.8.50

Who doesn't like looking at code! Also +3 points for getting down this far! Here's a cow joke as a reward:

That feeling like you’ve done this before? .... Deja-moo

Questions?

The SuperCowPowers team is happy to answer any questions you may have about AWS and SageWorks. Please contact us at sageworks@supercowpowers.com or on chat us up on Discord