Skip to content

Release 0.8.72

Need Help?

The SuperCowPowers team is happy to give any assistance needed when setting up AWS and SageWorks. So please contact us at sageworks@supercowpowers.com or on chat us up on Discord

The SageWorks framework continues to flex to support different real world use cases when operating a set of production machine learning pipelines.

Note: These release notes cover the changes from 0.8.71 to 0.8.72

General

This release is an incremental release as part of the road map for v.0.9.0. Please see the full details of the planned changes here: v0.9.0 Roadmap.

Artifact Hashes

For content verification purposes we've added a hash() method to all of the SageWorks Artifact classes (DataSource, FeatureSet, Model, Endpoint, Graph, etc). Also for DataSources and FeatureSets there is a table_hash() method that will compute a total hash of all data in the Athena table.

ds = DataSource("abalone_data")

ds.modified()
Out[2]: datetime.datetime(2024, 11, 17, 19, 45, 58, tzinfo=tzlocal())

ds.hash()
Out[3]: '67a9ebb495af573604794aa9c31eded8'

ds.table_hash()
Out[4]: '622f5ddba9d4cad2cf642d1ea5555de9'

fs = FeatureSet("test_features")

fs.hash()
Out[5]: '1571eee207b72f14bd5065d6c4acdaaf'

# Note: Model/Endpoint hashes will backtrack to model.tar.gz and can be used for validation
model = Model("abalone-regression")
end = Endpoint("abalone-regression-end")

model.get_model_data_url()
Out[6]: 's3://sagemaker-us-west-2-507740646243/abalone-regression-2024-11-18-03-09/output/model.tar.gz'

model.hash()
Out[7]: '00def9381366cdd062413d0b395ba70c'

# Verify endpoint is using expected model
end.hash()
Out[7]: '00def9381366cdd062413d0b395ba70c'

# Realtime endpoint created from the same model
end = Endpoint("abalone-regression-end-rt")
end.hash()
Out[8]: '00def9381366cdd062413d0b395ba70c'

Note: You will get a performance warning when running table_hash() on DataSources and FeatureSets as it typically involves a deeper computation on the table contents of that artifact.

Custom Model Scripts

We're expanding our custom model script to include some additional chemical informatics capabilities. In addition to our molecular descriptors and Morgan fingerprints we also have RDKIT canonicalization and tautomerization.

API Changes

  • get_database() has a deprecation warning, it's replaced with just the database property.

    ds.get_database()
    <replaced by>
    ds.database
    
  • Added the hash() method to Artifacts (see above).

  • Added table_hash() method to DataSources and FeatuerSet (see above).

Internal Changes

There was a small refactor of the cache decorator. We fixed a case where if we blocked on getting a value we also spun up a background thread to get it. This chance will no affect existing code or APIs.

Specific Code Changes

Code Diff v0.8.71 --> v0.8.72

Who doesn't like looking at code! Also +3 points for getting down this far! Here's a cow joke as a reward:

What do you call that feeling like you’ve done this before? Deja-moo

Questions?

The SuperCowPowers team is happy to answer any questions you may have about AWS and SageWorks. Please contact us at sageworks@supercowpowers.com or on chat us up on Discord