Skip to content

Release 0.8.74

Need Help?

The SuperCowPowers team is happy to give any assistance needed when setting up AWS and Workbench. So please contact us at workbench@supercowpowers.com or on chat us up on Discord

The Workbench framework continues to flex to support different real world use cases when operating a set of production machine learning pipelines.

Note: These release notes cover the changes from 0.8.71 to 0.8.74

General

This release is an incremental release as part of the road map for v.0.9.0. Please see the full details of the planned changes here: v0.9.0 Roadmap.

S3 based Plugins

We've added the ability to grab Dashboard plugins directly from an S3 bucket. Please see Dashboard S3 Plugin

Artifact Hashes

For content verification purposes we've added a hash() method to all of the Workbench Artifact classes (DataSource, FeatureSet, Model, Endpoint, Graph, etc). Also for DataSources and FeatureSets there is a table_hash() method that will compute a total hash of all data in the Athena table.

ds = DataSource("abalone_data")

ds.modified()
Out[2]: datetime.datetime(2024, 11, 17, 19, 45, 58, tzinfo=tzlocal())

ds.hash()
Out[3]: '67a9ebb495af573604794aa9c31eded8'

ds.table_hash()
Out[4]: '622f5ddba9d4cad2cf642d1ea5555de9'

fs = FeatureSet("test_features")

fs.hash()
Out[5]: '1571eee207b72f14bd5065d6c4acdaaf'

# Note: Model/Endpoint hashes will backtrack to model.tar.gz and can be used for validation
model = Model("abalone-regression")
end = Endpoint("abalone-regression-end")

model.get_model_data_url()
Out[6]: 's3://sagemaker-us-west-2-507740646243/abalone-regression-2024-11-18-03-09/output/model.tar.gz'

model.hash()
Out[7]: '00def9381366cdd062413d0b395ba70c'

# Verify endpoint is using expected model
end.hash()
Out[7]: '00def9381366cdd062413d0b395ba70c'

# Realtime endpoint created from the same model
end = Endpoint("abalone-regression-end-rt")
end.hash()
Out[8]: '00def9381366cdd062413d0b395ba70c'

Note: You will get a performance warning when running hash() or table_hash() as these operations compute actual content hashes on the underlying data.

Custom Model Scripts

We're expanding our custom model script to include some additional chemical informatics capabilities. In addition to our molecular descriptors and Morgan fingerprints we also have RDKIT canonicalization and tautomerization.

API Changes

FeatureSet

When creating a FeatureSet the id_column is now NOT required. The old "auto" functionality is gone, if you don't specify an id_column and id column named auto_id will be automatically generated.

ds = DataSource("test_data")
fs = ds.to_features("test_features", id_column="auto")
<replaced by>
fs = ds.to_features("test_features")

# We do recommend that you specify an id :)
fs = ds.to_features("test_features", id_column="my_id")
  • get_database() has a deprecation warning, it's replaced with just the database property.

    ds.get_database()
    <replaced by>
    ds.database
    
  • Added the hash() method to Artifacts (see above).

  • Added table_hash() method to DataSources and FeatuerSet (see above).

Internal Changes

There was a small refactor of the cache decorator. We fixed a case where if we blocked on getting a value we also spun up a background thread to get it. This chance will not affect existing code or APIs.

Specific Code Changes

Code Diff v0.8.71 --> v0.8.74

Who doesn't like looking at code! Also +3 points for getting down this far! Here's a cow joke as a reward:

What do you call that feeling like you’ve done this before? Deja-moo

Questions?

The SuperCowPowers team is happy to answer any questions you may have about AWS and Workbench. Please contact us at workbench@supercowpowers.com or on chat us up on Discord