Release 0.8.74
Need Help?
The SuperCowPowers team is happy to give any assistance needed when setting up AWS and Workbench. So please contact us at workbench@supercowpowers.com or on chat us up on Discord
The Workbench framework continues to flex to support different real world use cases when operating a set of production machine learning pipelines.
Note: These release notes cover the changes from 0.8.71
to 0.8.74
General
This release is an incremental release as part of the road map for v.0.9.0
. Please see the full details of the planned changes here: v0.9.0 Roadmap.
S3 based Plugins
We've added the ability to grab Dashboard plugins directly from an S3 bucket. Please see Dashboard S3 Plugin
Artifact Hashes
For content verification purposes we've added a hash()
method to all of the Workbench Artifact classes (DataSource, FeatureSet, Model, Endpoint, Graph, etc). Also for DataSources and FeatureSets there is a table_hash()
method that will compute a total hash of all data in the Athena table.
ds = DataSource("abalone_data")
ds.modified()
Out[2]: datetime.datetime(2024, 11, 17, 19, 45, 58, tzinfo=tzlocal())
ds.hash()
Out[3]: '67a9ebb495af573604794aa9c31eded8'
ds.table_hash()
Out[4]: '622f5ddba9d4cad2cf642d1ea5555de9'
fs = FeatureSet("test_features")
fs.hash()
Out[5]: '1571eee207b72f14bd5065d6c4acdaaf'
# Note: Model/Endpoint hashes will backtrack to model.tar.gz and can be used for validation
model = Model("abalone-regression")
end = Endpoint("abalone-regression-end")
model.get_model_data_url()
Out[6]: 's3://sagemaker-us-west-2-507740646243/abalone-regression-2024-11-18-03-09/output/model.tar.gz'
model.hash()
Out[7]: '00def9381366cdd062413d0b395ba70c'
# Verify endpoint is using expected model
end.hash()
Out[7]: '00def9381366cdd062413d0b395ba70c'
# Realtime endpoint created from the same model
end = Endpoint("abalone-regression-end-rt")
end.hash()
Out[8]: '00def9381366cdd062413d0b395ba70c'
Note: You will get a performance warning when running hash() or table_hash() as these operations compute actual content hashes on the underlying data.
Custom Model Scripts
We're expanding our custom model script to include some additional chemical informatics capabilities. In addition to our molecular descriptors and Morgan fingerprints we also have RDKIT canonicalization and tautomerization.
API Changes
FeatureSet
When creating a FeatureSet the id_column
is now NOT required. The old "auto" functionality is gone, if you don't specify an id_column
and id column named auto_id
will be automatically generated.
ds = DataSource("test_data")
fs = ds.to_features("test_features", id_column="auto")
<replaced by>
fs = ds.to_features("test_features")
# We do recommend that you specify an id :)
fs = ds.to_features("test_features", id_column="my_id")
-
get_database()
has a deprecation warning, it's replaced with just thedatabase
property. -
Added the
hash()
method to Artifacts (see above). - Added
table_hash()
method to DataSources and FeatuerSet (see above).
Internal Changes
There was a small refactor of the cache decorator. We fixed a case where if we blocked on getting a value we also spun up a background thread to get it. This chance will not affect existing code or APIs.
Specific Code Changes
Who doesn't like looking at code! Also +3 points for getting down this far! Here's a cow joke as a reward:
What do you call that feeling like you’ve done this before? Deja-moo
Questions?
The SuperCowPowers team is happy to answer any questions you may have about AWS and Workbench. Please contact us at workbench@supercowpowers.com or on chat us up on Discord