Release 0.8.23
Need Help?
The SuperCowPowers team is happy to give any assistance needed when setting up AWS and SageWorks. So please contact us at sageworks@supercowpowers.com or on chat us up on Discord
The SageWorks framework continues to flex to support different real world use cases when operating a set of production machine learning pipelines.
Note: These release notes cover the changes from 0.8.22
to 0.8.23
General
Mostly bug fixes and minor API changes.
API Changes
-
Removing
auto_one_hot
arg fromPandasToFeatures
andDataSource.to_features()
When creating a
PandasToFeatures
object or usingDataSource.to_features()
there was an optional argumentauto_one_hot
. This would try to automatically convert object/string columns to be one-hot encoded. In general this was only useful for 'toy' datasets but for more complex data we need to specify exactly which columns we want converted. -
Adding optional
one_hot_columns
arg toPandasToFeatures.set_input()
andDataSource.to_features()
When calling either of these FeatureSet creation methods you can now add an option arg
one_hot_columns
as a list of columns that you would like to be one-hot encoded.
Minor Bug Fixes
Our pandas dependency was outdated and causing an issue with an include_groups
arg when outlier groups were computed. We've changed the requirements:
Improvements
The time to ingest
new rows into a FeatureSet can take a LONG time. Calling the FeatureGroup AWS API and waiting on the results is what takes all the time.
There will hopefully be a series of optimizations around this process, the first one is simply increasing the number of workers/processes for the ingestion manager class.
feature_group.ingest(.., max_processes=8)
(has been changed to)
feature_group.ingest(..., max_processes=16, num_workers=4)
Questions?
The SuperCowPowers team is happy to answer any questions you may have about AWS and SageWorks. Please contact us at sageworks@supercowpowers.com or on chat us up on Discord