Views
View Examples
Examples of using the Views classes to extend the functionality of SageWorks Artifacts are in the Examples section at the bottom of this page.
Views are a powerful way to filter and agument your DataSources and FeatureSets. With Views you can subset columns, rows, and even add data to existing SageWorks Artifacts. If you want to compute outliers, runs some statistics or engineer some new features, Views are an easy way to change, modify, and add to DataSources and FeatureSets.
If you're looking to read and pull data from a view please see the Views documentation.
View Constructor Classes
These classes provide APIs for creating Views for DataSources and FeatureSets.
- DisplayView: The Display View is leveraged by the web views/components and allows fine tuning of the UI for the SageWorks Dashboard.
- ComputationView: The Computation View controls which columns have descriptive stats, outliers, and correlation calculations. Typically the computation view is a superset of the display view.
- TrainingView: The Training View will add a 'training' column to the data for model training, validation, and testing. Each row will have a 1 or 0 indicated whether is was used in the model training.
- MDQView: The Model Data Quality View computes various data quality metrics that include modeling inference results.
Examples
All of the SageWorks Examples are in the Sageworks Repository under the examples/
directory. For a full code listing of any example please visit our SageWorks Examples
Listing Views
from sageworks.api.data_source import DataSource
# Convert the Data Source to a Feature Set
test_data = DataSource('test_data')
test_data.views()
["display", "training", "computation"]
Getting a Particular View
from sageworks.api.feature_set import FeatureSet
fs = FeatureSet('test_features')
# Grab the columns for the display view
display_view = fs.view("display")
display_view.columns
['id', 'name', 'height', 'weight', 'salary', ...]
# Pull the dataframe for this view
df = display_view.pull_dataframe()
id name height weight salary ...
0 58 Person 58 71.781227 275.088196 162053.140625
View Queries
All SageWorks Views are stored in AWS Athena, so any query that you can make with Athena is accessible through the View Query API.
from sageworks.api.feature_set import FeatureSet
# Grab a FeatureSet View
fs = FeatureSet("abalone_features")
t_view = fs.view("training")
# Make some queries using the Athena backend
df = t_view(f"select * from {t_view.table} where height > .3")
print(df.head())
df = t_view.query("select * from abalone_features where class_number_of_rings < 3")
print(df.head())
Output
sex length diameter height whole_weight shucked_weight viscera_weight shell_weight class_number_of_rings
0 M 0.705 0.565 0.515 2.210 1.1075 0.4865 0.5120 10
1 F 0.455 0.355 1.130 0.594 0.3320 0.1160 0.1335 8
sex length diameter height whole_weight shucked_weight viscera_weight shell_weight class_number_of_rings
0 I 0.075 0.055 0.010 0.002 0.0010 0.0005 0.0015 1
1 I 0.150 0.100 0.025 0.015 0.0045 0.0040 0.0050 2
The SuperCowPowers team is happy to answer any questions you may have about AWS and SageWorks. Please contact us at sageworks@supercowpowers.com or on chat us up on Discord