DataLoaders Light
API Classes
For most users the API Classes will provide all the general functionality to create a full AWS ML Pipeline
These DataLoader Classes are intended to load smaller dataset into AWS. If you have large data please see DataLoaders Heavy
Welcome to the SageWorks DataLoaders Light Classes
These classes provide low-level APIs for loading smaller data into AWS services
- CSVToDataSource: Loads local CSV data into a DataSource
- JSONToDataSource: Loads local JSON data into a DataSource
- S3ToDataSourceLight: Loads S3 data into a DataSource
CSVToDataSource
Bases: Transform
CSVToDataSource: Class to move local CSV Files into a SageWorks DataSource
Common Usage
Source code in src/sageworks/core/transforms/data_loaders/light/csv_to_data_source.py
__init__(csv_file_path, data_uuid)
CSVToDataSource: Class to move local CSV Files into a SageWorks DataSource
Parameters:
Name | Type | Description | Default |
---|---|---|---|
csv_file_path |
str
|
The path to the CSV file to be transformed |
required |
data_uuid |
str
|
The UUID of the SageWorks DataSource to be created |
required |
Source code in src/sageworks/core/transforms/data_loaders/light/csv_to_data_source.py
post_transform(**kwargs)
transform_impl(overwrite=True)
Convert the local CSV file into Parquet Format in the SageWorks Data Sources Bucket, and store the information about the data to the AWS Data Catalog sageworks database
Source code in src/sageworks/core/transforms/data_loaders/light/csv_to_data_source.py
JSONToDataSource
Bases: Transform
JSONToDataSource: Class to move local JSON Files into a SageWorks DataSource
Common Usage
Source code in src/sageworks/core/transforms/data_loaders/light/json_to_data_source.py
__init__(json_file_path, data_uuid)
JSONToDataSource: Class to move local JSON Files into a SageWorks DataSource
Parameters:
Name | Type | Description | Default |
---|---|---|---|
json_file_path |
str
|
The path to the JSON file to be transformed |
required |
data_uuid |
str
|
The UUID of the SageWorks DataSource to be created |
required |
Source code in src/sageworks/core/transforms/data_loaders/light/json_to_data_source.py
post_transform(**kwargs)
transform_impl(overwrite=True)
Convert the local JSON file into Parquet Format in the SageWorks Data Sources Bucket, and store the information about the data to the AWS Data Catalog sageworks database
Source code in src/sageworks/core/transforms/data_loaders/light/json_to_data_source.py
S3ToDataSourceLight
Bases: Transform
S3ToDataSourceLight: Class to move LIGHT S3 Files into a SageWorks DataSource
Common Usage
Source code in src/sageworks/core/transforms/data_loaders/light/s3_to_data_source_light.py
__init__(s3_path, data_uuid, datatype='csv')
S3ToDataSourceLight Initialization
Parameters:
Name | Type | Description | Default |
---|---|---|---|
s3_path |
str
|
The S3 Path to the file to be transformed |
required |
data_uuid |
str
|
The UUID of the SageWorks DataSource to be created |
required |
datatype |
str
|
The datatype of the file to be transformed (defaults to "csv") |
'csv'
|
Source code in src/sageworks/core/transforms/data_loaders/light/s3_to_data_source_light.py
input_size_mb()
Get the size of the input S3 object in MBytes
Source code in src/sageworks/core/transforms/data_loaders/light/s3_to_data_source_light.py
post_transform(**kwargs)
transform_impl(overwrite=True)
Convert the S3 CSV data into Parquet Format in the SageWorks Data Sources Bucket, and store the information about the data to the AWS Data Catalog sageworks database