Qwak's data sources are used to configure connections to your data. Data sources are used in order to create create feature sets.
There are two main types of data sources:
- Batch: Data-at-rest sources of data, such as Athena, Snowflake, and Redshift.
- Streaming: Data in motion sources, such as Kafka and Kinesis.
To connect to a data source:
- Enable network connectivity between the data sources and Qwak's cluster if they are not publicly accessible.
- Grant Qwak access to your data lake components by creating read-only service accounts and/or IAM roles.
Data Sources can be defined and registered programatically via Qwak SDK and CLI or created altogether via the Qwak Dashboard.
Qwak provides Python classes to define any Data Source type using the
For example, you can define a CsvSource to read from an S3 based CSV file as follows:
from qwak.feature_store.data_sources import CsvSource # The S3 anonymous config class is required for public S3 buckets from qwak.feature_store.data_sources import AnonymousS3Configuration # Create a CsvSource object to represent a CSV data source # This example uses a CSV file from a public S3 bucket csv_source = CsvSource( name='credit_risk_data', # Name of the data source description='A dataset of personal credit details', # Description of the data source date_created_column='date_created', # Column name that represents the creation date path='s3://qwak-public/example_data/data_credit_risk.csv', # S3 path to the CSV file filesystem_configuration=AnonymousS3Configuration(), # Configuration for anonymous access to S3 quote_character='"', # Character used for quoting in the CSV file escape_character='"' # Character used for escaping in the CSV file )
Data Sources defined with the Qwak SDK are not going to be registered in the cloud platform unless the
qwak features registercommand is ran for that object.
- Select Data Sources from the sidebar
- Click Create New Data Source.
- Select the required data source type from the list.
- Fill in the form (all required fields are marked with an asterisk).
- Test the connection to the data source to verify it's operating.,
- Click Save.
- The data source is created.
Below is an example of creating a Batch / CSV file based Data Source in the Qwak Dashboard.
To register a Data Source class defined with the SDK you can use the Qwak CLI
features command as follows:
qwak features register -p data_source.py
To delete a data source, execute the following
qwak command in the terminal:
qwak features delete --data-source <data-source-name>
Deleting Data Sources in use
Before you can delete a Data Source that is linked to one or more Feature Sets, you must either remove those Feature Sets or reassign them to a different Data Source.
Updated 3 days ago
Learn more about different types of Data Sources.