Feast
Overview¶
Feast is a (Apache-licensed) open-source feature store for machine learning. Feast aims at providing a fast solution to the typical MLOps needs one encounters when bringing ML applications to production.
Feast offers a solution to the problem of training/serving skew, provides tools to standardize the data engineering workflows (thus avoiding having to "re-invent the features" every time), and ensures reproducible feature sets with point-in-time historical retrievals.
This feature store supports several backends, both as offline store (for historical time-series data) and online store (with the latest features, synced from the former by Feast itself). Besides a few core backends, the Feast project features additional backends contributed by the community.
Feast is built with the cloud in mind: one of its
goals is to free MLOps practitioners and data engineers from having to manage
their own infrastructure.
In this spirit, starting with version 0.24
,
the Feast online store for Cassandra
contribution flexibly supports both Cassandra and Astra DB, as will be explained below.
Reference documentation:
Prerequisites¶
- You should have an Astra account
- You should Create an Astra Database In the following example, a keyspace called `feastks` is created in the database.
- You should Create an Astra Token with the role "Database Administrator" (Feast will have to dynamically create and delete tables in the keyspace).
- You should Download your Secure Connect Bundle.
- Install Feast, including the dependencies for the Cassandra/Astra DB backend, in your local Python environment:
pip install feast[cassandra]
.
Keep the token information and the bundle file location ready: these will be soon provided in the Feast configuration.
Quickstart¶
Note
In this minimal quickstart, modeled after the one found in the Feast documentation, you will be providing the store configuration file by hand.
Alternatively, an interactive command-line procedure to help you set up your store is available by launching feast init REPO_NAME -t cassandra
.
All credits for the sample code given here goes to the Feast documentation.
A new feature store is created and configured to use Astra DB as online store; next, a few sample features will be materialized to database; finally, historical/online feature retrieval is demonstrated.
Install Feast¶
See last item in the "Prerequisites" above.
Create a feature repository¶
In a directory of your choice, create a new repository and cd
to the
corresponding repo-definition directory:
As you can see, the new feature store already contains sample data and a sample feature definition. These will be used in this walkthrough, so don't delete them.
Configure Astra DB as online store¶
Locate and open the store configuration file, feature_store.yaml
. Replace
the online_store
portion of the file with something like the following.
Make sure you use your values for the Secure Bundle file full path,
the Client ID and Client Secret from your token and the keyspace name:
online_store:
type: cassandra
secure_bundle_path: /path/to/secure/bundle.zip
username: Client_ID
password: Client_Secret
keyspace: feastks
Settings in 'feature_store.yaml' for usage with Cassandra
If using regular Cassandra as opposed to Astra DB, the "online_store" portion might look like:
Additional settings are available when configuring your Cassandra/Astra DB online store: check out the full examples on the Feast documentation.
Register feature definitions and deploy the store¶
With the apply
command, features defined in Python modules (in this case,
example.py
) are scanned and used for actual deployment of the infrastructure.
Run the command
This is the step that actually accesses the database. After running it, you may want to check directly the presence of a new table in the Astra DB keyspace.
Generate training data¶
This illustrates the get_historical_features
store method,
which directly scans the offline source data and performs
a point-in-time join to construct the features requested
up to a certain provided timestamp.
Create a file generate.py
and run it with python generate.py
:
from datetime import datetime, timedelta
import pandas as pd
from feast import FeatureStore
# The entity dataframe is the dataframe we want to enrich with feature values
entity_df = pd.DataFrame.from_dict(
{
# entity's join key -> entity values
"driver_id": [1001, 1002, 1003],
# label name -> label values
"label_driver_reported_satisfaction": [1, 5, 3],
# "event_timestamp" (reserved key) -> timestamps
"event_timestamp": [
datetime.now() - timedelta(minutes=11),
datetime.now() - timedelta(minutes=36),
datetime.now() - timedelta(minutes=73),
],
}
)
store = FeatureStore(repo_path=".")
training_df = store.get_historical_features(
entity_df=entity_df,
features=[
"driver_hourly_stats:conv_rate",
"driver_hourly_stats:acc_rate",
"driver_hourly_stats:avg_daily_trips",
],
).to_df()
print("----- Feature schema -----\n")
print(training_df.info())
print()
print("----- Example features -----\n")
print(training_df.head())
Load features in the online store¶
With the materialize-incremental
command, Feast is instructed
to carry the latest feature values over to the online store, for
quick access during feature serving:
At this point, inspection of the Astra DB table will show the presence of newly-inserted rows.
Fetch feature vectors from the online store¶
The get_online_features
store method will query the online store
and return the required features, as resulting from the last
"materialize" operation.
Create a fetch_online.py
script and run it with python fetch_online.py
:
from pprint import pprint
from feast import FeatureStore
store = FeatureStore(repo_path=".")
feature_vector = store.get_online_features(
features=[
"driver_hourly_stats:conv_rate",
"driver_hourly_stats:acc_rate",
"driver_hourly_stats:avg_daily_trips",
],
entity_rows=[
# {join_key: entity_value}
{"driver_id": 1004},
{"driver_id": 1005},
],
).to_dict()
pprint(feature_vector)
Next steps¶
Have a look at the feature_store.yaml
examples
for Cassandra and Astra DB to check the full set of options available.
Head over to the Feast documentation to find out what you can do with your newly-deployed feature store.