Skip to content

Commit

Permalink
docs: Adding blog posts from feast.dev website (#5034)
Browse files Browse the repository at this point in the history
  • Loading branch information
franciscojavierarceo authored Feb 9, 2025
1 parent efbffa4 commit 48a4285
Show file tree
Hide file tree
Showing 24 changed files with 1,342 additions and 0 deletions.
1 change: 1 addition & 0 deletions docs/SUMMARY.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Table of contents

* [Introduction](README.md)
* [Blog](blog/README.md)
* [Community & getting help](community.md)
* [Roadmap](roadmap.md)
* [Changelog](https://github.com/feast-dev/feast/blob/master/CHANGELOG.md)
Expand Down
21 changes: 21 additions & 0 deletions docs/blog/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Blog Posts

Welcome to the Feast blog! Here you'll find articles about feature store development, new features, and community updates.

## Featured Posts

{% content-ref url="what-is-a-feature-store.md" %}
[what-is-a-feature-store.md](what-is-a-feature-store.md)
{% endcontent-ref %}

{% content-ref url="the-future-of-feast.md" %}
[the-future-of-feast.md](the-future-of-feast.md)
{% endcontent-ref %}

{% content-ref url="feast-supports-vector-database.md" %}
[feast-supports-vector-database.md](feast-supports-vector-database.md)
{% endcontent-ref %}

{% content-ref url="rbac-role-based-access-controls.md" %}
[rbac-role-based-access-controls.md](rbac-role-based-access-controls.md)
{% endcontent-ref %}
80 changes: 80 additions & 0 deletions docs/blog/a-state-of-feast.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# A State of Feast

*January 21, 2021* | *Willem Pienaar*

## Introduction

Two years ago we first announced the launch of Feast, an open source feature store for machine learning. Feast is an operational data system that solves some of the key challenges that ML teams encounter while productionizing machine learning systems.

Recognizing that ML and Feast have advanced since we launched, we take a moment today to discuss the past, present and future of Feast. We consider the more significant lessons we learned while building Feast, where we see the project heading, and why teams should consider adopting Feast as part of their operational ML stacks.

## Background

Feast was developed to address the challenges faced while productionizing data for machine learning. In our original [Google Cloud article](https://cloud.google.com/blog/products/ai-machine-learning/introducing-feast-an-open-source-feature-store-for-machine-learning), we highlighted some of these challenges, namely:

1. Features aren't reused.
2. Feature definitions are inconsistent across teams.
3. Getting features into production is hard.
4. Feature values are inconsistent between training and serving.

Whereas an industry to solve data transformations and data-quality problems already existed, our focus for shaping Feast was to overcome operational ML hurdles that exist between data science and ML engineering. Toward that end, our initial aim was to provide:

1. Registry: The registry is a common catalog with which to explore, develop, collaborate on, and publish new feature definitions within and across teams. It is the central interface for all interactions with the feature store.
2. Ingestion: A means for continually ingesting batch and streaming data and storing consistent copies in both an offline and online store. This layer automates most data-management work and ensures that features are always available for serving.
3. Serving: A feature-retrieval interface which provides a temporally consistent view of features for both training and online serving. Serving improves iteration speed by minimizing coupling to data infrastructure, and prevents training-serving skew through consistent data access.

Guided by this design, we co-developed and shipped Feast with our friends over at Google. We then open sourced the project in early 2019, and have since been running Feast in production and at scale. In our follow up blog post, [Bridging ML Models and Data](https://blog.gojekengineering.com/feast-bridging-ml-models-and-data), we touched on the impact Feast has had at companies like Gojek.

## Feast today

Teams, large and small, are increasingly searching for ways to simplify the productionization and maintenance of their ML systems at scale. Since open sourcing Feast, we've seen both the demand for these tools and the activity around this project soar. Working alongside our open source community, we've released key pieces of our stack throughout the last year, and steadily expanded Feast into a robust feature store. Highlights include:

* Point-in-time correct queries that prevent feature data leakage.
* A query optimized table-based data model in the form of feature sets.
* Storage connectors with implementations for Cassandra and Redis Cluster.
* Statistics generation and data validation through TFDV integration.
* Authentication and authorization support for SDKs and APIs.
* Diagnostic tooling through request/response logging, audit logs, and Statsd integration.

Feast has grown more rapidly than initially anticipated, with multiple large companies, including Agoda, Gojek, Farfetch, Postmates, and Zulily adopting and/or contributing to the project. We've also been working closely with other open source teams, and we are excited to share that Feast is now a [component in Kubeflow](https://www.kubeflow.org/docs/components/feature-store/). Over the coming months we will be enhancing this integration, making it easier for users to deploy Feast and Kubeflow together.

## Lessons learned

Through frequent engagement with our community and by way of running Feast in production ourselves, we've learned critical lessons:

Feast requires too much infrastructure: Requiring users provision a large system is a big ask. A minimal Feast deployment requires Kafka, Zookeeper, Postgres, Redis, and multiple Feast services.

Feast lacks composability: Requiring all infrastructural components be present in order to have a functional system removes all modularity.

Ingestion is too complex: Incorporating a Kafka-based stream-first ingestion layer trivializes data consistency across stores, but the complete ingestion flow from source to sink can still mysteriously fail at multiple points.

Our technology choices hinder generalization: Leveraging technologies like BigQuery, Apache Beam on Dataflow, and Apache Kafka has allowed us to move faster in delivering functionality. However, these technologies now impede our ability to generalize to other clouds or deployment environments.

## The future of Feast

> *"Always in motion is the future."*
> – Yoda, The Empire Strikes Back
While feature stores have already become essential systems at large technology companies, we believe their widespread adoption will begin in 2021. We also foresee the release of multiple managed feature stores over the next year, as vendors seek to enter the burgeoning operational ML market.

As we've discussed, feature stores serve both offline and production ML needs, and therefore are primarily built by engineers for engineers. What we need, however, is a feature store that's purpose-built for data-science workflows. Feast will move away from an infrastructure-centric approach toward a more localized experience that does just this: builds on teams' existing data-science workflows.

The lessons we've learned during the preceding two years have crystallized a vision for what Feast should become: a light-weight modular feature store. One that's easy to pick up, adds value to teams large and small, and can be progressively applied to production use cases that span multiple teams, projects, and cloud-environments. We aim to reach this by applying the following design principles:

1. Python-first: First-class support for running a minimal version of Feast entirely from a notebook, with all infrastructural dependencies becoming optional enhancements.
* Encourages quick evaluation of the software and ensures Feast is user friendly
* Minimizes the operational burden of running the system in production
* Simplifies testing, developing, and maintaining Feast

## Next Steps

Our vision for Feast is not only ambitious, but actionable. Our next release, Feast 0.8, is the product of collaborating with both our open source community and our friends over at [Tecton](https://tecton.ai/).

1. Python-first: We are migrating all core logic to Python, starting with training dataset retrieval and job management, providing a more responsive development experience.
2. Modular ingestion: We are shifting to managing batch and streaming ingestion separately, leading to more actionable metrics, logs, and statistics and an easier to understand and operate system.
3. Support for AWS: We are replacing GCP-specific technologies like Beam on Dataflow with Spark and adding native support for running Feast on AWS, our first steps toward cloud-agnosticism.
4. Data-source integrations: We are introducing support for a host of new data sources (Kinesis, Kafka, S3, GCS, BigQuery) and data formats (Parquet, JSON, Avro), ensuring teams can seamlessly integrate Feast into their existing data-infrastructure.

## Get involved

We've been inspired by the soaring community interest in and contributions to Feast. If you're curious to learn more about our mission to build a best-in-class feature store, or are looking to build your own: Check out our resources, say hello, and get involved!
65 changes: 65 additions & 0 deletions docs/blog/announcing-feast-0-11.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Announcing Feast 0.11

*June 23, 2021* | *Jay Parthasarthy & Willem Pienaar*

Feast 0.11 is here! This is the first release after the major changes introduced in Feast 0.10. We've focused on two areas in particular:

1. Introducing a new online store, Redis, which supports feature serving at high throughput and low latency.
2. Improving the Feast user experience through reduced boilerplate, smoother workflows, and improved error messages. A key addition here is the introduction of *feature inferencing,* which allows Feast to dynamically discover data schemas in your source data.

Let's get into it!

### Support for Redis as an online store 🗝

Feast 0.11 introduces support for Redis as an online store, allowing teams to easily scale up Feast to support high volumes of online traffic. Using Redis with Feast is as easy as adding a few lines of configuration to your feature_store.yaml file:

```yaml
project: fraud
registry: data/registry.db
provider: local
online_store:
type: redis
connection_string: localhost:6379
```
Feast is then able to read and write from Redis as its online store.
```bash
$ feast materialize

Materializing 3 feature views to 2021-06-15 18:43:03+00:00 into the redis online store.

user_account_features from 2020-06-16 18:43:04 to 2021-06-15 18:43:13:
100%|███████████████████████| 9944/9944 [00:04<00:00, 20065.15it/s]
user_transaction_count_7d from 2021-06-08 18:43:21 to 2021-06-15 18:43:03:
100%|███████████████████████| 9674/9674 [00:04<00:00, 19943.82it/s]
```

We're also working on making it easier for teams to add their own storage and compute systems through plugin interfaces. Please see this RFC for more details on the proposal.

### Feature Inferencing 🔎

Before 0.11, users had to define each feature individually when defining Feature Views. Now, Feast infers the schema of a Feature View based on upstream data sources, significantly reducing boilerplate.

Before:
```python
driver_hourly_stats_view = FeatureView(
name="driver_hourly_stats",
entities=["driver_id"],
ttl=timedelta(days=1),
features=[
Feature(name="conv_rate", dtype=ValueType.FLOAT),
],
input=BigQuerySource(table_ref="feast-oss.demo_data.driver_hourly_stats"),
)
```

Aside from these additions, a wide variety of small bug fixes, and UX improvements made it into this release. [Check out the changelog](https://github.com/feast-dev/feast/blob/master/CHANGELOG.md) for a full list of what's new.

Special thanks and a big shoutout to the community contributors whose changes made it into this release: [MattDelac](https://github.com/MattDelac), [mavysavydav](https://github.com/mavysavydav), [szalai1](https://github.com/szalai1), [rightx2](https://github.com/rightx2)

### Help us design Feast for AWS 🗺️

The 0.12 release will include native support for AWS. We are looking to meet with teams that are considering using Feast to gather feedback and help shape the product as design partners. We often help our design partners out with architecture or design reviews. If this sounds helpful to you, [join us in Slack](http://slack.feastsite.wpenginepowered.com/), or [book a call with Feast maintainers here](https://calendly.com/d/gc29-y88c/feast-chat-w-willem).

### Feast from around the web 📣
50 changes: 50 additions & 0 deletions docs/blog/faster-feature-transformations-in-feast.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Faster Feature Transformations in Feast 🏎️💨

*December 5, 2024* | *Francisco Javier Arceo, Shuchu Han*

*Thank you to [Shuchu Han](https://www.linkedin.com/in/shuchu/), [Ross Briden](https://www.linkedin.com/in/ross-briden/), [Ankit Nadig](https://www.linkedin.com/in/ankit-nadig/), and the folks at Affirm for inspiring this work and creating an initial proof of concept.*

Feature engineering is at the core of building high-performance machine learning models. The Feast team has introduced two major enhancements to [On Demand Feature Views](https://docs.feast.dev/reference/beta-on-demand-feature-views) (ODFVs), pushing the boundaries of efficiency and flexibility for data scientists and engineers. Here's a closer look at these exciting updates:

## 1. Transformations with Native Python

Traditionally, transformations in ODFVs were limited to Pandas-based operations. While powerful, Pandas transformations can be computationally expensive for certain use cases. Feast now introduces Native Python Mode, a feature that allows users to write transformations using pure Python.

Key benefits of Native Python Mode include:

* Blazing Speed: Transformations using Native Python are nearly 10x faster compared to Pandas for many operations.
* Intuitive Design: This mode supports list-based and singleton (row-level) transformations, making it easier for data scientists to think in terms of individual rows rather than entire datasets.
* Versatility: Users can now switch between batch and singleton transformations effortlessly, catering to both historical and online retrieval scenarios.

Using the cProfile library and snakeviz we were able to profile the runtime for the ODFV transformation using both Pandas and Native python and observed a nearly 10x reduction in speed.

## 2. Transformations on Writes

Until now, ODFVs operated solely as transformations on reads, applying logic during online feature retrieval. While this ensured flexibility, it sometimes came at the cost of increased latency during retrieval. Feast now supports transformations on writes, enabling users to apply transformations during data ingestion and store the transformed features in the online store.

Why does this matter?

* Reduced Online Latency: With transformations pre-applied at ingestion, online retrieval becomes a straightforward lookup, significantly improving performance for latency-sensitive applications.
* Operational Flexibility: By toggling the write_to_online_store parameter, users can choose whether transformations should occur at write time (to optimize reads) or at read time (to preserve data freshness).

Here's an example of applying transformations during ingestion:

```python
@on_demand_feature_view(
sources=[driver_hourly_stats_view],
)

df = pd.DataFrame()
df["conv_rate_adjusted"] = features_df["conv_rate"] * 1.1
return df
```

With this new capability, data engineers can optimize online retrieval performance without sacrificing the flexibility of on-demand transformations.

### The Future of ODFVs and Feature Transformations

These enhancements bring ODFVs closer to the goal of seamless feature engineering at scale. By combining high-speed Python-based transformations with the ability to optimize retrieval latency, Feast empowers teams to build more efficient, responsive, and production-ready feature pipelines.

For more detailed examples and use cases, check out the [documentation for On Demand Feature Views](https://docs.feast.dev/reference/beta-on-demand-feature-views). Whether you're a data scientist prototyping features or an engineer optimizing a production system, the new ODFV capabilities offer the tools you need to succeed.

The future of Feature Transformations in Feast will be to unify feature transformations and feature views to allow for a simpler API. If you have thoughts or interest in giving feedback to the maintainers, feel free to comment directly on [the GitHub Issue](https://github.com/feast-dev/feast/issues/4584) or in [the RFC](https://docs.google.com/document/d/1KXCXcsXq1bU...).
Loading

0 comments on commit 48a4285

Please sign in to comment.