Adds first scenario for feature engineering examples #311

skrawcz · 2023-02-14T20:22:12Z

This example shows how you can use the same feature definitions in Hamilton in an offline setting and use them in an online setting.

Assumptions:

the API request can provide the same raw data that training provides.
if you have aggregation features, you need to store the training result for them, and provide them to the online side.

Changes

adds feature_engineering folder to examples
adds scenario 1

How I tested this

ran this code locally

Notes

Checklist

PR has an informative and human-readable title (this will be pulled into the release notes)
Changes are limited to a single goal (no scope creep)
Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
Any change in functionality is tested
New functions are documented (with a description, list of inputs, and expected output)
Placeholder code is flagged / future TODOs are captured in comments
Project documentation has been updated if adding/changing functionality.

elijahbenizzy

Good start -- I don't think this is going to be clear to most people who haven't really dug into this. A few thoughts:

We can clarify the wording/make it crisper to specify why this is a problem, how its normally done, and why hamilton alleviates this
We can give more context about what we're doing here/why its in an online context
We can root on tooling that might be familiar to them. While loading fake models/whatnot makes sense, I think its going to confuse the users. So either load from a model/feature store they're used to, or (more likely) abstract it away and make it very clear that it could be implemented in many different ways.

This stuff is natural to us as we've been building online/batch inference/training tooling for years, but I think this will be extremely complex to most people out there, and fall flat. Hamilton is simple enough and makes this easy enough that this is a good chance to capture market share, but to do so we need to really hammer home a pattern and a motivation.

examples/feature_engineering/README.md

examples/feature_engineering/scenario_1/constants.py

examples/feature_engineering/scenario_1/fastapi_server.py

examples/feature_engineering/scenario_1/README.md

This example shows how you can use the same feature definitions in Hamilton in an offline setting and use them in an online setting. Assumptions: - the API request can provide the same raw data that training provides. - if you have aggregation features, you need to store the training result for them, and provide them to the online side.

This example shows how one might use Hamilton to compute features in an offline and online fashion. The assumption here is that the request passed into the API has all the raw data required to compute features. This example also shows how one might "override" some values that are required for computing features, in this example they are `age_mean` and `age_std_dev`. This can be required when you computing aggregation features does not make sense at inference time.

skrawcz · 2023-02-20T23:51:52Z

Good start -- I don't think this is going to be clear to most people who haven't really dug into this. A few thoughts:

We can clarify the wording/make it crisper to specify why this is a problem, how its normally done, and why hamilton alleviates this

We can give more context about what we're doing here/why its in an online context

We can root on tooling that might be familiar to them. While loading fake models/whatnot makes sense, I think its going to confuse the users. So either load from a model/feature store they're used to, or (more likely) abstract it away and make it very clear that it could be implemented in many different ways.

This stuff is natural to us as we've been building online/batch inference/training tooling for years, but I think this will be extremely complex to most people out there, and fall flat. Hamilton is simple enough and makes this easy enough that this is a good chance to capture market share, but to do so we need to really hammer home a pattern and a motivation.

That's the point of the scenarios, there is no one size fits all. That is, show the simplest possible thing, then one where there is a feature store, etc.

Will add more to motivation -- and draw some pictures.

I think that this makes things clearer what this file is, and is a lightweight way to register feature sets that are used by a model.

To help set the tone and explain what feature engineering is, as well as more context about the scenarios and the task.

I expand on the docs, and hopefully explain it a bit more that is understandable to a novice.

As a way to show functionality that can be used to highlight that they should be overridden in an online setting.

skrawcz changed the title ~~Adds basic scenario 1 for feature engineering~~ Adds examples for feature engineering Feb 14, 2023

skrawcz changed the title ~~Adds examples for feature engineering~~ Adds first scenario for feature engineering examples Feb 19, 2023

skrawcz marked this pull request as ready for review February 19, 2023 22:35

elijahbenizzy reviewed Feb 20, 2023

View reviewed changes

skrawcz added 2 commits February 20, 2023 15:42

Renames constants to named_model_feature_sets

c7b0aae

I think that this makes things clearer what this file is, and is a lightweight way to register feature sets that are used by a model.

skrawcz force-pushed the feature_eng_example branch from 36799a9 to ed55ed9 Compare February 21, 2023 00:56

Adds more context to the feature engineering example README

f175938

To help set the tone and explain what feature engineering is, as well as more context about the scenarios and the task.

skrawcz force-pushed the feature_eng_example branch from ed55ed9 to f175938 Compare February 21, 2023 01:02

skrawcz added 3 commits February 20, 2023 17:21

Updates feature engineering scenario 1 with more docs

f6ee7e1

I expand on the docs, and hopefully explain it a bit more that is understandable to a novice.

Adds tags to age_mean and age_std_dev

784ea0f

As a way to show functionality that can be used to highlight that they should be overridden in an online setting.

Adds SVG image to help explain scenario 1

c97cec4

HamiltonRepoMigrationBot mentioned this pull request Feb 26, 2023

Adds first scenario for feature engineering examples DAGWorks-Inc/hamilton#68

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds first scenario for feature engineering examples #311

Adds first scenario for feature engineering examples #311

skrawcz commented Feb 14, 2023 •

edited

elijahbenizzy left a comment •

edited

skrawcz commented Feb 20, 2023

Adds first scenario for feature engineering examples #311

Are you sure you want to change the base?

Adds first scenario for feature engineering examples #311

Conversation

skrawcz commented Feb 14, 2023 • edited

Changes

How I tested this

Notes

Checklist

elijahbenizzy left a comment • edited

Choose a reason for hiding this comment

skrawcz commented Feb 20, 2023

skrawcz commented Feb 14, 2023 •

edited

elijahbenizzy left a comment •

edited