## Definition
**Feature engineering** is the process of transforming raw data into features that better expose the predictive signal to a model. For decades the most leveraged activity in ML; partially superseded by deep learning for unstructured data (text, images) but still decisive for tabular, time-series, and structured data.
## Why It Matters
A model's input is the feature vector, not the raw data. Good features:
- **Highlight predictive structure** the model can exploit.
- **Hide irrelevant noise** that would dilute the signal.
- **Encode domain knowledge** explicitly, reducing data requirements.
- **Match the model's [[Inductive Bias]].** Tree-based models like piecewise-constant features; linear models like centred, scaled features.
## Common Transformations
### Numeric
- **Scaling** ([[Feature Scaling]]) — standardisation, min-max, robust scaling.
- **Log / Box-Cox transforms** — for right-skewed distributions (income, prices).
- **Binning** — convert continuous to discrete (age → age bracket).
- **Interaction features** — products or ratios of two raw features.
- **Polynomial features** — $x_1^2$, $x_1 x_2$ etc. expand the feature space.
### Categorical
- **[[One-Hot Encoding]]** — binary indicator per category.
- **Target encoding** — replace each category with its mean target value (careful with leakage).
- **Embeddings** — learn a low-dimensional vector per category. The bridge to deep learning for tabular data.
- **Ordinal encoding** — assign ordered numbers when the categorical has natural order.
### Temporal
- **Cyclical encoding** — `sin(2πt/24)`, `cos(2πt/24)` for hour-of-day, day-of-year. Better than treating hour as ordinal.
- **Lagged features** — value 1, 7, 30 periods ago.
- **Rolling statistics** — moving average, moving std, moving min/max.
- **Time-since-event** — recency features.
### Text (pre-LLM)
- **Bag-of-words.**
- **TF-IDF.**
- **n-grams.**
- **Topic models** (LDA).
(Modern: feed text directly to a model or use pre-trained embeddings.)
## Feature Stores
In production ML, *features* themselves become first-class artefacts:
- Stored, versioned, served at training and inference time.
- Reused across models and teams.
- Tested independently of any model.
Tools: Feast, Tecton, Databricks Feature Store.
## Deep Learning Caveat
For images, audio, and increasingly text, learned representations from neural networks outperform hand-engineered features. The "Bitter Lesson": engineering effort tends to be displaced by scale + learning. But for *tabular* data — most enterprise ML — engineered features still dominate, often combined with gradient-boosted trees.
## Related
- [[Feature Scaling]]
- [[One-Hot Encoding]]
- [[Feature Selection]]
- [[Dimensionality Reduction]]