## Definition
**One-hot encoding** turns a categorical variable with $k$ possible values into $k$ binary columns, exactly one of which is 1 for each row. The default encoding for nominal (unordered) categorical features in models that require numeric input.
## Example
Original column `Color ∈ {Red, Green, Blue}`:
| Color | → | Red | Green | Blue |
|--------|---|-----|-------|------|
| Red | | 1 | 0 | 0 |
| Blue | | 0 | 0 | 1 |
| Green | | 0 | 1 | 0 |
## Why Not Just Map to Integers?
Mapping `{Red: 1, Green: 2, Blue: 3}` introduces a fake ordering. A linear model would treat "Blue is three times more X than Red" — meaningless.
One-hot avoids the ordinal illusion: each category is its own dimension.
## Variants
### Dummy encoding (drop one)
Use $k - 1$ columns; the all-zeros row represents the dropped category. Avoids perfect multicollinearity (the original $k$ columns sum to 1, an exact linear dependency). Standard for linear regression where collinearity matters.
### Drop-first one-hot
scikit-learn's `OneHotEncoder(drop='first')` does this automatically.
### Effect coding
Similar but the reference category is encoded as -1 across columns; useful in some statistical analyses.
## Trade-offs
**Pros:**
- No fake ordering.
- Each category fully isolated.
- Trivial to implement.
**Cons:**
- **High cardinality blows up.** A `user_id` column with 100k unique values → 100k binary columns. Sparse and memory-heavy.
- **Cold-start problem.** A new category unseen at training time has no encoding. Need an "unknown" bucket.
- **Loses information.** Two semantically similar categories ("XL", "XXL") are as distant in the encoded space as two completely unrelated ones.
## Alternatives for High Cardinality
- **Target encoding.** Replace category with mean target value (within fold to avoid leakage).
- **Frequency encoding.** Replace with count or proportion.
- **Embeddings.** Learn a dense vector per category — bridge to deep tabular models.
- **Hashing trick.** Hash the category into a fixed-size bucket space. Some collisions; fixed dimensionality.
## With Tree-Based Models
Tree models (random forest, gradient boosting) handle categorical features natively in modern implementations (XGBoost, LightGBM, CatBoost — the last one is specifically built around categorical handling). One-hot still works but is rarely optimal.
## Related
- [[Feature Engineering]]
- [[Feature Scaling]]
- [[Logistic Regression]]