## Definition
**Logistic regression** is a linear classifier despite its name. It models the probability of a binary outcome via the sigmoid (logistic) function applied to a linear combination of features.
## Model
$
P(y = 1 \mid x) = \sigma(w^\top x + b) = \frac{1}{1 + e^{-(w^\top x + b)}}
$
Decision rule: predict positive class if $P(y = 1 \mid x) > 0.5$ (or any threshold suited to the use case).
## Loss
Negative log-likelihood (binary cross-entropy):
$
\mathcal{L}(w, b) = -\frac{1}{n} \sum_{i=1}^n \left[ y_i \log \hat p_i + (1 - y_i) \log(1 - \hat p_i) \right]
$
Convex in $(w, b)$ → unique global minimum reachable by [[Gradient Descent]].
## Multi-class Extension
**Softmax regression** (multinomial logistic):
$
P(y = c \mid x) = \frac{e^{w_c^\top x}}{\sum_{c'} e^{w_{c'}^\top x}}
$
Loss = categorical cross-entropy. The output layer of most classification neural networks is exactly softmax regression.
## Strengths
- Outputs **calibrated probabilities** (with proper training).
- Coefficients are **interpretable** as log-odds contributions.
- Trains fast; scales to very large datasets.
- Convex optimisation → no local minima.
## Weaknesses
- **Linear decision boundary** in the input space — can't capture non-linear patterns without feature engineering or kernel tricks.
- **Sensitive to feature scaling and outliers.**
## Regularised Variants
- **L2-regularised** (default in scikit-learn): equivalent to Ridge for classification.
- **L1-regularised**: sparse coefficients; implicit feature selection.
- **Elastic Net**: combined.
## When to Use
- **Baseline classifier** for any tabular task.
- **High interpretability requirements.**
- **Probability outputs needed** (rather than just labels).
- **Massive datasets** where complex models are too slow.
## Related
- [[Loss Functions]]
- [[L2 Regularization]]
- [[Sigmoid Neuron]]
- [[Linear Regression]]