Logistic Regression - Albert Masoliver's learning site

## Definition **Logistic regression** is a linear classifier despite its name. It models the probability of a binary outcome via the sigmoid (logistic) function applied to a linear combination of features. ## Model $ P(y = 1 \mid x) = \sigma(w^\top x + b) = \frac{1}{1 + e^{-(w^\top x + b)}} $ Decision rule: predict positive class if $P(y = 1 \mid x) > 0.5$ (or any threshold suited to the use case). ## Loss Negative log-likelihood (binary cross-entropy): $ \mathcal{L}(w, b) = -\frac{1}{n} \sum_{i=1}^n \left[ y_i \log \hat p_i + (1 - y_i) \log(1 - \hat p_i) \right] $ Convex in $(w, b)$ → unique global minimum reachable by [[Gradient Descent]]. ## Multi-class Extension **Softmax regression** (multinomial logistic): $ P(y = c \mid x) = \frac{e^{w_c^\top x}}{\sum_{c'} e^{w_{c'}^\top x}} $ Loss = categorical cross-entropy. The output layer of most classification neural networks is exactly softmax regression. ## Strengths - Outputs **calibrated probabilities** (with proper training). - Coefficients are **interpretable** as log-odds contributions. - Trains fast; scales to very large datasets. - Convex optimisation → no local minima. ## Weaknesses - **Linear decision boundary** in the input space — can't capture non-linear patterns without feature engineering or kernel tricks. - **Sensitive to feature scaling and outliers.** ## Regularised Variants - **L2-regularised** (default in scikit-learn): equivalent to Ridge for classification. - **L1-regularised**: sparse coefficients; implicit feature selection. - **Elastic Net**: combined. ## When to Use - **Baseline classifier** for any tabular task. - **High interpretability requirements.** - **Probability outputs needed** (rather than just labels). - **Massive datasets** where complex models are too slow. ## Related - [[Loss Functions]] - [[L2 Regularization]] - [[Sigmoid Neuron]] - [[Linear Regression]]