## Definition
A **Graph Neural Network (GNN)** is a neural architecture that processes graph-structured data — nodes connected by edges, possibly with attributes on both. Generalises convolution (which exploits spatial grid structure) to arbitrary graphs.
## The Core Idea: Message Passing
Each node's representation is iteratively updated by *aggregating messages from its neighbours*:
$
h_v^{(\ell+1)} = \text{UPDATE}\left( h_v^{(\ell)}, \text{AGGREGATE}\left( \{h_u^{(\ell)} : u \in \mathcal{N}(v)\} \right) \right)
$
- **AGGREGATE** — combine neighbour states (sum, mean, max, or learned).
- **UPDATE** — combine the current state with aggregated neighbour info (typically MLP).
After $K$ layers, each node has integrated information from its $K$-hop neighbourhood.
## Major Variants
### Graph Convolutional Network (GCN) — Kipf & Welling, 2017
Normalised mean aggregation:
$
h^{(\ell+1)} = \sigma\left( \tilde D^{-1/2} \tilde A \tilde D^{-1/2} h^{(\ell)} W^{(\ell)} \right)
$
where $\tilde A = A + I$ (add self-loops) and $\tilde D$ is its degree matrix. Simple and effective.
### GraphSAGE — Hamilton et al., 2017
Sample a fixed-size neighbourhood; aggregate via mean, LSTM, or pooling. Scales to very large graphs.
### Graph Attention Network (GAT) — Velickovic et al., 2018
Attention-weighted aggregation: each neighbour's contribution is weighted by an attention score. Often outperforms uniform aggregation when edges have varying relevance.
### Message Passing Neural Networks (MPNN)
General framework subsuming many specific GNNs (Gilmer et al., 2017).
## Tasks
- **Node classification.** Predict a label per node (citation networks, fraud detection).
- **Link prediction.** Predict whether an edge should exist (recommendation, knowledge graphs).
- **Graph classification.** Predict a label for the entire graph (molecular property prediction).
- **Graph generation.** Generate new graphs (drug discovery).
## Strengths
- **Inductive bias** matched to graph data.
- **Inherently permutation-invariant** — doesn't depend on node ordering.
- **Handles variable-size graphs.**
## Limitations
- **Over-smoothing.** Deep GNNs make all node representations converge — features wash out.
- **Limited receptive field per layer** — many layers needed for long-range dependencies.
- **Expressivity bounded by Weisfeiler-Lehman test** (Xu et al., 2019) — some graphs cannot be distinguished by GNNs.
- **Scalability** to graphs with billions of nodes is non-trivial.
## Notable Applications
- **Molecular property prediction** (DeepMind's AlphaFold builds on GNN-like ideas).
- **Recommendation systems** (Pinterest's PinSage, social-network embeddings).
- **Drug discovery** and protein design.
- **Traffic forecasting** (Google Maps).
- **Knowledge graph completion.**
## Connection to Transformers
A transformer's attention is *graph attention on a fully-connected graph*. The transformer-vs-GNN debate is partially a question of whether explicit graph structure helps when attention can dynamically discover relevance.
## Related
- [[Convolutional Neural Network]]
- [[Transformer Architecture]]
- [[Attention Mechanism]]