Graph Neural Network - Albert Masoliver's learning site

## Definition A **Graph Neural Network (GNN)** is a neural architecture that processes graph-structured data — nodes connected by edges, possibly with attributes on both. Generalises convolution (which exploits spatial grid structure) to arbitrary graphs. ## The Core Idea: Message Passing Each node's representation is iteratively updated by *aggregating messages from its neighbours*: $ h_v^{(\ell+1)} = \text{UPDATE}\left( h_v^{(\ell)}, \text{AGGREGATE}\left( \{h_u^{(\ell)} : u \in \mathcal{N}(v)\} \right) \right) $ - **AGGREGATE** — combine neighbour states (sum, mean, max, or learned). - **UPDATE** — combine the current state with aggregated neighbour info (typically MLP). After $K$ layers, each node has integrated information from its $K$-hop neighbourhood. ## Major Variants ### Graph Convolutional Network (GCN) — Kipf & Welling, 2017 Normalised mean aggregation: $ h^{(\ell+1)} = \sigma\left( \tilde D^{-1/2} \tilde A \tilde D^{-1/2} h^{(\ell)} W^{(\ell)} \right) $ where $\tilde A = A + I$ (add self-loops) and $\tilde D$ is its degree matrix. Simple and effective. ### GraphSAGE — Hamilton et al., 2017 Sample a fixed-size neighbourhood; aggregate via mean, LSTM, or pooling. Scales to very large graphs. ### Graph Attention Network (GAT) — Velickovic et al., 2018 Attention-weighted aggregation: each neighbour's contribution is weighted by an attention score. Often outperforms uniform aggregation when edges have varying relevance. ### Message Passing Neural Networks (MPNN) General framework subsuming many specific GNNs (Gilmer et al., 2017). ## Tasks - **Node classification.** Predict a label per node (citation networks, fraud detection). - **Link prediction.** Predict whether an edge should exist (recommendation, knowledge graphs). - **Graph classification.** Predict a label for the entire graph (molecular property prediction). - **Graph generation.** Generate new graphs (drug discovery). ## Strengths - **Inductive bias** matched to graph data. - **Inherently permutation-invariant** — doesn't depend on node ordering. - **Handles variable-size graphs.** ## Limitations - **Over-smoothing.** Deep GNNs make all node representations converge — features wash out. - **Limited receptive field per layer** — many layers needed for long-range dependencies. - **Expressivity bounded by Weisfeiler-Lehman test** (Xu et al., 2019) — some graphs cannot be distinguished by GNNs. - **Scalability** to graphs with billions of nodes is non-trivial. ## Notable Applications - **Molecular property prediction** (DeepMind's AlphaFold builds on GNN-like ideas). - **Recommendation systems** (Pinterest's PinSage, social-network embeddings). - **Drug discovery** and protein design. - **Traffic forecasting** (Google Maps). - **Knowledge graph completion.** ## Connection to Transformers A transformer's attention is *graph attention on a fully-connected graph*. The transformer-vs-GNN debate is partially a question of whether explicit graph structure helps when attention can dynamically discover relevance. ## Related - [[Convolutional Neural Network]] - [[Transformer Architecture]] - [[Attention Mechanism]]