Hash Join - Albert Masoliver's learning site

# Hash Join: Efficiency with Unordered Tables > [!Note] > `The **Hash Join** operator excels at joining large, **unsorted** tables by building an **in-memory hash table** from the smaller input and **probing** the larger input, eliminating the need for **sorting** while relying on sufficient **memory** to avoid spills.` Hash Join starts with a **build phase**, during which the engine reads the smaller dataset and applies a **hash function** to each join key, populating an in-memory hash table. In the subsequent **probe phase**, it scans the larger dataset, computes the same hash for each row’s join key, and performs lookups against the hash table to identify matching rows. These matches are then **combined** to produce the final joined result. Because this method does not require any pre-ordering of data, it is ideal when **indexes** on join keys are missing or when datasets are received in arbitrary order. The efficiency of Hash Join depends critically on the ratio between the build table’s size and available **memory**; if the hash table grows beyond memory capacity, SQL Server may **spill** intermediate data to disk, incurring additional I/O and potentially degrading performance. In execution plans, this operator appears as **Hash Match**, and the optimizer selects it when cost estimates favor handling large, unordered inputs over Merge Joins or Nested Loops Joins. --- ## ## References - Korotkevitch, D. (2022). _SQL Server advanced troubleshooting and performance tuning: Best practices and techniques_. O’Reilly Media. - Nevarez, B. (2022). _SQL Server query tuning and optimization: Optimize Microsoft SQL Server 2022 queries and applications_. Packt Publishing.