Level 2 · 35 min

Hash Tables: Why O(1) Isn't Free

Hash tables give O(1) average-case lookup, but the constants and the worst case are where production engineers earn their salary. CLRS chapter 11 develops the theory; Cracking the Coding Interview chapter 1 uses HashMaps in nearly every solution. Understand the gap between average and worst.

Hash Function Properties

A good hash function is uniform (output values are evenly distributed) and exhibits the avalanche effect (a 1-bit input change flips half the output bits). CLRS describes the division method (h(k) = k mod m, with m a prime far from a power of 2) and the multiplication method (h(k) = floor(m·(k·A mod 1)) with A irrational, like the golden ratio). Bad hash functions cluster keys into buckets and degrade lookup to O(n). Java''s String.hashCode() uses 31·s[0] + 31^(n-1)·... — the prime 31 was chosen for cheap shift-subtract multiplication and reasonable bit dispersion. For untrusted keys (HTTP headers, JSON keys), use a randomized hash (SipHash) to defeat algorithmic complexity attacks.

Collision Handling

Two strategies: chaining (each bucket holds a linked list/tree of entries) and open addressing (probe other buckets on collision). Java HashMap uses chaining with a linked list, switching to a red-black tree at chain length 8 (treeify threshold) to bound worst-case lookup to O(log n) — Java 8+ guards against hash flooding. Open addressing variants: linear probing (cache-friendly but suffers from primary clustering), quadratic probing (bounded but secondary clustering), double hashing (uniform but two hash function evaluations per probe). CLRS Theorem 11.6: under uniform hashing assumption with load factor alpha, expected number of probes for unsuccessful search is at most 1/(1-alpha). At alpha=0.9 that is 10 probes; at alpha=0.99 it is 100. Resize before you saturate.

Load Factor and Resize

Load factor alpha = n/capacity governs collision probability. Java HashMap resizes at alpha=0.75 (default), doubling capacity to maintain alpha bounds. Resize is O(n) — every entry rehashes — but amortized O(1) per insert by aggregate analysis. Two production pitfalls: (1) pre-size when you know the count: new HashMap'<''>'((int) (expectedSize / 0.75) + 1) avoids 5–6 resizes during bulk load; (2) iteration order is not insertion order in HashMap (use LinkedHashMap if you need deterministic iteration). Worst-case Java HashMap insert is O(log n) since Java 8 due to tree promotion; before Java 8 it was O(n) and exploitable as a DoS vector — the famous 2011 hash flood attack against PHP / Java / Tomcat.

Key Takeaways

Hash tables are O(1) average but O(n) worst case for chaining (O(log n) in modern Java due to tree promotion).
Pre-size to avoid the 5–6 resizes during bulk load. Use new HashMap(expectedSize / 0.75 + 1).
Untrusted keys need randomized hashing (SipHash). Otherwise an attacker can craft collisions and DoS your server.

Code example

// Java — pre-sizing and iteration order pitfalls

// BAD: resizes 5+ times during a 1M-row load
Map<String, Long> counts = new HashMap<'>();
for (Row r : oneMillionRows) counts.merge(r.key, 1L, Long::sum);

// GOOD: pre-sized, single allocation
Map<String, Long> countsFast = new HashMap<'>((int) (1_000_000 / 0.75) + 1);
for (Row r : oneMillionRows) countsFast.merge(r.key, 1L, Long::sum);

// LinkedHashMap with access-order is an O(1) LRU cache base
Map<String, byte[]> lru = new LinkedHashMap<String, byte[]>(16, 0.75f, true) {
  @Override
  protected boolean removeEldestEntry(Map.Entry<String, byte[]> eldest) {
    return size() > 1024;
  }
};

// Counting collisions to verify hash quality
int[] dist = new int[16];
for (String key : keys) dist[(key.hashCode() '&' 0x7fffffff) % 16]++;
// stddev across dist[] should be small for a healthy hash