Data Structures

On this page

Data structures organize values so that operations on them are cheap. Picking the right structure usually beats micro-optimizing the wrong one.

Overview

Trade-offs are universal: structures fast at lookup are often slow at insertion; ordered structures cost more to maintain than unordered; cache-friendly contiguous layouts beat “asymptotically better” ones at small n.

Linear Structures

  • Array — O(1) index, O(n) insert/delete in middle. Cache friendly.
  • Dynamic array (vector / list) — amortized O(1) append.
  • Linked list — O(1) insert/delete given node; O(n) lookup. Poor cache behavior.
  • Stack — LIFO; push/pop O(1).
  • Queue / Deque — FIFO; ring buffer or linked list.
  • String — usually a specialized array; immutable in many languages.

Trees

  • Binary tree — generic recursive structure.
  • BST — ordered; O(log n) avg, O(n) worst (degenerate).
  • Balanced BST — AVL, Red-Black; guaranteed O(log n).
  • B-tree / B+ tree — high fan-out; backbone of databases & filesystems.
  • Heap — priority queue; insert & extract O(log n).
  • Trie — prefix search on strings.
  • Segment / Fenwick tree — range queries.

Hash-Based

  • Hash map / set — expected O(1) lookup, insert, delete.
  • Collision strategies: chaining, open addressing (linear, quadratic, double).
  • Load factor target: ~0.5–0.75 for open addressing, higher for chaining.
  • Bloom filter — probabilistic set; no false negatives.
  • HyperLogLog — cardinality estimation in O(1) memory.

Graphs

  • Representation: adjacency list (sparse) or adjacency matrix (dense).
  • Directed / undirected; weighted / unweighted; cyclic / acyclic.
  • Union-Find (DSU) — near-O(1) connectivity queries.

Choosing One

  • Need ordered iteration → balanced BST or sorted array.
  • Need fastest lookup by key → hash map.
  • Need range queries → B-tree, segment tree.
  • Need FIFO scheduling → deque or priority queue.
  • Need duplicate detection at scale → Bloom filter + DB.
reference page