Data Structures
Data structures organize values so that operations on them are cheap. Picking the right structure usually beats micro-optimizing the wrong one.
Overview
Trade-offs are universal: structures fast at lookup are often slow at insertion; ordered structures cost more to maintain than unordered; cache-friendly contiguous layouts beat “asymptotically better” ones at small n.
Linear Structures
- Array — O(1) index, O(n) insert/delete in middle. Cache friendly.
- Dynamic array (vector / list) — amortized O(1) append.
- Linked list — O(1) insert/delete given node; O(n) lookup. Poor cache behavior.
- Stack — LIFO; push/pop O(1).
- Queue / Deque — FIFO; ring buffer or linked list.
- String — usually a specialized array; immutable in many languages.
Trees
- Binary tree — generic recursive structure.
- BST — ordered; O(log n) avg, O(n) worst (degenerate).
- Balanced BST — AVL, Red-Black; guaranteed O(log n).
- B-tree / B+ tree — high fan-out; backbone of databases & filesystems.
- Heap — priority queue; insert & extract O(log n).
- Trie — prefix search on strings.
- Segment / Fenwick tree — range queries.
Hash-Based
- Hash map / set — expected O(1) lookup, insert, delete.
- Collision strategies: chaining, open addressing (linear, quadratic, double).
- Load factor target: ~0.5–0.75 for open addressing, higher for chaining.
- Bloom filter — probabilistic set; no false negatives.
- HyperLogLog — cardinality estimation in O(1) memory.
Graphs
- Representation: adjacency list (sparse) or adjacency matrix (dense).
- Directed / undirected; weighted / unweighted; cyclic / acyclic.
- Union-Find (DSU) — near-O(1) connectivity queries.
Choosing One
- Need ordered iteration → balanced BST or sorted array.
- Need fastest lookup by key → hash map.
- Need range queries → B-tree, segment tree.
- Need FIFO scheduling → deque or priority queue.
- Need duplicate detection at scale → Bloom filter + DB.