Data Structures
===============

Priority Queues
---------------
A set of items with a numerical "key" giving its priority in the queue (smaller key = higher priority).

Useful when we want to track several items to process, like FIFO queue, but we want to be able to insert high-priority
items that get handled before elements already in the queue.

Operations
^^^^^^^^^^

- Insert: add a new item *x* with a key *k*
- Find-Min: find the item with the least key value
- Delete-Min: remove the item with the least key value
- Delete: remove a particular item *x* from the queue given a pointer to it
- Decrease-Key: change the key of item *x* to a smaller *k*
- Merge/Meld/Join (rarely): combine 2 disjoint priority queues

.. note::
    Certain textbooks will use higher key as higher priority, rather than lower. Formally, this is "max-priority" vs
    "min-priority" queues.

Implementations
^^^^^^^^^^^^^^^

.. note::
    In the table below, :math:`O(n)` is assumed as a tight upper bound, i.e. :math:`\Theta(n)` so I don't have to type
    ``\Theta`` each time.

+----------------------+----------+----------+----------+------------+--------------+
| Implementation       | Insert   | Delete   | Find-Min | Delete-Min | Decrease-Key |
+======================+==========+==========+==========+============+==============+
| Unsorted Linked List | O(1)     | O(1)     | O(n)     | O(n)       | O(1)         |
+----------------------+----------+----------+----------+------------+--------------+
| Sorted Linked List   | O(n)     | O(1)     | O(1)     | O(1)       | O(n)         |
+----------------------+----------+----------+----------+------------+--------------+
| Heap (min-heap*)     | O(log n) | O(log n) | O(1)     | O(log n)   | O(log n)     |
+----------------------+----------+----------+----------+------------+--------------+
| Fibonacci Heap       | O(1)     | O(log n) | O(1)     | O(log n)** | O(1)**       |
+----------------------+----------+----------+----------+------------+--------------+

\*: A tree where each node's parent has a smaller value than it.

\**: Time averaged over a larger sequence of operations (amortized).

The best implementation to use depends on how the priority queue is being used in your algorithm. We only discuss
the top 3 implementations in this class.

Amortized Runtimes
------------------
Instead of finding worst-case bounds for individual operations, find the worst case over *any sequence* of *m*
operations and average over them. This can yield better bounds if the worst case for an individual operation can't
happen every time.

Automatically-expanding Arrays
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
e.g. C++ vector, Python list

Allocate a fixed-size array to store a list; when it fills up, allocate an array of twice the size and copy
the old array into it. Notice that since you sometimes have to copy on an append, the worst case append
operation could take linear time. Naively, we might think then that doing *n* appends would take
:math:`1+2+3+...+n = \Theta(n^2)` time.

But we don't run the copy operation for every append, so our time for a single append onto a list of *n* elements:

.. math::
    T(n) & = \Theta(n) \text{ if n is a power of 2} \\
         & = \Theta(1) \text{ otherwise}

Then, doing *n* appends onto an empty list will take time:

.. math::
    \sum_{i=0}^{n-1} T(i) & = \Theta(n) \cdot \Theta(1) + 1 + 2 + 4 + 8 + ... + 2^{\lfloor \log_2 n \rfloor} \\
                          & = \Theta(n) \cdot \Theta(1) + \Theta(n) \\
                          & = \Theta(n).

Therefore, the *amortized* worst-case runtime of an append operation is :math:`\Theta(n)/n = \Theta(1)`.

.. note::
    :math:`1+2+4+...+2^{\lfloor \log_2 n \rfloor} \leq 2n = \Theta(n)`.

    Note that it's *not* :math:`\Theta(\log n)\cdot \Theta(n)` because that would assume you copy the full *n*-length
    list each time.

Incrementing Binary Counter
^^^^^^^^^^^^^^^^^^^^^^^^^^^
Suppose we have a binary counter as an array of *k* bits. Incrementing can cause a chain of carries forcing us to
update all bits: so increment is :math:`\Theta(k)` time in worst case.

But does this mean incrementing *n* times takes :math:`\Theta(nk)` time? No! It's :math:`O(nk)` but not
:math:`\Omega(nk)`.

The LSB flips every increment, the next bit every other increment, the next half as much, etc. So an increment on
average only needs :math:`1 + 1/2 + 1/4 +... \leq 2` flips.

So the amortized runtime of increment is :math:`\Theta(1)`.

Disjoint-Set Data Structures
----------------------------
aka "union-find"

Want to keep track of items which are partitioned into sets, where we can look up the set an item is in and merge
2 sets.

.. image:: _static/structures1.png
    :width: 500

Operations:

- Find: Given a pointer to an item, find which set it's in (returning a representative element or ID #)
- Union: Given 2 items, merge the sets the items are in into a single set

The most commonly used data structure for this is the *disjoint-set forest*.

**Idea**: Represent each item as a node in a forest, with a parent pointer indicating their parent node, if any.
Each root node represents a set, and all its descendants are members of the same set.

.. image:: _static/structures2.png
    :width: 500

We can then implement the operations as follows:

- Find: follow parent pointers until you reach a root, which is the representative element of the set containing the item
- Union: use Find to find the roots of each item, then make one root the parent of the other

.. image:: _static/structures3.png
    :width: 500

The runtime of Find(x) is linear in the depth of x in its tree, since we follow pointers all the way to the root. So
we want to keep the trees shallow - we use 2 heuristics:

- Path compression: when we do a find, change all nodes visited to point directly to the root

.. image:: _static/structures4.png
    :width: 350

- Union by rank: when we do a union, make the shallower tree a child of the deeper one

With both heuristics, the amortized runtime of Find and Union is :math:`O(\alpha(n))` where :math:`\alpha(n) \leq 4`
for all *n* less than the number of atoms in the universe (the inverse Ackermann function)