Randomized Routing

This is a fascinating topic that I learnt about and gave a lecture on as a GSI for CSE 572: Randomness & Computation in Fall 2024. Most of this material is taken from Section 4.2 of Randomized Algorithms, by Rajeev Motwani and Prabhakar Raghavan. The Randomized Routing Algorithm is a great demonstration of why deterministic algorithms can sometimes suffer, and how a simple randomization of a deterministic algorithm helps.

The Routing Problem.

We tackle the problem of communication in a network of parallel processors. We model our problem via a graph \(G = (V,E)\) where vertices represent processors and edges represent communication channels. We consider the permutation routing problem where each node $v$ sends a packet \(p_v\) to a unique destination \(d(v)\) as specified by a permutation \(d \colon V \to V\). The only two constraints we have are that (i) packets must traverse only on edges, and (ii) each edge can carry only one packet at a time. A routing scheme specifies the path that each packet takes to reach its destination with an aim to minimize the overall time taken for all packets to reach their destination.

We focus on the hypercube graph \(\mathcal{H}_n\) whose vertices \(V = \{0,1\}^n\) are all possible \(n\)-bit strings and each edge allows us to flip exactly one bit. That is, \((u,v)\) is an edge iff \(u\) and \(v\) differ in exactly one bit. Equivalently, \((u,v)\) is an edge iff \(u = v \oplus e_i\) for some unit bit vector \(e_i\). We focus on a class of routing schemes known as oblivious schemes where the path for the packet \(p_v\) depends only on the endpoints \(v\) and \(d(v)\) and not on any other nodes. Such schemes are easy to implement in practice. The most natural algorithm one can think of is called the bit-fixing algorithm. To route \(v = (v_1, \ldots, v_n)\) to \(u = (u_1, \ldots, u_n)\), we scan the bits from left to right and ‘‘fix’’ the incorrect bits. For example, to route \(0011\) to \(1000\), we follow the following path \(0011 \to 1011 \to 1001 \to 1000.\)

Deterministic Routing is Bad.

Note that over the \(n\)-bit hypercube \(\mathcal{H}_n\), any routing path with the bit-fixing scheme has length at most \(n\). Despite this, the bit-fixing scheme can incur a large overhead, as we show below.

Claim. There exists a permutation \(d \colon \{0,1\}^n \to \{0,1\}^n\) such that the bit-fixing scheme requires at least \(2^{n/2}/n\) time steps to pass all packets.

Proof. W.l.o.g., we assume that \(n\) is even. For any bitstring \(x \in \{0,1\}^n\), we can rewrite \(x = (x_1, x_2)\) where \(x_1, x_2 \in \{0,1\}^{n/2}\). Now, consider any permutation \(d \colon \{0,1\}^n \to \{0,1\}^n\) that maps \((x,0)\) to \((0,x)\) for all \(x \in \{0,1\}^{n/2}\). Note that the path for each packet \(p_v\) where \(v\) is of the form \((x,0)\) must pass through the all-zeros node. Further, there are \(2^{n/2}\) such packets. Since the all-zeros node has only \(n\) outgoing edges, we need at least \(2^{n/2}/n\) time steps for all these packets to pass through this node. \(\square\)

Although we proved this claim only for the bit-fixing scheme, the result holds in full generality as follows (we of course omit the proof here).

Theorem. For any deterministic routing scheme, there exists a permutation \(d \colon \{0,1\}^n \to \{0,1\}^n\) that requires at least \(2^{n/2} / \sqrt{n}\) time steps.

Randomization to the Rescue!

Although it seems like all hope is lost, a slight randomization of the bit-fixing scheme works wonders. We consider the two-phase randomized bit-fixing scheme which works as follows:

  1. For each \(v \in \{0,1\}^n\), pick a uniformly random intermediate node \(\pi(v)\).
  2. Route \(v\) to \(\pi(v)\) using the (deterministic) bit-fixing scheme.
  3. Route \(\pi(v)\) to \(d(v)\) using the (deterministic) bit-fixing scheme.

Astonishingly, this leads to the following result.

Theorem. With high probability, all packets are routed in at most \(14n\) time steps.

We first prove two Lemmas that will help us prove this result.

Lemma 1. Let \(v,u,v^{\prime}, u^{\prime} \in \{0,1\}^n\). Let \(P\), \(P^{\prime}\) be the directed paths from \(v\) to \(u\) and \(v^{\prime}\) to \(u^{\prime}\) obtained by the bit-fixing scheme. If \(P\) and \(P^{\prime}\) intersect at some point, continue together for a while, and then separate, then they never re-join.

Proof. The idea is that once two paths diverge, the remaining nodes in their route will always differ in at least one bit, and thus, their paths never intersect again. Let us denote the two paths as

\[P \colon v = w^{(1)}, w^{(2)}, \ldots, w^{(m)} = u\] \[P^{\prime} \colon v^{\prime} = w^{\prime(1)}, w^{\prime(2)}, \ldots, w^{\prime(m^{\prime})} = u^{\prime}\]

Suppose that \(w^{(i)} = w^{\prime(j)} = w\) but \(w^{(i+1)} \neq w^{\prime(j+1)}\). That is, the two paths overlap until \(w\) but separate on the next step. Since each step fixes a particular bit and the two paths diverge, we must have \(w^{(i+1)} = w \oplus e_a\) and \(w^{\prime(j+1)} = w \oplus e_b\) with \(a \neq b\) (in words, the next step fixes different bits in the two strings). Without loss of generality, we assume that \(a < b\). Since the \(a^{\text{th}}\) bit is never touched again by the scheme, we have \(w_a^{(l)} = w_a^{(l+1)} = w_a \oplus 1\) for all \(l \geq i+1\), and \(w_a^{\prime(l)} = w_a^{\prime(j+1)} = w_a\) for all \(l \geq j+1\). Thus, the nodes of the two paths after separating always differ in the \(a^{\text{th}}\) bit and thus never intersect again. \(\square\)

Lemma 2. Fix any node \(v\). Let the route from \(v\) to \(\pi(v)\) follow the sequence of edges \((e_1, \ldots, e_k)\). Let $S$ be the set of all other nodes \(v^{\prime} \in \{0,1\}^n\) whose routes pass through at least one edge in \(\{e_1, \ldots, e_k\}\). Then the delay incurred by \(p_v\) is at most \(\lvert S \rvert\).

Proof. If a packet is ready to traverse edge \(e_j\) at time \(t\), we define its lag as \(t - j\) (since it should have traversed this edge at the \(j^{\text{th}}\) step if there was no congestion). The lag of \(p_v\) is initially zero, and the total delay it incurs is the lag when it traverses edge \(e_k\). The idea is that everytime our packet $p_v$ is delayed, we can ‘‘charge’’ this delay to a distinct member of \(S\) each time. This is because if \(p_v\) is delayed at any time step, then there must be some message that traverses the edge that \(p_v\) is held up at, and this message must be a member of \(S\) by definition. Further, Lemma 1 tells us that once we have charged a member of \(S\), its path never intersects the route of \(p_v\) again, so we never end up charging it again. Thus, the total delay incurred by \(p_v\) can be at most \(\lvert S \rvert\). \(\square\)

We are now in a position to prove our main theorem. Let \(H_{ij}\) denote the indicator random variable that is \(1\) if the routes \(\rho_i\) and \(\rho_j\) share at least one edge, and \(0\) otherwise. From Lemma 2, it follows that the total delay incurred by \(i\) is at most \(\sum_j H_{ij}\). One crucial observation is that since the routes of the packets are chosen independently at random, \(H_{ij}\)’s are independent. We will use Chernoff to get an upper bound on \(\sum_j H_{ij}\), for which we first upper bound \(\mathbb{E} \left[ \sum_j H_{ij} \right]\).

For any edge \(e\), let \(T(e)\) denote the number of routes that pass through \(e\). Fix any route \(\rho_i = (e_1, \ldots, e_k)\). We clearly have

\[\sum_j H_{ij} \leq \sum_{l=1}^k T(e_l) \implies \mathbb{E} \left[ \sum_j H_{ij} \right] \leq \mathbb{E} \left[ \sum_{l=1}^k T(e_l) \right].\]

Further, we have \(\mathbb{E}[T(e_l)] = \mathbb{E}[T(e_m)]\) for any two edges \(e_l, e_m\) by symmetry. Note further that the expected length of each route \(\rho_j\) is exactly \(n/2\). and so, the expected total length when summed over all messages is \(2^n \cdot n/2\). The number of (directed) edges in the hypercube is \(2^n \cdot n\), and thus \(\mathbb{E}[T(e)] = 1/2\) for all edges \(e\). Thus, we have

\[\mathbb{E} \left[ \sum_j H_{ij} \right] \leq \frac{k}{2} \leq \frac{n}{2}.\]

Since \(H_{ij}\)’s are independent, we may apply Chernoff to the sum, which gives us

\[\Pr \left[ \sum_j H_{ij} \geq 6n \right] \leq \Pr \left[ \sum_j H_{ij} \geq (1 + 10n) \cdot \frac{1}{2} \right] \leq \exp \left( -\frac{100n^2}{2 + 10n} \cdot \frac{1}{2} \right) \approx e^{-5n} < e^{-4n}.\]

Thus, with probability at least \(1 - e^{-4n}\), we have \(\sum_j H_{ij} \leq 6n\) so that the delay incurred by message \(i\) is at most \(6n\). Union-bounding over all \(2^n\) packets, the probability that any packet experiences a delay of more than \(6n\) is at most \(2^n \cdot e^{-4n} < 2^{-3n}\). Since the length of each route is at most \(n\), each packet reaches its intermediate node in at most \(7n\) steps with very high probability. The second phase from the intermediate nodes to destination nodes goes through using the same analysis (and can actually be seen as a time-reflected version of what we just did). Thus, putting together both phases, we have with very high probability that each packet reaches its destination in at most \(14n\) steps. \(\square\)