Posted by: Silvio Capobianco | 2015/03/12

## More Evidence that Regularity Is Not Preserved Up to Infinity

Let $\mathcal{A} = \langle Q, \mathcal{N}, f \rangle$ be a one-dimensional cellular automaton and let $F = F_\mathcal{A}$ be its global transition function. The very construction of $F$ shows that it is continuous in the product topology, as $F(c)$ coincides with $F(e)$ on $U \subseteq \mathbb{Z}$ as soon as $c$ and $e$ coincide on $U + \mathcal{N}$: and also commutes with translations, as it is easy to see how the translate of the update is precisely the update of the translate. We will discuss this in more detail in a future post.

As the image of a continuous function of a compact metric space into itself, the set $X = F(Q^\mathbb{Z})$ is a closed subset of $Q^\mathbf{Z}$ is a subshift, that is, a closed and translation invariant subset of the full shift $Q^\mathbb{Z}$. As such (again, we will go into further details in another post) $X$ is entirely determinated by its language

$\mathcal{L}(x) = \{ w \in Q^\ast \mid \exists c \in X, k \in \mathbb{Z} : c_{[k:k+|w|-1]} = w \}$

that is, the set of words that appear as “chunks” of elements of $X$. Such language, which we call the language of $\mathcal{A}$, is always regular.

To see this, we suppose $\mathcal{N} = \{ k, \ldots, k+m-1 \}$ for some $k \in \mathbb{Z}$, which we always can: and construct the de Bruijn graph of order $m$ on $Q$. This graph has as its nodes the words of length $m-1$ over $Q$, and as its edges the words of length $m$ over $Q$: an edge $w$ boes from the node $x$ to the node $y$ if and only if there exist $a, b \in Q$ such that $xb = ay = w$: that is, each word of length $m$ joins its own longest proper prefix to its own longest proper suffix. If we label each edge $w = w_1 \cdots w_m$ with $f(w_1, \ldots, w_m)$, and consider all states as both initial and final, we obtain a nondeterministic finite automaton that accepts precisely the language of $\mathcal{A}$.

As a composition of cellular automata is clearly a cellular automaton, we may consider the languages of the iterates of $\mathcal{A}$, which are the languages of the subshifts of the form $F^n(Q^\mathbb{Z})$ for $n \geq 0$. Now, an arbitrary intersection of closed sets is closed: similarly, an arbitrary intersection of translation invariant sets is translation  invariant. This means that the set

$\Omega_\mathcal{A} = \bigcap_{n \geq 0} F^n(Q^\mathbb{Z})$

is a subshift: which we call the limit set of $\mathcal{A}$. Similarly, we call the language $\Lambda_\mathcal{A}$ of $\Omega_\mathcal{A}$ the limit language of $\mathcal{A}$.

The first question that comes to the mind, is whether the limit language is regular or not: after all, regularity of intersection is only ensured for finitely many regular languages at once. Indeed, the answer is negative: a very nice counterexample was constructed by Lyman Hurd and published in 1987 on Complex Systems.

The idea of Hurd’s counterexample is to use signals moving either left or right at unitary speed, and ummovable, elastic walls that make signals bounce if they arrive at the same time from opposite directions, otherwise just stopping them. Conflicts are resolved by having the signals stop if they would end up occupying the same spot or crossing each other.

So, let’s set $Q = \{ \ell, r, \Box, W \}$, meaning signal moving to the left, signal moving to the right, empty slot, and wall, respectively; $\mathcal{N} = \{ -2, -1, 0, +1, +2 \}$, which is needed to correctly manage collisions; and $f$ according to the following lookup table:

$\begin{array}{|l|l|l|} \hline \mathrm{Input} & \mathrm{Output} & \mathrm{Condition} \\ \hline \left( \cdot, r, \Box, x, \cdot \right) & r & x \neq \ell \\ \hline \left( \cdot, \cdot, r, \Box, x \right) & \Box & x \neq \ell \\ \hline \left( \cdot, y, \Box, \ell, \cdot \right) & \ell & y \neq r \\ \hline \left( y, \Box, \ell, \cdot, \cdot \right) & \Box & y \neq r \\ \hline \left( \cdot, \cdot, r, W, \ell \right) & \ell & \\ \hline \left( r, W, \ell, \cdot, \cdot \right) & r & \\ \hline \left( \cdot, \cdot, z, \cdot, \cdot \right) & z & \mathrm{otherwise} \\ \hline \end{array}$

Hurd’s next trick relies on the following observation. Suppose $L$ is an arbitrary language, and $R$ a regular language: as a finite intersection of regular languages is regular, if it happens that $L \cap R$ is not regular, the only possible reason is that $L$ was already not regular in the first place! This is precisely what Hurd does by setting

$R = \Box \Box \ell \Box^\ast W \Box^\ast r \Box \Box$

and showing that

$\Lambda_\mathcal{A} \cap R = \{ \Box \Box \ell \Box^n W \Box^n r \Box \Box \mid n \geq 0 \}$

which clearly does not respect the necessary condition given by the pumping lemma for regular languages.

First, every such word belongs not only to $R$ (as it is evident) but to $\Lambda_\mathcal{A}$ too. Indeed, if we extend the word to a configuration by setting all the other cells as blank, we may obtain an infinitely long backward history by observing that:

• A predecessor of $\cdots \Box \Box \ell \Box^n W \Box^n r \Box \Box \cdots$ for $n > 0$ is $\cdots \Box \Box \ell \Box^{n-1} W \Box^{n-1} r \Box \Box \cdots$
• A predecessor of $\cdots \Box \Box \ell W r \Box \Box \cdots$ is $\cdots \Box \Box r W \ell \Box \Box \cdots$
• A predecessor of $\cdots \Box \Box r \Box^n W \Box^n \ell \Box \Box \cdots$ for $n \geq 0$  is $\cdots \Box \Box r \Box^{n+1} W \Box^{n+1} \ell \Box \Box \cdots$

Next, let $\Box \Box \ell \Box^i W \Box^j r \Box \Box$ be an arbitrary word of $R$ which also belongs to $\Lambda_\mathcal{A}$. Can it be that $i > j$? No, it cannot, because by doing $i-1$ backward steps we necessarily arrive at $\Box \Box \ell \Box W r \Box \Box$, which is an orphan. Why so? Because the only ways to have a situation of the form $W r$, is either as unchanged from the beginning, or immediately after a bounce: but after a bounce, the situation is $\ell W r$, not $\Box W r$. Symmetrically, it cannot be $i < j$.

Can more be said? Indeed, Hurd himself provides an upgraded construction, adding new signals moving left and right at double speed: the limit language of the new cellular automaton is not context free. Finally, as a consequence of computational universality of cellular automata (the reader can easily see how to simulate arbitrary Turing machines by suitable one-dimensional cellular automata) Hurd proves that some limit languages are not recursively enumerable: this last proof is, as it can be expected, nonconstructive. On the other hand, the complement of the limit language, as a union of regular languages, is recursively enumerable.

Bibliography:

1. Lyman P. Hurd. Formal Language Characterizations of Cellular Automaton Limit Sets. Complex Systems 1 (1987), 69–80.