Limit theorems for chains with unbounded variable length memory which satisfy Cramer condition

Here we obtain the exact asymptotics for large and moderate deviations, strong law of large numbers and central limit theorem for chains with unbounded variable length memory.


Instroduction
Let A = {0, 1, . . . , d} be a set of d symbols (characters) -an alphabet. Here we consider a class of chains r = (r i , i ∈ Z) ∈ A Z which is a special case of so-called chains with unbounded variable length memory. These chains began to be actively studied after they were first introduced by Jorma Rissanenom [1] as an economic and universal way of data compression. The short and simple introduction to these processes can be found, for example, in [2]. They are used for modeling data in computer science [1], in biology [3], [4], in neuro-biology [6], [7], in linguistics [5]. We do not pretend for at least some sort of full review on these chains, and we restrict ourselves to some papers known to us by the activity of the research group NeuroMat under the guidance of Prof. A. Galves. If we interpret Z as a discrete time, then one can imagine such chains as a successive in time attribution of a character from the alphabet A with probability which depends on the past (existing sequence of characters), or, more precisely, it depends on a part of the past, a context. As a consequence such dependence can be represented as a context tree, where each vertex represent a context, and each vertex is associated with a probability distribution (on A) of a new character. Markov chain with the state space A is a particular case of these chains. In this case the context tree has height 1 -we should know only the last character in order to decide the distribution of a new character. The existence of a stationary measure on A Z compatible with a family of transition probabilities determined by a context tree is the question which naturally arises. These question was answered and the short review can be find [2], where also methods of statistical inferences of context tree were provided.
In [8] the perfect simulation for such processes was constructed. The success of the algorithm (if the perfect simulation stops in a finite time) depends directly from the existence of a (finite) renewal time moment -the moment when the next and successive attributions of characters do not depend on the past. In the same paper the connection between renewal processes and the chains with variable length memory was established. This indicates to us that the large deviations results for this sort of chain can be obtained using the regeneration structure and the recently published results on large deviations for the renewal processes [9].
Despite the fact that for a complete definition and description of chains with unbounded variable length memory we need to introduce the notion of a context tree, in this article we restrict ourselves to an alternative description of the chain, because from the very beginning we will consider a particular case of such chains. Let us fix an initial configuration where r i (0), −∞ < i ≤ 0 take values from the alphabet A, which is binary alphabet A := {0, 1} in our case. We set the configuration change rule. Remind that at any time step we write exactly one character from the alphabet A at the end of existing sequence without changing it.
In order to set the transition rules, let first fix a number v ∈ N (one of the parameters of the chain). Consider the set of all words from v characters ending with the character 1; the total number of such words is 2 v−1 ; for any such word we assing their own order number j, 1 ≤ j ≤ 2 v−1 . Let us set the sequence of positive numbers on this set of words Now we are ready to describe the rules (transition probabilities) of adding a character from the right: let we have a configuration on the n-th step Then, at the next step the configuration r(n) = {r i (n)} −∞<i≤n jump to the configuration r(n + 1) = {r i (n + 1)} −∞<i≤n+1 by writing from the right a character 1, r n+1 (n + 1) = 1, with probability p knjn , where k n = n − m n , j n -the number of word, which forms the sequence r mn−v+1 (n), r mn−v+2 (n), . . . , r mn (n).
Thus, the character 0, r n+1 (n + 1) = 0, will be adding with probability 1 − p knjn . Note that the previous sequence do not change: Thus, the probability of the attributed character r n+1 (n + 1) depends on 1) the distance to the nearest character 1; 2) the word from v − 1 letters, which stand on the left from this 1.
It is obvious that the random sequence {r n (n)} is not Markov chain, because the transition probability from r n (n) to r n+1 (n + 1) can depend, generally speaking, not only on the character r n (n), but also on values r n−j (n − j) for arbitrarily large j ≥ 0.
For such defined process r(n) define R(n) the number of units adding from the right to the inicial configuration r(0) in n steps: We are interested in the behaviour of the process R(n) when n → ∞. In the next section we prove low of large numbers, central limit theorem, local limit theorem, we establish also the large and moderate deviation principles.

Further we suppose that the following condition [A] holds true. The condition [A]
composed by two items.
2. There exist constants 1 > δ 1 > δ 2 > 0 such that for all k ∈ 0 ∪ N, 1 ≤ j ≤ 2 v−1 the following inequalities holds The condition 1 is an obvious condition for existence of process and for implementation of transition probabilities. Note, however, that this condition can be omitted adding the probability p ∞ ∈ (0, 1) to atribute a character 1, when the sequence consists of only zeros. Condition 2 gives us the possibility of construction of arithmetic generalized renewal process which satisfy Cramer moment condition [C 0 ] and the condition of arithmeticity of [Z] (see Section 2).
The paper is organized as follows: in Section 2 we introduce our definitions and notations, we provide the main result Theorem 2.2 (low of large numbers, central limit theorem, local moderate and large limit theorem and principle of moderate large deviations for R(n)); in Section 3 we prove Theorem 2.2; in Section 4 auxiliary lemmas are proved.

Main results, definitions, notations
To formulate and prove the main result we need some auxiliary processes which we define in this section.
For any state r(n) we correspond the pair where Y 1 (n) := n − m n (the distance to the nearest unit, or, what's the same, the number of zeros before the nearest unit), (sequence of the nearest unit and v − 1 letters from its left). Note, that the pair Y (n) := (Y 1 (n), Y 2 (n)) can transit with probability p knjn into the pair and with probability 1 − p knjn into the pair In this way Y (n + 1) is an random function on Y (n). Thus, the sequence {Y (n)}, n ≥ 0 is homogeneous Markov chain with phase state Let us pick out the state y 0 := (0, (0, . . . , 0, 1)).
Note that the chain can jump in one step from any state (y 1 , y 2 ) to chosen state y 0 , if the coordinate y 1 not less than v − 1. Denote Since {Y (n)} is homogeneous Markov chain, the random variables τ 1 , . . . , τ k , . . . are independent and, moreover, τ k are identically distributed when k ≥ 2.
Let ζ k be the number of units added from the right during the time n ∈ {(T k−1 + 1, . . . , T k }, where In other words By construction the random vectors ξ k := (τ k , ζ k ), k ∈ N are independent, and ξ k are identically distributed when k ≥ 2. Let Define generalized renewal process Let the random vector ξ := (τ, ζ) has the distribution which coincides with distribution of vectors ξ k = (τ k , ζ k ) for k ≥ 2.
Since ζ k ≤ τ k a.s. when k ∈ N, then from Lemma 4.1 (see Section 4) it follows that for ξ 1 and ξ the Cramer's condition [C 0 ] holds true: From Lemma 4.2 (see Section 4) we obtein that the vector ξ satisfies the arithmeticity condition [Z]: For any u ∈ Z 2 the equality f (2πu) = 1 holds and for any u ∈ R 2 \ Z 2 the inequality is the characteristic function for ξ.
Let us give now the main result of our work.
3. (local theorem in regions of normal, moderate and large deviations) There  3 Proof of Theorem 2.2 P r o o f of statements 1) and 2). Since ζ k ≤ τ k a.s. when k ∈ N, then the following inequality holds Using Lemma 4.4 and Borel-Cantelli lemma, when n → ∞, it is easy to see that Thus the statements 1) and 2) follow from the inequality (1) and corresponding results for Z(n) (see [12] Theorem 11.5. Note that if T k = n − l, then the random variable Z k uniquely defined by the values of Markov chain Y (m) when m < n − l, but the random variables R(n) − Z k and τ k+1 depend on the values of the chain Y (m) when m > n − l. Therefore, by the inclusion the following equality holds Applying (2) and (3) we obtain (4) Note that Thus, from equality (4) it is follows that Since P(T 0 = n − l) = 0 when 0 ≤ l ≤ [ln 2 n], then from Theorem 2.2 [9] it is follows, when Since the function ψ 1 (λ(α), µ(α)) is continuous in a neighborhood of the point α = a, and the function C H (θ, α) is continuous in a neighborhood of the point (θ, α) = (1, a), then, for sufficiently small ∆ and n → ∞ the following equality holds true Applying Lemma 4.6 (see Section 4) and considering that 0 ≤ s ≤ l ≤ [ln 2 n] and |α 0 − a| < ∆, from the equality (6) we obtain converges.
✷ P r o o f of statement 5). From Consequence 3.2 (see [10]) it is follows that the sequence of random variablesZ(n) := Z(n)−an κ satisfies LDP with normalized function ϕ(n) = κ 2 n and rate funtion I(y).
Using Therefore from Theorem 4.2.13 (see [11]) we obtain that the sequencesR(n) andZ(n) satisfy the same LDP. ✷

Auxiliary Results
Lemma 4.1. For any k, n ∈ N the following inequality holds where P r o o f. Due to the fact that the process Y (n) is markovian it is sufficient to prove Lemma 4.1 for τ 1 with an arbitrary initial condition. We fix some inicial state Y (0) = (y 1 , y 2 ). Since C > 1, Ce −ρn = C 1− n v , then for n ≤ v the right-hand side of inequality (17) is not less than 1, therefore (17) obviously holds true.
We prove now the inequality (17) Since it is obvious that Hence we have P n : = P(τ 1 ≥ n | Y (0) = (y 1 , y 2 )) Since by condition [A] each cofactor in the right-hand side has the following upper bound then we have ✷ Lemma 4.2. For any u ∈ Z 2 the equality f (2πu) = 1 holds true, and for any u ∈ R 2 \ Z 2 the inequality |f (2πu)| < 1 holds, where is characteristic function for ξ.
P r o o f. Since τ and ζ are integers numbers, then, it is obvious, that for u ∈ Z 2 the equality f (2πu) = 1 holds true. We show that for any u ∈ R 2 \Z 2 the inequality |f (2πu)| < 1 holds true.

From Condition [A] it is follows that
Thus, if or hypothesis is true, then should exist k 1 ∈ Z, k 2 ∈ Z, k 3 ∈ Z such that the following inequalities holds true Divide each equality by 2π. Subtracting from the 2nd equality the 1st we obtain u 2 = k 2 − k 1 ∈ Z; substracting from the 3rd equality the 1st, we obtain u 1 = k 3 − k 1 ∈ Z. The resulting contradiction completes the proof. ✷ For the vector (λ,μ) such that ψ(λ,μ) = 1, we consider the sequence of random vectors (τ k ,ζ k ), k ∈ N, whose joint distribution is given as follows  Let γ +λ +μ < ρ, then there exists the constantĈ > 0, such that for any n ∈ N the following inequality holds true Ee γτν (n)+1 <Ĉn.
P r o o f. Since random variablesτ k+1Tk are independent, then Due to the fact that by arithmeticity the inequalityT k ≥ n a.s. when k ≥ n. Therefore, using Lemma 4.1 and inequality τ k ≥ ζ k a.s., we obtain n.

From Lemma 4.1 it is follows that
Since the function D(α) is continuouis in a neighborhood of the point α = a and D(a) = 0, then for sufficiently small ∆ > 0 and α : |α − a| ≤ ∆ the following inequality holds Denote Z k := k l=1 ζ l .