ON THE OPTIMAL IMPORTANCE PROCESS FOR PIECEWISE DETERMINISTIC MARKOV PROCESS

. In order to assess the reliability of a complex industrial system by simulation, and in reasonable time, variance reduction methods such as importance sampling can be used. We propose an adaptation of this method for a class of multi-component dynamical systems which are modeled by piecewise deterministic Markovian processes (PDMP). We show how to adapt the importance sampling method to PDMP, by introducing a reference measure on the trajectory space. This reference measure makes it possible to identify the admissible importance processes. Then we derive the characteristics of an optimal importance process, and present a convenient and explicit way to build an importance process based on theses characteristics. A simulation study compares our importance sampling method to the crude Monte-Carlo method on a three-component systems. The variance reduction obtained in the simulation study is quite spectacular.


Introduction
For safety and regulatory issues, nuclear or hydraulic industries must assess the reliability of their power generation systems.To do so, they can resort to probabilistic safety assessment.In recent years, dynamic reliability methods have been gaining interest, because they avoid conservative static approximations of the systems and they better capture the dynamics involved in the systems.When dealing with complex industrial systems, this kind of reliability analysis faces two main challenges: the first challenge is related to the modeling of such complex systems, the second one concerns the quantification of the reliability.Indeed as we refine the model the estimation of the reliability requires more efforts and is often challenging.

A model based on a PDMP
Due to the complexity of the systems the reliability analysis is often done through an event tree analysis [4] which requires static approximations of the system, and relies on conservative approximations.With the development of computational capacities, it is now possible to consider more accurate tools for reliability assessment.
Several attempts have been proposed to better model the dynamical processes involved in the systems.In this article, we focus on the option proposed in [15,29], which consists in modeling the system using a piecewise deterministic Markovian process (PDMP) with boundaries.
In many industrial systems, and in particular in power generation systems, failure corresponds to a physical variable of the system (such as temperature, pressure, water level) entering a critical region.The physical variables can enter this region only if a sufficient number of the basic components of the system are damaged.In order to estimate the reliability we need an accurate model of the trajectories of the physical variables.In industrial systems, the physics of the system is often determined by ordinary or partial differential equations which depend on the statuses of the components within the system (on, off or failed).Therefore the dynamics of the physical variables changes whenever the statuses of the components are altered.Such alteration can be caused by automatic control mechanisms within the systems or failures or repairs.It is also possible that the values of physical variables impact the statuses of the components, because the failure and repair rates of the components depend on the physical conditions.In order to deal with this interplay between the physical variables and the statuses of components, we need to model their joint evolution.The vector gathering these variables is called the state of the system.To address the challenge of modeling the trajectory of the state of the system, we model the evolution of the state of the system by a piecewise deterministic Markovian process (PDMP) with boundaries.PDMPs were introduced by M.H.A Davis in [11,12], they benefit from high modeling capacity, as they are meant to represent the largest class of Markovian processes that do not include diffusion.These processes can easily incorporate component aging, failure on demand, and delays before repairs.
For a given system, we denote its state at time t by Z t = (X t , M t ), where X t is the vector of the values of the physical variables, and M t the vector gathering the statuses of all the components in the system.Throughout the paper we call X t the position of the system, and M t the mode of the system.Z = (Z t ) t∈[0,t f ) represents a trajectory of the state of the system up to a final observation time t f .We consider that the trajectories are all initiated in a state z o .
Recall the system fails when the physical variables enter a critical region.We denote by D the corresponding region of the state space, and we denote by D the set of the trajectories of Z that pass through D. In order to estimate the reliability on the observation time t f , we want to estimate the probability of system failure defined by p = P Z ∈ D|Z 0 = z o = P zo Z ∈ D .

Accelerate reliability assessment by using importance sampling
The second challenge is that the reliability of a complex industrial system can rarely be assessed analytically, so reliability analysis often relies on simulations techniques.The company Électricité de France (EDF) has recently developed the PyCATSHOO toolbox [8,10], which allows the simulation and the modeling of dynamic hybrid systems.PyCATSHOO bases its modeling on PDMPs.Thanks to Monte-Carlo simulation, it evaluates dependability criteria, among which is the reliability of the system.The method we present in this article is used to accelerate the reliability assessment within the PyCATSHOO toolbox.
In the context of reliable systems, crude Monte-Carlo techniques perform poorly because the system failure is a rare event.Indeed, with the Monte-Carlo method, when the probability of failure approaches zero, the number of simulations to get a reasonable precision on the relative error increases dramatically, and so does the computational time.To reduce this computational burden, one option is to reduce the number of simulations needed by using a variance reduction method.Among variance reduction techniques [3,24], we may think of multilevel splitting techniques [5,14] and of importance sampling techniques [2,13,15,30].A variance reduction method, inspired from particle filtering can be used on a particular case of PDMP that is a PDMP whithout boundary [28].Unfortunately the industrial systems are often modeled by a PDMP with boundaries, and other variance reduction methods need to be designed for these cases.We choose to focus on the importance sampling technique, because: (1) the importance sampling strategy that we propose can easily be implemented (in particular in the PyCATSHOO toolbox); (2) the results derived in this paper (in particular the reference measure and the expressions of the densities and likelihood ratios) should be useful to study multilevel splitting.
In this paper we present how to adapt the importance sampling technique for PDMP.By doing so we generalize the use of importance sampling, not only for many power generation systems, but also for any phenomenon that can be modeled by a PDMP.As PDMP generalizes numerous kinds of processes (among which are discrete Markov chains, continuous time Markov chains, compound Poisson processes or queuing systems), the scope of our work goes way beyond the study of power generation systems.

Prerequisite for importance sampling on PDMPs
Remember that we want to apply importance sampling to estimate the probability p = P zo Z ∈ D that the system fails.In our case, importance sampling would consist in simulating from a more fragile system, while weighting the simulation outputs by the appropriate likelihood ratio.The issue is that the random variable we are considering is a trajectory of a PDMP, so we need to clarify what is the density (or the likelihood) for a trajectory of PDMP.Namely we need to introduce a reference measure for PDMP trajectories, and to identify its related densities.
In simple cases of dynamical importance sampling, this issue of the reference measure is often eluded, because the reference measure has an obvious form: it is often a product of Lebesgue measures, or a product of discrete measures.But PDMPs are very degenerate processes, their laws involve hybrid random variables which have continuous and discrete parts.In this context, it is important to ensure that we do have a reference measure that is sigma-finite to define properly the densities and the likelihood ratios.
Suppose ζ is a reference measure for P zo Z ∈ ., we denote by f the density of Z with respect to ζ, and we denote by g the density of an importance process Z with respect to ζ.If ζ exists, and f and g satisfy ∀z ∈ D, f (z) = 0 ⇒ g(z) = 0, then we can write: . (1.1) If Z 1 , . . ., Z Nsim is a sample of independent trajectories simulated according to an importance process with density g, then P zo Z ∈ D can be estimated without bias by: g(Z) < ∞ and the conditions above are verified, we have a central limit theorem on pIS : Thus the use of importance sampling on PDMP trajectories requires the following three conditions: (C1) We have a measure ζ on the trajectory space, and the trajectory Z of the system state has density f with respect to ζ (C2) We are able to simulate trajectories according to an importance process Z which has density g with The existence of a reference measure is an important theoretical argument, but it can also be used to characterize the admissible importance processes.Knowing the reference measure ζ tells us what we can modify in the law of Z to obtain an importance process Z with a well-defined likelihood ratio.It is a valuable information to know to which extent we can modify the density f to get the density g, because the variance σ 2 IS depends on the density g.
It is theoretically possible to design an importance sampling strategy with zero variance, indeed, it suffices to use an importance process with a density In practice, however, we cannot reach this zero variance, as we do not know the value of p.The expression of g * rather serves as a guide to build an efficient and explicit density g.Indeed we can try to choose a density g as close as possible from g * in order to get a strong variance reduction.

Our contributions to the literature
Many authors have used importance sampling on particular cases of PDMP sometimes without noting it was PDMPs, see [19][20][21][22].Sometimes, the authors using PDMPs avoid considering automatic control mechanisms which activate and deactivate components depending on the values of physical variables.Such automatic control mechanisms play an important part in power generation systems, and therefore that can not be avoided in our case.Also, the modeling of control mechanisms implies to work with a special kind of PDMPs, which are the PDMPs with boundaries.These PDMP are typically the kind for which the reference measure is complex.In [22,25], importance sampling is used on PDMP while taking into account automatic control mechanisms but the reference measure is not clearly identified, and so far we have not found a proof that likelihood ratios involved in importance sampling on PDMP are always defined.In Section 3 we provide a reference measure for PDMP trajectories.This allows to define the likelihood ratios for PDMP trajectories and to use the importance sampling method, but also to identify the admissible importance processes.Our major contributions are presented in Section 4 where the characteristics of the optimal importance process are identified, and used to propose a convenient way to build the importance process in practice.Note that the characteristics of the optimal importance process are identified for the general case of PDMP, therefore our result can be generalized to any subclass of the PDMP process, like Markov chains, or continuous time Markov chains, or queuing systems.

Optimization of the variance reduction
Finding the optimal importance process is equivalent to solving the following minimization problem: Minimizing a quantity on a density space being difficult, we usually consider a parametric family of importance densities {g α } and look for a parameter α which yields an estimator with the smallest possible variance.Under favorable circumstances the form of the family can be determined by a large deviation analysis [16,17,27], but the large deviation method is difficult to adapt to PDMP with boundaries which are degenerate processes with state spaces with complicated topologies.Therefore we focus on other methods which rather try to minimize an approximation of the distance between the importance density g and the optimal one g * .For instance, if the approximated distance happens to be D(g, g it is equivalent to minimize the variance of the estimator, and if we consider the Kullback-Leibler divergence so that D(g, g * ) = E g * log g * (Z) , we would be using the Cross-Entropy method [13,30].These two options have been compared on a set of standard cases in [7].They yielded similar results, though results obtained with the Cross-Entropy seemed slightly more stable than with the other option.In [31] the Cross-Entropy method was applied to a model equivalent to a PDMP without boundaries and showed good efficiency.Therefore we choose this method to select the parameters of the importance process in our paper.Of course, the efficiency of this procedure strongly depends on the choice of the parametric family of importance densities.
The rest of the paper is organized as follows.Section 2 introduces our model of multi-component system based on a Piecewise deterministic Markovian process.In Section 3, we introduce a reference measure on the space of the PDMP trajectories and study the admissible importance processes.In Section 4 we present an optimal process and a clever way to build the importance process in practice.In Section 5 we apply our adaptation of the importance sampling technique on a three-component system and compare its efficiency with the Monte-Carlo technique.

State space of the system
We consider a system with N c components and d physical variables.Remember we call position the vector X ∈ R d which represents the physical variables of the system, and we call mode the vector M = (M 1 , M 2 , . . ., M Nc ) gathering the statuses of the N c components.The state of the system Z includes the position and the mode: For ease of the presentation, we consider the status of a component can be ON , or OF F , or out-of-order (noted F ), so that the set of modes is M = {ON, OF F, F } Nc , but as long as M stays countable, it is possible to consider more options for the statuses of the components.For instance, one could consider different regimes of activity instead of the simple status ON , or different types of failure instead of the status F .Note that we can also deal with continuous degradations, like the size of a breach in a pipe for instance: the presence of the degradation can be included in the mode and its size in the position.
Generally, there are some components in the system which are programmed to activate or deactivate when the position crosses some thresholds.For instance, it is typically what happens with a safety valve: when the pressure rises above a safety limit, the valves opens.To take into account these automatic control mechanisms, within a mode m the physical variables are restricted to a set Ω m ⊂ R d , which is assumed open.We set E m = {(x, m), x ∈ Ω m }, so that the state space is:

Flow functions
In a given mode m, i.e. a given combination of statuses of components, the evolution of the position is determined by an ordinary differential equation.We denote by φ m x the solution of that equation initiated in x.If we consider a position state Z t at time t, there exists a random time T > 0 such that ∀s ∈ [0, T ), X t+s = φ Mt Xt (s) and M t+s = M t .For an initial state z ∈ E, we can introduce the flow function Φ z with values in E. Regarding the evolution of the trajectory after a state Z t = (X t , M t ), the next states are locally given by the function Φ Zt : In practice an approximation of the function φ m x can be obtained by using a numerical method solving the ordinary differential equations.For instance the PyCATSHOO toolbox can use, among others, the Runge-Kutta methods up to the fourth order.

Jumps
The trajectory of the state can also evolve by jumping.This typically happens because of control mechanisms, failures, repairs, or natural discontinuities in the physical variables.When such a jump is triggered, the current state moves to another one by changing its mode and/or its position.We denote by E the closure of E, and B(E) the Borelian σ-algebra on E. We define T as the time until the next jump after t, such that the next jump occurs at time t + T .The destination of this jump is determined according to a transition kernel K Z t+T − where Z t+T − ∈ E is the departure state of the jump.This kernel is defined by: where B(•) indicates the Borelians of a set.Let ∀z − ∈ E , ν z − be a σ-finite measure on E, such that K z − << ν z − .∀z − ∈ E , we define K z − as the density of K z − with respect to ν z − , so: The kernel density must satisfy K z − (z − ) = 0, so that even if ν z − has a Dirac point in z − , jumping on the departure state is impossible.Note that with this setting, the law of the arrival state of a jump can depend on the departure point z − .For instance, if the physical variables are all continuous, then the reference measure of the transition kernel ν z − could be defined by: In this example, the jump kernel is discrete: and it is generally the case, but one can imagine some cases where the kernel include a continuous part.For instance consider that the physical variables have two dimensions, the first corresponding to pressure on a concrete structure, and the second to the size of a crack in the structure.One can consider that the crack length increase in a jerky way, and that the amplitude of the increase is random and has a continuous law.For a jump triggered from a state z where µ Leb (•) corresponds to the Lebesgue measure, and with K z − (x 1 , x 2 ), m = 0. We think, that the cases of non-discrete jump kernel should be rather rare in the reliability analysis field, but PDMPs are also used in other fields, like finance, where non-discrete jump kernel could be more common and for which the use of importance sampling can be of interest too [26].That is why we keep the most general form of PDMP, which can handle any type of jump kernel.

Jump times
Now, assuming that Z t = z, we present the law of the time until the next jump after t, which is denoted by T .

Jumps at boundaries
For m ∈ M, let ∂Ω m be the boundary of Ω m .The boundary of the set E m is the set ∂E m = {(x, m), x ∈ ∂Ω m }.For z = (x, m) ∈ E, we define t * z = inf{s > 0, Φ z (s) ∈ ∂E m } the time until the flow hits the boundary.We take the convention t * z = +∞ if {s > 0, Φ z (s) / ∈ E m } = ∅.Assume that the system starts in state z = (x, m) ∈ E. When the flow leads the position out of its restricted set Ω m , i.e. the state touches ∂E m , an automatic jump is triggered (see the scheme in 1), and T = t * z .Boundaries can be used to model automatic control mechanisms, or any automatic change in the status of a component.For instance in a dam, if the water level X reaches a given threshold x max the evacuation valve automatically opens to avoid overflow.If M = C, O, F represent respectively the modes where the valve is closed, or opened, or failed, this control system could be modeled by setting Ω C = (0, x max ) and

Spontaneous jumps
The trajectory can also jump to another state when a random failure or a repair occurs (see Fig. 2).The distribution of the random time at which it happens is usually modeled through a state-related intensity function λ : E → R + .For z ∈ E, λ(z) represents the instantaneous probability (also called hazard rate) of having a failure or a repair at state z.If Z t = z and T is the duration until the next jump, ∀s < T we have Z t+s = Φ z (s).To simplify the notations in the following, we introduce the time-related intensity λ z such that λ z (s) = λ(Φ z (s)) and Λ z (s) = s 0 λ Φ z (u) du.If P z (•) is the probability of an event knowing Z t = z, we have: (2.9) 12 , and t * z = 4.
The law of T has a continuous and a discrete part (see Fig. 3).As there is a discontinuity at t * z in the cumulative distribution function of T , its reference measure T must include a Dirac point at t * z and therefore depends on z.We denote µ z the reference measure of T such that: This measure will be useful to define the dominant measure ζ in Section 3. It also allows to reformulate the law of T with an integral form: (2.11)

A link between jump rate and the hazard rates of the possible transitions
Note that the equation (2.9), or (2.11), gives the time of the next jump, but it does not tell whether the transition is a failure, or a repair, or an automatic control mechanism.The type of the transition triggered is determined by the transition Kernel K Z − t+T .For each jump, we consider that there can be a countable number of possible transitions.Each type of transition is indexed by a number in the countable set J. When Z − t+T ∈ ∂E, K Z − t+T can take an arbitrary form, but when Z − t+T ∈ E, the density of the kernel K Z − t+T is linked to the hazard rates of the possible transitions, as shown in the following.If a transition is indexed by j ∈ J, we denote by T j the time between t and the next occurrence of this transition, taking by convention T j = +∞ if the transition does not occur.This way the time of the next jump satisfies: Let λ j : E → R + be its associated state-related intensity function, such that: For instance, if the transition j corresponds to a failure the function λ j is the associated failure rate, and respectively if the transition j corresponds to a repair, the function λ j is the associated repair rate.Knowing Z t = z t = (x, m), and therefore, knowing the path given by φ m x that the positions are following, we make the assumption that the times T j are independent.This assumption is true if the position gathers all the variables affecting the different types of transitions when the system is in mode m.According to the equation (2.12), this conditional independence implies that: Note the equations (2.14) is only valid when the departure state z − is not on a boundary.We denote by B j z − the possible arrival states of a jump when the transition j is triggered and when the departure state is z − ∈ E.
We assume that the different types of transition are exclusive, meaning that for i = j we have Then the probability of triggering the transition i from the departure state z − is K z − (B i z − ), and we have: Similarly the equation (2.15) is also valid only when the departure state z − is not on a boundary.When z − ∈ ∂E there is no link between λ and K z − .

Generate a trajectory
In order to generate a realization of the PDMP, one can follow these steps [11,12,15]: 1. Start at t = 0 with state Z t = z t 2. Generate T the time until the next jump using (2.9) or (2.11), and (2.14) 3. Follow the flow Φ until T using (2.2) 4. Generate Z t+T = z t+T the arrival state of the jump knowing the departure state is Z t+T − = Φ z (T ) using (2.4) 5. Taking t := t + T , repeat steps 1 to 5 until a trajectory of size t f is obtained

Example
As an example of a system, we consider a room heated by three identical heaters.X t represents the temperature of the room at time t.x e is the exterior temperature.β 1 is the rate of the heat transition with the exterior.β 2 is the heating power of each heater.The differential equation giving the evolution of the position (i.e. the temperature of the room) has the following form: The heaters are programmed to maintain the temperature within an interval (x min , x max ) where x e < 0 < x min .Heaters can be on, off, or out-of-order, so M = {ON, OF F, F } 3 .We consider that the three heaters are in passive redundancy in the sense that: when X ≤ x min the second heater activates only if the first one is failed, and the third one activates only if the two other heaters are failed.When a repair of a heater occurs, if X ≤ x min and all other heaters are failed the heater status is set to ON , else the heater status is set to OF F .To handle the programming of the heaters, we set Ω m = (−∞, x max ) when all the heaters are failed m = (F, F, F ) or when at least one is activated, otherwise we set Ω m = (x min , x max ).
Due to the continuity of the temperature, the reference measure for the kernel is ∀B ∈ B(E), ν (x,m) (B) = For the spontaneous jumps that happen outside boundaries, we consider the position is not modified during the jumps and, if the transition j corresponds to the failure of a heater, then, for z − = (x − , m − m) ∈ E, λ j (z − ) = 0.0021 + 0.00015 × x − and, if the transition j corresponds to a repair, then, for z − = (x − , m − m) ∈ E, λ j (z − ) = 0.2.A possible trajectory of the state of this system is depicted in Figure 4.Here the system failure occurs when the temperature of the room falls below zero, so D = {(x, m) ∈ E, x < 0}.

A reference measure for trajectories
We have seen in Section 2.3 that when the position is restricted to a bounded set in some modes, the time to the next jump can be a hybrid random variable.We have to be cautious when considering the density of a trajectory of a PDMP for several reasons: first the reference measure for the times between the jumps is a mixture of Dirac and Lebesgue measures, secondly these hybrid jumps may occur multiple times and in a nested way in the law of the trajectory of PDMP.Indeed, with these mixtures of Dirac and Lebesgue measures involved, the existence of a sigma-finite reference measure on the trajectory space is not obvious, yet it is mandatory to properly define the density of a trajectory.The existence of a reference measure is therefore crucial, because it preconditions the existence of the likelihood ratio needed to apply the importance sampling method.
We begin this Section by introducing a few notations: For a trajectory Z on the observation interval [0, t f ), we denote by N the number of jumps before t f , and by S k the time of the k-th jump with the conventions S 0 = 0, and S N +1 = t f .∀k < N, T k = S k+1 − S k is the duration between two jumps and T N = S N +1 − S N = t f − S N is the remaining duration between the last jump and t f .One can easily verify that the sequence of the (Z S k , S k+1 − S k ) is a Markov chain: it is called the embedded Markov chain of the PDMP [11], see Figure 5.

The law of the trajectories
The main idea in building the law of the trajectory Z is to summarize the trajectory by the truncated embedded Markov chain of the process: the vector Z S0 , T 0 , . . ., Z S N , T N .This vector is also called the skeleton of the trajectory.As the trajectory is piecewise deterministic, we only need to keep the states of the arrivals of the jumps and the durations between the jumps to describe the trajectory.If we have the vector Z S k , T k k≤N then we have enough information to reconstruct the trajectory using (2.2) because we know the flow function Φ.Noting Θ the map that changes Z into Z S k , T k k≤N , the law of Z can be defined as the image law of Z S k , T k k≤N through Θ.We denote by E the set of the trajectories defined on [0, , where Using the Markov structure of the sequence Z S k , T k k≤N , the law of Z S k , T k k≤N can be expressed as an integral of the product of the conditional densities given by (3.1) and (3.2).We define the σ-algebra S on the set of the possible values of Z S k , T k k≤N as the σ-algebra generated by the sets in Definition 3.1.The law of the trajectory is then defined as follows, for B ∈ S where z − j = Φ zj−1 (t j−1 ), and Note that, depending on the set B, n can take different values in the equation (3.3).Implicitly, the equation (3.3), states that: Also note that with our construction, this is a probability law on the space of the trajectories that satisfy (2.2), not on the set of all the trajectories with values in E.

3.2.
The dominant measure and the density Definition 3.2.We define the measure ζ so that where n is the number of jumps in the trajectory z.
The proof of Theorem 3.3 is given in Appendix A. Note that it is always possible to choose the measures ν z − so they are all bounded by the same constant.Indeed the transition kernel is itself bounded by 1, as it is a probability measure.So, to get a measure ζ that is σ-finite, we can simply take the measures ν equal to the transition kernel, so the densities can be properly defined when the observation time t f is finite.

Admissible importance processes
Recall that an admissible importance process is any process whose law is absolutely continuous with respect to ζ (condition C2), and which has a density g with respect to ζ satisfying ∀ z ∈ D, f (z) = 0 ⇒ g(z) = 0 (condition C3).In this Section, we clarify the previous statement, and we identify to which extent we can modify the original process to obtain an admissible importance process.Throughout the rest of paper we denote the elements relative to this importance process with a , except for its density that is denoted by g.
Our first remark is that condition C2 implies that the realizations of the importance process must satisfy equation (2.2).Indeed, the measure ζ involves the transformation Θ which uses the equation (2.2) to rebuild a trajectory from a skeleton.Consequently, the importance process has to piecewisely follow the same flows as the original process.Similarly to the original process the importance process jumps to a new state for each change of flow.To ensure condition C2, the law of the T k has to be dominated by µ Z .This means that the boundaries of the Ω m 's and the set of the possible arrivals of a jump remain unchanged.So the modification of the original process focuses on the timing and nature of changes of modes, i.e. the laws of the jumps.
To generate an importance process, we keep generating trajectories by successively generating the arrival state of a jump (Z S k ) and the time until the next jump (T k ).As there is no requirement for the importance process to be Markovian, we consider that the law of a point of the trajectory Z t depends on the past values of states.As the states follow the flows piecewisely, it is equivalent to say that the law of Z S k can depend on Z S i , T i i<k , and that the law of T k can depend on Z S i , T i i<k and Z S k .For a jump time S k , we denote , and we denote by λ z k (•) the intensity function associated to T k when Z S k = z k .We have: Notice that the intensity function λ zs,s in equation (3.7) does not have to be of the form λ • φ zs , where λ is a positive function on E. This means that at the time S k + t, the intensity does not depend only on the state Z S k +t as it would be the case if Z were a PDMP.So, in the importance process, we consider that the intensity can depend on the arrival state of the last jump and on previous pairs (Z S i , T i ).Therefore the importance process can be seen as a piecewise deterministic process (PDP) which is not necessarily Markovian.
For condition C3 to be satisfied almost everywhere we can impose that almost everywhere for any z k ∈ E, and z − k ∈ E , and t ∈ (0, t * z k ]: Unfortunately with complex systems, the set D can be very hard to manipulate, and we do not always know if So in practice we often only use the following sufficient condition which states that for almost any z k ∈ E, and z − k ∈ E , and t ∈ (0, t * z k ]:

Practical importance processes and notations
We will see in Section 4.2 that we can restrict the search of an efficient importance process within a special class of processes without any loss in efficiency, because an optimal importance process (giving an estimator with zero variance) belongs to this special class.
The processes of this class are defined through the expressions (3.So, to ease the presentation of such jump rates and transition kernels, we slightly modify the state space by adding an active boundary at the boundary of D and we add a coordinate on the mode which indicates if the trajectory has already visited D. The state now becomes Z = X, (M, M D ) where M D = 0 if D has not been visited, and 1 if it has.This way, for any time t we have Z t = (X t , (M t , 1 τ D ≤t )).For instance, with the heatedroom system the set of modes becomes M = {ON, OF F, F } 3 × {0, 1}.The kernel K Z − is unchanged when M − D = M + D , and is null when M − D = M + D , except at the boundary of D where K (0,(F,F,F,0)) 0, (F, F, F, 1) = 1.The three variables that determine the jump rates and kernels of the processes of the special class can now be identified by the current state and the current time.Therefore, we now consider importance processes with jump rate λ z k ,s k (t) and transition kernel K z − k ,s k .Such processes have the following laws of jump times and jump arrivals: Note that the class of processes that can be defined by (4.1) and (4.2) is included in the class of admissible importance processes.Thanks to the new definition of the states, the conditional expectations . This makes it possible to introduce the following important definitions: Definition 4.1.Let U * be the function defined on E × R + by: Definition 4.2.Let U − be the function defined on E × R + by: The quantity U * (z, s) measures the chances of having a system failure before t f knowing the system is in state z at time s, and the quantity U − (z − , s) the chances of having a system failure before t f knowing the system is jumping from the state z − at time s.These quantities play an important role in the latter.

A way to build an optimal importance process
In the importance process, generating the trajectories jump by jump by using (4.1) and (4.2) is not restrictive in term of efficiency, as proved by the following theorem: Theorem 4.3.For all z ∈ E, z − ∈ E , and s ∈ [0, t f ), the jump densities with respect to µ z such that and the kernels K * z -,s having a density with respect to ν z − which satisfies correspond to the jump densities and the transition kernels of an optimal importance process.Note, these optimal densities do integrate to one as U * z, s = Proof.Assume the trajectory z = Θ (z 0 , t 0 ), . . ., (z n , t n ) has been simulated with (4.5) and (4.6).Then its density g with respect to ζ is: So it verifies: where g * (z) is the density for an estimator with zero variance.
Equations (4.5) and (4.6) serve as a guide to build an importance process: one should try to specify densities as close as possible to these equations so as to get an estimator variance as close as possible to the minimal zero variance.

Observations on the optimal process
As we do not know the explicit forms of U * and U − , the construction of an importance process close to the optimal one is delicate.Nonetheless, the equations (4.5) and (4.6) can give us information on how to build an importance process in practice.In this Section, we investigate the properties of the optimal importance process and of the function U * with the aim of building a good and practical importance process.
For instance, we can get the expression of the jump rate of the optimal process.For the time of the k-th jump, by definition of the jump rate and knowing that (Z S k , S k ) = (z, s), we get: Using some properties of U * and (4.7) we can prove the following theorem: Theorem 4.4.The jump rate of the optimal importance process defined by the densities (4.5) and (4.6) verifies: The proof is provided in Appendix B. Note that the expression (4.8) can be easily interpreted.λ * z,s (u) corresponds to the jump rate at the state Z s+u = Φ z (u).U * Φ z (u), s + u is the probability of generating a failing trajectory if Z s+u = Φ z (u) and if there is no jump at time s + u.U − Φ z (u), s + u is the probability of generating a failing trajectory if there is a jump at time s + u and if the departure state is Z s+u − = Φ z (u).So the ratio U -Φ z (u), s + u U * Φ z (u), s + u is the factor multiplying the probability of generating a failing trajectory when there is a jump at time s + u.The expression indicates that, in order to reach the zero variance, one should increase the original jump rate in the same proportion as a jump would increase the probability of getting a failing trajectory.The Theorem 4.4 is noteworthy, because in practice the law of the jump time is specified through the jump rate.Thus it can be used to specify the laws of the jump times of an importance process, as we will do in Section 4. 4.
Also note that with equations (4.8) and (4.6) indicate that, once the region D has been reached, the optimal process does not differ from the original process.Indeed if τ D is the reaching time of the critical region D, then for s ≥ τ D we have for all states z and z − , U * (z, s) = U − (z − , s) = 1 and so for s ≥ τ D we get K * z − ,s = K z − , and for s + u ≥ τ D we get λ * z,s (u) = λ z (u).As it plays an important role in the expression of the optimal process, we look for more information about the function U * .We first notice that: if τ is a stopping time such that t f > τ > s, then (4.9) Using the equation (4.9) we can show the two following properties: Theorem 4.5.U * is kernel invariant on boundaries: Theorem 4.6.If u → U -Φ z (u), s + u and u → λ z (u) are continuous almost everywhere on [0, t * z ), then almost everywhere U * is differentiable along the flow, with: Theorems 4.5 and 4.6 can in fact be seen as foreward Kolmogorov equations on U * .A complete proof for these two properties is in the Appendix B.

A parametric importance process
In order to find an importance process that gives a good variance reduction, we usually restrict the search within a parametric family of importance densities.Then we rely on optimization routines to find the parameters yielding the best variance reduction.Here, we propose to use a parametric approximation of U * (z, s), and then combine it with equations (4.8) and (4.6) to get the form of the importance kernels and of the importance intensities.If we denote U α (z, s) our approximation of U * (z, s), where the parameter α belongs to the set A param , and we set U - α z -, s = E U α (w, s)K z -(w)dν z -(w), then the corresponding importance intensities and kernels are given by: With these settings and notations, condition (C3) can be expressed as: for any z k ∈ E, and z − k ∈ E , and t ∈ (0, t * z k ].It is therefore satisfied if we take U α positive everywhere for instance.
Here we switch the problem of setting a density g close to g * by finding λ and K , to the problem of finding a surface U α on E × R + close to the surface U * .
Note that this way of building a parametric family of importance processes can be applied to any kind of systems, though the shape of U α may have to be adapted from case to case.Indeed, we expect the shape of U * to depend on the configuration of the system and so does the shape of the U α 's.
We could also have plugged the approximations U α and U − α into (4.5),rather than into (4.8),but the option we have chosen is in fact more convenient and computationally more efficient.With equation (4.8), we pass through the intensity, so the density of the T k 's automatically integrates to 1. Conversely if we pass through equation (4.5), we have to renormalize the density so it integrates to 1 before simulating a realization of the T k .As this renormalization requires to compute an integral, it is less advantageous.

Remarks on the parameter optimization
As mentioned in the introduction, we propose to use the cross-entropy method presented in [13] to select the parameters of the importance density as it was done in [31].In the case of PDMP, it is hard to use the adaptive cross-entropy method presented in [13]: the adaptive cross-entropy requires to have a function O : E → R that orders the trajectories which is hard to specify judiciously.This function must order the states in such a way that there exists a threshold c for which and there exists a sequence of thresholds where In order to run the Cross-Entropy algorithm with relatively low sample sizes at each step (from 100 to 1000), it is good to set the function O so that 20 ≤ P(O(Z)>ci) P(O(Z)>ci+1) ≤ 100 [13].The issue is that we find it hard to specify such a function with PDMP.For this reason we used a simplified version of the CE method considering only one threshold c.
The CE algorithm also requires to minimize an approximation of the Kullback-Leiber divergence D(g α , g * ).We simulate a sample used to compute many approximations of D(g α , g * ).The optimization routine uses the sample to compute some approximations of D(g α , g * ) with different values of α.When the sample contains too many trajectories in D, this approximations can be computationally heavy.Conversely, when the sample does not contain enough trajectories in D the approximations are not accurate enough.So we choose to increase the size of the sample gradually until it contains n CE trajectories in D, n CE being a number fixed by the user.This way the objective function to minimize and its gradient are both a sum over n CE terms, and thus they are not too heavy to compute.The CE algorithm we used is presented in Table 1.
However, to our knowledge, there is no guarantee that the minimization routine used in the cross entropy method converges to a global optimum.Therefore, to avoid falling in a local optimum, one should run several times the cross entropy method with different initial values for the vector of parameters.Note that the parametrization must be chosen carefully: indeed the family of the importance densities must contain densities that are close to the zero-variance density g * (z) = 1 D (z)f (z) p to obtain a good variance reduction, otherwise Table 1.CE algorithm.
Initialization: choose α 0 ∈ A param and n CE ∈ N * and set t = 0, and ε > 0 End: Estimate p using the importance density g αt−1 we could even obtain a variance increase.In order to avoid a variance increase, the parametric family should contain the original density f .Indeed, if we specify in the parametric family that for say α = 0 we have g 0 = f then the parameter optimization should not select a parameter worse than α = 0 and in the worst scenario the variance remains unchanged.This is why we advise that the family of the U α functions includes a constant function, so that the original process with jump rate λ z and transition kernel K z is included in the admissible importance processes.
The initial vector of parameters α 0 has a big influence on the convergence of the method.Ideally, it should be chosen to simulate n CE trajectories in D relatively fast, but, in order to avoid an over-biasing situation with a wrong approximation of the Kullback-Leiber divergence at the first step, we recommend to choose α 0 so that g α0 is as close as possible from f .Testing several values of α 0 is therefore necessary, to get a sense of what is a good α 0 .

Simulation study on a test case
In this section we present how we build an importance process for the heated room system presented in Section 2.7.

A parametric family of importance processes
In the heated-room system, the three heaters are identical and are in parallel redundancy, so we expect the probability U * (z, s) = E 1 D (z)|Z s = z to increase with the number of failed heaters in the state z.Therefore, noting b(z) the number of failed heaters in state z, we start by setting where Q is a function of position and time, and H α is a function on integers.We set H α (0) = 1.As we want U α (z, s) to increase with b(z), H α has to be an increasing function.
If T denotes the time until the next jump after a time s, using (4.9) with τ = s + T we get: As the repair rates are larger than the failure rates by one order of magnitude in practice, when there is at least one failed heater, the probability of arriving in a more degraded state Z T is much lower than the probability of having a repair.This last remark can actually be applied to any reliable industrial system (see for instance [10]).Ideally we would like U α to mimic the property of U * so we would like to have which can be reformulated as: where As a repair is much more likely than failure, if the transition from state (φ m x (u), m) to the state (φ m x (u), m + ) indexes a repair K Φz(u) (φ m x (u), m + ) is larger than if it had indexed a failure.So, (5.4) implies that, when b(z) > 1, the value of H α (b(z)) is closer from H α (b(z) − 1) than from H α (b(z) + 1).As H α was supposed increasing, it must be convex.So we propose that 2 ], with α 1 > 0. If, from a Z s = Φ z (u), the transition j corresponds to a failure then we have: and if it corresponds to a repair then we have: The jump rate satisfies: We set the jump kernel such that its density satisfies for u ∈ [0, t * z ): Note that plugging U α into the equations (4.5) and (4.6) imposes some kind of symmetry in the biasing of failure and repair rates.It is especially visible in equations (5.5) and (5.6):On the one hand the failure rate associated to the transition from a state z − to z + is multiplied by a factor exp α 1 2b(z − ) + 1 ], and on the other hand the repair rate corresponding to the reversed transition (from state z + to state z − ) is divided by a factor exp α 1 2b(z − ) − 1 ].The equations (4.5) and (4.6) not only imply that the failures should be enhanced and the repairs inhibited, but it also states that the magnitudes of the distortion should be reciprocal.The square in H α 's formula was introduced to strengthen the failure rates when the number of broken heaters gets larger.The idea was to shorten the duration where several heaters are simultaneously failed in the simulated trajectories.Indeed, as repair is faster than failure, the shorter are the durations with a failed heater the more likely is the trajectory.Increasing the failure rates with the number of broken heaters is a mean to simulate more trajectories in D while maintaining the natural proportion between the likelihoods of the trajectories, which should decrease the variance.
As the failure on demand was likely to play an important role in the system failure, we choose to separate it from spontaneous failure in our parametrisation setting U α ((x min , m), s) = exp[−α 2 b(z) 2 ]H α (x min , s).This allows to better fit U α to U * .Under this assumption, the equation (4.13) implies that for z − = (x min , m), the importance kernel takes this form: . (5.9)

Results
The Monte-Carlo simulations have been carried out using the Python library PyCATSHOO.(The flow functions φ m x were computed using a Runge-Kutta method of order 4 with a discretization step of 0.01.This discretization step is small enough so that reducing it further does not change the estimations.)As the Cross-Entropy method was not yet implemented in PyCATSHOO, we have used a specific Python code for the Cross-Entropy and the importance sampling methods.The system parameters used in the simulation were the following ones: x min = 0.5, x max = 5.5, x e = −1.5, β 1 = 0.1, β 2 = 5, t f = 100.Trajectories were all initiated in the state z 0 = 7.5, (OF F, OF F, OF F ) .The probability of having a system failure before t f was estimated to p = 1.29 × 10 −5 with an intensive Monte-Carlo estimation based on 10 8 runs.
The values of the parameters selected by the cross-entropy method were α 1 0.915 and α 2 1.197, and for the first step, the approximation of the Kullback-Leiber divergence between g * and g α was obtained by simulating from a biased density with parameters (0.5, 0.5).The whole cross-entropy method lasted approximately 9 minutes.Most of the running time was allocated to the optimization within each step of the cross-entropy, because each evaluation of the objective function and of its gradient was costly.In order to optimize the running time of the cross-entropy method, the size of the sample used for the approximations of the Kullback-Leiber divergence were set by simulating until we would get n CE = 100 trajectories with a system failure.The number of n CE = 100 roughly guaranties that the two first digit of of the Kullback-Leiber divergences are identified by their approximations.For each of the three steps needed to select the parameters, samples of respectively 1970, 126, 127 trajectories were used.
A comparison between Monte-Carlo and the associated importance sampling estimates is presented in Table 2, where we display the number N sim of simulations used for each method, the estimates p of the probability, the associated empirical variances σ2 /N sim and confidence intervals IC, and the mean time of a simulation t sim in seconds.For 10 6 simulations the results show that the Monte-Carlo estimator has not converged yet, whereas the importance sampling estimate is very accurate.To compare the two methods we estimate the efficiency of their estimators when they have converged.The efficiency is defined by the ratio of the precision and the computational time:  The efficiency can be interpreted as the contribution of a second of computation to the precision of the estimator.We estimate it by ef f = 1 σ2 tsim .The results indicate that our importance sampling strategy is approximately 7 000 times more efficient than a Monte-Carlo method.
We also verify that the importance sampling estimations are asymptotically normally distributed.The asymptotic normality was not observed for N = 10 3 , but it was observed for larger sample sizes.For instance for N = 10 4 , Figure 6 shows a normalized histogram on 100 estimations pIS that matches the normal density with mean p and with the standard deviation of the 100 estimations.We also recorded the weights of the failing trajectories in the sample of one run of the IS method with N = 10 4 .Figure 7 shows that the weights are close to the value p, suggesting that the importance density is close to the optimal density.Figure 8 is a zoom-in on the largest weight: It shows there is no degenerated preponderant weight such that p, suggesting there is no sign of under-favored region of D in g α .Here we do not need to check the weight degeneracy in all parts of D because, as we now the value of p the can simply check the estimation are unbiased and normally distributed to ensure convergence is reached.Finally, in Figures 9 and 10, we present the graphs of two trajectories obtained respectively with the original process with density f and with the importance process selected by the CE method with density g (α1,α2) .

Discussion
Our work shows that importance sampling is applicable to any PDMP with or without boundaries.We have given the expressions of the intensities and the kernels of the optimal importance process, and we have seen that it depends on a critical function U * .These expressions show that the optimal importance process has a    This trajectory was generated with the importance process with density g (α1,α2) .specific structure.Although we do not have a closed form expression of the function U * , these expressions are important for two reasons: (1) they prove the existence of an optimal bias, which ensures that the importance sampling technique can be very efficient on PDMPs.(2) They can guide the practical design of an efficient and explicit importance process.Indeed, by replacing U * by an approximation in the optimal expressions of the transition rates and kernels, we preserve the structure of the optimal importance process.The presented method therefore helps designing an importance process having the same behavior as the optimal one, and it showed good efficiency on our case study.
This biasing strategy can be applied to any system, but the parametric shape of the approximation of U * may have to be adapted from case to case.The parametric shape presented in this article is suited to any system with similar components in terms of failure rates and repair rates and containing one minimal cut set (A minimal cut set being a group of components that need to fail so that the system can fail).For a system with a different configuration, we expect the shape of the function U * will differ, and the method may require a different parametric approximation for the function U * .
Our approach through the function U * can be applied to any sub-classes of PDMP, like, for instance, Markov chains [18], or continuous time Markov Chain, or queing models.In the particular case of PDMP that is a continuous time Markov Chain, the definition of the function U * is close to the forward committor function used in the transition path theory [23].In the case of a general PDMP, a committor function would be a function (z, s) → E 1 D A (Z)|Z s = z where D A is the set of trajectories that pass through D without passing through a set A ⊂ E first.U * is therefore a commitor function for which A = ∅.It is also interesting to note that, in the Adaptive Multilevel Splitting algorithm, the asymptotic variance is minimized when using the committor function as the score function [1,6], similarly, in the interacting particles system method [14], the function U * also plays a role in the optimal potential function method [9].Approaching the function U * , allows to efficiently estimate rare event for importance sampling, but also for Adaptive Multilevel Splitting algorithm and the interacting particles system method.A method that allows to approximate this function would lead to significant improvement in the reliability assessment field.
We proposed to find a good approximation of U * searching inside a family of parametric functions (U α ) α∈Aparam .In our application on the heated room system, we used the Cross-Entropy method to select an efficient parameter α.We have noticed that the Cross-Entropy method tends to diverge quickly if it is not well initialized: The choices of α 0 and n CE are critical for the convergence of the method.These two parameters must be well tuned because they impact the quality of the first approximations of the Kullback-Leiber divergence within the CE algorithm, and these approximations must be accurate enough to launch the optimization routine on a good track.To choose a high value for n CE is a way to insure that these first approximations are accurate enough, but it is not worth considering in practice, as it greatly slows down the CE algorithm.The only solution is to find right away an α 0 which yields correct approximations of the Kullback-Leiber divergence.This is unfortunately difficult to do, and may be even harder with more complex systems.We believe that the CE method used in this article must be improved or substituted by an other parameter optimization method so that the initialization gets less critical.
With the IS method, depending on the importance process chosen, we can observe some weight degeneracy and therefore slow convergence.Weight degeneracy typically happens when two conditions are met: (1) there is a domain D 1 ⊂ D such that the likelihood ratios within this domain are very big compared to the likelihood ratios in other domains of D, meaning the domain D 1 is under-favored by the importance density g compared to f and (2) though it is unlikely, few realizations of the importance process Z i are drawn in D 1 , which creates unbalanced weights.Some methods allow to reduce the risk of weight degeneracy by using resampling schemes like for instance in the interacting particle system (IPS) method [14], but, even though it is reduced, the risk of weight degeneracy still remains within the IPS method.The IPS method takes in input some potential functions G k (also called score functions).If these functions does not favor the domain D 1 , the convergence is slowed down [9], and we can end up with the same situation.In this method the weights of the simulation outputs are the inverse of the product of resampling weights multiplied by the objective function's evaluations (see equations 2.18 in [14]).The degenerate weights would therefore appear each time some trajectories in D 1 are selected by the resamplings, eventhough the resamplings make it unlikely.Weight degeneracy does not depend on the method used but it rather depend on the choice of the importance process or on the choice the potential functions.Weight degeneracy is that it is the symptom of a slow convergence, therefore the sample size should be increased until the weight degeneracy fades out: the weight degeneracy is a tool to select the sample size for both methods.For a fixed sample size, if the sample contains some trajectories in D 1 , the weight degeneracy can be a criterion to reject the importance density, or the potential function, used.But this last criterion is valid only if the sample contains observations in the under-favored domains in D, which is unlikely by definition.One important point to stress out, is that witnessing no weight degeneracy within the simulations outputs does not guarantee the convergence, we can consider we have converge if the sample size is reasonably large and that we do not witness weight degeneracy in all part of D.
When choosing the importance process, there is a risk of over-biasing.Over-biasing corresponds to the situation where a domain D 2 ⊂ D is over-favored by the importance process, resulting in an under-favoring of an other domains D 1 ⊂ D. In this situation a weight degeneracy exist in D 1 but it is not witnessed because no trajectory within the sample is drawn in D 1 .This situation happens when one type of failing trajectories is over represented in the importance distribution comparatively to other types of failing trajectories.This phenomenon can result in underestimating the probability of the system failure and in underestimating the variance.To avoid it, we must satisfy two points: (1) We must design a parametric importance density that can increase the likelihoods of each type of failing trajectories separately.(2) We need to initiate the Cross-Entropy method with a sample of trajectories that contains all types of failing trajectories.It is therefore preferable to apply this method only on systems of reasonable complexity, for which it is possible to determine the different types of failing trajectories.The parametric functions (U α ) α∈Aparam should be flexible enough to satisfy the two previous points, but one should pay attention to keep the dimension of the vector of parameter α reasonably small, so that we avoid a prohibitive computational effort during the optimization routines in the CE.

Conclusion
We have presented a model for multi-component systems based on PDMPs.In order to speed up reliability assessment on such systems, we have adapted the importance sampling method to trajectories of PDMP.We have given a dominant measure for PDMP trajectories, allowing to properly define the likelihood ratio needed to apply the importance sampling method on such processes.The possible kinds of importance processes were discussed, and the optimal biasing strategy when simulating jump by jump was exhibited.We developed and tested a biasing strategy for a three-component heated-room system.Our importance sampling method has shown good performance, increasing the efficiency of the estimator by a factor 7 000.U − (Φ z (u), s + u) λ z (u) This last equality allows to transform (4.7) into (4.8).

B.3 Equality (4.11)
Let z ∈ E and s ∈ [0, t f ).Remember that equality (4.11) states that if the functions u → U -Φ z (u), s + u and u → λ z (v) are continuous almost everywhere on [0, t * z ), then almost everywhere Proof.We denote by T the time until the next jump after the trajectory has reached Z s = z.For 0 ≤ h < t * z , we define τ = min(h, T ).

Figure 1 .
Figure 1.A jump at boundary.

Figure 4 .
Figure 4.A possible trajectory of the heated-room system (the mode is represented with colors).
the set of the trajectories including n jumps.The sets (Θ −1 (A n )) n∈N form a partition of E. The sets (A n ) n∈N form a partition of the set of the skeletons.We can get the law of Z S k , T k k≤N , by using the dependencies between its coordinates.Thanks to (2.11) and (2.4) we can get the density of T k knowing Z S k with respect to µ Z S k , and the density of Z S k+1 knowing Z S k , T k with respect to ν Z S − k+1

. 5 )
Note that, like in equation(3.3), in the equation (3.5) n can take different values depending on the set B.Theorem 3.3.If ∃C > 0, ∀z ∈ E , ν z (E) < C and t f < ∞, then ζ is a σ-finite measure.By Radon-Nikodym theorem, the density of a trajectory z = Θ (z 0 , t 0 ), . . ., (z n , t n ) with respect to the measure ζ is

,
and the law of Z S k+1 has to be dominated by ν Z − S k 7) and (3.8) but they do not use all the information contained in z k and z − k .The jump rates λ z k (t) depend only on three variables which are: the current state Z S k +t = Φ z k (t), the time t f − (s k + t) left before t f , and the indicator 1 τ D ≤s k +t which tells if the system failure has already happened.The kernels K z − k+1 depend only on three variables, which are: the current departure state z − k+1 = Φ z k (t k ), the time t f − s k+1 left before t f , and the indicator 1 τ D ≤s k+1 .

Figure 8 .
Figure 8. Allocation of the largest weights in the sample (for N = 10 4 ).

Figure 9 .
Figure 9.A trajectory of the coordinates of the state of the system.This trajectory was generated with the original process with density f .

Figure 10 .
Figure 10.A trajectory of the coordinates of the state of the system.This trajectory was generated with the importance process with density g (α1,α2) .

Table 2 .
Comparison between Monte-Carlo and importance sampling estimations.