Analysis of an optimal stopping problem for software rejuvenation in a deteriorating job processing system

Significance Statement

Software faults, for example aging-related bugs, are known to cause performance degradation of service systems which are running on software components. Although they cause undesirable impacts to systems, it is quite difficult to remove these bugs completely during software development. This is because their manifestations are only possible after continuous operation of the software on a certain execution environment. Software rejuvenation is a method that has been developed to restore service systems that suffer from software aging, and they do so by rebooting the execution environment. However, restarting the system causes service interruption and introduces additional costs due to data loss, job drops, or the termination of system processes. This calls for the careful planning of software rejuvenation processes.

In a recent paper published in Reliability Engineering and System Safety, Professor Naoto Miyoshi and his PhD student Fumio Machida considered a condition-based software rejuvenation strategy for a job processing system that is deteriorating. They have analytically derived the optimal policy for determining when to trigger the software rejuvenation process in relation to the performance of the job completion time. Their system is modelled as an M/M/1 queue, and they have formulated the software rejuvenation decision as an optimal stopping problem, which is different from conventional studies on software rejuvenation models.

The authors defined the states of the system in relation to the quantity of queued jobs. The transition between the states occurs either when the current job is completed or when a new job arrives. Job arrivals follow a Poisson arrival process while job service times are exponentially distributed with a service rate.

Depending on the quantity of remaining jobs, the system may decide to immediately execute software rejuvenation, or to progress with operation until the completion of the next job. If at a decision point the rejuvenation action is chosen, all the system jobs are dropped, and this incurs a rejuvenation cost which is proportional to the number of dropped jobs. Conversely, when the waiting action is chosen, then the rejuvenation decision is put on hold until the next job is completed and this incurs a cost of delayed job.

In their analysis, the authors considered the case where the cost of delayed job is smaller as compared with the cost of dropped job due to rejuvenation. The decision for software rejuvenation ends with the rejuvenation action. Therefore, the authors formulated their problem as an optimal stopping problem whereby the optimal policy was solved using the optimality equation. The interrelations among infinite states becomes an obstacle to solving the optimality equation, and hence the authors took an analytical approach to overcame this.

By proving the various propositions, Machida and Miyoshi contend that the optimal policy is determined by the relation among the cost of delayed job, the cost of dropped job, and the deteriorating traffic intensity. The authors noted that when the cost of delayed job is less than the cost of dropped job multiplied by one minus the deteriorating traffic intensity, then awaiting action is chosen irrespective of the quantity of queued jobs. However, if the reverse is true then a rejuvenation action should be chosen. The policy derived by the authors provides a reasonable and easy guide for determining the software rejuvenation trigger time for a job processing system that is deteriorating.  

optimal stopping problem for software rejuvenation in a deteriorating job processing system-Advances in Engineering

About The Author

Fumio Machida received the B.S. and M.S. degrees from Tokyo Institute of Technology in 2001 and 2003, respectively.  He is a Ph. D candidate at the Department of Mathematical and Computing Science in Tokyo Institute of Technology.  He is a principal researcher at NEC Laboratories. He was a visiting scholar in the Department of Electrical and Computer Engineering at Duke University in 2010.  He was a recipient of the young scientists’ prize of Japan in 2014.

His research interests include modeling and analysis for system dependability, software aging and rejuvenation, and virtualization of systems and networks.  He is a senior member of the IEEE and the IEEE Computer Society.

About The Author

Naoto Miyoshi received the B. Eng., M. Eng. and Dr. Eng. degrees from Kyoto University in 1989, 1991 and 1997, respectively.  He was an Assistant Professor in the Department of Applied Mathematics and Physics, Kyoto University from June 1994 to October 1998.  In November 1998, he joined the Department of Mathematical and Computing Science, Tokyo Institute of Technology as an Associate Professor.  Since October 2012, he has been a Professor at the same department.

His research interests include theory of stochastic models and its application to analysis of computer and communication systems.  He received the Best Paper Award from the Operations Research Society of Japan (ORSJ) in 2005. He is a fellow of the ORSJ and a member of the IEEE, INFORMS Applied Probability Society, and ISCIE.

Reference

Fumio Machida, Naoto Miyoshi. Analysis of an optimal stopping problem for software rejuvenation in a deteriorating job processing system. Reliability Engineering & System Safety, Volume 168, December 2017, Pages 128-135

Go To Reliability Engineering & System Safety