Proof of Concept: Doctor: Failure Detection and Notification for NFV
<iframe src="https://www.youtube.com/embed/XdIO6JEuVKg?modestbranding=1&rel=0" width="970" height="546" frameborder="0" scrolling="auto" allowfullscreen></iframe>
Doctor: Failure Detection and Notification for NFV - DOCOMO
The Doctor project in OPNFV is realizing fast detection and notification of resource failures to application managers. Resources include compute, storage and network resources in a resource pool or cloud. The failure detection and notification also includes hypervisors and Virtual Machines (VMs).
Telecom nodes, due to their stringent high availability requirements, often come in an Active-Standby (ACT-SBY) redundant configuration. When virtualized, the manager of such virtualized node application requires fault notification on the ACT node application in order to instantly switch to the SBY node application.
In the Doctor project, we have developed such failure event collection and immediate notification feature in OpenStack. Our solution monitors the resource pool by an external resource monitor. As soon as the monitor detects a failure in any on the resource pool elements, it notifies Ceilometer. Ceilometer, with its modified notification function, instantly notifies the node application manager. The application manager then switches to the SBY node application. This reduces the downtime to nearly zero for such telecom node applications which host thousands of mobile subscribers’ connections.
In OpenStack, which was not originally designed to notify an external entity about failures in the resource pool, without our solution, the delay to notify the application manager is in the order of several minutes. Given such long notification delay, in a telecom scenario, thousands of mobile subscribers will be disconnected from their cellular network. These thousands of mobile phones will then try to reconnect to the cellular network simultaneously, imposing extremely high processing load to the subject telecom node applications. In contrast to that, in our solution, we can perform such failure notification within 1 second. In the demo, we will show how well Doctor performs in failure detection and notification, thus being able to meet the high availability requirements of telecom node applications in NFV-based virtualized network systems.
Get in touch
Ryota Mibu, Engineer, NEC