Lifting the lid on NFV: this is what’s really happening
How telecoms set out on a journey to specify the next set of network technologies, but ended up completely re-thinking the way
Back in 2012, the circulation of a white paper advocating a virtualization framework for network operators and authored by a tight group of top telco executives caused an instant ripple of excitement. Something like this was by then overdue. People in telecoms had been talking quietly of the wonders of SDN and the potential for virtualisation for several years. And they’d watched as the likes of Google and Amazon Web Services specified hundreds (nay thousands) of white box commodity servers to run at so-called Web scale with huge attendant cost-savings. The software-driven, white box approach was clearly working well for them, but the problem was that it was difficult to see how it might be grafted over to telecoms. Wouldn’t telcos be in danger of throwing the service quality baby out with the high cost-per-bit bathwater?
Most of all, Google, AWS & Co were running relatively closed systems. Easy for them to come up with a new approach, test it, specify the components, roll out a secret trial and then go for it by packing a huge data centre with white box servers.
Telecoms was different. You were dealing with end-to-end services. You had to interwork whatever you came up with, with other telcos' (and your own) networks and vendors’ end-devices. You had to get broad agreement and standards fully agreed before you could move forward. It was difficult and it always took time.
Nevertheless the existential threat was clear. Unless the telecoms industry could get itself onto the same ‘Moore’s curve’ as Google, so that infrastructure commoditization and software-driven automation could drag the cost per network bit down far enough to at least stay in touch with bandwidth prices, then the future looked bleak. Eventually, the Webscalers would take the lot.
So where do we go from here?
The top line objective of ‘network functions virtualisation’ was, as laid-out in the ‘white paper’, not technological but business. It was to significantly reduce the capital and operational costs associated with deploying communications services,” by exploiting virtualization and software to drive higher levels of automation.
In addition, the founders threw in 'agility'. Once everything was being driven by software, it was reasoned, then operators could be much more nimble when it came to developing services. They would be able to differentiate their offerings, appeal to new constituencies and claw back ground they felt they had lost to so-called ‘over the top’ players.
So costs, both operational and capital, plus agility (the ability to change tack and launch new services) were to be the three legged stool upon which the proposition was to turn. And turn it did!
Any new technology development, certainly one for which the term ‘transformational’ is not an exaggeration, will almost immediately gather a host of Yaysayers and, as its implications become clearer, Naysayers.
One of the prime NFV movers, Don Clarke, explained to TelecomTV why the group decided to underhype the implications of its establishment. Don knew that many in the industry would be keen to climb on-board. But he also knew that this would lead to over-expectation which would likely come back to bite the group at a later stage.
Avoiding the 'Peak of Inflated Expectations': why the group kept things low-key
<iframe src="https://www.youtube.com/embed/XWM4f_a5KzI?modestbranding=1&rel=0" width="970" height="546" frameborder="0" scrolling="auto" allowfullscreen></iframe>
It’s worth remembering at this point, just now much attention the NFV idea attracted almost immediately. According to Don Clark, ETSI had originally ‘dimensioned’ the ISG (industry specification group formed to orchestrate the agenda outlined in the white paper) to be 20 or so companies. Instead the group grew from seven founding operators to 37 operators, 245 member companies and 1245 delegates subscribed to the mailing list, all in less than a year (there are now more than 290 companies).
The predominance of vendors numbers has lead to a mistaken view, held in some quarters, that NFV is a vendor club of the old school - coming up with new classes of product to be championed and sold to telcos.
In fact the reverse is truer. Telcos very much set and have maintained the pace, but what has emerged isn’t one side leading the other, but a new set of relationships built around joint endeavour. In an echo of authors of the software which runs the Internet might have it… you had to put the code first.
This is one important aspect of what NFV has developed into - a different way of advancing and refining the underlying technology where the old demarcation lines between what’s understood to be the roles of vendors and operators have been, if not obliterated, certainly redefined.
Breaking things into bits
To get NVF to work properly it pretty soon became apparent that there had to be a ‘decomposition’ of network functions. This is where you take what might currently be a large chunk of code, lovingly stitched together and running something important (like the IP Multimedia Subsystem - IMS) and break it up into smaller modules, each with its own API and each acting as a small but perfectly formed program in its own right. Then you put a string of microservices together to form your network services.
There are several reasons why this ‘decomposing’,’modularising’ or going ‘cloud native’ as this is also called, is a very good idea.
How cloud native is different from ‘trad’ vendor approaches
<iframe src="https://www.youtube.com/embed/HdyM-ZyEv1w?modestbranding=1&rel=0" width="970" height="546" frameborder="0" scrolling="auto" allowfullscreen></iframe>
Reuse of code is just a good idea in its own right. Instead of starting from scratch for different applications, you get to reuse modules that you’ve already tested and installed. Re-use also grants extra agility since it becomes feasible to easily re-use existing modules to make up new services (with just a module change here and there). The promise is that new service introductions which might before have taken months in ‘waterfall development’ time (writing, testing, validating etc) could theoretically be implemented in weeks or even days.
Protecting legacy infrastructure
But decomposition hasn’t proved to be an unalloyed good for all the players. Vendors and many service providers, have always pointed out that whatever happens with NFV, neither side can simply write off the huge investment already made in legacy infrastructure.
So an essential part of the ‘art’ of NFV integration must be about getting old and new infrastructure interconnected and working side by side. But that is far easier said than done. To engineer the desired resiliency in transforming a software base into VNFs might mean reworking the legacy software - a time-consuming and expensive task.
Companies may have spent many years, perhaps more than a decade, writing their particular pieces of software and embedding all the features that their customers wanted. To start from scratch and write it all in small modules, as would not only be hugely costly, but might take years to accomplish. Years that may not be available.
So an initial NFV approach was to copy what had happened in the IT environment by using a ‘hypervisor’ approach - essentially taking the shortcut of virtualizing the code’s original server and running the original application within that.
This approach might have worked in the early days of IT virtualisation when people were consolidating all those old servers and applications. But today it can only be bought at the cost of low speed and an inability to cope gracefully with infrastructure failures, NFV advocates claim.
As Patrick points out, once it became apparent in the industry that NFV was the destination, network equipment vendors had two pressing objectives. They needed to protect their existing investments while at the same time moving on to support SDN and NFV as much as they could. For the most part the existing vendors had monolithic code bases (big chunks of code that ran a black box). The idea of starting from scratch and rewriting all their code in modules didn’t appeal greatly - after all, they were also promised in the white paper that part of NFV’s ‘win win’ was that existing vendors would be able to keep the code that they’d spent years developing.
But as time and experimentation ground on, it became apparent that monolithic code and decomposed modules just didn’t mix properly.
Interoperability deemed absolutely key.
On their NFV journey telcos wanted, most of all, to avoid the vendor lock-in which had bedeviled them for several decades. So it was made clear to the vendors that they had to work together and ‘play nice’ with each other. Interoperability would be the major marker of success for them.
<iframe src="https://www.youtube.com/embed/rElEZAshwKM?modestbranding=1&rel=0" width="970" height="546" frameborder="0" scrolling="auto" allowfullscreen></iframe>
So, where are we now?
There has been much working together, recently through open source groups and by all reports huge progress is being made. But the actual degree of transformation has broadened and the number of open source groups and the amount of collaboration that they make necessary, has grown as well.
It’s all taking time and that’s lead to an upsurge in Naysaying. Early optimistic estimates of how long it might take before we saw the first major NFV rollouts have been just that… optimistic While there’s no doubt that the development and large scale adoption of NFV by network operators is taking longer than its more optimistic proponents had hoped for, it’s also the case that the more realistic proponents - they being the original seven authors and founding members of the ETSI White Paper on NFV published back in October 2012 - were fully aware that what they were proposing wasn’t just a new technology generation designed to help do the old things better. But a once in a lifetime switch of approach. As such they were cagey about how long it might take. It would ‘probably’ they said - nobody could be sure - result in a difficult period of transition with difficult moments on the way.
And it has.
What impediments are we now facing?
For several of the last five years there were persistent doubts about NFC's ability to generate the crucial “5 nines” of reliability in the network. The concern cropped up over and again in our interviews and never seemed to resolve itself - in fact the crux of the problem seemed to relate to disaggregation or rather the lack of it. Here’s the reason monolithic code can’t cut it.
One of the big advantages of ‘cloud native’ is the speed with which any one of the small modules can (to borrow a PC metaphor) switch itself off and then back on again in the unlikely event that it falls over. A big old chunk of legacy code, on the other hand, running as a virtual application on a bare metal server, takes considerably longer to get itself back into place should something go wrong .
It turns out, therefore, that cloud native software made up of microservices running in containers and properly distributed across the cloud can back-up and self-repair so fast that service is resumed before the rest of system is even aware that something has happened.
According to Martin Taylor, CTO of Metaswitch, “if NFV is designed and distributed properly, no single fault of any kind, at any layer in the stack, including the loss of an entire data centre, will stop the service from running.”
<iframe src="https://www.youtube.com/embed/0aje-THBBP4?modestbranding=1&rel=0" width="970" height="546" frameborder="0" scrolling="auto" allowfullscreen></iframe>
<iframe src="https://www.youtube.com/embed/ZCoOUIDRHQ0?modestbranding=1&rel=0" width="970" height="546" frameborder="0" scrolling="auto" allowfullscreen></iframe>