Huawei launches AI Data Platform to address AI inference bottlenecks
Mar 2, 2026
As AI adoption accelerates and the industry shifts from model training to inference at scale, the transition is exposing bottlenecks in storage infrastructure. At the ongoing MWC Barcelona 2026, Yuan Yuan, President of Huawei Data Storage Product Line, launched the company’s AI Data Platform, designed to address the storage challenges in AI industrialization, particularly around inference.
“Inference is where AI meets the real world,” said Yuan. “It is where models provide services, execute tasks and deliver actual value. That is why inference is the core of AI industrialisation.” He was delivering a speech on AI Data Platform: Bridging Models and Business Value.
From model training to inference at scale
Over the last few years, the industry has focused on training Large Language Models (LLMs), but as deployments move from pilots to real-world deployments, value depends on how the models perform in live environments, and this is possible through inference.
The growing popularity of real-time applications, such as chatbots, which require immediate results are driving the transition from training to inference. Several industries like financial services, healthcare and retail demand smart inference systems, which demand extremely low latency and high reliability. In addition, new-age use cases, such as industrial robots and autonomous driving, require efficient reasoning, which is made possible through inference.
According to McKinsey, inference is likely to become the dominant workload in AI data centers by 2030 and will represent over 50% of all AI compute. This demands the transformation of traditional storage solutions, which are not designed to meet the capacity and scale requirements of inference. For instance, traffic patterns for AI services can surge suddenly and the storage infrastructure should be able to address these needs without impacting performance.
However, inference presents a fundamentally different set of challenges compared to training. While training workloads tolerate batch processing delays, inference demands millisecond-level latency, high concurrency and consistent accuracy, putting strain on storage throughput. Inference must operate with “zero errors, high availability and low latency,” placing unprecedented pressure on storage and data systems.
“Despite rapid AI advancement, most AI models have not been thoroughly integrated into carriers' core services. This is largely due to more focus on training than inference, which is the key to model adoption. Improved data processing is needed to address challenges like AI hallucinations, slow response, and throughput constraints in inference,” said the Huawei press release.
While some providers are focusing on an integrated AI data platforms that bring together compute, storage and orchestration, others are focusing on cloud-native architecture for multi-vendor environments. Huawei’s storage-centric approach is designed to address the performance bottlenecks to make it easier to transition from training to inference.
Three pillars: Knowledge, KV Cache and Memory
Huawei’s AI Data Platform is built around three pillars of knowledge, KV cache and memory, designed to improve inference efficiency, accuracy and continuity. Three pillars are supported by Huawei’s Unified Cache Manager (UCM).
The first pillar, knowledge generation and retrieval, converts multimodal data, including text, images and video, into high-precision knowledge using multimodal lossless parsing and token-level encoding. This ensures that searches are based on factual data, and not hallucinations, with a retrieval accuracy of over 95%.
The second pillar focuses on KV cache, a core mechanism in LLM inference, which impacts the latency and throughput of inference. “Three-level KV cache eliminates repeated computing using query and can achieve 90% lower time to first token,” said Yuan. This is powered by Huawei’s Unified Cache Manager (UCM), which uses historical data to reduce redundant computing. With cache in inference storage, the system doesn’t have to calculate from the beginning, thus ensuring faster results. This is crucial with the emergence of agentic AI, which requires models to respond in real time.
The third pillar addresses memory, enabling AI systems to capture, retain and continuously refine contextual information, allowing them to improve over time. This helps AI systems to become more logical and continuous. Huawei states that this approach can deliver more than 30% higher inference accuracy and up to tenfold improvements in agent task efficiency through PB-scale shared memory architecture.
Significantly, Huawei emphasises the synergy between these elements. While long-term experiential memory can solidify into structured knowledge, frequently accessed knowledge can, in turn, be pre-warmed into KV cache and unified memory and cache integration accelerates retrieval and reasoning. Huawei believes this sets the foundation for a scalable, agent-centric AI.
Rethinking storage for AI era
The platform is designed to support both integrated and decoupled deployment models. The integrated option uses OceanStor A800 to combine three key technologies with compute and applications to ensure improved performance and scalability. On the other hand, the Decoupled model leverages OceanStor Dorado with AI Data Engine to upgrade existing infrastructure while protecting prior investments.
“Huawei will continue to deepen technological innovation and build a strong data foundation for carriers around the world. With the AI Data Platform serving as a bridge, we will transform model capabilities into real business value, accelerate the evolution of intelligent technologies, and work with all parties to embrace the future of intelligent computing,” said Yuan.
As AI scales from experimentation to industrial deployment, the storage systems need to evolve from data depositories to become a more computational and intelligent part of the system. The competitive edge will no longer be defined by model size or the number of GPUs, but by how effectively organizations manage knowledge grounding, memory continuity and inference acceleration.
Email Newsletters
Sign up to receive TelecomTV's top news and videos, plus exclusive subscriber-only content direct to your inbox.
Subscribe