The Development History of RDMA

RDMA (Remote Direct Memory Access) is a high-speed network interconnection technology. Its core capability is to enable direct cross-device memory access while bypassing the CPU and operating system kernel, which significantly reduces communication latency and CPU consumption. It has become a core supporting technology for high-performance computing (HPC), artificial intelligence, financial high-frequency trading and other scenarios. Over the past two decades, RDMA has evolved from a niche dedicated technology to a mainstream general-purpose solution and achieved domestic technological breakthroughs. Its development can be divided into five key stages.

I. Late 1990s – 2000: Technology Germination

In the 1990s, with the rapid development of high-performance computing, the traditional TCP/IP protocol stack became a major bottleneck due to high data copy latency and excessive CPU occupation. Enterprises including IBM and HP explored CPU-free remote memory access technologies, giving birth to the concept of RDMA. In 1999, the IBTA (InfiniBand Trade Association) was founded. In 2000, the InfiniBand specification was released, taking RDMA as its core feature, which marked the official launch of RDMA technology. At this stage, RDMA relied on the InfiniBand architecture and was only applied to high-end HPC scenarios.

II. 2001 – 2010: Architecture Deepening and Route Expansion

During this period, the InfiniBand architecture was continuously improved, achieving ultra-low latency (≤ 1 microsecond) and high throughput, making it the preferred solution for high-end HPC. However, its dedicated hardware led to high deployment costs and a closed ecosystem. To break the technical monopoly, the industry explored the integration of RDMA and Ethernet. The RDMA Alliance was established in 2003. In 2007, the iWARP technology was standardized, enabling RDMA operation on traditional Ethernet and supporting wide-area network adaptation. Meanwhile, the OFA organization optimized the software ecosystem and launched the OFED protocol stack, lowering the application threshold of RDMA technology.

III. 2011 – 2014: Ethernet Integration and Protocol Finalization

The core breakthrough of this stage was the launch of the RoCE series protocols, realizing the in-depth integration of RDMA and Ethernet. RoCEv1 was released in 2010, operating on the Layer 2 of Ethernet and relying on PFC to ensure lossless transmission, though it could not cross subnets. In 2014, RoCEv2 was upgraded to Layer 3, supporting cross-subnet routing and optimized congestion control, striking a balance between performance and cost, and becoming the mainstream protocol for data centers. At this point, three differentiated technical routes were formed: InfiniBand, RoCE and iWARP.

IV. 2015 – 2025: Scenario Expansion and Ecosystem Improvement

With the rise of cloud computing and artificial intelligence, RDMA expanded from HPC to general data centers, supporting core scenarios such as GPU cluster gradient synchronization and compute-storage separation in intelligent computing centers, as well as financial high-frequency trading and distributed databases. Technically, the bandwidth of InfiniBand was upgraded to 400G and 800G, and RoCEv2 was continuously optimized. Ecologically, Chinese manufacturers such as Huawei and ZTE broke overseas technological monopolies and improved software adaptability. The issued in 2025 further promoted the popularization of RDMA technology.

V. 2025 – Present: Independent Control and Large-Scale Implementation

Previously, core RDMA technologies were monopolized by overseas enterprises. Since 2025, domestic enterprises have accelerated independent R&D of RDMA technologies. In 2026, the Sugon scaleX ten-thousand-card super cluster was launched, adopting independently developed scaleFabric native RDMA technology with fully self-developed core IP and chips, delivering significant advantages in performance and cost. Huawei, ZTE and other enterprises have also made breakthroughs in RoCEv2 and domestic native InfiniBand-like routes. The continuous improvement of the domestic industrial ecosystem has boosted the independent and controllable development of domestic computing power.

VI. Development Summary and Future Trends

The main development line of RDMA is evolving from closed and dedicated to open and general-purpose, from overseas monopoly to independent domestic control, and from single-scenario to multi-scenario integration. In the future, RDMA will achieve continuous performance upgrades, expand to edge computing and other emerging fields, further improve the domestic independent ecosystem, and provide strong support for the construction of Digital China and computing power internet.

VII. Recommended Products for RDMA Networking

Combined with the core advantages of RDMA technology — low latency, high throughput and lossless transmission — and compatible with mainstream RoCEv2 protocol as well as diverse application scenarios such as intelligent computing centers, cloud data centers and high-frequency trading, UNIPOE recommends a full lineup of high-performance switches with excellent compatibility. These products cover end-to-end RDMA networking across 400G core backbone, 100G aggregation and 25G/10G access layers, fully supporting the large-scale deployment of the RDMA ecosystem.

1. AES8001 (32*400G QSFPDD + 2*10GE SFP+)

AES8001 Poster.jpg

Designed for the core backbone layer of high-end intelligent computing centers and supercomputer clusters, this model perfectly adapts to 400G-level high-throughput RDMA transmission requirements. It supports the complete RoCEv2 protocol stack and is equipped with hardware-based PFC and ECN lossless flow control mechanisms, effectively eliminating packet loss and congestion jitter during RDMA transmission. It meets the ultra-high bandwidth and ultra-low latency service demands of AI large model training and supercomputer parallel computing. The 32 high-density 400G QSFPDD ports support RDMA interconnection of large-scale GPU clusters, while 2*10GE SFP+ ports flexibly connect to management networks and low-speed service nodes. Adaptable to the new-generation domestic native RDMA networking architecture, it serves as the core networking device for ten-thousand-card super clusters and large-scale computing centers.

2. AES6201 (32*100G QSFP28)

AES6201 Poster.jpg

Positioned at the aggregation layer of data centers and the core layer of small and medium-sized intelligent computing centers, this is a benchmark device for mainstream 100G RDMA networking. Full wire-speed 100G ports efficiently carry RDMA communication services for distributed databases, cloud computing compute-storage separation, and small and medium-sized AI training clusters, with compatibility for all iWARP and RoCE protocols. The device optimizes RDMA packet forwarding mechanisms and greatly reduces CPU scheduling overhead, conforming to the core feature of RDMA that bypasses the kernel for efficient transmission. Its high-density 100G port design supports horizontal expansion of multi-node server clusters, making it suitable for mid-to-high-end HPC scenarios and financial high-frequency trading data center networking with balanced performance and cost-effectiveness.

3. AES6102/AES6104 (48*25G SFP28 + 8*100G QSFP28)

AES6102 Poster.jpg

Tailored for the access layer of data centers and edge computing node RDMA networking, it adapts to the classic hierarchical architecture of 25G server access and 100G uplink aggregation. 48*25G SFP28 access ports connect in batches to RDMA-enabled server nodes, meeting the low-latency interconnection requirements of massive terminal devices. 8*100G uplink ports achieve non-blocking connection with core-layer 100G/400G devices, ensuring end-to-end lossless RDMA transmission. Supporting visualized BUFFER optimization and intelligent congestion control, it accurately adapts to RDMA services of edge computing power and small and medium-sized distributed storage clusters, solving the pain points of high latency and high packet loss in traditional networks and supporting the large-scale implementation of RDMA in general data centers.

4. AES6006 (48*10G Base-T + 6*100G QSFP28)

AES6006 Poster.jpg

Focusing on flexible access and cost-effective RDMA networking, it is applicable to comprehensive data centers and campus computing centers. 48*10G Base-T electrical ports directly connect to ordinary servers and terminals, realizing RDMA deployment without optical modules and greatly reducing wiring and hardware transformation costs. 6*100G QSFP28 ports provide sufficient uplink bandwidth to support concurrent RDMA communication of multiple terminals. Compatible with full-range RDMA protocols and supporting rapid deployment of lossless networks, it is suitable for cost-sensitive large-scale popular RDMA networking scenarios, promoting the popularization of RDMA technology from high-end scenarios to general commercial scenarios.

I. Late 1990s – 2000: Technology Germination

II. 2001 – 2010: Architecture Deepening and Route Expansion

III. 2011 – 2014: Ethernet Integration and Protocol Finalization

IV. 2015 – 2025: Scenario Expansion and Ecosystem Improvement

V. 2025 – Present: Independent Control and Large-Scale Implementation

VI. Development Summary and Future Trends

VII. Recommended Products for RDMA Networking

1. AES8001 (32*400G QSFPDD + 2*10GE SFP+)

AES8001 Poster.jpg

2. AES6201 (32*100G QSFP28)

AES6201 Poster.jpg

3. AES6102/AES6104 (48*25G SFP28 + 8*100G QSFP28)

AES6102 Poster.jpg

4. AES6006 (48*10G Base-T + 6*100G QSFP28)

AES6006 Poster.jpg

The Development History of RDMA

Want to Learn More?

The Development History of RDMA

Want to Learn More?