https://github-wiki-see.page/m/corundum/corundum/wiki/RDMA-Development-Roadmap
RDMA Development Roadmap
Support for RDMA in the form of RoCEv2 in Corundum requires the implementation of many different functionalities, and several of these will require architectural changes to Corundum. Therefore, splitting the development into several phases is necessary to streamline the development process and enable incremental development and integration.
Subsystem pages:
- Subsystem: Ingress and egress pipelines
- Subsystem: RDMA header processing
- Subsystem: RDMA memory management
- Subsystem: RDMA QP state management
- Subsystem: RDMA transport
Phase 1
Phase 1 of the development process includes all of the foundational components upon which RDMA support will be built, including all significant architectural changes.
- Variable-length descriptor support (see RFC: Variable length descriptors)
- Unified DMA address space (including on-card DRAM and/or on-package HBM)
- Shared interface datapath (including support for PFC)
- Application section
- Scheduler section
- Ingress and egress pipeline components (self-contained checksum offload and flow steering)
- Generic PCIe interface/Intel device support
- Segmentation offloading (LSO)
Phase 2
Phase 2 of the development process includes initial support for RoCEv2, storing state in on-FPGA memory. Limiting storage to only on-FPGA SRAM limits scalability, but it simplifies the implementation and keeping things self-contained can be useful for performance reasons (no cache misses if there is no cache) or in embedded applications (limited or no DRAM).
- Queue pair state storage (transport state information)
- Protection domain, memory region, and address translation
- RoCEv2 packet parsing and deparsing
- ICRC computation and verification
- RoCEv2 transport implementation
Phase 3
Phase 3 of the development process includes more scalable implementations of the phase 2 components, storing state in on-host or on-card DRAM and caching state in on-FPGA memory. These changes will enable support for a large number of queues (QPs/CQs/SRQs) as well as more and larger memory regions.
- DRAM-backed caching of queue pair state information
- DRAM-backed caching of memory region information
- DRAM-backed caching of address translation information