Introduction
the advancement of the East-West Computing project has brought new development situations to data centers and infrastructure. The traditional data center architecture with integrated computing and storage is facing multiple challenges. The new computing and storage separation is outstanding due to its flexible architecture, refined resource utilization, and green and low-carbon energy consumption ratio. Face up to opportunities and challenges.
Text
the continuous development of digitalization is an important driving force for the progress of IT infrastructure such as computing and storage. The cloud and internet industries have built the largest IT infrastructure platform in China, with the largest proportion of data stored and processed. It is estimated that by 2025, China will have the computing power of 300 EFLOPS and the data volume will reach 48.6 ZB[1]. At the same time, the continuous advancement of the East-West computing project in our country puts forward higher requirements for the data center to be green, intensive, autonomous and controllable.
The traditional big data storage solution of computing and storage is represented by a server-based hyper-fusion system, which manages server resources in a unified manner. However, when computing and storage requirements vary, there are some problems such as inflexible expansion and low utilization. Separation of storage and computing divides storage and computing resources into independent modules, which has significant advantages in efficient sharing of storage resources. Currently, it has been applied in multiple scenarios, it can provide storage systems with the advantages of data sharing and flexible scaling.
The storage domains of the cloud and the internet mainly adopt the integration method of deploying distributed storage services by servers. It faces the following challenges:
1. The data storage period does not match the server update period.. Large amounts of data for emerging businesses must be stored according to their lifecycle policies (for example, 8 to 10 years). The replacement cycle of the server-based storage system is determined by the upgrade cycle of the processor (for example, 3 to 5 years)[2] decision. The huge difference between the two results in a large waste of system resources and an increased risk of data migration loss. For example, server components in the storage domain are eliminated with CPU upgrades. Therefore, data migration is required.[3].
2. Reliable performance and difficult resource utilization. Distributed storage systems can be divided into performance and capacity types. Performance-based storage runs key services such as databases. It usually uses three replicas and is combined with an independent redundant disk array. However, only 30% of the space utilization is used, which greatly wastes storage resources. Capacity-based systems use EC to improve the utilization rate. EC consumes a lot of system resources during calculation, and the reconstruction efficiency is low, bringing risks (as shown in Figure 1).
Figure 1 distributed storage resource utilization
3. Simple and efficient shared storage requirements for new distributed applications. New distributed applications, represented by serverless Applications, are constantly emerging. Applications are expanding from stateless to stateful, and data sharing access is increasing. At the same time, applications such as artificial intelligence require a large number of heterogeneous computing power collaboration, resulting in the demand for shared memory access. They pay more attention to high bandwidth and low latency, and only need lightweight shared storage without complex enterprise characteristics.
4. Data Center tax leads to low efficiency of data-intensive applications. The & ldquo; Data center tax & rdquo;(datacenter tax) paid by CPU-centric server architectures and applications to obtain data is increasing. For example, if the CPU is used to store IO requests, 30% of the computing power is required.[4].
To sum up, cloud and internet storage needs to take into account the requirements of resource utilization and reliability, and build a new storage and computing separation architecture based on new software and hardware technologies.
Facing the challenges of data centers in terms of capacity utilization and storage efficiency, the rapid development of dedicated data processors and new networks has provided a technical basis for the reconstruction of data center infrastructure.
First, to replace the local disk of the server, many manufacturers have introduced EBOF high-performance disk frames. Focus on adopting new data access standards such as NoF to provide high-performance storage.
Secondly, more and more DPU and IPU chips have emerged in the industry to replace general-purpose processors, improving the energy efficiency ratio of computing power. At the same time, network memory collaboration based on programmable switches is also a research hotspot, such as NetCache[5], KV-Direct[6] and so on.
Finally, the data access network standards are also continuously enhanced. For example, the CXL protocol has enhanced the characteristics of memory pooling.
With the development of new hardware technologies such as RDMA, CXL, and NVMe SSD, a new storage and computing separation architecture needs to be built to ensure that cloud and internet storage domain services can meet the requirements of resource utilization and reliability. Compared with the traditional architecture, the new architecture has two differences: first, complete decoupling of memory and computing to form a separate hardware resource pool; Second, fine-grained division of work, make tasks that CPU is not good at Data processing be replaced by dedicated accelerators to achieve the optimal combination of energy efficiency ratio (as shown on the right side of figure 2).
Figure 2 Comparison between traditional memory and computing separation architecture and new memory and computing separation architecture
the new architecture has the following features:
1.Diskless server. The new architecture of memory and computing separation extends the local disk of the server to form a Diskless server and a remote storage pool. It also expands the local memory through a remote memory pool, realizing the real memory and computing decoupling, greatly improves storage resource utilization and reduces data migration.
2. Diversified network protocols. The network protocol between computing and storage is extended from the current IP or Fiber Channel Protocol to the CXL NoF IP protocol combination. CXL reduces network latency to sub-micro-second level, enabling memory-based media pooling; NoF accelerates SSD pooling. The high-throughput network constructed by the combination of these protocols meets various pooling access requirements.
3. Dedicated data processor. The general-purpose processor is not responsible for data storage and is uninstalled to the dedicated data processor. In addition, specific data operations such as code correction and deletion can be further accelerated by a dedicated hardware accelerator.
4. Storage system with extremely high memory density. Separated storage system is an important component of the new architecture. As the base of persistent data, it integrates the current system, two-level disk space management, which reduces the proportion of redundant resource overhead by using a large scale code correction algorithm. In addition, the scenario-based data reduction technology based on chip acceleration can provide more data available space.
The new storage and computing separation architecture aims to solve several major problems and challenges of the traditional architecture. It is completely decoupled, pooled, reorganized and integrated to form three new simplified layers: storage module, bus network and computing power module.
1. Storage module:
cloud and internet services are divided into three typical application scenarios (as shown in Figure 3). In the first scenario, the local disk of the storage domain server in the data center is directly extended for virtualization services. In the second scenario, large memory is provided for services that require extremely hot data processing, such as big data services, key-Value interface to accelerate data processing; The third scenario is new business scenarios such as containers, which directly provide file semantics for distributed applications such as Ceph, it also supports classifying warm data into colder mechanical hard disk storage modules such as EBOD to improve storage efficiency.
Figure 3 three typical application scenarios of the storage module
in the new storage and computing separation architecture, storage modules mainly exist in the form of new disk frames such as EBOF and EBOM, and traditional storage capabilities such as EC/compression sink into the new disk frames to form & ldquo; disk is the storage & rdquo; Big disk technology, which provides standard services such as blocks and files through high-speed shared networks such as NoF.
From the perspective of internal structure, its medium layer can be composed of standard hard disk or granular large plate integrated by wafer technology, and the combination of disk and frame can achieve the ultimate cost. On this basis, the storage module needs to build a pooling subsystem to realize pooling of local media based on reliable redundancy technologies such as RAID and EC, and further improve the available capacity by combining technologies such as re-deletion and compression. To support high-throughput data scheduling in the new architecture, more efficient data throughput is required. Fast data access paths are usually built based on technologies such as hardware pass-through. Compared with traditional arrays, it avoids the inefficient interweaving of user data and control data (metadata, etc.), reduces the complex feature processing (replication, etc.) of traditional storage arrays, and shortens the IO processing path, the ultimate performance experience with high throughput and low latency is achieved.
As a new storage mode with intensive, compact and extreme storage capacity, storage modules accelerate server Diskless and effectively support the evolution of traditional data center architectures to a new storage and computing separation architecture with minimal layers.
2. Computing power module: moore's law evolves slowly, and only by using dedicated processors can the computing power of the next stage be further developed. After the introduction of dedicated processors, computing power pooling is an inevitable choice; Otherwise, if heterogeneous computing power cards are configured for each server, not only the power consumption is huge, but also the resource utilization rate is very low. DPU and other professional data processors have unique advantages such as low cost, lower power consumption, plug and play, ensuring normal business operation and service quality..
3. High-throughput data bus: over the past 10 years, 10 Gigabit IP networks have enabled HDD pooling. Based on IP networks, access protocols supporting block and file sharing have been developed. Currently, for hot data processing, NVMe/RoCE promotes SSD pooling. In addition, NVMe has developed rapidly and started to compile chimney protocols. Next, for extremely hot data processing, memory networks (such as CXL) will promote memory resource pooling (as shown in Figure 4).
Figure 4 Timeline of network technology development
the new architecture of storage and computing separation has changed the combination of various hardware resources and created a series of key technologies, such as scenario-based data reduction and high-throughput hyper-converged networks.
1. Scenario-based data reduction: in the new architecture of storage and computing separation, data reduction capabilities sink to the storage module. With front-end and back-end reduction tasks, the impact on performance is effectively reduced and the reduction rate is improved. In addition, different reduction techniques can be used for data features in different scenarios.
2. High-throughput ultra-converged network: according to deployment scenarios and diversified network agility and adaptive business requirements, the combination of CXL Fabric, NoF, and IP can be selected for networking among computing modules. The following key technologies need to be considered. First, the network connection mode can be direct connection mode or pooling mode. In direct connection mode, Nic resources are exclusively used by devices; In pooling mode, Nic resources are pooled and shared by multiple devices, which can provide more economical use efficiency. Second, cross-rack communication usually adopts the RDMA mechanism. The number of traditional RDMA connections is limited, and the scalability problem of large-scale interconnection needs to be solved. For example, you can use technologies such as connectionless to decouple the connection status and network applications and support tens of thousands of connection scales.
3. Network Storage collaboration: the smart Nic and DPU are the data entrances and exits of the server. They make full use of the hardware of the smart Nic and DPU to uninstall NoF, compress and other acceleration capabilities, and coordinate the task scheduling between the host and DPU, reducing host data processing overhead can improve IO efficiency; Programmable switches are the data exchange centers between servers and storage, which occupy a special position in the system. Combined with its programmable capability and the advantages of switch centralization and high performance, it can realize efficient data Collaborative Processing.
4. Inventory collaboration: the best end-to-end TCO and efficiency can be obtained through deep collaboration between media and control chips. Taking the redundancy design as an example, the new storage module directly integrates media particles, and only builds a pool space of a large proportion of EC at the frame level to assist the unloading acceleration of proprietary chips, finally simplifying the original in-disk, multi-layer redundancy design, such as in-box, effectively improves resource utilization.
Finally, the new storage module is based on proprietary chips. In addition to providing traditional I/O interfaces, it also provides bypass interfaces to accelerate metadata, bypassing heavy I/O stacks, and improving parallel access capability through remote memory access.
It can be seen that under the Development wave of national East-West Computing and energy conservation and emission reduction, the new separation structure of storage and computing will certainly become a hot topic. Of course, the construction of a new type of storage and computing separation system also faces many technical challenges, which need to be jointly explored and solved by experts in various fields.
The data access interfaces and standards between computing and storage mainly adopt the & ldquo; Master-slave & rdquo; Request response mode, and mainly use the transmission block storage semantics. However, with the rapid development of heterogeneous computing power of memory disks and smart network cards, the performance capabilities of memory access semantics and memory computing collaboration semantics are insufficient.
How to combine with the existing ecology and develop the infrastructure potential based on the new architecture still needs further exploration. For example, how to maximize the potential of new data processors, global shared storage systems, and how to design a more efficient application service framework are all long-term and arduous tasks.
< references>
[1] David Reinsel, Wu Lianfeng, J o h n F.G a n t z, J o h n Rydning, IDC: China will have the largest data circle in the world in 2025 [OL], http://www.d1net.com/uploadfile/2019/0214/20190214023650515.pdf, January 2019
[2] Bozman J S, Broderick K. IDC: Server Refresh: Meeting the Changing Needs of Enterprise IT with Hardware/Software Optimization[OL]. https://www. oracle.com/us/corporate/analystreports/corporate/idc-server-refresh-359223.pdf, July 2010.
[3] Zhang T, Zuck A, E.Porter D, et al. Flash Drive Lifespan is a Problem[C]// Proc. of the 16th Workshop on Hot Topics in Operating Systems(HotOS 2017), 2017:42-49.
[4] Kanev S, Darago J, Hazelwood K, et al. Profiling a warehouse-scale computer[C]//The 42nd International Symposium on Computer Architecture(ISCA 2015), 2015:158-169.
[5] Jin X, Li X, Zhang H, et al. NetCache: Balancing Key-Value Stores with Fast In-Network Caching[C]// The 26th ACM Symposium on Operating Systems Principles(SOSP 2017), 2017:121-136.
[6] Li B, Ruan Z, Xiao W, et al. KV-Direct: High-Performance In-Memory Key-Value Store with Programmable NIC[C]// The 26th ACM Symposium on Operating Systems Principles(SOSP 2017), 2017:137-152.