Source: architect technology alliance public account
the data center architecture is a complex integration of modern facilities, IT and network systems. These components work together to build, design, and support critical business applications. These systems are highly interconnected and require careful planning and synchronization of their design and operation. The data center architecture includes the design and layout of physical infrastructure (such as power distribution and cooling systems) and IT infrastructure (including network architecture, storage architecture, server architecture, and cloud data center architecture). IT involves detailed planning of physical space, power supply and cooling systems, network connections, security measures and software to ensure the best performance, reliability and scalability of IT resources and services. The ultimate goal is to create an efficient, flexible and secure environment for carrying the key IT infrastructure of modern enterprises and organizations.
Components of the data center architecture
server: it can be divided into different types according to its physical structure and size, including rack servers, blade servers, and tower servers.
Storage System: data centers use storage technologies such as storage area networks (SAN), network attached storage (NAS), and direct-connect storage (DAS) to store and manage data.
Network Device: switches, routers, firewalls, and load balancers provide efficient data communication and security between internal and external networks in the data center.
Power Infrastructure: Uninterruptible Power Supply (UPS) system, standby generator and distribution unit (PDU) provide stable and reliable power supply for data center equipment
cooling system: The machine room air conditioning (CRAC) device, liquid cooling system and cold/hot channel seal can maintain the optimal temperature and humidity level to make the hardware run normally.
Cabinet: racks and cabinets used in the data center include open racks (two-column and four-column racks), closed racks, wall-mounted racks, and network cabinets.
Wiring: Structured wiring system, including twisted pair cables (for Ethernet, such as Cat5e and Cat6), optical fiber cables (single-mode and multi-mode) and coaxial cables
security System: biometric access control, surveillance cameras, security personnel and other physical security measures, as well as firewalls, intrusion detection/Defense Systems (IDS/IPS) network security solutions such as and encryption can protect data centers from unauthorized access and threats.
Management Software: data center infrastructure management (DCIM) software helps monitor, manage, and optimize the performance and energy efficiency of data center components
01. Network architecture of the data center
the network architecture of a data center refers to the design and layout of interconnection nodes and paths that facilitate communication and data exchange within a data center. It includes the physical and logical layout of network devices such as switches, routers, and cables to achieve efficient data transmission between servers, storage systems, firewalls, and load balancers. An appropriate network architecture provides high-speed, low-latency, and reliable connections, as well as scalability, security, and fault tolerance.
For decades, three-tier architecture it has always been the standard model of data center networks. However, another topology, namely leaf Ridge architecture it has appeared and gained a prominent position in the modern data center environment. This architecture is particularly common in high performance computing (HPC) settings and has become a major choice for cloud service providers (CSP).
The following is a comparison of the two different data center network architectures:
▋ three-tier data center network architecture
the three-tier data center network architecture is a traditional network topology that has been widely used in many older data centers. It is often called the "core-convergence-access" model. Redundancy is a key part of the design. In addition to helping the network achieve high availability and efficient resource allocation, there are multiple paths from the access layer to the core.
Access Layer: as the lowest layer in the three-tier data center network architecture, it serves as the entry point for servers, storage systems, and other devices to enter the network, providing connections through switches and cables. Access layer switches are usually arranged in the top (ToR) configuration, and policies such as security settings and VLAN (Virtual Local area network) allocation are enforced.
Aggregation layer: also known as the distribution layer, it integrates data traffic from the top switch of the access layer and then transfers it to the core layer for routing to its final destination. This layer enhances the elasticity and availability of the data center network through redundant switches, eliminates single points of failure, and through load balancing, quality of service (QoS), packet filtering, policies such as queues and inter-VLAN routing control network traffic
core layer: It is also called the backbone network, which is the high-capacity central part of the network. It is specially designed for redundancy and elasticity. It interconnects aggregate layer switches and connects them to external networks. The core layer operates at Layer 3 and uses high-end switches, high-speed cables, and routing protocols with short convergence time. Speed, minimum latency, and connectivity are preferred.
The inter-layer multi-hop latency caused by server virtualization technology will generate a large amount of East-West (server-to-server) traffic. The traditional three-tier data center architecture is difficult to effectively handle, and there is also a waste of bandwidth, problems such as large fault domains and difficulty in adapting to ultra-large-scale networks.
Data center traffic can be divided into the following types:·
north-south traffic: the traffic from the client outside the data center to the data center server, or the traffic from the data center server to the Internet.
East-west traffic: traffic between servers in the data center.
Cross-data center volume: traffic in different data centers, such as disaster recovery between data centers and communication between private and public clouds.
In traditional data centers, services are usually deployed by leased lines. Typically, services are deployed on one or more physical servers and physically isolated from other systems. Therefore, the east-west traffic in traditional data centers is relatively low, and the north-south traffic accounts for about 80% of the total traffic in data centers.
In cloud data centers, the service architecture has gradually changed from a single architecture to a Web-APP-DB, and distributed technology has become the mainstream of enterprise applications. Service components are usually distributed in multiple virtual machines or containers. The service is no longer run by one or more physical servers. Instead, multiple servers work together, resulting in rapid east-west traffic growth.
In addition, the emergence of big data services makes distributed computing the standard configuration of cloud data centers. Big data services can be distributed across hundreds of servers in a data center for parallel computing, which greatly increases east-west traffic.
The traditional three-tier network architecture is designed for traditional data centers with north-south traffic dominating, and is not suitable for cloud data centers with large east-west traffic.
Some east-west traffic (such as layer -2 and Layer -3 traffic across pods) must be forwarded through devices at the aggregation layer and the core layer, without having to pass through many nodes. Traditional networks usually set a bandwidth excess ratio from 1:10 to 1:3 to improve device utilization. With the oversubscription rate, the performance decreases significantly each time the traffic passes through the node. In addition, xSTP technology on Layer 3 networks has aggravated this deterioration.
Therefore, if a large amount of east-west traffic is running through the traditional three-tier network architecture, devices connected to the same switch port may compete for bandwidth, resulting in poor response time for end users.
Vertebral Spine-Leaf Spine architecture
ridge architecture, commonly known as Clos design, is a two-tier network topology that is widely used in data centers and enterprise IT environments. Compared with the traditional three-tier network architecture, it brings many advantages to the data center infrastructure, such as scalability, reduced latency, and improved performance.
Leaf layer: these are Rack-top switches in the access layer that are used to connect to servers and storage devices in the rack. They form a complete mesh network by connecting to each backbone switch, ensuring that all forwarding paths are available and that nodes are equidistant in terms of hop count.
Vertebral layer: they constitute the backbone of the data center network, interconnecting all branch switches and routing traffic between them. They do not connect to each other directly because the mesh network architecture eliminates the need for dedicated connections between Backbone switches. On the contrary, they route east-west traffic through the backbone layer to achieve complete non-blocking data transmission between servers on different leaf switches.
Compared with the traditional three-tier architecture, the ridge leaf architecture has excellent scalability, lower latency, predictable performance, and optimized east-west traffic efficiency. It also provides fault tolerance through high interconnection, eliminating network loop problems and simplifying data center network management.
However, the Fabric architecture is not perfect. The performance and function requirements of leaf node network devices are higher than those of traditional architecture access devices. They are used as gateways of various types (two or three layers, VLAN/VxLAN, vxLAN/NVGRE, FC/IP, etc.), chip processing capability requirements are high, currently there is no commercial chip that meets the intercommunication between all protocols; Because there is no relevant standard, in order to achieve access to various types of networks, the forwarding between backbone nodes and leaf nodes adopts private encapsulation, which also sets up difficulties for future communication. In addition, there are:
independent L2 Domain limits the deployment of applications that depend on L2 Domain. Applications that are required to be deployed in a layer -2 network can only be deployed in the next rack. Independent L2 Domain limits server migration. After migrating to different racks, the gateway and IP address must be changed.
The number of subnets has increased significantly. Each subnet corresponds to a route in the data center, which now means that each rack has a subnet. The number of routes corresponding to the entire data center has greatly increased, and how to transmit the route information to each Leaf, it is also a complex problem.
Before designing the Yeji network architecture, you must first identify some important factors. For example, convergence ratio (that is, oversubscription ratio), ratio of leaf switch to Ridge switch, uplink from leaf layer to ridge layer, built on Layer 2 or layer 3, etc.
02. Storage architecture of the data center
A data center storage architecture is the design and organization of a storage system. It determines how data is physically stored and accessed in a data center. It defines the types of physical storage devices, such as hard disk drives (HDD), solid state drives (SSD), and tape drives, as well as their configuration methods, such as direct-connect storage (DAS), network connection storage (NAS) and storage area network (SAN). In addition, the storage architecture also involves methods for servers to access stored data directly or through the network. The following are the main types of data center storage architectures:
▋ direct connection storage (DAS)
direct connection storage (DAS) is a digital storage system used in data centers. It is characterized by direct physical connection with the servers it supports without network connection. The server uses protocols such as SATA, SCSI, or SAS to communicate with storage devices. The RAID controller manages data striping, mirroring, and disk management.
DAS provides a single server with high efficiency, simplicity, and high performance, but compared with network storage solutions such as NAS and SAN, it has limitations in scalability and accessibility.
▋ network Attached Storage (NAS)
network Attached Storage (NAS) is a dedicated file-level storage device that provides data access for multiple users and client devices through TCP/IP Ethernet in a local area network (LAN). These systems are designed to simplify data storage, retrieval, and management without the need for intermediate application servers.
NAS provides the advantages of easy access, sharing, and management. However, due to its dependence on shared network bandwidth and physical limitations, NAS faces scalability and performance limitations.
▋ storage area network (SAN)
A storage area network (SAN) is a dedicated high-speed network that typically uses fiber channel protocols to connect servers to shared storage devices. These systems provide block-level access to storage in the data center, enabling servers to interact with storage devices as if they were directly connected, this simplifies backup and maintenance operations by uninstalling these tasks from the host server. SAN offers high performance and scalability, but they have high costs and complex management requirements, requiring specialized IT expertise.
▋ next-generation storage solutions and technologies
A variety of innovative next-generation solutions and technologies are emerging in the data center storage field to meet the growing demand for efficiency, scalability, and performance. These include:
all-Flash Array: high-speed storage systems that use solid state drives (SSD) instead of traditional rotary hard drives (HDD) provide excellent performance and lower latency. In addition, the adoption rate of storage protocols specially designed for SSD (such as NVMe (fast non-volatile memory) and NVMe-OF(NVMe over Fabric) is increasing, the performance of all-flash arrays in the data center is further improved, and latency and throughput are reduced.
Scale Out file systems: A storage architecture that allows you to horizontally expand storage capacity and performance by adding more nodes. It supports flexibility and ease of expansion.
Object platform: A storage solution designed to manage large amounts of unstructured data that uses flat namespaces and unique identifiers for data retrieval.
Hyper-converged infrastructure (HCI): an integrated system that integrates storage, computing, and network into a framework to simplify management and enhance scalability.
Software defined storage (SDS): a method of software management and abstraction of underlying storage resources, providing flexibility and efficiency through policy-based management. SDS technology has been adopted by many ultra-large companies such as Meta Platforms(Facebook), Google and Amazon.
Thermal assisted magnetic recording (HAMR): a data storage technology that uses local heating to increase magnetic record density, enabling higher capacity hard disk drives (HDD) to meet the growing storage needs of modern data centers.
03. Server architecture of the data center
the server architecture of a data center refers to the design and organization of servers and related components to effectively process, store, and manage data. It can generally be divided into the following categories: external dimensions (physical structure), system resources, and supporting infrastructure:
dimensions (physical structure)
rack Server: these are the most common server types in the data center. They are designed to be installed in a standard 19-inch rack and are usually 1U to 4U in height
blade Server: these servers are designed to maximize density and minimize physical space. Multiple blade servers are installed in one chassis to share public resources such as power supply, cooling, and network.
Tower Server: Although it is not common in large data centers, tower servers are still used in small-scale deployments or places with unlimited rack space. They are similar to desktop computer towers and can be independent units.
System resources
CPU (CPU): CPU is the brain of the server, which is responsible for executing instructions and processing data. It performs arithmetic, logic, and input/output operations.
Memory: RAM (random access memory) is the main memory of the server, providing fast access to data and instructions. It temporarily stores data and programs currently in use
storage: devices such as hard disk drives (HDD) or solid state drives (SSD) store data and files permanently. They store operating systems, applications, databases, and user data
network: NIC (network interface card) connects the server to the network to realize communication with other devices. They handle the sending and receiving of data packets
GPU (graphics processing unit): GPU is a dedicated processor designed for parallel processing and graphics rendering. They are good at processing compute-intensive tasks, especially artificial intelligence, machine learning, and scientific simulation tasks. However, not all servers require GPU
infrastructure Support
Power Supply System: The power supply unit (PSU) provides stable and reliable power for all server components. They convert the AC power from the wall socket into the appropriate DC voltage required by the server
air conditioning system: The server generates a large amount of heat, and the cooling system ensures that the components operate within a safe temperature range. Cooling options include fans, radiators, liquid cooling, and air conditioners in the server room.
Motherboard architecture: this is the main printed circuit board that connects all server components together. It provides necessary interfaces, buses, and slots for CPU, RAM, storage, and other peripheral devices.
04. Cloud data center architecture
cloud data center architecture is to design and organize computing, storage, network and database resources in a remote data center to deliver cloud computing services. The architecture is built based on virtualization technology and allows efficient sharing and utilization of physical resources to provide scalable, reliable, and flexible cloud-based applications and services. The following is a breakdown of the main components of the cloud data center architecture:
compute: Cloud computing provides virtual machines (VMs), containers, and serverless computing resources for running applications and workloads. These services allow users to configure and expand computing capabilities on demand without managing physical hardware. For example, major Cloud computing services include Amazon EC2,Microsoft Azure Virtual machines, and Google Cloud computing engines.
Storage: Cloud storage provides scalable and persistent storage solutions for various data types, such as files, objects, and backups. These services provide high availability, automatic replication, and data encryption to ensure data integrity and security. Examples of popular cloud storage services include Amazon S3, Microsoft Azure Blob storage, and Google Cloud Storage.
Network: cloud network services enable users to create, configure, and manage virtual networks, subnets, and network security rules. These services provide connections between cloud resources, local networks, and the Internet for secure and efficient data transmission. For example, key Cloud network services include Amazon Virtual Private Cloud (VPC), Microsoft Azure Virtual network, and Google Cloud Virtual Private Cloud (VPC)
database: apsaradb provides managed and scalable database solutions for storing, retrieving, and managing structured and unstructured data. These services support various database engines, such as relational databases (such as MySQL and PostgreSQL), NoSQL databases (such as MongoDB), and data warehouses. Apsaradb can handle configuration, expansion, backup, and security tasks, enabling developers to focus on application development. For example, famous Cloud database services include Amazon RDS,Microsoft Azure Cosmos DB, and Google Cloud SQL.
05. Physical data center design
the physical architecture and design of a data center are essential to ensure optimal performance, security, and reliability. The following are the key elements of physical data center architecture design:
site selection
location: data centers are usually built in areas with low risk of natural disasters, away from areas prone to earthquakes, floods and hurricanes.
Climate: cooler locations can reduce the cost of cooling data centers by using ambient air, while warmer climates require more energy-saving cooling solutions.
Traffic: the location must be convenient for staff to enter and close to the main roads and airports for transportation equipment and emergency response.
Power supply: reliable and cost-effective energy is crucial. The existence of multiple high-voltage transmission lines and substations is very important for power transmission.
Line resources: it is close to major optical fiber lines and has multiple service providers to achieve better connectivity.
Architecture and structure
building materials: data centers are usually built of durable refractory materials, such as concrete, steel and dedicated wallboards.
Structure: Although single-layer data centers are more common, multi-layer data centers are increasingly built in areas with limited land availability or high real estate costs.
Ceiling height: high ceiling height (usually between 12 and 18 feet) is necessary to accommodate movable floors, overhead cable bridges and air conditioning pipes while providing sufficient clearance for equipment and maintenance.
Bearing capacity: the data center requires a high floor load capacity to support the weight of heavy-duty server racks, cooling systems, and uninterruptible power supply (UPS) systems. The carrying capacity is usually between 150 and 300 pounds per square foot.
Internal layout: the internal architecture of the data center, including pillars and partitions, plays a vital role in the overall design and function of the facility. These factors will affect the space utilization rate, airflow related to the cooling system, power distribution and equipment transportation for maintenance.
▋ data center function positioning
the design and construction of a data center are based on various architectural factors, such as size, purpose, ownership, and location. Common data center types include:
enterprise Data Center: Owned and operated by various companies to support their specific business needs and applications. They are usually tailored, which means they are customized to meet the specific needs of a single organization.
Host-hosted data center: provides a shared infrastructure. Multiple customers can rent space, power supply and cooling to accommodate their IT equipment in managed facilities.
Ultra-large data center: large-scale centralized facilities designed to support the needs of ultra-large-scale providers (CSP) and Internet companies. EDGE data center: a small facility that utilizes the distributed data center architecture.
EDGE data center located closer to end users or data sources, it aims to reduce latency and improve application performance by processing data closer to its source.
Containerized data center: these data centers, also known as miniature data centers, are modular portable facilities installed in containers, providing flexibility and rapid deployment.
AI data center: dedicated facilities optimized for AI workloads, with high-performance computing, GPU (graphics processing unit) and liquid cooling system.
Source: Data Center O & M Management