briefly interpret the deployment architecture and business flow of the Dorado storage and computing separation solution for GaussDB Database
毕须说  2024-09-03 09:36   published at in China

Recently, many users are interested in the GaussDB Dorado storage and computing separation solution. The Deployment architecture of this solution is shown in the following figure.

640.webp


Production Center 1 master 2 slave 3 database nodes, down to 1 OceanStor Dorado enterprise storage, each node maps the LUN disks stored in Dorado to three database nodes (based on the actual database capacity of the application system and allocated as needed). Each node has a set of LUN space instead of several local disks, connect the external storage LUN to the OS operating system, generate the corresponding drive letter, create a file system, and use it to the database. At the same time, a LUN disk is mapped and shared to three nodes, which is a shared redo log volume. The same deployment architecture is similar to that of the city center. Three nodes are redundant. The stored LUN is mapped to three database nodes for use as a data space, and the log volume is mapped to three nodes for sharing. The production and the same-city log volume storage LUN are synchronized for storage replication. The link is FC optical fiber or ROCE link.


The business flow of the database architecture is as follows:


1. After configuring the disaster recovery relationship between the two clusters, the full data between the primary and secondary clusters is built for the first time, and the databases are replicated through IP networks.


2. The primary DN of the primary cluster generates WAL(Write Ahead Logging, XLog) logs and writes them to the shared log volume;


3. The written log volumes are synchronized to the shared log volumes of the peer disaster recovery cluster through Dorado storage to ensure that RPO = 0;


4. The standby DN node of the primary cluster reads WAL logs from the shared log disk to complete data playback.


5. The three DN nodes of the standby cluster read WAL logs from the shared log disk to complete data playback. (RTO<120s)


as can be seen from the preceding process, the GaussDB Dorado storage and computing separation solution has changed the database architecture and IO process: log volumes have shared access between primary and secondary instances, similar to primary and secondary clusters; the replication stream is replicated from an IP network-based database to a storage FC network-based synchronous replication. The network quality is better. The secondary node is replayed to its own data space based on shared logs, instead of primary-secondary IP network replication and playback.


This solution enables smooth migration and in-situ replacement of the Oralce storage architecture. Its core values are as follows:


1. The reliability of the overall architecture is not degraded, and the classic database storage architecture is maintained. The database has a good database logic, SQL engine, storage engine, storage system disk management and RAID management, space management, sub-health, slow Disk, fast isolation over time, more reliable.


2. RPO = 0 is implemented in both active and standby clusters in the same city. The replication network is based on a low-latency lossless FC or ROCE network, with low latency and more reliability.


3. Computing resources and storage are decoupled and managed in a refined manner. If the capacity is not enough, you only need to expand storage resources instead of database nodes, which saves more costs.

 

4. Production and the same city are both primary and secondary clusters, rather than one cluster. They are isolated from each other. If the primary cluster in the production center fails or is upgraded, the standby Cluster Support business will not be affected. Some database experts often recommend that production is far away from a cluster in the same city. In fact, it is risky. If network jitter or software bugs may cause production and in the same city to fail, failure isolation has not been achieved, which is similar to Oracle production and intra-city remote clusters. In fact, this is rarely deployed.

640.webp

Source: Bi Xu said public number

Replies(
Sort By   
Reply
Reply