can the server NVMe disk improve the performance of the database multi-replica architecture?
毕须说  2024-07-27 15:49   published at in China

Recently, I communicated with a financial user. The customer said that in order to improve the performance, all local disks of the server adopt NVMe SSD, which is in the pass-through mode and has low latency. The architecture of the distributed database is carefully studied, as shown in the following figure,

1.jpg

it adopts a one-master-Multi-slave scheme, three replicas in the production center and three replicas in the same city. As can be seen from the figure, the CN computing proxy node sends requests to the database node, and the database node writes to the replica, if synchronous replication is performed in the same city, the latency seen by CN nodes is the sum of the latency of 1,3, and 4 in the figure. The latency of 1 is the same as that of IP SAN, about 0.6ms (the IP SAN released more than 10 years ago is also not as reliable as FC SAN because its performance latency is not as good as FC SAN. It is not used in database scenarios for fear of network error codes, only in scenarios where virtual machines are not sensitive to latency),3 of the latency is replicated synchronously across the same-city network. It takes more than 1ms for 50 kilometers. Currently, the cloud network is very complex. After many network devices, the latency may be higher. However, the latency of writing local SSD disks is 80us (here, the latency is about 0.1~0.2ms for external storage LUN), which is more than 1.6ms compared with the total latency of the two network segments, the delay of NVMe disk is basically ignored.

Therefore, some database experts often say that the performance latency of local NVMe disks is very good. However, due to the database master-slave architecture and longer network transmission latency, NVMe disks do not reflect their core values. In addition, the TCP/IP network is used for synchronous replication. Generally, cloud network layer -3 routing is used across the same city. If many devices pass through, the latency will be high. If jitter or error codes occur, will the performance be significantly reduced, this may also be the main influencing factor of high latency. Therefore, databases have high requirements on the network. However, databases with high latency are blamed for network problems.

In fact, the write process of distributed databases is exactly the same as that of distributed storage. Distributed storage supports cross-node write and ROCE to solve low-latency lossless transmission. Currently, distributed databases use TCP/IP networks, at present, distributed storage mainly carries virtual machine applications, but transaction databases are rarely used. Distributed databases with poor link quality will have higher I/O latency. Does everyone have to lower the database quality requirements?

Source: Bi xunshuo

毕须说公众号二维码.jpg

Replies(
Sort By   
Reply
Reply