thinking and architecture changes after the use of distributed database forerunners, from integration of storage and computing to separation of storage and computing
毕须说  2024-08-23 18:31   published in China

Recently, I communicated with an organization. The distributed database is divided into nearly 100 shards, including 3 replicas in the production center, 3 replicas in the same city, and 2 replicas in different places, totaling 8 replicas, A single server is configured with 4 3.2TB NVMe SSDs, and the local disk is made of soft RAID10 (except for 2 copies), with 16 copies. This group of servers is dedicated to a single shard. Due to the limitation of a single shard in a database, a single shard generally has 1TB of valid data, and 1TB/4/3.2TB is 7.8%. Divided by 3 local replicas, the utilization rate is 2.5%! The utilization rate is surprisingly low. (A single server has 4 disks. If a single disk costs 8000 or 32000, the actual effective data volume is 1TB, equivalent to 32000/TB, which is much more expensive than the current high-end flash memory storage). The number of database nodes is close to 500 servers, and the peak business transaction value is close to more than 3,000 TPS. A single sharded node shares less than 40TPS, and the CPU utilization rate is extremely low.

The database can flexibly configure any local and local replicas to be synchronized and returned to the upper-layer proxy node. Because there are only four local disks on the primary node and cross-city IP network replication, the performance is greatly affected. In particular, the pressure of batch running at night is greater, and other secondary node disks cannot bear the pressure, right there & ldquo; Look & rdquo;, usually the CPU utilization is very low. If the local disk has a slow disk or a timeout disk, the database node will quickly kick off the server node and switch the service to a new primary node. The system generates an alarm and the next recovery operation is very complicated. First of all, this server is faulty. You need to take a new idle machine, reinstall the system and database software, and connect the system to the Slave node. Second, in order not to occupy the production network bandwidth, this database restores data from the backup system of the previous day, and then tracks incremental data from the primary node until it is consistent. How long does it take to rebuild the replica? In addition, flow control does not affect normal business and batch running. Is it worth killing the entire server because of a slow disk failure? More than 10 years ago, the disk kicking of the storage system was strict. Simple and rough disk kicking would bring huge recovery work, let alone CPU kicking.

Customers said that based on these problems, in order to improve the utilization rate of resources, save costs and improve reliability, in addition to the key production system that has been launched, other systems fully adopt the G database + external enterprise storage solution, class A systems such as payment systems have been launched. The database + external storage solution is adopted. 1. More disks are attached. The RAID2.0 pool balancing is integrated in a unified manner. The upper limit performance is higher, and no bottlenecks occur. 2. The utilization rate of external storage is as high as 80%. It is allocated on demand and decoupled from CPU. It has high reliability and can reduce replicas. It does not need to heap so many SSD disks and server resources, thus saving costs, 8 The number of replicas is reduced to 2+1+1, and the number of software licenses is also reduced. 3. In terms of reliability, professional storage for disk-level faults can be quickly isolated in seconds, which does not cause database node switching. Subsequent complex aftermath operations are unnecessary, there is no need to get up at 3 a.m. and report to the leaders how to do high-risk operations. 4. If the server fails due to memory and other factors, map the external storage LUN to the new server for incremental data, and do not need to rebuild the basic data of the replica from the backup.

Take a closer look, some manufacturers take it for granted that servers are cheap, regardless of the utilization rate, and do not think about how the software architecture can improve the utilization rate and efficiency to reduce costs, so they constantly pile up replica resources, take it for granted that the cost is no problem. Multiple replicas have become the top-level feature of distributed architecture. How can Software Engineering and algorithms degenerate to be proud of writing multiple replicas! Can the current economic downturn be so extravagant?

Due to the limited I/O capability and capacity of the local disk, and the great impact on network replication performance, the CPU resources of the heap server are continuously increased, and the number of servers with one master and multiple slave servers is continuously increased, this logic is also very strange, CPU is actually more expensive, SSD pull CPU into the water to play, the cost is higher. In fact, the computing power of Luan Peng server is very strong now. It has tested 1 master and 2 slave servers, and its performance can reach 18000 TPS. As a result, there are so many shards. If it is balanced, the pressure of a single node is less than 40TPS. Why? The hardware is actually Diamond Diamond, while the software system feels that it is & ldquo; Fire Stick & rdquo;.🤭🤭

Adhering to the right innovation, unrealistic disorderly innovation and fake innovation will only bring huge risks and losses to financial institutions, but it is still necessary to be realistic.

 

Source: Bi xunshuo

Replies(
Sort By   
Reply
Reply
Post
Post title
Industry classification
Scene classification
Post source
Send Language Version
Note: Please review the accuracy of the translation. You can edit the translation content on the corresponding language site.
Contribute
Article title
Article category
Send Language Version
Note: Please review the accuracy of the translation. You can edit the translation content on the corresponding language site.