do domestic databases use external enterprise storage or local disks? Is it really worth overthrowing the architecture and sacrificing reliability?
毕须说  2024-07-27 18:55   published at in China

Recently, I communicated with a financial user. The customer said that distributed databases are now required to use local disks instead of centralized storage. Some top-level customers are fooled by some Internet vendors that distributed database software and local server disks adopt a master-slave architecture of 3 replicas, 6 replicas, and even more than 10 replicas, do you need a lot of cost to invest so many copies? How can Software Engineering and algorithms degenerate to be proud of writing replicas? Customers are also worried about the reliability and Hang of a large number of local disks. Is it really reliable to pile up so many replica resources? Just like the 21st century now, it has already passed the era of & ldquo; Many people and great power & rdquo.

Now many customers do think that since a distributed architecture is built, how can centralized storage be attached? The first feeling is that centralized architecture storage is an outdated and backward technology, is it true? In fact, global IDC Consulting companies do not define the concept of centralized Storage, but define the Enterprise Storage System, namely ESS Enterprise Storage System. The ESS defined by Gartner is called External Storage System External Storage System, now it is named Primary Storage System Primary Storage.

Since 2010, the evolution of distributed storage based on open-source SDS software in China has attached a big label to the original Enterprise Storage: centralized storage, which is actually an incorrect concept. Open the enterprise storage architecture, and its hardware and software architecture can realize horizontal expansion of 2-node, 4-node, 8-node, 16-node or even 32-node controllers. Of course, it can also expand the disk frame vertically. The software architecture works on each node controller (non-write multiple replicas) based on LUN space slicing. The slicing granularity is as fine as 64MB, and the disk-level RAID granularity is as fine as 4MB, in this way, the service implements load balancing for All Access Controllers. Isn't this a distributed architecture? The storage field is called A- A-balanced architecture, and the storage manufacturers of this architecture have been working hard to develop and develop this architecture for a large number of users 20 years ago, and now it is very mature and stable.

However, the distributed architecture that everyone is talking about at present is actually the same in terms of technical principles. Slicing is evenly distributed to each node, but the name is changed and the technology is the same. However, if you look at it carefully, what you have done in a hurry is the sliced multi-copy master-slave architecture, which does not achieve the balance A- A each node. Isn't it because you have a sheep head and sold dog meat?

Therefore, we recommend that you analyze and analyze various architecture names. In fact, in the past two years, some head customers have realized the problems of using a large number of local disks on servers in databases: low reliability, disk Hang timeout, low utilization rate of multiple replicas, and the need to expand the CPU of servers, on the contrary, the cost is high, especially the complexity of operation and maintenance management, the avalanche of fault diffusion, and the sleepless analysis. Line G returns to the distributed database + external enterprise storage, that is, the storage and computing separation architecture. Instead of thinking that using this so-called centralized architecture is outdated, it is ahead: 1) the dual-cluster database is isolated from the primary cluster and the secondary cluster. It does not cause a single cluster to be delayed due to software bugs. The upgrade can also be isolated and upgraded separately without affecting each other.

2) inheriting the high reliability of enterprise storage, unified storage management disk, all disks are balanced in pool wear, the disk is sure to have faults, such as disk failure, sub-health, slow disk, FW BUG, when I/O hang times out, these failures can be quickly isolated without affecting the business. Common OS cannot handle these failures well. It can also predict the health of disks, identify disks that are about to expire in advance, and actively troubleshoot them. After a disk failure occurs, the storage system automatically reconstructs and repairs the disk without frequent read and write across database nodes. The storage system takes several hours. The most important thing is that if there is any problem with the disk, you can find the manufacturer to check it out, so as not to find the manufacturer to analyze it, even if the server disk is hung up, the log cannot be exported, ridiculous events that cannot be analyzed by the disk manufacturer occur.

3) synchronous replication of the storage system ensures that the RPO in the same city is 0, because the storage replication network is FC. The FC network with low latency and reliability is of better quality than the IP network, and does not cause congestion and packet loss, resulting in hang-up.

4) the computing resources and storage resources are decoupled. If the database capacity is insufficient, you can directly expand the disk without the need to expand the CPU of one master and several slave. This avoids wasting resources and reduces costs.

In fact, the storage system was separated from the computing system in the 1990 s. The classic IOE architecture has been proved to be successful in more than 30 years of practice. This architecture is in line with human nature. Database, OS, computing server, network, storage and other fields have corresponding professional companies, and most of them are top 500 companies, which are divided into layers and do their own professional work in their respective fields. At present, some people use an integrated architecture that uses unreliable local disks to host core databases. The reliability of a large number of disks is left unattended. Isn't it irresponsible to customers?

Therefore, it is inappropriate to deny the architecture. The separation architecture of Data Inventory calculation conforms to the two basic common sense logic that professional people do professional things and resource management must be refined rather than extensive, as well as cost reduction and efficiency improvement, the present and the future are the general trend.

Source: Bi xunshuo

Replies(
Sort By   
Reply
Reply