The internal essence of a distributed database architecture is a master-slave architecture with one master-slave architecture. How can it be balanced? Master-slave replication, the playback time of the Slave node is hourly RTO, which can tolerate?
毕须说  2024-07-30 16:25   published at in China

Recently, I communicated with a financial customer. The customer said that the current domestic database is essentially a database shard, unit, or database shard. A single shard is generally 500GB ~ 1TB, or even 10TB,1 Master-replica multi-replica, then master-replica replication, based on MySQL TDSQL,GoldenDB uses binlog to copy to the Slave node. For 90% of databases, the capacity is about a few TB. The capacity is really small. Why do you need to deploy multiple shards?

In fact, a single shard of 6 master-slave copies may be done. Is it necessary to deploy dozens of servers to scale out by grouping? The hardware reliability is low. Multiple replicas do not mean good reliability. The resource utilization rate is low, and the operation and maintenance management is complex. It consumes a lot of data center space, consumes energy, charges electricity, and costs a lot. The capacity of a single slice is 1TB. In fact, the slice size is very large. In essence, one master node and five master nodes undertake transactions, while other node queries do not really balance the six nodes, isn't this the primary/secondary architecture? Why is it called distributed architecture?

At the same time, the database recommend generally produce one master and multiple replica synchronously, and synchronize at least one replica in the same city. The Master Replica copies the slave replica through binlog, and the slave replica needs to replay the SQL logical log, A customer told me that the biggest problem found at present is that the playback time may be very long and the length cannot be controlled, especially during the period of Batch Settlement at night, the playback time may exceed one hour or even longer. If the primary node server fails at this time and the secondary node needs to play back the logical log data to take over the service, the RTO time may be as high as several hours and the service may be interrupted for several hours, can this service interruption time be tolerated?

At the same time, the production and the same city cluster are replicated synchronously through IP network. If there is network packet loss, congestion, error code jitter, and transient disconnection, who is responsible for fast isolation? After all, the quality of IP network is definitely worse than that of FC network. How to deal with the database? Will it lead to a sharp decline in business performance or even hang up? In fact, Oracle ADG has the highest performance mode, which is asynchronous replication and does not achieve synchronous replication.

In fact, the core logic of the database distributed architecture is slicing and balancing to each node. This architecture is called A- A balancing architecture in the storage system: 64MB slicing based on LUN space is distributed and rotated to each controller node, non-replica mode with a small granularity of 64MB enables each controller to perform read and write services at the same time. The disk is balanced to each SSD disk based on 4MB RAID slices, this fine-grained slicing is the real distributed balancing architecture.

The storage system has been working hard for it 20 years ago. There are only 2 to 3 manufacturers (such as EMC and Huawei) that can do a good job in AA architecture on this planet, it is extremely difficult to balance the A- A software architecture and hardware architecture. However, the current database slice level is 1TB. With such a large slice granularity, it is still one master, multiple replica and multiple replicas. How can load balancing be achieved? If the Shard size is as small as MB, the metadata will be greatly expanded and the reliability will be enhanced. It is hard to imagine the so-called distributed database architecture, but the internal architecture is a master-slave multi-replica architecture. What do you think of this so-called distributed architecture? Should it be called distributed master-slave multi-replica architecture?😅😅

 

Source: Bi xunshuo

毕须说公众号二维码.jpg

Replies(
Sort By   
Reply
Reply