It takes 3 months to upgrade the FW of 300 disks. If there is a BUG with tens of thousands of SSDs on the local disk of the distributed database, how can I upgrade the FW of the disk?
毕须说  2024-07-30 16:33   published in China

Recently, I visited a financial user and talked about the large number of local disks used in the database. The customer said that there were dozens of server disks, about 300 SSD disks, after the service is started, the manufacturer locates a FW BUG and needs to upgrade the FW. It is all manual operations. You need to find the downtime change time window, restart the server, upgrade the FW disk, reset the power-on, and it takes 3 months!

The customers themselves are afraid of the extremely complicated and high-risk operations.

Imagine that some bank database nodes currently have thousands of servers and tens of thousands of local disks, and come from various server suppliers, disk manufacturers, models, and versions, there may be dozens of people researching and developing some domestic unknown small SSD manufacturers. The quality of research and development and manufacturing may not be so high. It is normal for disks to have bugs. If you need to upgrade FW, how to manually upgrade tens of thousands of local disks of servers? This will have an unimaginable impact on the complexity of data center O & M management!

Actually the storage system itself is the machine that manages the disk, commonly known as & ldquo; Disk Machine & rdquo;, which is responsible for the overall reliability of the storage, and the storage manufacturer checks and controls the quality of incoming materials, the quality evaluation of SSD disk FW software version requires a large number of tests before introduction, and a large number of baking machine tests are also carried out during production and manufacturing. The problem disk, slow disk, check and identify the sub-health disk to avoid problems when it is sent to the customer's site.. The FW of the SSD disk is also integrated into the storage system software. If there is any problem, the system automatically upgrades the SSD disk without manual operation.

However, when a large number of servers hang local disks, the uncontrollable risk of disks will be magnified infinitely. The FW software of disks needs to be evaluated by customers themselves. How to evaluate the version quality to meet the requirements? Manual human flesh upgrade operations are also required. The complex tasks originally done by professional manufacturers are left on the user side for manual operation. Moreover, it is a massive and complicated high-risk operation. Is it a responsible logic? The logic that conforms to human nature is to leave complexity to manufacturers themselves and simplicity to customers.

Source: Bi xunshuo

毕须说公众号二维码.jpg

Replies(
Sort By   
Reply
Reply