[vSAN distributed storage server data recovery] VMware vSphere distributed virtualization VMDK file 1KB problem
海境超备  2024-08-09 10:50   published in China

Case description
only when you know your enemy and yourself can you win every battle. The same is true for data recovery. A detailed understanding of the process of data loss can make data recovery easier. You can communicate with customers in detail to find out the causes of the failure as follows: the entire VMware vSphere controls multiple clusters, the faulty cluster uses vSAN distributed storage. The cause of the failure is that the flash drive of one disk group in one storage fails. The disk indicator reports an error, but the data can be used normally, therefore, a new SSD hard disk was replaced, but the previous disk group did not recognize this hard disk, so the disk group failed, the maintenance personnel re-select two HDDs of the disk group and the new SSD to form a new disk group, and re-join the vSAN distributed storage cluster. After 2 hours of synchronization, the cluster can be accessed normally, however, one of the virtual machines involved in the disk group cannot be started. After checking, it is found that the VMDK file of the virtual machine becomes 1kB.
Solution
1. Case assessment
through on-site analysis and analysis by our technical engineers, it is found that the storage strategy of the faulty virtual machine is inconsistent with that of other normal virtual machines. The virtual machine adopts the RAID0 structure and is not provisioned, other virtual machines adopt the RAID1 policy and use 100% of the configurations. When rebuilding the disk group, all the virtual machines that are stored in the disk group have failed. However, due to the 100% provisioning policy, the disk group has been automatically degraded, then it automatically restores and continues to use the 100% provisioning policy, so these virtual machines can automatically restore to the normal state, but virtual machines that do not use this policy cannot automatically restore, resulting in data loss. vSAN is similar to VMware's traditional VMFS file system. It can be understood that vSAN is a large partition, and each folder in this partition has a similar structure of VMFS. You can access the next layer of VMFS file system only when you access vSAN. However, in the VMFS file system under vSAN, the storage of virtual disk VMDK files is specifically defined.

VMDK1.jpg

A virtual machine is created in a vSAN. The virtual machine is equipped with a VMDK file. When the system generates the VMDK file, a virtual object is generated and associated with UUID, when accessing vSphere through a web page, we can find the VMDK file in the virtual machine directory and the size is normal.
However, if you use SFTP to access the VMDK, you will find that the size of the VMDK file is 1KB, because when you use external access to the VMDK file, the system does not automatically associate the VMDK file with the virtual object.
Similarly, if the virtual object fails, cannot be accessed normally, and the association becomes invalid, the VMDK file associated with the virtual object becomes 1kB when using the web page to access vSphere. (1kB for the same size less than 1KB)

VMDK2.jpg

we can download a 1KB VMDK file. After you open it in txt, you can see the UUID of the virtual object associated with it. The red box contains the ID of the virtual object associated with it:

VMDK3.jpg

you can select monitoring in the cluster to view the status of virtual objects:

VMDK4.jpg

then, view the physical storage location of the virtual object according to the virtual object ID. As shown in the figure, the virtual object is a RAID 0 composed of multiple components, and the component status is missing:

VMDK5.jpg

2. Recovery plan
data recovery is as follows:
1) record the host, cache disk, and physical disk of each component under the virtual object.
2) parse the space allocated to the virtual object in the physical disk.
3) parse allocated but not written space addresses from the cache disk.
4) use tools to manually extract these sector addresses and combine them into a complete component.
5) reorganize RAID 0 with all extracted components to access all data in the virtual object.
Step 1: parse the virtual object corresponding to the faulty VMDK file.
Step 2: view the structure of the virtual object in vSphere monitoring based on the obtained virtual object ID.
In some extreme cases, the virtual object has been lost in vSphere monitoring. If the virtual object cannot be found, manual analysis is required to access the bottom sector of the hard disk. The structure is as follows. Parses the space occupied by partitions of vSAN distributed storage on the HDD and SSD. In this space, the lost virtual object ID can be parsed through a 16-digit editor, extract a specified virtual object or VMDK virtual disk file.

vSAN2.jpg

Step 3: based on the obtained virtual object ID, extract component members attached to the ID from the hard disk, reorganize RAID, obtain lost data, and extract component data distributed by vSAN stored in the disk group, you can extract components of the entire virtual object and reorganize the RAID to recover lost data.

VMDK6.jpg

Case Summary
vSAN is a scalable distributed storage architecture developed based on vSphere kernel and based on VMware ESXi virtualization platform. vSAN builds the vSAN storage layer by installing Flash memory and hard disk in the vSphere cluster host. These devices are controlled and managed by vSAN, which forms a unified shared storage layer for vSphere clusters. VMDK (Virtual Machine disk) is a virtual machine disk format developed by VMware and is one of the standard formats for storing virtual machine hard disks. In a virtualized environment, a VMDK file acts as a disk drive and contains the operating system, applications, and data of the virtual machine. A VMDK file contains all virtual machine disk information. The file format consists of multiple data files and a description file. The data file is the actual data of the virtual disk, while the description file contains the configuration information of the disk, the hardware configuration of the virtual machine, and the information of the disk file. VMDK files can be migrated and shared among different VMware virtualization platforms, support running in a variety of operating systems and hardware environments, and provide flexible virtualization solutions.

When data is lost, the overseas super standby R & D team deeply studies the design ideas of various servers and systems, carefully compares fault types, solves difficult recovery cases, and summarizes successful recovery experiences, it has successfully repaired tens of thousands of difficult cases related to data centers such as server databases, virtualization platforms, distributed storage, and has mastered the core technology of ransomware recovery. All recovered data is not recorded and has a complete structure, use directly without error.

Replies(
Sort By   
Reply
Reply