Basic for high-performance storage, learn about all aspects of memory
百川归海-Edward  2024-08-05 16:58   published in China

The article is reprinted from: Data storage Zhang public account

 

in the previous sections, we briefly introduced the relevant contents of memory. In this section, we will introduce all aspects of memory in detail. The so-called Memory in this section focuses on the category of DRAM(Dynamic Random Access Memory), which is widely used in computer systems such as personal computers, workstations and servers. Memory is actually a relatively simple computer device. As shown in Figure 1, it is a physical diagram of common memory chips. Memory chips of different types of computers have different forms. Here are memory chips commonly used on servers. It can be seen that there are many integrated circuits on memory chips. These integrated circuits are divided into two types: one is memory particles, which are the entities that store data, just like the warehouses where we store goods everyday; the other is SPD(Serial Presence Detect), which is the Serial Presence detection chip.

641.png

Figure 1 Memory physical diagram

although a computer may have multiple memory chips and each memory chip has multiple memory particles, the controller has done a lot of abstract work for us, A continuous linear space is usually seen at the operating system level. The operating system layer does a lot of work. The application layer usually sees only virtual addresses, not real physical addresses.

As we learned earlier, the access to hard disks is based on sectors or pages. The memory access granularity is much smaller. We can access the memory in bytes. When we introduced mechanical hard disks earlier, we learned that their random access performance is much worse than sequential access, but there is little difference between random access of memory and sequential access. These access features differ from the hard disk in memory.

Let's first introduce SPD. This integrated circuit is actually a memory chip that stores some characteristic parameters of the memory chip. These parameters include information such as type, operating voltage, operating frequency and speed. Take the memory type as an example. A field named Basic memory type is used to describe the memory type. For example, for DDR memory, the value of this field is 7, 8 for DDR2 memory and 11 for DDR3 memory. It is through this information that the CPU can establish normal communication with the memory.

In addition, it should be noted that there is not a controller chip like SSD on the memory chip, and the memory controller is usually integrated into the North Bridge chip. Earlier, beiqiao chip was an independent chip, but most of it has been integrated into the CPU.

Another integrated circuit is memory particles. However, we will not introduce this part for the time being. Let's take a look at the memory chips as a whole. It can be seen that the number of memory particles on the upper and lower memory chips in Figure 1-35 is different. The memory chip above has 8 memory particles, while the memory chip below has 9 memory particles. This is because the two memory modules are different types of memory modules. The one above is a common memory module and the one below is a memory module with ECC function.

What is ECC? ECC is short for Error Correction Code. In short, memory chips with Error Correction Code function can realize Error Correction function. The cost is an additional memory particle. Consumer memory chips usually do not have ECC, and only some enterprise-level memory chips usually have ECC functions. The reason is simple. Memory chips with ECC are much more expensive.

Is the ECC function of memory chips necessary? Let's take a look at Kingston's statistics on the failure rate of ECC memory and non-ECC memory, as shown in figure 2. We can get two useful information from this figure. One is that the failure rate of non-ECC memory is about 1%, which is still very high for enterprise-level applications. Another information is that ECC technology can indeed effectively improve the memory failure rate.

641.png

Figure 2 memory failure rate

how does ECC reduce the memory failure rate? Let's go back and introduce the principle of ECC. ECC implements error correction through a method called Hamming Code. Next, let's introduce how Hamming code implements error correction.

Before introducing Hamming code, let's introduce parity check. Parity verification is a method to verify data based on the number of & ldquo;1 & rdquo; In a binary bit of data. Parity check is two different methods: parity check and even check. The odd check is to ensure that there are odd & ldquo;1 & rdquo; In the data through the check bit; The even check is to ensure that there are even & ldquo;1 & rdquo; In the data through the check bit;.

Taking odd check as an example, we briefly introduce its implementation method. In fact, the method is very simple, that is, add a digit after the original data, which is called the check bit, as shown in figure 3. We know that in a binary number, the number of & ldquo;1 & rdquo; Is either odd or even. With the new check bit, we can ensure that the number of & ldquo;1 & rdquo; In the data must be an odd number. Take the number 17 as an example, its binary number is shown in figure 3, where the number of & ldquo;1 & rdquo; Is 2. If you want to implement the odd check, set the check bit to 1. Of course, if the number of & ldquo;1 & rdquo; In the original data is odd, then we can set the check bit to 0. Even check is similar to odd check, except that the number of & ldquo;1 & rdquo; In the data must be an even number. Still take 17 as an example. If it is an even check, the check bit should be set to 0.

641.png

Figure 3 Schematic diagram of parity check

through the principle of parity check, we can see its defects. One is that parity check can only complete the check, that is, check whether there are errors in the data, and error correction cannot be completed. The reason is very simple, because it is checked according to the number of & ldquo;1 & rdquo;, and it is not known which bit has changed. The second is that if two bits fail at the same time, parity cannot be checked.

With the knowledge of parity check, let's take a look at the implementation of Hamming code. Hamming code can not only check errors, but also correct errors. Hamming code uses multiple check bits to complete the error correction function. The number of check bits k can be calculated by the following formula. In the following formula, n is the number of digits of the data and k is the number of digits of the verification data.

641.png

In order to understand the relationship between the number of digits k and the number of digits n of the verified data more clearly, let's give a few examples. The number of digits in the verification data here is the minimum value, that is, the value with the lowest error correction cost. The less verification data, the lower the additional transmission and storage costs we need. If we understand from the perspective that the number of digits of the verified data can complete the maximum number of digits supported by the error correction, it may be easier to understand.

Table 1 ‑ 1 relationship between the number of digits of Hamming code data and the number of digits of verification

截图20240805164704.png

take the 3-bit check code as an example, the calculation result on the left side of the formula is 7(2 3-1=7). At this time, the number of digits k of the verified data is 3, so the maximum number of digits that can be supported is 4, that is, 3-bit check codes can be used to correct data lengths from 2 to 4 bits. By analogy, if the verification data is 4 bits, a maximum of 11 bits can be supported. For computer memory, the current mainstream is 64-bit memory, so an additional 7-bit space is needed to store verification data, which is why there is an extra memory particle in ECC memory.

We have introduced the quantity of verification data. What is the relationship between these verification data and the original data? Is it also placed at the end of the original data like parity? This involves hamming code coding rules. There are two specific rules for Hamming codes:

1) determine the data location, including the location of the verification data and the location of the original data. The check code occupies 2 k-1, data is filled in the original order

the above rules may not be easy to understand. Let's take 4-bit data (number 5) as an example. For 4-digit ones, according to the formula, we know that the check bit should be 3 digits, that is, a total of 7 digits. This rule resolves how to place 4-bit raw data and 3-bit verification data in the following 7-bit vacancy.

641.png

Figure 4 Hamming code data location

we numbered the vacancy, from right to left, 1,2,3,& hellip;,7. In this rule, the check code occupies 2 k-1, that is, 1,2, and 4 respectively. Raw data is filled into other vacancies according to relative positions, as shown in figure 5.

642.png

Figure 5 Hamming code data location example

2) confirm the verification data value. The location is determined, and the next step is to determine the value of the verification data. The calculation of specific bit values is also obtained through a similar method of parity, but the core problem is to select which bits to perform the calculation. Next, let's look at the specific rules.

We convert the position number into a binary value, as shown in figure 6. For the calculation of the check bit of position 1, select the position number with position 1 (number from right to left) as a group. Therefore, for the check bit 1, select 1, 3, 5, and 7 for verification. If even check is used, the value should be 1.

Similarly, for the calculation of the check bit of position 2, the position number with the second position 1 is selected as a group. Therefore, for the check bit 2, select 2, 3, 6, and 7 for verification, and the check value is 0. For the check bit 3, it is located at the position (0100) numbered 4, so the position number numbered 1 in the third position is selected as a group, and the final calculation result is 1.

641.png

Figure 6 Hamming code check bit calculation rules

after analyzing the preceding examples, you should be clear about the grouping and calculation rules of Hamming codes. After the above calculation, we can finally get the value of 7-digit data as 0101101.

The value of the verification data is confirmed. If a data error occurs, how does Hamming code correct the error? We still have a specific example to illustrate. Assuming that there is a problem with the fifth digit of the above data, we can definitely find it when we perform even check in the group. As shown in Figure 7, according to the original grouping method, the data of the fifth digit is contained in two different groups respectively. It can be seen that the number of 1 in the two groups is no longer an even number. Therefore, it can be concluded that there must be some data errors.

641.png

Figure 7 Hamming code error correction algorithm

but which data is wrong? The method is also very simple. We still fill in the number of 1 according to the method of even check. At this time, we can calculate the values of 3 groups of numbers respectively as shown in the Blue Square in figure 7. Then, the obtained value is formed into a binary number from bottom to top, which is the specific location of the error data. In this example, 101(5).

Next, we will continue to introduce the physical structure of memory chips. As we know earlier, memory chips are composed of multiple memory particles, as shown in figure 8, which is composed of eight memory particles. Memory particles are composed of multiple banks, which can be independently addressed by the BANK.

641.png

Figure 8 internal structure of memory particles

each BANK contains a memory matrix, including both column decoding and row decoding units. The encoder implements array positioning and data reading and writing. We go further into the storage matrix, as shown in Figure 9, which contains a large number of DRAM units. Each DRAM unit can store one bit of data.

641.png

Figure 9rank local magnification

how do DRAM units store data? The principle is very simple, as shown in Figure 9, the right side is the amplification diagram of DRAM units, each unit consists of a transistor and a capacitor. The DRAM unit is represented by whether there is a charge in the capacitor & ldquo;0 & rdquo; And & ldquo;1 & rdquo, if the charge in the capacitor reaches a certain threshold, it means & ldquo;1 & rdquo;, otherwise it means & ldquo;0 & rdquo;. Because the capacitor will leak electricity, DRAM needs to be refreshed regularly to avoid data errors caused by leakage.

So far, we have introduced the most important content of DRAM. I hope you can learn useful content. This account will continue to introduce more information about memory.

Replies(
Sort By   
Reply
Reply