Source: Data Store public account
the author has rarely bought things from physical stores, and most of his daily necessities are purchased through e-commerce platforms (shopping websites). I believe readers of this book don't know how many times they browse and buy goods through e-commerce apps in a year. Now that everyone is familiar with e-commerce systems, this section will take e-commerce systems as the research object to further talk about data and data storage.
The video, audio, Word documents and other files introduced above still focus on personal data, while the data of e-commerce system focuses on enterprise data. Although this division is not strict, this section focuses more on introducing data and data storage from the perspective of enterprise data.
Let's take a specific example as the entry point, which will be more vivid. For example, we search for the keyword & ldquo; File system & rdquo; On a certain platform and want to read books in this field. When we enter a keyword and Click & ldquo; Search & rdquo;, a list of related products is listed, as shown in Figure 1-5.
Figure 1 ‑ 5 e-commerce search interface
this seemingly simple process is actually very complicated. The whole process involves hundreds of data transfers between mobile phones and data centers, and involves multiple subsystems of the e-commerce system. As shown in Figure 1, Fig. 6 is an extremely simplified e-commerce system. When we search for & ldquo; File system & rdquo; Keywords through mobile phones, mobile App (short for Application, that is, the mobile phone application software) sends the query request to the e-commerce data center through https (step 1). The query request sent by the mobile App is also called data at the computer level.
Figure 1 ‑ 6 end-to-end process of Commodity Search
after a user's request arrives at the data center, it passes through a device called load balancer. The load balancer in the data center routes the request to a node in the core business cluster for processing (step 2). The core business software queries relevant information from the database according to keywords (step 3). According to the information returned by the database, the core business software assembles it into page information and feeds it back to the mobile phone (step 4), and this page is the product list information we see on the mobile phone.
The page returned in step 4 is a static page that contains many resources, such as book covers and scripts. These resources are only placeholders in static pages. If you want to display image resources such as book covers normally, the mobile App will also request specific resources, as shown in Figure 1 and Figure 5.
Image transmission involves steps 5 to 7, or steps 8 to 10. The difference between the two is that one obtains Image resources from the cache system and the other obtains Image resources from the file system or object storage system. It is much faster to obtain Image resources from the cache than from the cache system. Therefore, in order to improve the access speed, enterprises will put some images of hot products into the cache. Steps 5 to 7, or 8 to 10, are repeatedly executed many times, depending on the number of resources involved in the first request. (Note: the actual situation is much more complicated, because DNS resolution and CDN cache are also involved in the entire link. In order to make it easy for everyone to understand, the relevant content is omitted here.)
the preceding query request involves four different types of storage systems: relational data that stores structured data, object storage and file systems that store unstructured data, and cache clusters that cache data. This topic describes the concepts of structured and unstructured data. We do not need to delve into it for the time being. This topic will be described later.
At the same time, for the above four storage systems, database, object storage and file system are software systems that can persist data. Persistence is to store data on devices such as hard disks or SSDs that do not lose data even if the system is powered down. The cache system is usually not persistent, and data is lost after the system is powered off. Therefore, data needs to be re-loaded from the persistent device.
In addition to the above-mentioned storage systems that are directly related to business, there are usually two very important businesses for e-commerce systems involving storage systems. One is the operation and maintenance monitoring system, the operation status and configuration data of the storage device; The other is the log system, which contains the log information of the business operation.
Previously, we briefly introduced several main storage systems involved in e-commerce systems. Next, we will introduce the main data types involved in e-commerce systems. The most important and important data is the relevant data of commodities. Take the commodity information that we can see visually as an example, including the store name, commodity name, price, inventory, the storage path of product photos.
If you look closely, you will find that the search results follow the same structure, and these data can be organized in the form of tables. As shown in Figure 1 and 7, the corresponding relationship between the displayed content and the data table is shown. It can be seen that this type of data has a clear structure. We usually call this type of data structured data.
Figure 1 ‑ 7 Schematic diagram of structured data
it should be noted that the structured data volume of the e-commerce system is very large and the relationship is very complex. In Figure 1, Figure 7, we only introduce a simple example, which may actually contain hundreds of tables, and there will be associations between the tables. For example, in addition to the table containing commodity information, it usually contains a lot of information such as account information, order information, and payment records.
The images in figure 1, Figure 7 are not stored in the database, but in other storage systems, such as file systems, object storage, or temporary storage in cache clusters. Taking the cache system as an example, image data is stored in the cache as a key-value pair, where & ldquo; Key & rdquo; Is the file name, & ldquo; Value & rdquo; the content of the image file.
Figure 1 ‑ 8 file storage diagram
the data in the image file usually exists as a whole and can only be opened by dedicated software to see the effect of the image. If we open a file with a binary tool, we will see a pile of numbers, as shown in Figure 1 ÷ 9. In fact, image files have many formats that define how to describe the color of each point (in a professional term, pixels) in an image. As you can see, data such as image text exists as a whole and is not as well structured as database data. Such data is called unstructured data.
Figure 1 ‑ 9 data of image file content
in addition to structured and unstructured data, there is also a kind of data between the two, which is called semi-structured data. For example, the above product information can be stored not only in SQL databases, but also in JSON strings through files. Because it is stored by a string, there does not seem to be a clear format, but there is a clear structure inside the string. Generally, this type of data is called semi-structured data.
This section describes the common storage systems and data types for enterprise applications. This part of the content is relatively professional. It doesn't matter if you can't understand some of the content. We will gradually analyze each concept involved in the above step by step.