With the advent of cloud computing and AI era, data processing technology has been more and more widely used. The National Super Computing Jinan Center (hereinafter referred to as & ldquo; Jinan Super Computing & rdquo;) has deeply cooperated with Huawei to equip the AI era with & ldquo; New Engine & rdquo;. The blessing of Huawei OceanStor All-Flash memory makes the Super Computing Crown more shining.
With the rapid development of AI models,
large models require powerful computing power and storage power.
On the road of human exploration of AI artificial intelligence, big models emerged, opening up a new way to general artificial intelligence & mdash;& ldquo; Big computing power and big parameter & rdquo;. For a while, AI models such as ChatGPT, Wenxin model, GLM-130B, and UED model have been rising continuously, and have been widely used in the fields of auxiliary diagnosis, weather forecast, AI creation, financial risk control, etc. Along the way, AI has made great progress relying on its powerful infrastructure, from single machine to cluster to Supercomputing. In particular, AI models and supercomputing are more and more closely linked. Relying on the powerful supercomputing resources behind it, the time for AI big model-related applications to go public is continuously shortened. To support tens of billions or even hundreds of billions of large model training, supercomputing centers need not only strong computing power, but also high-speed and reliable storage power to cope with the higher demands for storage in the AI era.
Looking back on the various driving factors of HPC storage growth in the past 15 years, we can find that it is closely related to the rapid development of AI, thus giving birth to HPC solutions based on the data analysis ecosystem. In recent years, in order to seize the commanding heights of big model technology, AI Supercomputing Centers have been updated and built around the world, such as Perlmutter and Dojo in the United States, Leonardo in Europe and domestic supercomputing centers. Jinan Supercomputing Center, as the benchmark of the most complete flash supercomputing center in China, has a first-mover advantage in the field of AI intelligent computing.
From computing to data to integration with AI is inevitable for the development of HPC market. The evolution from traditional HPC modeling/simulation applications to new HPDA/AI/ML/DL applications is mainly characterized by the transformation from compute-intensive loads to data-intensive loads, which is helpful for researchers, engineers and business data analysts obtain research results faster and analyze and summarize them from the HPC infrastructure with the best performance.
AI big models will bring many challenges to traditional supercomputing centers and greatly raise the industry threshold. It can be seen that the training costs of big models are huge, including GPU computing power, server costs, storage costs, labor input and other costs. Taking ChatGPT as an example, ChatGPT processes 13 million independent visits per day. According to the estimation, the total training cost of GPT-3 with 175 billion parameters is as high as $12 million. Therefore, it is necessary to continuously increase the investment in computing and storage infrastructure resources of supercomputing centers to support the rapid development of AI models in the future.
The Jinan center of national supercomputing is developing rapidly,
build & ldquo; **& rdquo; Leading the development of global science and technology;
under the background of the policy of becoming a powerful country in science and technology, China began to vigorously develop and promote HPC. By 2023, the Ministry of Science and Technology had approved the establishment of the National Super Computing Jinan Center, the National Super Computing Tianjin Center and other 14 national super computing centers.
At present, Jinan supercomputing is leading Shandong Province & ldquo; Super Computing Science Project & rdquo;, developing and constructing a new generation of supercomputers with the world's leading computing power, through rooted in Shandong, covering the whole country, the construction of supercomputing Internet that radiates the whole world, and the construction of & ldquo;E-level supercomputing, artificial intelligence, industrial internet & rdquo; And other large scientific device clusters, forming the international first-class & ldquo; super Computing brain & rdquo;, strives to become the forefront of promoting the national basic scientific progress and major technological research, and helping Shandong province to walk in & ldquo; New Momentum & rdquo.
Facing challenges,
jinan supercomputing actively pursues HPC storage architecture transformation
new applications such as AI big models have entered the field of supercomputing research, promoting the reform, innovation and development of supercomputing in Jinan, transforming from single computing service to computing and multi-dimensional data processing service.
Jinan supercomputing faces the following changes and challenges in the upcoming AI Supercomputing innovation reform:
first, the surge in data volume poses a challenge to transmission and storage costs. There are two main reasons for the sharp increase in data volume: first, the raw data involved in computing is not only large but also miscellaneous; Second, the excessive data expansion caused by multi-link data processing and computing. With the explosive growth of emerging industries such as AI big models, the model scale is increasing. The model input changes from single mode such as text to multi-mode, and the data volume increases by 1000 times. Supercomputing needs to process more data, however, it may take several weeks to transmit these data online and several months to copy TB-PB-level data. These are difficult problems that supercomputing centers cannot avoid during the transformation process.
Second, the overall computing efficiency is not high due to the preemption of storage resources. Currently, most supercomputing scenarios are multi-task and multi-concurrent computing. Some jobs require high bandwidth, while others require high IOPS. Multi-task concurrency causes Storage Resource preemption, the overall computing efficiency is reduced.
In addition, it can be predicted that large amounts of training small files will be read during AI model training, which needs to meet the high-performance requirement of thousands of kcal concurrency and Prevent training data reading from becoming a critical path, thus requiring high IOPS.
In addition, Jinan supercomputing business is also facing the challenges of data management and data Island in the process of integrating with traditional data center services. When providing diversified services such as AI intelligent computing, virtualization and disaster recovery, it faces tens of thousands of user groups and reads and writes in various forms of data (such as file storage, virtualized block storage, AI vector storage, etc.), will increase the management difficulty, and the data is easy to appear split island state. How to make the data flow move to promote cross-region innovation is an urgent problem to be solved in Jinan supercomputing.
Jinan supercomputing and HUAWEI OceanStor all-flash memory,
build a benchmark-level new supercomputing center to prepare for AI intelligent computing transformation
huawei OceanStor All-Flash memory has ultra-high performance and ultra-high throughput to meet the needs of multi-platform and multi-type operations, reducing costs and increasing efficiency for Jinan Super Computing development. To meet the differences in storage requirements among clusters of HPC platform, AI platform and cloud platform of Jinan supercomputing, the platform plans to build a storage system capacity of 220PiB, covering high-performance storage systems, blocks, NAS storage, it can meet the storage requirements of various applications. Among them, the full flash storage system supporting the high-performance file storage system has a capacity of 15PiB.
With TBps-level bandwidth and tens of millions of IOPS, Huawei OceanStor all-flash memory can fully meet the performance requirements such as high broadband and high IOPS in the integrated supercomputing scenario, improving business efficiency and accelerating business innovation, the overall broadband exceeds 1000 Gb/s, effectively solving the problem that Resource preemption reduces the overall operation efficiency. At the same time, the data image compression algorithm is adopted to further improve the utilization rate of storage space, greatly reduce the storage cost of massive data, and reduce costs and increase efficiency for the sustainable development of supercomputing centers.
Huawei OceanStor All-Flash memory green energy saving, technology assistance & ldquo; Carbon peak & rdquo;. Whether based on the guidance of national policies or the consideration of cost reduction and efficiency increase of supercomputing centers, green energy conservation has gradually been implemented in all aspects of production and operation. At the data center level, Huawei OceanStor All-Flash memory is an ideal solution, which can greatly reduce the cost of data center space and energy consumption, the resulting high resource utilization and high cost-effectiveness are important factors driving the growth of all-flash system shipments.
Huawei cooperates with customers on data flow solutions to solve customer data storage problems. Huawei OceanStor All-Flash memory has been fully adapted and can efficiently support supercomputing business. Facing the future cooperation level, Jinan supercomputing and Huawei have established a storage innovation center, conduct in-depth research on data service and data security and incubate the industry; Facing the supercomputing field, Jinan supercomputing and Huawei have provided technological innovation directions and solutions, jointly build the world's leading demonstration site and industrial base for intelligent data and storage. Relying on Huawei OceanStor storage, Jinan supercomputing will have the largest capacity and the highest performance all-flash array cluster in the HPC field in China, which can solve the block and file exchange problem faced by Jinan supercomputing and open up data circulation, promote cross-domain data innovation and continue to help Jinan supercomputing explore innovative businesses such as future AI models. Huawei provides API interfaces and jointly customized development with Jinan supercomputing to realize service and visualization of data flow based on data flow tasks and policies. Huawei and Jinan supercomputing jointly carry out customized development based on the tagging of customer data attributes to realize the security management of data flow.
AI and Supercomputing, as the representative technologies of machine intelligence and computing power, are two dazzling waves in the wave of computer science. The integration of the two will certainly bring huge energy. Taking advantage of the situation, the cooperation between Huawei and Jinan supercomputing will lay the foundation for accelerating AI innovation in the future. The reason why Jinan supercomputing chose Huawei OceanStor All-Flash memory is its high efficiency, energy-Saving end-to-end supercomputing storage solutions and technological innovation capabilities are inseparable.
Huawei OceanStor All-Flash memory has the characteristics of agility, high efficiency, availability, security and so on, which is highly consistent with the strategy of & ldquo; New Infrastructure & rdquo; Which is fully promoted by our country. In addition, the high density and low power consumption of all-flash memory of Huawei OceanStor can significantly reduce the PUE of Supercomputing Center, empower Jinan supercomputing to complete the transformation and upgrading of HPC storage architecture, accelerate the innovation and development of industrial ecology, and achieve & ldquo; super computing speed & rdquo; Boosting Jinan super computing to become the leader of global super computing center.