This article is reprinted from Andy730 public account
Title: How AI Is Challenging Infrastructure
Expert: David McIntyre, SNIA Board of Directors Member and Samsung
Current State of Enterprise AI Application
Host: Could you provide an overview of the current state of AI application within enterprise IT and IT in general from a high-level perspective?
David McIntyre: I have never seen a dynamic and emerging trend as it is now in my entire career. About half a year ago, the wave of AI was sweeping with strong momentum. A spectrum of companies and solution providers are scrambling to come up with innovative applications that can take advantage of AI from the cloud to edge computing, from on-premises large language models (LLMs) to smaller models.
We are cresting through the wave as shown by the Gartner Hype Cycle, but I think we are turning AI applications into reality, especially for enterprises. Solutions like ChatGPT have shown great potential across consumer and enterprise markets. For example, I can have ChatGPT write a book for me as long as I tell it what style I want. These advances are very exciting, and I look forward to the opportunity to apply these technologies in enterprises.
I think we are pushing through the excitement and the frenzy. This is an exciting moment as AI has shown great potential in data centers, data storage, networks, and the entire IT infrastructure.
Architecture and Development Trend of AI Deployment
Host: We are focusing on some practical issues about AI deployment. When it comes to AI, there are many cutting-edge architectures and specifications. Can you talk about these architectures and your views on AI development trends? What is the driving force behind these advanced architectures?
David McIntyre: We need to consider how to optimize the compute, network, storage, and memory resources in the infrastructure to better support the massive amounts of data, which serves as the foundation of AI applications. Take LLMs as an example. Large companies such as Meta, Google, or AWS now have their own LLMs. They see very strong value in deploying AI solutions across their platforms.
These LLMs contain billions or even trillions of parameters that need to be trained and inferred to deeply analyze the collected data and put some intelligence behind that data. So, how do we process this huge amount of data? First, we need to know how to manage and compute the data. The algorithm will guide us through how to process the data, but the next question is where to perform the computing.
That is where data center computing and computational storage technologies come into play. Instead of transferring the humongous amount of data to a central processing unit or processor cluster, it is better to deploy distributed compute resources where the data is generated. This is critical whether in the cloud or edge. Even within a data center, compute resources or accelerators can be deployed at the place where data is collected. That helps solve the bottleneck problem from compute to memory, because many AI applications are now limited by memory capacity.
In terms of storage, the storage system must have good scalability and support a variety of file types and data types. Ten years ago, storage technology was considered relatively backward, but now it has made great progress and can work closely with memory resources to serve as the last level of cache.
As for the network, network congestion can be alleviated if we can deploy compute resources where data is created. Just like when I encountered a traffic jam on my way to the venue, if data cannot be transmitted smoothly, it cannot be computed on.
From an infrastructure challenge standpoint, these are potential solutions that need to be considered in AI development.
Challenges of AI to Infrastructure
Host: There are many exciting research projects in the AI field. You have just mentioned network and data issues. From an infrastructure point of view, what are the challenges seen in AI? What pressure or help does it bring to our existing infrastructure?
David McIntyre: AI poses great challenges to our existing infrastructure. Specifically, both cloud-to-edge deployment and on-premises data center require collecting data at the edge when running AI applications, and deploying distributed compute resources or computational storage resources at the edge. But the question is, what do you do with that data? It is often necessary to move them back to the central system for deep dive analytics. For example, real-time video analytics may be performed at the edge for security monitoring or safety analysis. However, to support in-depth analytics for different industries such as retail and healthcare, a challenge occurs on how to effectively manage and coordinate the data and compute output.
Another challenge is how to marry up these new infrastructure resources with the application layer, which usually requires software upgrades. Data center operators expect that all systems can run consistently and excellently without interrupting power supply. The introduction of new technologies, be they new software versions, new server deployments, or any other infrastructure components, requires very detailed planning. They do not want these new technologies to disrupt the current day-to-day operations. Therefore, it is critical to bring in new guidelines on software deployment.
The computational storage API model is a software model that connects computational storage resources to hosts. Before deployment, these models need to be strictly tested by end customers and application developers. Data center operators do not automatically deploy these new technologies across the entire data center. Instead, they pick a part of the data center for isolation tests to ensure the secure, deterministic, consistent, and high-quality running of new hardware or software. After that, they open up the gates for the new technologies to be deployed across the entire data center.
Power consumption is another elephant in the room. As the amount of AI data increases and the demand for computing the data increases, power consumption also increases rapidly. The development from a 50 MW data center to a future gigawatt data center is an urgent problem to be solved. It is important that large and hyperscale data centers and OEMs are also incorporating sustainability into their business.
AI in Sustainability
Host: Sustainability is a major challenge worldwide, especially in the future. There are regulatory aspects to it and also there is the fact is that as more data centers come online, they are starting to compete with households in terms of power consumptions. This has become a huge challenge in Europe and the United States, so it is important that there are discussions around this topic. From an AI perspective, can we leverage AI to deal with this sustainability challenge? Is anyone using AI for sustainability-related research and development?
David McIntyre: Indeed, there are many opportunities to use AI to promote sustainability. When we evaluate the carbon footprint of our data centers, relevant data is being collected to manage the infrastructure and its operations, and to continuously monitor the carbon footprint. With that vast amount of data collected across manufacturing plants or data centers, the data can be used by AI analytics tools. I am not suggesting that these are manual processes. They are beyond this stage and there are well established processes for managing the carbon footprint. However, by applying sophisticated AI prediction algorithms, we can proactively predict the trends instead of passively reacting to data. In this way, we can predict, adjust, optimize, and correct the data before an event actually occurs by using tools such as AI and LLMs.
AI and Security
Host: What comes to mind is that a company made predictions on rising natural gas prices and can use AI to analyze demand and consumption data. I think this is a good application scenario. Another important issue is regulation and security, especially in the current situation. How can security be factored into AI?
David McIntyre: Specific to security, we focus on several strategies and mechanisms. The first step is to detect, and then to correct and isolate any potential security encroachment. This includes security protection at the edge or security measures across the entire infrastructure. As experts in the data domain, we can get valuable expertise from security experts who can provide guidance for us from the perspective of global regulatory requirements. In general, SNIA always brings in security front and center when publishing various deployments, architectures, recommended reference designs, and specifications.
Who Is Leading the Way?
Host: From an overall standpoint, is there any particular area, region, or even business that is doing better in security than others? Which groups are leading the way? From your own perspective, which enterprises may be in the leading position in terms of security and what direction that might be going?
David McIntyre: I have been following the latest developments in the security field. As far as I know, Samsung is showing great momentum in driving the security of its SSDs and system development. My experience with the Memory Solutions Labs of Samsung Device Solutions America tells me that security is a key factor for consideration when they are developing new architectures and solutions. Among them, ransomware is a hot topic, and has attracted everyone's attention. I have discussions with Eric Hibbert on a regular basis, who is a security expert at SNIA, about how to detect and proactively isolate viruses before they spread. He shared the latest approaches and common regulations in many security areas. Samsung is also studying how to provide early warnings of ransomware before they spread.
Of course, there are other companies that are leading the way as well. AI also shows great potential. If we can predict ransomware events by detecting abnormal read/write patterns in SSDs or the compute, memory, and storage infrastructure across the entire data center, this will be a real success story of AI applications. This is not just some compelling consumer applications, but also a challenge that enterprises need to face by applying AI to predict and prevent ransomware or other security threats.
AI Development Trend in the Next Five Years
Host: How do you see AI evolving from the next 12 months to five years? What are the things or trends on the horizon that you are excited about?
David McIntyre: We spent most of our time today talking about the underlying infrastructure. I think this part will be improved gradually because of the joint efforts of experts from SNIA and other organizations, customers, and software developers. Why do we need to improve these infrastructures? This is because there are some remarkable end applications and solutions that can solve the major problems faced by humanity. For example, in cancer research, it would be an epoch-making achievement if we could find a cure for cancer in our lifetime.
I think the work in genomics and genome sequencing is very promising. We have introduced AI across many algorithm families and significantly improved the performance of these algorithms. From the bottom of our mind, it will be excellent if we can make breakthroughs in this area, or even in food distribution services. 80% of the world's population lives in poverty. How can we balance food distribution, deliver food to nations in need, and reduce waste? This is an urgent issue to be resolved.
In addition, there is pollution. We need to achieve the goal of zero emissions by 2030. Although this is beyond the time frame of five years, I think sustainability is not just an empty talk. We have done a lot of meaningful things at Samsung, but it is still critical to achieve the global goals we set.
Therefore, I think these three areas are very important. As an industry, or however we call it, this is our responsibility as human beings. I think this is an excellent opportunity to combine technologies with solving the problems facing the planet today.
-----end-----