By Patrick Ortiz | Trace3 Innovation Research Associate
Nvidia’s rise to the second-most valuable company in the world has fallen right in line with our team’s prediction on the need for new infrastructure to ensure the success of AI in the enterprise. If we follow that same line of thinking but go beyond the chip, where does that take us? It’s clear that with increasing investments across the stack, AI workloads demand a different approach in infrastructure. So, what will it take for IT teams to enable their businesses to meet both present and future demands from these workloads?
Earlier this year the Trace3 Innovation team predicted this rise in AI Infrastructure as a 2024 Top Theme to follow. This came not only out of the investments in new infrastructure solutions but the inevitable shift in the market as a whole to support rising concerns of Generative AI workloads. With that said our team has identified four critical infrastructure decisions that IT leaders must make to best prepare their organizations for long term success.
High-volume and complex AI workloads demand significant performance and scalability on the compute end of the infrastructure stack. With a plethora of options ranging from on-prem to cloud based setups the compute space in the stack has been growing to meet the demand brought on by GenAI use cases. Cloud solutions, in particular, offer flexible and scalable choices for an enterprise dipping their toes into AI workload management.
A common challenge is the unpredictable demand for compute resources during production. This is where new serverless GPU on-demand services have emerged to handle model packaging and serving optimizations, provide faster cold start times, and use customizable autoscaling to adjust as needed to minimize spend. Optimization and model compression tools have also become more popular to boost performance on existing compute and edge devices while also providing another avenue of cost savings. These tools are usually tied into some of the larger DSML platforms and specialty CSPs as a feature offering.
AI systems rely heavily on the seamless and rapid transfer of vast amounts of data between distributed computing resources. The workloads brought on by these systems require high-throughput, low latency networks to handle the massive datasets and complex computations involved. Without the necessary high-performance networking components, coupling hardware to run your models or tying AI clusters together becomes impossible. In fact, most AI workloads benefit from customized protocols (e.g., RDMA for low-latency data transfer) and hardware (e.g., NV Link for GPU Interconnects) designed with these needs in mind.
On the cloud front, major cloud providers may not necessarily be optimized to meet the needs of the more novel GenAI workloads. This is largely due to inherent limitations of multi-tenant environments that ultimately may lead to performance variability and congestion. Additionally, the cost component of provisioning dedicated interconnects can lead to an organization finding themselves blocked from achieving their intended goals with a specific workload. To combat these roadblocks AI-optimized clouds, both private and public, offer economical solutions for running AI networks, especially at the edge, making deployment and integration of AI applications far more seamless than what might be found in more traditional options.
In traditional AI and ML model development, structured data has always been key. But with the recent uptick in new use cases coming out of the introduction of GenAI, unstructured data is taking the spotlight. This is where investments have taken place, showcasing the need for new storage options capable of working with these types of data sets. Vector databases, which excel at searching unstructured data, are gaining interest. On top of this, multimodal and graph databases are becoming essential for supporting models in production, especially for backing model memory through retrieval-augmented generation.
Operationalizing AI models is more than just setting up the cloud or on-prem infrastructure. To make sure the AI/ML production process is repeatable and sustainable, organizations need to consider specialized tools and processes to streamline the deployment, monitoring, and maintenance of large language models. A lot of the investment in this space has leaned heavily on the introduction of LLM operational tools to handle some of the nuanced challenges of GenAI applications. This includes improvements in data processing and management, development tools for fine-tuning and prompt engineering, and updated monitoring, security, and governance options. Organizations deploying traditional AI models today should begin to take note on these investments even if they are not yet mature enough for GenAI workload deployments. These investments will rapidly come into play once an overall strategy has been selected.
AI Infrastructure has been shifting as new requirements and new applications of both traditional ML and new LLM models become more commonplace in the enterprise. Organizations looking to build a robust and scalable AI Infrastructure may require some or all of these key considerations discussed within their own implementation. The most robust AI infrastructure strategy will likely go beyond cloud to the edge as use cases begin to build. To meet this need, a drive for MCN as well as a surge in additional emerging startups in the domains discussed will continue to be seen. Of course, as a Trace3 client you’ll gain insider access to our opinions on these emerging solutions and continue staying on top of this theme as new updates to the stack arise.