The market of AI tools continues to move at a blistering pace. Every couple of weeks, the AI world goes crazy for a new solution that claims to have unlocked the next level of automation. It is tools like Microsoft’s Copilot & PowerAutomate suite, AI software engineers like Devin AI, or task management system BabyAGI, that have spurred on executives’ imagination. Usually the talk is of AI agents, and the value proposition resonates with most people. A “text to action” paradigm where AI just gets stuff done for us. On paper the opportunities are endless – An agent tirelessly fine-tuning every aspect of your IT infrastructure to optimize cost and performance, or monitoring your digital perimeter while dynamically updating security policies and countermeasures to stay ahead of malicious actors 24/7. As these imagined scenarios increasingly become top of mind for managers, autonomous AI agents are emerging as the holy grail of automation. However, much of this excitement has proven premature to date, and a confusing market landscape makes it difficult to separate the noise from true technological disruption. To demystify the concept of AI agents, this blog will cover what AI agents are and what they are not, who is going to use them and where the automation lies, how to achieve maturity as an enterprise, as well as how to minimize risks.
Since the launch of ChatGPT two years ago, Large Language Models and Gen AI tools have established themselves in a lot of everyday tasks, from assisting with research, facilitating customer service, to writing code. They are far more adaptable and can tackle more complex tasks than previous systems, such as deterministic chatbots. Nevertheless, these tools also have their limitations as they remain unreliable and require human supervision, explaining why organizations are yet to realize true performance gains from their Gen AI initiatives. One thing current AI models lack is agency – acting autonomously with minimal supervision, adapting and executing goals in complex environments.
To close this gap, organizations have shifted their focus towards AI agents for the next evolution of AI automation. AI agents are autonomous or semiautonomous software entities- They usually involve a large language model (LLM) that acts as the agent, capable of perceiving and interacting with its environment through different tools to perform tasks or make decisions.
Figure 1: The agency gap of current AI solutions (Gartner)
More specifically, an agent receives input from its environment through human prompts, sensors (cameras, microphones, etc.), tools, or documents and scripts from a company database. Within the agent, a LLM then processes this information in combination with its memory of experiences and interactions, to select an action that achieves the stated objectives. This could include performing specific research or tasks, generating content, or interacting with other agents and enterprise tools. Agents can also learn from past interactions. This could be through feedback-based reinforcement learning, or through other supervised or unsupervised learning methods.
Figure 2: General process flow of an AI agent
All of this makes agents different from tools used up to this point. RPA for example is programmed to automate repetitive, rule-based tasks. It follows a predetermined script to perform specific functions. This contrasts with an AI agent that possesses a degree of agency, enabling it to operate autonomously and make decisions based on input it receives, as well as past experiences.
Agents also differ from copilots. These are designed to enhance features within software only, and offer guidance without taking entire control from the user. Examples are meeting summaries or drafting emails, serving as an interface for users. Agents on the other hand make decisions with minimal human input.
Within single agents, slight differences exist in regard to their characteristics. As such, there are goal-based agents that proactively achieve a specific goal, utility-based agents that aim to maximize the quality of an action, or hierarchical agents that have a hierarchical structure in which decisions are made on different levels. However, a more important distinction needs to be made between stateful and stateless agents. Stateful agents are able to maintain the state of the conversation or process it is in, remember past information, and the context of previous interactions. Stateless agents do not retain any information from previous interactions and are not context-aware, meaning responses are less personalized.
On an architectural level, a significant difference exists between single-agent systems and multi-agent systems (MAS). The latter consist of multiple agents that collaborate, communicate, and coordinate their actions to solve complex problems that a single agent cannot handle efficiently. In this system, each agent acts independently and takes on a specific role or expertise. Central to this is a shared memory that serves as a repository to allow for seamless communication and coordination across all agents. It is a hub where information, plans and goals are exchanged, ensuring efficient collaboration. It’s important to note that MAS architectures exist in a spectrum, from predefined sets of agents with centralized control and deterministic behavior and outcomes, to highly dynamic, constantly evolving collectives of agents with distributed control, adaptive behavior, and emerging outcomes at the system level.
Figure 3: Process flow of a mature MAS in the enterprise
Mature multi-agent systems show distinct characteristics:
Specialization: Defined roles for each agent to focus on specific tasks and domains, where the agents are also capable of dynamic process optimization.
Collaboration & distribution: Structured collaboration and task distribution where each agent contributes unique capabilities and shares information to achieve a common goal.
Decentralized decision making: Multiple intelligent agents working in a dynamic environment, collaborating and negotiating with each other, without a centralized decision-maker.
Real-time adaptation: Continuously monitor and adjust operations based on feedback and changing conditions or increasing complexity.
The use cases that can be solved for are highly dependent on the sophistication of the underlying agent system. By definition, single agent systems have significant limitations. As there is only one agent, there is no modularity, no specialization across different domains, and no collaboration possible. This brings several disadvantages:
Single Agent System
The inability to cross-verify information with other agents and a lack of specialization means an increased risk of hallucinations, making single agents unsuitable for highly sensitive or customer-facing use cases. Moreover, fixed context window sizes and no distribution of processing loads across agents severely limits their ability to handle longer documents or extended conversations (e.g. legal document analysis, lengthy customer support interactions). And finally, sequential task processing results in slower response times for applications requiring simultaneous handling of multiple queries. This leads to longer wait times for customers and reduced productivity, limiting single agents to low-complexity use cases.
Even the most advanced standalone agents still struggle with multi-step tasks that require navigating different contexts and managing dependencies. This means a single agent excels in more specific and narrow applications that are less sensitive. Examples are use cases on an individual productivity level, rather than on an enterprise-wide level.
Data analysis and reporting: Analyzing, augmenting, collating, assessing and summarizing information for improved decision making or comprehension.
Creative elements: Composing, optimizing, generating multiformat or multielement assets.
Virtual assistants: Scheduling meetings, managing emails and setting reminders.
Basic customer support: Handling common customer inquiries, providing support and resolving common issues.
Multi Agent System
A multi agent system solves many of the limitations that a single agent system faces. Thanks to its modular architecture with multiple specialized agents working in tandem, it opens the door for more advanced use cases. The decentralized nature allows for parallel processing and distributed computing, to enable MAS to tackle much more complex problems and workflows in changing environments.
The following are examples of MAS use cases.
Enhanced efficiency & resource allocation: MAS can automate complex workflows by handling intricate processes that require coordination among multiple entities, thus reducing human error and increasing productivity. Agents can also optimize resource allocation by dynamically distributing resources based on real time needs and constraints, leading to better asset utilization, reduced waste, and lowered operational costs. An example could be a cloud computing environment where agents manage server workloads to ensure efficient resource use, maintaining performance and cost-effectiveness.
Data driven decisions: MAS can support decision-making through real-time data analysis. Agents tap into and extract data from multiple enterprise-wide streams (structured, unstructured, time series etc.) to perform analyses for quick decision making and responsiveness to changing conditions. Agents can also leverage predictive analytics to forecast future trends and outcomes.
Personalized customer service: MAS provide 24/7 customer support by handling inquiries, resolving issues, and guiding customers through different processes. This could involve finding products, tracking orders, or providing tailored recommendations by analyzing customer data.
Resilient supply chains: MAS can monitor supply chain activities, detect anomalies, and implement contingency plans to minimize the impact of disruptions. Agents can also optimize inventory levels and logistics operations by forecasting demand and coordinating deliveries.
Risk Mitigation: MAS can monitor business operations for potential risks and anomalies in real time. This has applications in manufacturing or finance, but also in cybersecurity where autonomous agents investigate inbound alerts and triage or take remedial action.
Overall, thanks to its multidisciplinary, collaborative approach and the ability to handle diverse data sources, a MAS allows for use cases and workflows that span the entire enterprise, rather than just boosting individual productivity. Organizations operating in dynamic environments are able to add or remove agents without disrupting the overall architecture, allowing for flexibility and scalability. This also means that the sophistication and maturity of MAS can vary greatly.
Central to any Gen AI initiatives are Data Management and Preparation. Ideally, an organization has solid data practices that ensure proprietary data is readily available for fine-tuning and an overall more customized deployment. In some instances, additional datasets need to be acquired, licensed, or synthetically generated. Either way, ensuring data quality and suitability for GenAI/Agent-based applications is vital. This requires data engineering, data cleaning, and data transformation.
Beyond data, there are several key capabilities that form the foundation for the evolution towards agent-based and multi-agent systems. These get more complex with increasing sophistication of the underlying system.
Model selection, prompt engineering, and retrieval: Selection of a suitable LLM based on specific tasks and fine-tuning with proprietary data. Effective prompt engineering guides the model’s behavior, and retrieval mechanisms extract relevant information from various sources to enrich the LLM knowledge base.
Orchestration via LLM: One or multiple LLMs will be introduced as orchestrators, coordinating the actions of other models and components. They assign tasks, manage communication, and integrate outputs from different models, creating a cohesive workflow.
Grounding and evaluation: Grounding and evaluation mechanisms are employed to ensure that the generated outputs are reliable, factually accurate and in line with ethical and safety guidelines. This involves verifying information against reliable sources and assessing the potential impact of responses.
Observability and LLMops: Achieving formal end to end observability, gaining insights into system behavior and enabling proactive adjustments. LLMops practices are implemented to streamline the deployment, management, and monitoring of the entire GenAI infrastructure.
However, to achieve the pinnacle of maturity of Agent based systems, advanced techniques and frameworks have to be introduced that enhance reasoning and planning abilities of models. This will enable the sophisticated decision-making and problem-solving capabilities that can unlock the most complex enterprise use cases.
Enhanced reasoning: Advanced frameworks such as Tree-of-Thought (ToT) or Graph-of-Thought (GoT) are leveraged to help the LLM break down complex problems into smaller, manageable steps and explore solutions. This enhances reasoning, planning, and problem-solving abilities of the Gen-AI system.
Information retrieval and integration: In order to access relevant data from various data sources across the enterprise, advanced retrieval techniques are utilized. This information is then seamlessly integrated into the LLM’s reasoning process, providing it with up to date and contextually relevant information.
Active information gathering: Techniques like DSPY (Demonstrate-Search-Predict) guides the LLM in deciding when to search for external information, predicting relevant queries, and incorporating the retrieved information into its reasoning process.
Action: Frameworks such as ReAct enable the LLM to actively interact with its environment, making decisions and taking actions based on the information it gathers.
Mitigating the Risks of Agents and Future Outlook
While a successful AI agent deployment allows organizations to reap the benefits of unprecedented levels of automation, this also comes with significant risks. These risks are mostly stemming from AI agents interacting with different tools, external LLMs, or external agents to carry out automated tasks. These workflows and data paths create new attack surfaces such as APIs and uncontrolled data exchange that will have to be protected. Another risk factor is the output or action of agents that are completely autonomous. Without a human in the loop, agents dealing with highly sensitive data or that are customer facing can cause significant harm. Robust security measures, including automatic redaction and safe de-identification of sensitive data, audit logs, real-time monitoring, robust encryption of all data at rest and in transit, and explainability across all output results, ensures enterprise data remains private, secure, and compliant. When starting out, agents will need temporary structures and guidance until they learn or develop own capabilities and become more proficient.
Until now, adoption at scale on an enterprise level has faced difficulties because of data quality, employee distrust, cost of implementation, and a confusing market landscape. One of the easiest entry points to Agentic AI is presented by process and workflow automation suites like Microsoft’s Power Automate or solutions like EMA that abstract away the complexities of dealing with multiple frameworks and vendors. The company offers a library of prebuilt agents and provides a generative workflow engine to create personas that automate company-specific tasks.
As new use cases are unlocked, deployment costs decrease and long-tail use cases become economically viable, AI agents are expected to boost levels of automation across enterprise processes, employee experiences, and customer interfaces. The AI agent evolution will require strong data practices, AI risk management policies, as well as new frameworks and reference architectures to ensure a successful deployment.