BYOM: Harness the Power of Generative AI in the Enterprise (Part I)

By Lars Hegenberg | Trace3 Innovation

With the rapid proliferation of Generative AI (Gen AI) and an increasingly confusing market landscape filled with “Copilots”, organizations are looking for ways to level up control and governance over the usage of Gen AI and the data that is being shared with these tools. This has shifted the focus towards safer deployment options that reduce dependencies and avoid data being shared with third parties, while maximizing performance for custom business use cases. Enter: “Bring Your Own Model (BYOM)”, an approach where organizations deploy a dedicated instance of a large language model in their own environment. This two-part blog will delve into the technical life cycle of BYOM –from pre-deployment to post-deployment, including strategic choices that need to be made along the way, such as data and model types, infrastructure, security, and version control.

Generative AI – Strategic Asset or Liability?

Following the quick advancements in Gen AI, there are new large language models (LLMs), applications, and services launched every single day. Never before has AI been so easily accessible to anyone and never before has it had such wide applicability. And while Gen AI can make the lives of many employees a lot easier, organizations are scrambling to get a grip on the usage of Gen AI tools to tackle the data security & privacy concerns that come with this usage.

Gen AI features are now built into most applications to support enterprise functions such as software engineering, sales, customer support, marketing, etc. Yet, the most common way of interacting with the technology is still via chat interfaces such as OpenAI’s ChatGPT– and there are some innate problems that come with accessing tools like these without any security controls in place. Relying on cloud-based services to deploy and conduct inference with these models requires organizations to transfer their data to third-party centralized model providers. This can lead to data exposure, data breaches, and unauthorized access. One of many relevant examples is the Samsung incident of 2023, where engineers input confidential source code and meeting notes into ChatGPT, which reportedly caused three distinct data leaks in one month. Stories like these have led organizations to seek alternative ways to deploy LLMs to capitalize on Gen AI capabilities all the while protecting the confidentiality of sensitive data.

Next to security, there are also strategic business considerations when choosing a deployment method. Public LLMs are trained on massive datasets resulting in an extremely broad knowledge and sets of use cases. However, performance for specialized, company-specific use cases is limited. Closed-source/proprietary LLMs such as OpenAI’s ChatGPT are a bit of a “black box” - It is unclear what data it was trained on, and model weights or detailed algorithms are not disclosed. This makes them enormously difficult to predict or interpret and prone to hallucinations, where a model generates output that is plausible but not grounded in the training data or real-world facts. Relying on such information in a business context can not only harm performance but can also be unethical and have legal implications.

Emerging architectures such as Retrieval Augmented Generation (RAG) can tackle some of the inaccuracy concerns by grounding the model with relevant data and context from a company (vector-)database. Nevertheless, RAG may still be insufficient for highly specialized use cases and downstream tasks that require domain knowledge, specific model behavior or that are latency sensitive. It also comes with its own security risks - For example, raw data used to create vector embeddings for GenAI can be reengineered from vector databases, making data leakage possible.

BYOM Picture 1

Figure 1: Example Architecture RAG

BYOM - Unleashing GenAI's Potential, Responsibly

One possible avenue to tackle most of these concerns is to bring a model to your data by deploying a dedicated instance of an LLM in your own environment, aka “Bring Your Own Model (BYOM)”. The major advantage here is the data privacy guarantee: User and proprietary data remain within the organization’s infrastructure, reducing dependencies and thus chances of exposure to external entities and mitigating third-party risk. These privacy guarantees also facilitate compliance with data protection and privacy regulations and can avoid larger investments into security tooling.

Besides security and privacy, deploying an LLM locally can result in a strategic competitive advantage. This is under the condition that the model was either built and trained by an organization from scratch, or that an existing model was fine-tuned on company specific data. Fine-tuning or retraining of models on proprietary data, domain-specific knowledge, and unique business processes narrows & deepens model capabilities for the desired use cases while providing greater control over model behavior. Industry examples of such highly specialized models for narrow use cases include Klarna’s customer service chatbot, which replaced 700 full-time agents by automating tasks such as multilingual customer support, refunds & returns, or financial assistance. Next to increased relevancy & accuracy, model outputs also become much more difficult to replicate by competitors that are using generic models, offering a distinct competitive advantage. Finally, a local deployment can also enable a more seamless integration with an organization's existing IT infrastructure, software systems, and workflows while providing maximum control over model versioning. This, in turn, improves efficiency, reduces friction or disruptions, and enhances the overall user experience for employees and customers.

Pre-deployment: Model Options

When opting for the deployment of a dedicated instance of an LLM, choosing the right foundation model is key to business success. Organizations can choose between 3 different approaches. Building a model in-house from scratch, or deploying an existing model, either open-source or proprietary (closed-source).

Option 1: Train your own model

Building your own model means the entire tech stack is built and managed in-house. This includes model development, model training (including the data it is trained on), inference, maintenance, and all the hardware requirements that come with it. It is a way to maximize performance and control, while increasing the flexibility to customize the model for different downstream tasks. It also eliminates lock in from the traditional model providers, and data security & privacy risks are reduced as data never leaves an organization’s environment.

The biggest challenges of building your own model are the availability of high-quality training data and the extreme resource intensity. Massive computing and data center resources are required to build and train models. This goes hand-in-hand with human resource requirements, not just for development but also for ongoing maintenance. McKinsey estimates one-time development costs to be anywhere from ~ $5M - $200M, and annually recurring costs between ~ $1M - $5M. Hence, training your own large language model from scratch is out of reach for the majority of organizations.

Pre-trained models: Open-Source vs Proprietary

Organizations can also opt for existing models, either open-source or proprietary. Examples of open-source model providers include Meta, MistralAI and Hugging face, whereas the most prominent proprietary models are OpenAI’s ChatGPT, Google’s Gemini, and Anthropic’s Claude. While proprietary models can also be accessed via API or a web interface, this blog will focus on the deployment of dedicated instances to address data privacy concerns. A private deployment is usually tied to additional fine-tuning on company data, to make the models more relevant to the organization and its use cases.

Option 2: Proprietary Models

When opting for a proprietary model, a dedicated instance of an LLM is deployed into a controlled and fenced cloud infrastructure. While this is not a local deployment, no data is shared with third parties or used as input to train the model further. Organizations will pay by time period for an allocation of compute infrastructure that’s reserved for serving their requests. This solves most of the data privacy concerns while unlocking more control over the model itself. Organizations can opt to fine-tune, and freeze or take snapshots of the model, protecting it from potential disruptions caused by upgrades or version changes. An example of this is the deployment of a dedicated instance of ChatGPT in your own Azure environment.

Option 3: Open-Source Models

An open-source LLM is a type of model that is released with key characteristics made publicly available. The exact information made available can vary by model provider, but can include the source code, model weights and training data. Open-source models are available in different sizes and range from broad knowledge to domain-specific models trained for specific industries or even individual use cases. They are widely available through hubs such as Hugging Face and organizations are flexible in the location of deployment.

Open-source and proprietary models mainly differ along key factors like time-to-market, cost structure, flexibility & transparency, as well as security & governance.

BYOM Picture 2

Pre-deployment requirements

Across all listed deployment types, there are several key requirements to ensure a successful deployment.

Data (Management)

When thinking about training or fine-tuning a model, one of the main prerequisites is the collection of high-quality data relevant to the specific use case. Following the “garbage-in-garbage-out” principle, without high-quality training data, the accuracy and relevance of outputs by an LLM is severely limited. This requires a strong foundation for data management of both unstructured and semi-structured data. To eliminate any inaccuracies, biases, or noise within the data, investing time and effort into data curation and pre-processing is essential. Also consider ethical concerns, data privacy and anonymizing sensitive information in datasets.

Model Training/Fine-tuning

In order to successfully fine-tune and customize a model, it is important to choose the right model and fine-tuning approach. Considerations for pre-trained models should be size, task suitability, and accessibility. When it comes to fine-tuning, approaches vary from retraining the entire model, to more parameter-efficient methods that focus on specific layers, offering a faster and more resource-friendly option. While infrastructure will be covered extensively in Part 2 of this blog series, organizations at this stage must already engage in capacity planning to address computing resource requirements based on model size and training method. Additional MLops tools designed for fine-tuning may be necessary to streamline this process. However, to consistently minimize security concerns, also pre-deployment activities like fine-tuning or training should only take place in an organization’s own controlled environment.

Testing and Evaluation

Before deploying a model, extensive testing should be carried out to assess LLM outputs and understand the quality of outputs. While standard metrics may not fully capture the intricacies of language understanding and generation, the model should be evaluated for accuracy and effectiveness, as well as generalizability and robustness, including testing the performance on unseen data samples or benchmarks it wasn’t trained on.

Check out Part 2 of this blog series, which covers the strategic infrastructure choices that need to be made when deploying a large-language model, as well as considerations post-deployment, ensuring successful implementation from end to end.


Headshot Lars Hegenberg-1Lars is an Innovation Researcher on Trace3's Innovation Team, where he is focused on demystifying emerging trends & technologies across the enterprise IT space. By vetting innovative solutions, and combining insights from leading research and the world's most successful venture capital firms, Lars helps IT leaders navigate through an ever-changing technology landscape.

 

Back to Blog