Semantic Layer: A Journey into AI Innovation

Written by Patrick Ortiz | September 26, 2024

By Patrick Ortiz | Trace3 Innovation Research Associate

The semantic layer is making a strong comeback in modern data management, and for good reason. Acting as a translation layer between technical data structures and everyday business language, it has quickly become crucial in AI readiness strategies. With this pivotal shift in use cases, the Innovation team has decided to bring the semantic layer back into the spotlight as a relevant part of the enterprise AI journey.

What is a Semantic Layer?

Before we dive deeper into this resurgence of semantic layers, let’s take a step back and orient ourselves to the basics of what this technology offers. At a high level these solutions are designed to bridge the gap between complex data sets and business language, acting as a Rosetta Stone in modern data management. Its goal is to tie in the context of commonly understood business terms while abstracting the querying and data relationships required to pull together any necessary data sets. Imagine a platform in which you can easily search on a specific KPI or metric and pull the required data without having to handle the data wrangling and massaging required to get the same result. A semantic layer ensures that humans, and now Generative AI systems can easily find, understand, and use relevant data without getting lost in translation. This goal, however, isn’t new by any means.

The journey of the semantic layer began in the 90s, at that time it was all about the need to model and deliver data in end-user terms. This trend led to a shift towards a new data driven culture in the enterprise, placing a goal of self-service with IT led governance at the forefront of solutions in the data management market. It quickly became clear that the technical expertise required of data management (i.e. database schemas, querying languages, and data relationships) had to be abstracted for users not familiar with the required concepts. Thus, semantic layers were created to handle that very problem, providing non-technical users access and understanding of the data in their organizations.

Now back to the present, as we step into the era of Generative AI in the enterprise, the value proposition has grown to include Large Language Models (LLMs). This step forward in technology has led to advanced capabilities being made more accessible to non-technical users. It’s at this point in the journey where we find the true resurgence in semantic layer technology.

Rebirth in the Era of Generative AI – Why are Semantic Layers Coming Back?

Now that we have an idea of why semantic layers were created and what they do, it’s a whole lot easier to draw the connection between this solution and its resurgence in Generative AI use cases. The need to enable LLMs with organizational data has been clear from the get-go. Over the past year, many enterprises have focused on unstructured data, especially given LLMs’ ability to process files and images like never before. Investments were quickly aimed at vector databases and data labeling solutions tackling unstructured data challenges. However, what has been commonly overlooked was the value enterprises’ structured data could provide, if only it could be made accessible and meaningful to the LLM’s driving these cutting-edge AI applications.

Structured data is just as essential in various LLM use cases. Think about automating financial reporting, generating CRM insights, or even optimizing product recommendations. The challenge, though, is that structured data often lacks the necessary format, labeling, and schema for easy use. Meaningful semantics are hard to come by in these scenarios, and unfortunately for use cases involving LLMs, semantics are a crucial requirement to operationalize the data. To add onto the problem, most models in use are trained on internet data meaning much of the domain specific language used in business doesn’t carry over well into some of these pretrained models. What’s the fix? We need to rethink how we use structured data with LLMs. The answer lies in adding a layer between the data and its point of use – a technology that can translate your data into a language the model can understand.

Growing Roots in an Enterprise Strategy – How do Semantic Layers fit with Generative AI Applications?

With this tech’s journey making its way into enterprise strategies, you may be asking yourself what does this kind of solution look like in action? Well, we are all aware of the concerns related to hallucinations that occurs in many of the LLMs that are commonly used. The concern is incredibly relevant to enterprises excited about their AI journey but still hoping to keep accuracy and consistency in answers across the organization. Here is where semantic layers find ample opportunity to deliver on a method of increasing accuracy and handling context for improving the quality of responses.

The ideal fit for this solution would be in Gen AI applications connected to a structured data store. The semantic layer shines when acting as a gateway between these two components, providing a check and balance between the analysis and the result. In these scenarios context is key. This has been a critical challenge that solutions have been attempting to solve for some time - the most prominent of which has been retrieval-augmented generation (RAG) architectures relying on vector databases for support. But something many people miss is the need to constrain the context. In this way semantic layers can be seen as a highly specialized knowledge graph - one that constrains output to the metrics, attributes, and filters that are backed by company specific business definitions.

So, what happens when an LLM makes a request with a semantic layer in place? It’s forced to stick to the business terms defined in both natural language and SQL code. If it gets something wrong, you will almost certainly be met with an error rather than misleading information – no more hallucinations! In this way semantic layers have begun to show usefulness outside their usual roles in migrating data, governing data centrally, or connecting data silos. They’re proving to be incredibly valuable for supporting AI readiness especially for organizations still in the data management phase of their strategies.

Spawning New Solutions and Approaches – How are Semantic Layers coming to market?

Given the advantages of semantic layers in supporting AI efforts, the question quickly shifts to how does this technology integrate into your business? Understanding where this technology fits into your strategy should ultimately begin with the approach that best aligns with your business’ needs in implementation.

And when it comes to looking at implementation, vendors can usually be understood on two vectors. For one, how complex is the implementation of the solution - will it take a strong set of IT related skillsets to put in place or can you rely on business users to set it up with minimal supervision? Secondly, how much governance do you want over the semantics defined - can different business teams setup siloed definitions or should the IT team take lead on a centralized set of definitions to be used? These are the kinds of questions to ask yourself as you examine the solutions and their individual approaches to implementation.

Currently, four approaches in implementation have carved space in the market. These approaches include:

Direct Access: Using A&BI tools to directly access data with built in semantic capabilities.
Standalone Data Layer: Placing a standalone layer that sits between the data warehouse and end user access.
Integrated Transformation: Working with built in capabilities within current data transformation tools.
Centralized Governance: Leveraging semantic capabilities within existing data warehouses or data lakes.

It’s best to begin looking at these approaches from the two sides of the spectrum. First off, with centralized governance, an IT team is empowered to build semantic definitions directly into a data warehouse or data lake. This means that for any team pulling data from the repository, the semantics will remain the same no matter what. On the other side of the spectrum, we have a direct access implementation targeting the business users themselves. In this approach business users are empowered to define their own semantics within A&BI tools they currently use on their respective teams. The difference here lies in the fact that, as opposed to a centralized implementation, differing teams can end up with differing semantic definitions since they’re defining them on an individual basis.

Between these two approaches, exists some middle-ground solutions. The standalone data layer is the most popular. It sits as a layer between data repositories and BI tools, connecting to various data sources and ingesting the information in a few different ways depending on the solution. This setup allows for a user to go directly into the tools interface to define semantics across multiple data repositories. When you need to access the data, it can either be fed through A&BI tools or pulled directly from the semantic layer’s interface. Sitting just below this approach in the spectrum exists an implementation of semantic capabilities as an integrated transformation. Here, semantics are built into the data as it’s processed in a transformation stage between repositories - making sure that all data passing through your pipelines maintains consistent definitions.

Each approach has its strengths dependent on what needs you might have for a semantic driven solution. As the technology’s journey continues past the AI era, the market landscape will likely continue to shift to meet the needs of new use cases.

Expanding Adoption of a Semantic Driven Future – Where are Semantic Layers Headed?

We’ve seen how the journey of semantic layers has and continues to evolve over time in approaches, mirroring the growing complexity of data management needs. The market continues to show signs of renewed activity alongside the growing momentum in Generative AI use cases. For instance, established players have secured new rounds of funding, and emerging solutions like Illumex and Cube are attracting fresh investments –likely in the wake of Generative AI use. Market activity has also included acquisitions of some major players in the semantic layer market. For example, dbt Labs acquired Transform Data in February 2023. The acquisition underscoring the industry’s increasing recognition of the need for a semantic layer in the data stack, particularly linking semantics to the transformation stage.

With these changes in the landscape ongoing what has and will likely remain a constant for the time being is the divergence in approaches. As a forward-thinking enterprise, it’s prime time to determine a rationale behind the decision to bring one of these semantic layer solutions into your organizations’ data stack. Especially if you’re currently building out AI readiness strategies. When determining your rationale for adopting a semantic layer solution, consider the following:

Do your business users struggle with big data manipulation for reporting or analysis?
What is the current state of technical skills related to data management within your organization?
Are your data engineers struggling to keep up with business demands?
Are your business users proficient with querying, data manipulation, or text-to-SQL tools?

No matter what it comes down to, the main idea to keep in mind is that with a semantic layer, the underlying goal will always be to enable data to better fit business needs for the organization. Fostering better synergy between your data and business teams means the skillsets and knowledge of both groups are leveraged to their fullest potential. This would not only open your structured data to GenAI applications, but also enable your data to ‘talk’ the language of business.

Patrick Ortiz is an Innovation Research Associate here at Trace3. With a background in science and engineering and a drive for understanding the latest trends across the enterprise IT space, he continues to bring in a forward outlook and deliver on content to help our teams and clients understand the ever-changing landscape of IT solutions. When not researching, Patrick can be found exploring some of the best foodie locations in whichever city he’s visiting next.

View full post