The Evolution of the LLM Toolstack - Thoughts on Capturing Value in the GenAI Tech Stack as a Startup
The Evolution of the LLM Toolstack - Thoughts on Capturing Value in the GenAI Tech Stack as a Startup
Note: This article was written from a VC perspective, with a certain bias towards outlier companies - so take this into account, as there’s many ways to build an amazing company in the LLM toolstack.
The current state of the LLM toolstack shows a pattern that we have seen before: With every shift in technology, value migrates through the layers in the tech stack.
In the early days, value sits firmly at the bottom layer of the stack: The core technology.
The current discussion mostly centers on foundational models, context windows, multimodality, latency, and “AGI” (Artificial General Intelligence) as the holy grail of technologies.
Yet the most interesting promise of generative AI or “GenAI” is to make technology more accessible to everyone: software that can become any tool, in the hands of anyone (from sales assistant to website designer or financial analyst).
As of today, the adoption of GenAI in the enterprise segment has been slow, as b2venture Partner Andreas Goeldi pointed out in some of his recent posts (read more here and here). At the application layer, AI agents are mostly not useful beyond a set of fairly basic tasks (as a recent report from Sifted also pointed out), and the underpinning tooling layer is highly fragmented.
This begs the question: How will the middle layer in the GenAI stack evolve and shift value from the foundational models towards the application layer?
To understand where the space is headed, it’s helpful to look at the typical path of these markets when technology shifts occur.
Previous step changes that increased the accessibility of a technology for non-technical users include the introduction of computers with a graphic interface, the introduction of the “modern data stack'' and business intelligence platforms over the last decade, or the embrace of cloud computing through early SaaS companies.
Most of these stacks go through 3 stages:
- Pseudo end-to-end solutions that are too thin to be useful (players such as H2O)
- Best-of-breed stack that lets customers put together their own solution, often with considerable complexity
- Consolidation into true end-to-end solutions: This is typically driven by large players that expand into different parts of the stack (for the modern data stack Snowflake expanding into various other areas could be viewed as an example of this)
We believe that the GenAI tooling landscape is already at stage 2, and founders have to figure out how to move from best-of-breed solutions to end-to-end services.
Let’s break down the different stages in this market development:
Stage 1: Pseudo end-to-end solutions
The immediate, zero-order effect in the LLM technology shift was largely triggered by the release of ChatGPT, which brought GenAI and the power of LLMs to an everyday audience.
LLMs took the market by storm, and to close the gap between the horizontal model and the verticalized needs of users, a wave of start-ups started offering verticalized solutions for different workflows.
Most adoption of GenAI at this stage was fairly superficial, both conceptually and technologically. The classic example for these wrapper applications is “ChatGPT, but for X” that countless startups have been pitching.
The value here remains firmly in the foundation model layer, and the LLM toolstack mainly consists of prompt engineering: well-structured prompts make the horizontal model useful for vertical end-user cases.
Stage 2: Best-of-breed stack
At this stage, the developer journey from raw LLMs to business-layer applications is broken up into sub-steps, and best-of-breed companies are currently built and scaled to success.
We’ve broken down the stack in the following way (even though many companies cover more than one area, so examples are sorted by main focus from an outside-in assessment and the list certainly is not exhaustive):
- Training and Tuning to monitor LLM outputs and performance, often an AI “playground” approach (e.g., SuperAnnotate, Entry Point)
- Observability to predict and explain model behavior and outcomes (e.g., fiddler, Neptune, Patronus AI)
- Security to protect data pre- and post-deployment and enable data privacy (e.g., Lakera, Calypso)
- Governance of workflows within company and to ensure LLM compliance (e.g., Credo AI, Calvin Risk)
- LLM Orchestration for integration of different (open/closed) source models, routing requests (e.g., AI21 labs, OpenZoo) and reduction of inference cost (e.g., Martian) as OpenAI bills are raising awareness of ROI for companies scaling their initial use cases
- Retrieval Augmented Generation (RAG) for specialized contextual knowledge, including vector databases (e.g., LlamaIndex, Langchain, and OpenAI are currently integrating this in their own stack)
From an end-user perspective, this stage is not ideal. Putting together a GenAI stack requires stitching together various best-of-breed solutions from multiple independent vendors. Getting to enterprise level reliability with such a stack is no easy feat, and we see that as one reason for the slow adoption by corporations and larger enterprises.
If you are a startup working in the LLM stack at this stage, you will need a truly unique idea in order to establish a “right to win” in your category, and a perspective on where the market is moving. Open source solutions for the aforementioned categories have become very sophisticated, and closed source companies need a true advantage to make an argument for themselves.
Stage 3: True end-to-end solutions
This is where the value moves up to the application layer and makes the full power of GenAI available for non-technical people - not just as a point solution for workflows, as in stage 1, but integrated with the broader business context.
What role do foundation model providers play in stage 3?
A look back at Microsoft’s history makes the incumbent advantage over startups very clear: their simple Azure web stack tools leveraged synergies with other Windows offerings, and thus won in enterprise and big parts of the SME segment. Today, LLM providers are moving from the bottom layer to capture value in higher layers: OpenAI is building more and more of the developer stack for their models (recently announcing further fine-tuning capabilities).
We are starting to see a world where OpenAI offers the monolith for all customers wanting enterprise security, plus a “wild west” of open source, plus some enterprise alternatives. The Linux Lamb/Apache stack has been developed in parallel to the Azure stack and still co-exists today. A similar outcome in the GenAI era could be a duality of OpenAI with another LLM provider like Google Gemini. SAP Oracle and IBM MEAN have emerged as enterprise alternatives with a full stack of developer solutions - in the GenAI world, players like Aleph Alpha could become the SAP in this space.
This can be an attractive exit path for startups who are building winning solutions with the right timing in wave 2: Think of MySQL’s acquisition by Oracle. Microsoft also famously pushed into the web application market through a wave of acquisitions. Founders who can predict the next waves of innovation in GenAI are still in a good position to build winning toolstack solutions.
Where will start-ups capture value in stage 3?
Even with LLM providers expanding their toolstack, startups’ best-in-breed solutions can claim their “right to win” if their products continue to outperform on a specific value proposition. A role-model for the best-in-breed winner’s approach is Snowflake’s track record in the web stack: The company is still a household name, as their product for building data applications remains superior to incumbent solutions.
OpenAI’s in-house RAG offering, for instance, only had very basic features in their last release. Start-ups like Contextual AI and Crux are using superior RAG solutions to capture the enterprise segment. With Google recently announcing Gemini 1.5 Pro’s 2 million token context window, RAG is slowly becoming unnecessary for closed source models, but there’s a solid use case in open source still. Reinforcement learning from human feedback, or RLHF, is another interesting case, with players such as Adaptive ML.
The second option is building an LLM-agnostic abstraction layer. Oracle is favored for supporting different platforms including IBM and Windows in the world of databases. In GenAI, there’s a cohort of start-ups packaging LLMs from different providers and facilitating the embedding in the enterprise context, such as Malted AI and Intel spin-out Articul8. Start-ups like Arcee AI focus on highly regulated industries with low fault tolerance, such as Healthcare, FinTech, and InsurTech, which is a smart way to expand from specialized LLM security and monitoring solutions to the broader LLM toolstack.
These types of solutions also address a people-centric problem that many enterprises are facing: traditional software engineers lack the underlying skills needed for building GenAI solutions. There’s an entire cohort of coding co-pilots such as Devin and start-ups visualizing the development process (similar to Microsoft bringing visual applications to the web app development stack).
Last but not least, there’s a case for start-ups to extend beyond the LLM toolstack and capture value in the application layer. This requires finding the right niche that the monoliths are not yet playing in and establishing a superior solution that makes GenAI usable in the end-user application - for example, GenAI avatars (Anam.ai), music generation (Suno, Beatoven.ai), or video generation (Adobe Premiere Pro, HeyGen, Synthesia). There’s some nuances here on how deeply players cover the layers in the stack: Some only leverage existing models (Adobe), some cover the full stack with proprietary LLMs (Suno, Stability AI), some take a hybrid approach by leveraging external LLMs and adding their own models for specific cases (like our portfolio company, Text Cortex is doing).
Conclusion
We are at the very beginning of understanding how the potential of LLM's will impact and play out in the broader world of enterprise and for end-users, and we see a multitude of opportunities and challenges for making this revolutionary technology more accessible.
If you are a founder building in the space, do not get nervous with a pivot (or two) in finding the right direction for your product, as these are bound to happen, and we are excited to work with founders to figure it out together. At b2venture we believe in “show, don't tell” and always appreciate founders who ground their conversations around their product instead of a pitch deck (not that it hurts to have a good pitch deck).
Beyond the LLM toolstack, there’s much more to be said about the application layer and what truly GenAI-first vertical solutions could look like (for example, an AI-first CRM), and applications that are genuinely built on GenAI have the potential to change the underlying economics and usage patterns fundamentally. We would love to hear from founders who take truly novel approaches with their solutions: marisa.krummrich@b2venture.vc.
Note: This article was written from a VC perspective, with a certain bias towards outlier companies - so take this into account, as there’s many ways to build an amazing company in the LLM toolstack.
The current state of the LLM toolstack shows a pattern that we have seen before: With every shift in technology, value migrates through the layers in the tech stack.
In the early days, value sits firmly at the bottom layer of the stack: The core technology.
The current discussion mostly centers on foundational models, context windows, multimodality, latency, and “AGI” (Artificial General Intelligence) as the holy grail of technologies.
Yet the most interesting promise of generative AI or “GenAI” is to make technology more accessible to everyone: software that can become any tool, in the hands of anyone (from sales assistant to website designer or financial analyst).
As of today, the adoption of GenAI in the enterprise segment has been slow, as b2venture Partner Andreas Goeldi pointed out in some of his recent posts (read more here and here). At the application layer, AI agents are mostly not useful beyond a set of fairly basic tasks (as a recent report from Sifted also pointed out), and the underpinning tooling layer is highly fragmented.
This begs the question: How will the middle layer in the GenAI stack evolve and shift value from the foundational models towards the application layer?
To understand where the space is headed, it’s helpful to look at the typical path of these markets when technology shifts occur.
Previous step changes that increased the accessibility of a technology for non-technical users include the introduction of computers with a graphic interface, the introduction of the “modern data stack'' and business intelligence platforms over the last decade, or the embrace of cloud computing through early SaaS companies.
Most of these stacks go through 3 stages:
- Pseudo end-to-end solutions that are too thin to be useful (players such as H2O)
- Best-of-breed stack that lets customers put together their own solution, often with considerable complexity
- Consolidation into true end-to-end solutions: This is typically driven by large players that expand into different parts of the stack (for the modern data stack Snowflake expanding into various other areas could be viewed as an example of this)
We believe that the GenAI tooling landscape is already at stage 2, and founders have to figure out how to move from best-of-breed solutions to end-to-end services.
Let’s break down the different stages in this market development:
Stage 1: Pseudo end-to-end solutions
The immediate, zero-order effect in the LLM technology shift was largely triggered by the release of ChatGPT, which brought GenAI and the power of LLMs to an everyday audience.
LLMs took the market by storm, and to close the gap between the horizontal model and the verticalized needs of users, a wave of start-ups started offering verticalized solutions for different workflows.
Most adoption of GenAI at this stage was fairly superficial, both conceptually and technologically. The classic example for these wrapper applications is “ChatGPT, but for X” that countless startups have been pitching.
The value here remains firmly in the foundation model layer, and the LLM toolstack mainly consists of prompt engineering: well-structured prompts make the horizontal model useful for vertical end-user cases.
Stage 2: Best-of-breed stack
At this stage, the developer journey from raw LLMs to business-layer applications is broken up into sub-steps, and best-of-breed companies are currently built and scaled to success.
We’ve broken down the stack in the following way (even though many companies cover more than one area, so examples are sorted by main focus from an outside-in assessment and the list certainly is not exhaustive):
- Training and Tuning to monitor LLM outputs and performance, often an AI “playground” approach (e.g., SuperAnnotate, Entry Point)
- Observability to predict and explain model behavior and outcomes (e.g., fiddler, Neptune, Patronus AI)
- Security to protect data pre- and post-deployment and enable data privacy (e.g., Lakera, Calypso)
- Governance of workflows within company and to ensure LLM compliance (e.g., Credo AI, Calvin Risk)
- LLM Orchestration for integration of different (open/closed) source models, routing requests (e.g., AI21 labs, OpenZoo) and reduction of inference cost (e.g., Martian) as OpenAI bills are raising awareness of ROI for companies scaling their initial use cases
- Retrieval Augmented Generation (RAG) for specialized contextual knowledge, including vector databases (e.g., LlamaIndex, Langchain, and OpenAI are currently integrating this in their own stack)
From an end-user perspective, this stage is not ideal. Putting together a GenAI stack requires stitching together various best-of-breed solutions from multiple independent vendors. Getting to enterprise level reliability with such a stack is no easy feat, and we see that as one reason for the slow adoption by corporations and larger enterprises.
If you are a startup working in the LLM stack at this stage, you will need a truly unique idea in order to establish a “right to win” in your category, and a perspective on where the market is moving. Open source solutions for the aforementioned categories have become very sophisticated, and closed source companies need a true advantage to make an argument for themselves.
Stage 3: True end-to-end solutions
This is where the value moves up to the application layer and makes the full power of GenAI available for non-technical people - not just as a point solution for workflows, as in stage 1, but integrated with the broader business context.
What role do foundation model providers play in stage 3?
A look back at Microsoft’s history makes the incumbent advantage over startups very clear: their simple Azure web stack tools leveraged synergies with other Windows offerings, and thus won in enterprise and big parts of the SME segment. Today, LLM providers are moving from the bottom layer to capture value in higher layers: OpenAI is building more and more of the developer stack for their models (recently announcing further fine-tuning capabilities).
We are starting to see a world where OpenAI offers the monolith for all customers wanting enterprise security, plus a “wild west” of open source, plus some enterprise alternatives. The Linux Lamb/Apache stack has been developed in parallel to the Azure stack and still co-exists today. A similar outcome in the GenAI era could be a duality of OpenAI with another LLM provider like Google Gemini. SAP Oracle and IBM MEAN have emerged as enterprise alternatives with a full stack of developer solutions - in the GenAI world, players like Aleph Alpha could become the SAP in this space.
This can be an attractive exit path for startups who are building winning solutions with the right timing in wave 2: Think of MySQL’s acquisition by Oracle. Microsoft also famously pushed into the web application market through a wave of acquisitions. Founders who can predict the next waves of innovation in GenAI are still in a good position to build winning toolstack solutions.
Where will start-ups capture value in stage 3?
Even with LLM providers expanding their toolstack, startups’ best-in-breed solutions can claim their “right to win” if their products continue to outperform on a specific value proposition. A role-model for the best-in-breed winner’s approach is Snowflake’s track record in the web stack: The company is still a household name, as their product for building data applications remains superior to incumbent solutions.
OpenAI’s in-house RAG offering, for instance, only had very basic features in their last release. Start-ups like Contextual AI and Crux are using superior RAG solutions to capture the enterprise segment. With Google recently announcing Gemini 1.5 Pro’s 2 million token context window, RAG is slowly becoming unnecessary for closed source models, but there’s a solid use case in open source still. Reinforcement learning from human feedback, or RLHF, is another interesting case, with players such as Adaptive ML.
The second option is building an LLM-agnostic abstraction layer. Oracle is favored for supporting different platforms including IBM and Windows in the world of databases. In GenAI, there’s a cohort of start-ups packaging LLMs from different providers and facilitating the embedding in the enterprise context, such as Malted AI and Intel spin-out Articul8. Start-ups like Arcee AI focus on highly regulated industries with low fault tolerance, such as Healthcare, FinTech, and InsurTech, which is a smart way to expand from specialized LLM security and monitoring solutions to the broader LLM toolstack.
These types of solutions also address a people-centric problem that many enterprises are facing: traditional software engineers lack the underlying skills needed for building GenAI solutions. There’s an entire cohort of coding co-pilots such as Devin and start-ups visualizing the development process (similar to Microsoft bringing visual applications to the web app development stack).
Last but not least, there’s a case for start-ups to extend beyond the LLM toolstack and capture value in the application layer. This requires finding the right niche that the monoliths are not yet playing in and establishing a superior solution that makes GenAI usable in the end-user application - for example, GenAI avatars (Anam.ai), music generation (Suno, Beatoven.ai), or video generation (Adobe Premiere Pro, HeyGen, Synthesia). There’s some nuances here on how deeply players cover the layers in the stack: Some only leverage existing models (Adobe), some cover the full stack with proprietary LLMs (Suno, Stability AI), some take a hybrid approach by leveraging external LLMs and adding their own models for specific cases (like our portfolio company, Text Cortex is doing).
Conclusion
We are at the very beginning of understanding how the potential of LLM's will impact and play out in the broader world of enterprise and for end-users, and we see a multitude of opportunities and challenges for making this revolutionary technology more accessible.
If you are a founder building in the space, do not get nervous with a pivot (or two) in finding the right direction for your product, as these are bound to happen, and we are excited to work with founders to figure it out together. At b2venture we believe in “show, don't tell” and always appreciate founders who ground their conversations around their product instead of a pitch deck (not that it hurts to have a good pitch deck).
Beyond the LLM toolstack, there’s much more to be said about the application layer and what truly GenAI-first vertical solutions could look like (for example, an AI-first CRM), and applications that are genuinely built on GenAI have the potential to change the underlying economics and usage patterns fundamentally. We would love to hear from founders who take truly novel approaches with their solutions: marisa.krummrich@b2venture.vc.
The Author
Marisa Krummrich
Investment Manager
Marisa is Investment Manager in the b2venture Fund team and focuses on horizontal AI, AI tooling, and vertical enterprise applications that utilize AI as the key enabler.
Team