November 19, 2024

Replacing the taxing labor in technical workflows

Tobi Coker

Astasia Myers

Ask anyone in IT or engineering what parts they dislike most, and there is a good chance you will hear about managing open tickets, being on call, and fixing bugs in the product.

For good reason. No one in IT wants to hear from management that they can't get into Zoom for a critical meeting. No developer wants to be woken up at 2 am on a Saturday dealing with a mission-critical bug that could bring down their entire system. The world saw firsthand, earlier this year, the debilitating effects of a software bug being pushed into production when Crowdstrike brought down Windows, causing ~$5.4B in damage across airlines, hospitals, financial markets, and other industries.

Due to the critical nature of these workflows, several billion worth of software products have been built to address these needs. ServiceNow began as an internal ticketing platform. Atlassian’s Jira emerged as the developer-focused version of this. PagerDuty streamlined and automated parts of incident response. BrowserStack and Tricentis were born out of a need to test and ensure the quality of products before being put into production.

Despite the product advancements these tools have ushered into the market, full automation is still elusive. Further, the manual nature of these workflows creates inefficiencies and frustration with technical teams.

Most of these tools still require significant input and guidance from humans-in-the-loop. Broadly due to:

Complexity. Problem identification and resolution often require experience from tenured operators capable of using prior business-specific knowledge to solve edge cases. It also requires an understanding of numerous systems.
Judgment and prioritization. Humans have the context to prioritize and re-prioritize tasks.
Creativity. Problem-solving often requires creative and adaptable thinking to conceive unconventional solutions.

Given this, why do we believe now is the time to tackle full autopilot for these complicated workflows? Because of technical progress in multiple domains, including:

NLP for contextual understanding. NLP models can interpret and process natural language with very high accuracy. Recent releases like GPT-4o signal a shift towards nuanced understanding, emergent behavior, and cognitive thinking at inference. Chain-of-Thought (CoT) extends the capabilities of NLP models to more nuanced, multi-step problem-solving.
Synthetic data generation and reinforcement learning. Synthetic data generation can help train and fine-tune AI models for real-world scenarios and edge cases, increasing accuracy and efficacy. Reinforcement learning can help improve models' learning and behavior adaptation, enabling better accuracy. Moreover, Reinforcement Learning from Human Feedback (RLHF) builds on traditional RL but incorporates human input to shape the reward function and guide the learning process. RLHF helps the agent learn more complex or nuanced behaviors that might be difficult to capture solely with RL.
Multimodal capabilities and agent orchestration. Multimodal data processing and integration are crucial in incident response and QA testing, where both workflows require integrating different data (logs, alerts, activities, code, documentation, and visual content). Agent orchestration can connect through different systems and now “do the work.”

These advancements mean that AI-powered autopilot workflows are not only efficient but also context-aware, adaptive, and reliable.

We believe the recent advancements in AI and agents represent an opportunity to achieve an autopilot in these specific technical domains—ITSM, incident response, and QA, and that these autopilots will set up development teams for a bright future.

‍

ITSM

The ITSM software market is ~$48.3B. As of this writing, ServiceNow is a $190B company and Atlassian is $50B. At Atlassian’s investor day, the company stated Jira Service Management is at approximately $600M in annual revenue.

AI can positively improve ITSM in several different ways. AI and Retrieval-Augmented Generation (RAG) can significantly enhance ticket deflection through a conversational experience, accelerating time to first response. Semantic awareness can allow for more nuanced knowledge retrieval by understanding natural language queries better than traditional keyword search. The improved contextual relevance can enable individuals to troubleshoot their own issues. The knowledge base can also be dynamically constructed so it is more up-to-date.

Moreover, AI agents are a huge unlock. AI agents can leverage employee data to provide personalized responses that reflect the user’s previous interactions and preferences. With AI memory, AI agents can maintain the context of conversations over multi-turn dialogues. Most importantly, a multi-agent system can automate resolutions to take actions on behalf of the employee. This can decrease Mean Time to Resolution (MTTR) and improve Customer Satisfaction (CSAT).

For complex issues or requests that require human input, the AI will quickly route them to the right specialist with detailed context so the support team can dive directly into problem-solving rather than basic triage.

An AI-powered autopilot ITSM tool can help transform this massive market from a reactive, manual process into a more seamless, proactive experience. An autopilot AI system could work continuously in the background, constantly monitoring each user’s software, hardware, and network, recognizing anomalies in real time, predicting potential issues, and taking preemptive action.

In this AI-powered ITSM world, the “ticketing system” as we know it, fades into the background. Employees would rarely open tickets manually because issues are preempted, resolved automatically, or communicated through user-friendly channels. Every employee experiences uninterrupted workflows, whether they’re in the office, working remotely, or traveling.

While there are many benefits for companies and employees that adopt AI-enabled ITSM products, companies building in this space also have new advantages. Historically, ITSM products had long deployment cycles of 6-12+ months, and building customized workflows for a specific business took significant time. Simply, it was hard to convince customers to migrate. Now with AI, there is an opportunity to shorten implementation and migration by translating workflows from legacy systems more easily and using natural language to create workflows to achieve higher workflow coverage faster.

Companies building in this space include Atomic Work, Console, Espressive, Fixify, Moveworks, Ravenna, Risotto, Serval, and XOps.

‍

Incident response

The broader incident response platform market is around $38B, comprising AIOps at $5B, runbooks and IT process automation at $11B, and alerting and incident coordination at $22B. PagerDuty generated $430m in FY2024.

Incident response and the DevOps process are tightly coupled – Incident response is a structured approach to managing and addressing unplanned events in the developer environment. DevOps includes the monitoring of code repositories, development environments, CI/CD pipelines, and production servers. Over time, this can be managed autonomously by AI agents.

Imagine a DevOps environment where AI-powered incident response continuously monitors these environments. The AI tool analyzes every line of code, every commit, and every deployment in real time. If a new bug, configuration error, or potential security vulnerability is detected, the AI flags it instantly, preventing it from escalating into an incident. It is the reimagining of observability, making it pre-emptive and event-driven. Importantly, AI-enabled incident response can not only highlight issues early but can also effectively identify the root cause, dynamically generate a remediation runbook and dashboards, and, in a future state, take action to solve the issue.

In this world, the development pipeline is more than a series of tools and processes—it’s a continuously running, self-healing system. The autopilot AI ensures that environments are secure, issues are resolved or rolled back in real time, and deployments are predictable and reliable. Code quality is maintained effortlessly, and production stability is no longer a constant source of anxiety for developers and DevOps engineers.

Buyers are interested in AI-enabled root cause analysis and incident remediation for numerous reasons.

1) Skillsets: In the past, infrastructure engineers (DevOps, SRE, and Ops) managed production issues. Now more than ever before, developers are responsible for their own DevOps and incident response (“you build it, you run it”). They are incentivized to be good at shipping code to production, not necessarily being good at fixing issues in production. In turn, they may not have the depth of experience or pattern matching to quickly resolve the issue.

2) Lots of context to understand: Today's infrastructure systems are incredibly complex with microservices and multiple vendors. Most developers don't have a full understanding of the entire environment. Companies often have more than five observability vendors. Individuals have to dig through logs, code, and observability data to understand the service, what’s wrong, and why. This is often performed under time pressure, adding stress to the process.

3) Incidents are often unique: Each time an incident happens, engineers have to go to a runbook they’ve probably never used before or come up with a solution themselves under pressure. Even more, incidents are snowflakes. Each one can be different, so often, the most painful incidents haven’t been seen before, so there are no runbooks. This requires incident responders to be creative and can lead to longer remediation times, damaging the company's brand and customer experience and losing revenue.

With AI incident response products, teams are able to decrease MTTR, uplevel engineers, loop in fewer senior engineers, reduce organizational risk, minimize toil, and cut costs.

Exciting businesses taking on this challenge are 100x, Cleric, Deductive, Flip.ai, Resolve, and Traversal.

‍

QA

We estimate QA testing is a $10.5B market, including outsourced and domestic in-house QA teams. QA testing is the 7th most outsourced role in the U.S. (~54K), representing over $5B in value. Tricentis makes over $300M annually in low-margin services revenue.

QA testing involves several manual processes, including test planning, manual exploratory testing, test generation, bug identification/logging, and verification/regression testing. This is done across website, mobile, and application testing environments. These are all tasks that LLMs and agents will eventually excel at.

There is a technical unlock enabling an AI QA autopilot. AI web agents can traverse websites and understand the publicly accessible code in the DOM. Vision and VLM models can inspect websites for inconsistencies. AI models can generate testing workflows like Puppeteer by enabling natural language to code generation. AI can identify whether test failures are due to changes in user behavior, issues within the test code, or bugs in the application code. This helps improve the time it takes to build the workflows, the accuracy of the tests created, and the maintenance of the tests themselves.

A world with autopilot QA would mean that after finishing coding a feature, an auto-QA engineer would surface potential bugs or exploits—potentially even fixing low-level ones – and provide a near-instant report for a human to review in a pull request. It would shift QA left, consolidate QA teams, and outsource to a few big companies that employ agents and LLMs for 80 to 90% of the labor).

Compelling approaches to tackling this problem, in various parts of the market include Heal.dev, Meticulous, Momentic, OctoMind, QA.tech, Ranger, Robin, and Spur.

‍

Challenges

We foresee several challenges to attacking these three markets, including

System uniqueness. IT, internal ticketing, and production environments are complex and heterogeneous. Each system has unique failure points and dependencies, making it challenging for the AI to recognize and incorporate into its system.
Automation vs. decision making. Any tool will have to solve the false positive and negative problem. The system has to be precise as the stakes for their decision are very high. Too many false alerts and the system will quickly erode trust with end users. Starting with a human-in-the-loop design for high-risk actions (similar to an FDE/Palantir model) could solve this.
Infrastructure inertia/rip and replace. These systems – ServiceNow, Atlassian, PagerDuty/Datadog, Tricentis/Browserstack – are all deeply ingrained in a customer’s environment. There are years of technical debt and alignment with these tools. The inertia to maintain the status quo and skepticism of AI tooling will be powerful. We believe the strategies around this are to target a fast-growing set of startups (as future customers) or sit on top of existing systems (system of intelligence), with the long-term goal of supplanting the system in a customer’s workflow.

‍

What we’re looking for

Felicis’ investments in companies like Poolside, Semgrep, and Sourcegraph in adjacent markets depict the opportunity that excites us. This includes:

A novel technical architecture encompassing the latest technical advancements and a way to ingest or generate proprietary data to train and refine their models.
Experience solving these problems at scale or a sophisticated knowledge of the end user and their problems.
Founding teams with the right mix of Zuckerberg’s early product sense (and vision) and Slootman’s scalability.

We'd love to hear from you if you are building in these markets. Please reach out to us at astasia@felicis.com and tobi@felicis.com.