Gen AI quality lifecycle

Gen AI quality lifecycle Overview

The rise of AI is transforming knowledge bases and CMS platforms. Our team is adapting our knowledge base for AI agents, establishing end-to-end quality check points.

This initiative presented a unique challenge: navigating uncharted territory without established best practices. Our team had to forge our own path to achieve a successful outcome.

The rise of AI-powered chat has transformed customer support, making it crucial for teams to understand AI agent performance and gain actionable insights for improvement.

As we gain more understanding of issues and root causes, the team has established life cycle and main building blocks to help our business partners improve product quality overtime.

Challenges

How to fix these problems and improve the quality of generative AI? We have encountered a few challanges

Lack of actionable insights

The sheer volume of sessions and data makes it difficult to identify key areas for improvement.

HITL quality eval is not trustworthy

Quality evaluations by agents were perceived as high error bar and too slow to be useful.

No visibility into RAG

The lack of transparency in RAG makes it impossible to understand underlying issues and implement effective solutions.

Complex process for knowledge updates

The complex and time-consuming content creation process needs streamlining.

CrowdCompute

Improve accuracy and velocity of  human evaluation tool for Gen AI quality assessment.

We redesigned this human quality evaluation tool UI with following considerations:

  • Accuracy: Clear metric definitions, guidelines, and detailed evidence for each metric.
  • Velocity: Side-by-side view and streamlined selection.
  • Focus: Elimination of unnecessary survey questions, concentrating on key quality-driving metrics.

Content Health

Monitor content performance and gain actionable insights with these key features:

  • Performance trend visualization on key metrics.
  • Actionable insights derived from performance data, human evaluations, and autorater feedback.
  • Flexible views tailored to specific business needs, including dedicated dashboards for Gen AI performance, knowledge source performance, and human evaluation results.

Diagnostic Console

The diagnostic console centralizes quality information, enabling root cause analysis of inaccurate generative AI responses. Key features:

  • Comprehensive quality insights: View human and autorater evaluations to quickly assess response quality.
  • Deep RAG diagnostics: Understand retrieved content via the Diagnose tab.
  • Flexible experimentation: Test hypotheses and optimize performance by adjusting content, ranking, or retrieved knowledge in the Experiments tab.

Composer

Empower your team to quickly address knowledge gaps by creating and publishing AI-only Q&A content with ease. These Q&A pairs have demonstrably improved generative answer factuality across all key products in 2024.