Case Study: Generative AI Chatbot for Annuity Customer Service Representatives

Author: Tony Ojeda

This case study explores the development and implementation of a Generative AI chatbot designed to assist customer service representatives (CSRs) at a nationwide annuities provider. The chatbot aimed to improve CSR efficiency and effectiveness when addressing complex customer inquiries regarding annuity products.

The implemented solution utilized the OpenAI API, GPT-4 as the large language model, Pinecone as the vector store database, RDS as the relational database, and both Weights & Biases and HumanLoop for activity logging. The large language model and version was selected based on rounds of testing while many of the other tools were selected by the client to seamlessly integrate with their existing systems and processes.

Client Problem and Requirements

The annuities provider faced challenges related to call handle times, hold times, and CSR workload, leading to customer dissatisfaction and employee burnout. The complexity of annuity products often required CSRs to consult various resources and internal help lines, resulting in longer call durations and increased hold times for customers.

To address these challenges, the client outlined specific requirements for the AI chatbot solution:

Performance KPIs:

Reduction in call handle time
Reduction in hold time
Decrease in customer hold instances
Decrease in internal helpline calls
Improved quality scores
Increased CSR job satisfaction
Reduced CSR turnover rate

Business Needs:

High accuracy and consistency in responses
Low latency for real-time interactions
Alignment with existing internal systems
Transparent process for ongoing maintenance and monitoring
Comprehensive reporting dashboard
User-friendly interface for CSRs

Our Approach

To meet the client’s unique needs, we developed a Generative AI chatbot solution composed of three core components: a pre-processing pipeline, the chatbot application itself, and a logging and reporting dashboard.

Pre-processing Pipeline:

Document Processing: The client’s annuity product information existed in various formats, including PDFs and images. We implemented Optical Character Recognition (OCR) to convert images into machine-readable text and then employed a format detection step to handle both PDFs and text documents efficiently.
Chunking Strategy: To enhance consistency and maintain context, we divided documents into smaller, manageable parts based on their predefined sections. For longer sections, we broke them down further into smaller chunks, while combining smaller sections to ensure efficient processing. This approach ensures that related information stays together, making it easier to retrieve and generate accurate responses.
Information Extraction: Each document chunk was processed by GPT with a custom prompt designed to extract key information and associated metadata (e.g., document type, information category). This structured data was then stored in both a Pinecone vector database and a relational database for efficient access and querying.
Automation and Maintenance: The entire pre-processing pipeline was automated using an AWS Lambda script, which automatically detected new documents, processed them, and moved them to a separate directory upon completion. To ensure data accuracy and maintainability, we implemented a version control system that checked for existing information in the databases and replaced it with the newly extracted data as needed.

Chatbot Application:

User Interface: We initially used a Streamlit app for rapid prototyping and hypothesis testing. Once the chatbot’s performance was validated, we developed a customized, scalable web application with a user-friendly interface tailored to the client’s needs.
Product Specificity: To ensure accurate and relevant responses, the chatbot required context about the specific annuity product being discussed. Users input a contract number (with future plans for automatic population) that was used to look up the relevant customer contract and product information via a web service. This ensured all subsequent searches and responses were filtered to that specific product, enhancing accuracy and relevance.
Question Routing & Hallucination Mitigation: To maintain consistency and avoid AI “hallucinations” (fabricated information), we implemented a question routing system. User questions were first sent to GPT with a prompt designed to categorize the question’s intent. This categorization acted as a guardrail, preventing the LLM from using unrelated information to answer questions outside the application’s scope. In such cases, the chatbot provided predefined responses guiding the user back to relevant topics.
Contextual Response Generation: For questions within the scope of the application, the system retrieved the most relevant chunks of information from the Pinecone database based on semantic similarity. These chunks, along with the user’s question, were then injected into a custom prompt for the LLM, allowing it to generate a response based on the provided context.
Calculation Functionality: The chatbot also offered the capability to perform predefined calculations based on customer information. To ensure accuracy, calculation requests entered a separate “state” within the application. In this state, the calculators were exposed to the LLM as tools, Python functions that it could call to perform the calculation. The LLM would prompt the user to enter required input values and then call the function and provide the result.
Feedback System: We integrated a feedback mechanism with thumbs up/down ratings and text input options for users to provide detailed feedback. This feedback loop helped to continuously improve the chatbot’s performance and user experience.
Additional Features: We also included quick response options for users to request simplified explanations or view the context behind the chatbot’s responses. Additionally, the chatbot proactively suggested potential follow-up questions based on the conversation, allowing users to explore topics further with a simple “yes” response.

Logging and Reporting Dashboard:

Transparency and Traceability: To ensure transparency and provide oversight, we implemented comprehensive logging across various platforms. Every step of each interaction was logged, including conversation IDs, contract numbers, employee IDs, user questions, question routing results, retrieved context, prompts used, LLM responses, token usage, and feedback ratings. This granular logging enabled detailed analysis and auditing of the chatbot’s interactions.
Reporting and Evaluation: We developed a Tableau dashboard to provide the client with both high-level metrics and automated evaluation capabilities. Automated scripts periodically evaluated batches of conversations, assessing metrics such as hallucination rate, adherence to instructions, feedback ratings, and other relevant performance indicators. These metrics were then displayed on the dashboard for easy review and analysis.

Additional Considerations:

Integration with Client Environment and Systems: The solution was developed and implemented within the client’s environment, leveraging their preferred tools and platforms, and was integrated with various internal systems for authentication, data augmentation, and security purposes.
LLM Selection, Optimization, and Refinement: We tested various LLMs and their versions to identify the model with the optimal performance for this specific project. Through rounds of user testing, we fine-tuned the chatbot’s tone, level of detail, and overall communication style to align with user expectations and preferences.
Scalability and Future Development: The solution was built with scalability in mind, incorporating features such as A/B testing and conversation states, laying the foundation for expansion to larger user groups and additional use cases.

Conclusion

The Generative AI-powered chatbot successfully addressed the client’s unique requirements, improving key performance indicators and enhancing the overall customer service experience. The pre-processing pipeline ensured accurate and consistent information retrieval, while the chatbot application provided a user-friendly interface with features designed to reduce hallucinations and improve the quality of responses. The logging and reporting system offered the client the necessary transparency and oversight to monitor the chatbot’s performance and make data-driven decisions for future improvements.

If you want to explore how Generative AI can enhance your business, contact us today. For more insights into the world of Generative AI, follow our blog or sign up for our mailing list.