Testing strategy

Testing your assistant is a critical part of the success of your application and how the application reflects on your brand.

Types of testing

The different types of testing include the following.

  • Functional testing

    • Verify that the flow is correct, as designed and described in the journey diagram. E.g. when user selects an option, make sure that the next prompt is as designed

    • Test that integrations work

  • NLU testing

    • For intent-based NLU services, test that user utterances that are known (used as sample utterances) are correctly classified (with traditional intent classifier/NLU service). E.g. through automated testing and a confusion matrix

    • When LLMs are used as intent classifiers/extractors, test the prompts/examples

  • Coverage testing

    • Get user (and stakeholder) feedback on extent of functionality and use case coverage. E.g. perhaps an outcome that is sent to human to resolve can be handled automatically

    • Generate possible user utterances and test for coverage and correct intent recognition. Generated using LLMs or input from target users

  • Usability testing

    • Get user feedback on the ease of use, how intuitive and clear the assistant is, the wording, etc… Is it easy to use, is it useful (USERindex), likelihood to use it again and recommend

When to test

The motto should be: "Test early and often".

The different types of testing need to occur at any point in the development and live stages of your assistant. The following pages describe the types of testing to complete in the different stages of the development lifecycle.

The different types of testing can be performed during at any time.

Prototype testing

Once a prototype is available, testing can start. Some types of testing to undertake are:

  • Functional testing to ensure that the conversation flow is correctly implemented and that integrations work as expected

  • Usability testing to get user feedback on the interaction with the assistant

  • Coverage testing to discover additional utterances and intents that make the NLU interactions more robust

Testing in development

Functional, usability, intent and coverage testing can all be performed on sprint deliverables.

Once a beta version is available, end-to-end testing of the experience becomes feasible and critical.

Testing a live deployment

Once an assistant is live, continued testing is needed.

This includes recurring automated testing to ensure continued quality and performance and security, especially following updates or any changes to the system or integrations.

Consider running analytics including the Analyze functionality in the OD platform to gather data and improve the robustness of the assistant based on the insights gained from the analytics data.

Last updated