Test and tweak your AI Agent

Your AI Agent is almost ready to launch! Before you publish it, let’s make sure it’s optimised for the best user experience.

Introduction

While building out your AI agent, you have set up a number of components along the way that all contribute to it's success.

In this section we will dive deeper into the different tests you can run within the OpenDialog platform and what tools can help you debug issues or improve the quality of your AI Agent.

There are a few types of tests that you will want to run before launching your AI Agent:

  • Functional testing : This type of testing can identify issues with logic, flow or executed actions that might disrupt user interactions. You can perform functional tests in OpenDialog through Preview and the associated visualiser.

  • Conversation design testing: This type of testing will help you identify gaps in the conversational flow while you are building out your AI Agent. In OpenDialog, you can do this at every step, using the conversational player functionality in the action menu of the Conversation feature.

  • Classification testing : This type of testing identifies issues with how your semantic classifier, classifies user utterances. OpenDialog's language services sections provides with a testing panel for each language service to test it in isolation of any scenario.

  • Response Generation testing : This type of testing identifies issues with the response that is generated by the LLM Action. OpenDialog's LLM Action feature provides with a testing panel to test the LLM Action directly.

See it in action

Follow along to debug your AI Agent, step by step

Step-by-step guide

Before you start testing

In order to gain systematic improvements through quality assurance testing, there are a few things we recommend you prepare upfront.

  1. Prepare your test cases

    Prepare a number of conversational flows a user could go through using your AI Agent, covering a wide range of possibilities. If your AI Agent is mainly based on free-flow conversations, source a large set of questions a user could ask. If you have access to real-world data of what users are asking on other channels, you can use this as a basis for your testing.

  2. Set up a test sheet to document your findings

    In order to document your testing, but also to share with other testers, we recommend setting up a test sheet. This will allow you to keep track of the issues you find, prioritize and address them in priority order.

Example structure of a test sheet:

Example of the setup of an AI Agent test sheet

Test your AI Agent

First and foremost, you will want to test your AI Agent as a whole in an experience close to that of your future end-users. You can do this in OpenDialog through our Preview feature.

Preview your AI Agent and visualise it's interactions in real time

Armed with your test cases, you can now start interacting with your AI Agent through the Preview, by typing in user questions in the input field that reads 'Enter your message' or using the interaction options presented to you in the webchat frame.

Identify issues & define how to remedy them

Depending on the response you receive, a few different issues can be at the core of the conversational flow not proceeding as expected.

Before being able to remedy the issue, you will need to do diagnose what is going on. To do so, the best place to start is the Conversation visualiser in Preview.

Here is a quick overview of some of the primary issues you could encounter:

No-match

No classified intent got matched, and therefor the NoMatch intent gets triggered

Although you will have tailored your conversation design to handle this error gracefully, as shown in the above example, you do want to understand why this has happened.

This usually indicates that there is an issue around the interpretation of the user utterance to the appropriate intent.

This can either be an issue with your conversation design (the conversation engine could not find potential intents to service) or with your semantic classifier (the interpreter could not match the intent based on its classification).

Checking the conversation engine selection, will give you a first view into what has triggered the no-match response.

No available intents

Next, you will want to check the conversation flow, to make sure intents were available to the conversation engine to select from.

A few reasons why no intents might be available at a certain point of the conversation are:

  • Selected behaviours of your conversational components (no starting or open components are available). Check the behaviour on turns, scenes, and conversations to remedy.

  • Conditions on conversational components do not allow the conversation engine to consider any of the relevant components.

  • The app-user cadence has not been respected, and while looking for a user intent (or vice versa) - only app intents have been made available to the conversation engine.

No match due to semantic classification

When a no match is due to the fact that the interpreter could not match the user utterance to one of the intents from your semantic classifier, you will need to test this further by testing this user utterances and its variations against your semantic classifier in isolation.

Once you have identified the issue with the semantic classification, you can start tweaking the instructions for your different intents, add an intent if needed and test the user utterances again in the classifier's test panel, following the steps above.

No messages found with passing conditions

A conversation that has a failing message due to conditions

In this case, when looking for a response message to serve, there were no messages available to the conversation engine to serve to the user. The reasons for this are mainly related to the fact that all available messages or app intents will have conditions that have not been met.

The first place to check is the conversational visualiser (see above for how to check the conversation engine reasoning), which will tell you which conditions passed and which didn't.

If all conditions passed, and a response intent did get matched, the next place to check is the message editor.

Check the conditions on all the messages related to the matched app intent and make sure all potential use cases are covered.

For example, if you have a condition on a message: if attribute is set to true, then you should equally have on for the opposite use case, if attribute is false or a catch-all message with no conditions at all.

Wrong/unexpected response

The reponse that was generated by the application in the response message might be wrong or not what you had expected.

This can be due to a number of reasons, but mostly comes down to the following :

  • Misclassification: the user utterance did not get classified as expected, and therefore the related response for the wrong intent is served. This can be solved by following the steps mentioned in the No-match section of this documentation.

  • Erroneous response generation: the response generated through the RAG service and the LLM action is erroneous. To pinpoint the exact issue, and remedy it - you will need to follow these steps:

    • Test the RAG service in isolation

    • Test the LLM Action

  • Wrong attribute: the attribute referenced in the message that is being served, is not the correct attribute. You can verify this by checking the message in the message editor and making sure the referenced attribute is the same one as the one set by the LLM action.

If you can see the wrong response from your RAG service, than you can now remedy this by reviewing the document you used to see if it contains the correct information. If that is not the case, update the sources you use for this topic and re-vectorise.

If the document does reference the correct response but it is not returned correctly through the response, go to the RAG service settings to update specific settings such as the model used, Top K, temperature, etc. For more information on these settings, you can read the more in-depth documentation about RAG service settings.

These are just a few of many things that you can test using the extensive test tooling for each component of OpenDialog, avoiding issues along the way.

Last updated