Test and tweak your AI Agent

Your AI Agent is almost ready to launch! Before you publish it, let’s make sure it’s optimised for the best user experience.

Introduction

While building out your AI agent, you have set up a number of components along the way that all contribute to it's success.

In this section we will dive deeper into the different tests you can run within the OpenDialog platform and what tools can help you debug issues or improve the quality of your AI Agent.

There are a few types of tests that you will want to run before launching your AI Agent:

  • Functional testing : This type of testing can identify issues with logic, flow or executed actions that might disrupt user interactions. You can perform functional tests in OpenDialog through Preview and the associated visualiser.

  • Conversation design testing: This type of testing will help you identify gaps in the conversational flow while you are building out your AI Agent. In OpenDialog, you can do this at every step, using the conversational player functionality in the action menu of the Conversation feature.

  • Classification testing : This type of testing identifies issues with how your semantic classifier, classifies user utterances. OpenDialog's language services sections provides with a testing panel for each language service to test it in isolation of any scenario.

  • Response Generation testing : This type of testing identifies issues with the response that is generated by the LLM Action. OpenDialog's LLM Action feature provides with a testing panel to test the LLM Action directly.

See it in action

Video coming soon

Step-by-step guide

Before you start testing

In order to gain systematic improvements through quality assurance testing, there are a few things we recommend you prepare upfront.

  1. Prepare your test cases

    Prepare a number of conversational flows a user could go through using your AI Agent, covering a wide range of possibilities. If your AI Agent is mainly based on free-flow conversations, source a large set of questions a user could ask. If you have access to real-world data of what users are asking on other channels, you can use this as a basis for your testing.

  2. Set up a test sheet to document your findings

    In order to document your testing, but also to share with other testers, we recommend setting up a test sheet. This will allow you to keep track of the issues you find, prioritize and address them in priority order.

Example structure of a test sheet:

Test your AI Agent

First and foremost, you will want to test your AI Agent as a whole in an experience close to that of your future end-users. You can do this in OpenDialog through our Preview feature.

Navigate to your scenario's preview

  • If you are not already in your scenario section, hover over 'Scenarios' in the left-hand navigation panel, and select your scenario

  • Hover over 'Test' in the left-hand navigation panel and select Preview

  • View your AI Agent loading in the bottom left-hand corner of the central panel of your screen.

Armed with your test cases, you can now start interacting with your AI Agent through the Preview, by typing in user questions in the input field that reads 'Enter your message' or using the interaction options presented to you in the webchat frame.

Identify issues & define how to remedy them

Depending on the response you receive, a few different issues can be at the core of the conversational flow not proceeding as expected.

Before being able to remedy the issue, you will need to do diagnose what is going on. To do so, the best place to start is the Conversation visualiser in Preview.

Here is a quick overview of some of the primary issues you could encounter:

No-match

Although you will have tailored your conversation design to handle this error gracefully, as shown in the above example, you do want to understand why this has happened.

This usually indicates that there is an issue around the interpretation of the user utterance to the appropriate intent.

This can either be an issue with your conversation design (the conversation engine could not find potential intents to service) or with your semantic classifier (the interpreter could not match the intent based on its classification).

Check the conversation engine reasoning

  • Click on the 'Considered Path' tile on the right hand side of the central panel right next to the webchat window.

  • View the conversation visualizer

  • Using the arrows positioned on either side of the central panel in the middle of the screen, or the links on the bottom, browse through the different steps the conversation engine has taken to result in a no-match response

  • Check the available intents, the selected and rejected intents in the right-hand panel for each step.

Checking the conversation engine selection, will give you a first view into what has triggered the no-match response.

No available intents

Next, you will want to check the conversation flow, to make sure intents were available to the conversation engine to select from.

Check the conversation flow

  • Using the navigation menu on the left-hand side of your screen, hover over 'Design' and select 'Conversation'

  • View the conversation design view

  • Click on the Play ▶️ button in the action menu on the bottom center of your screen

  • The Conversational Player modal will open up in the right-hand panel

  • Walk through the conversational flow you want to test by clicking on the available intents present in the bottom of your screen

  • If you come to a point where 'No intents are available', this is an indication that your conversational flow is broken at this point and that the conversation engine has no intents to choose from

A few reasons why no intents might be available at a certain point of the conversation are:

  • Selected behaviours of your conversational components (no starting or open components are available). Check the behaviour on turns, scenes, and conversations to remedy.

  • Conditions on conversational components do not allow the conversation engine to consider any of the relevant components.

  • The app-user cadence has not been respected, and while looking for a user intent (or vice versa) - only app intents have been made available to the conversation engine.

No match due to semantic classification

When a no match is due to the fact that the interpreter could not match the user utterance to one of the intents from your semantic classifier, you will need to test this further by testing this user utterances and its variations against your semantic classifier in isolation.

Test semantic classification

  • Using the left-hand navigation menu, hover over 'Language Services' and select the language service you are using for your scenario

  • View the semantic classifier and it's list of intents and sub-intents

  • In the right-hand panel, paste the user utterance that caused a no-match in the input field, hit 'Enter'

  • View the detailed results from the semantic classifier in the right-hand panel

  • Click on the Inspect link in the right-hand panel to view the exact returned response from the LLM for further information

Once you have identified the issue with the semantic classification, you can start tweaking the instructions for your different intents, add an intent if needed and test the user utterances again in the classifier's test panel, following the steps above.

No messages found with passing conditions

In this case, when looking for a response message to serve, there were no messages available to the conversation engine to serve to the user. The reasons for this are mainly related to the fact that all available messages or app intents will have conditions that have not been met.

The first place to check is the conversational visualiser (see above for how to check the conversation engine reasoning), which will tell you which conditions passed and which didn't.

If all conditions passed, and a response intent did get matched, the next place to check is the message editor.

Check message conditions

  • Using the navigation menu on the left-hand side of your screen, hover over 'Design' and select 'Messages'

  • Using the filter buttons in the top right corner of your screen, filter down to the turn that contains the app intent that did not serve the correct message

  • On the bottom of each message tile, you can view the conditions for each message template - by clicking on the arrow next to Conditions.

Check the conditions on all the messages related to the matched app intent and make sure all potential use cases are covered.

For example, if you have a condition on a message: if attribute is set to true, then you should equally have on for the opposite use case, if attribute is false or a catch-all message with no conditions at all.

Wrong/unexpected response

The reponse that was generated by the application in the response message might be wrong or not what you had expected.

This can be due to a number of reasons, but mostly comes down to the following :

  • Misclassification: the user utterance did not get classified as expected, and therefore the related response for the wrong intent is served. This can be solved by following the steps mentioned in the No-match section of this documentation.

  • Erroneous response generation: the response generated through the RAG service and the LLM action is erroneous. To pinpoint the exact issue, and remedy it - you will need to follow these steps:

    • Test the RAG service in isolation

    • Test the LLM Action

  • Wrong attribute: the attribute referenced in the message that is being served, is not the correct attribute. You can verify this by checking the message in the message editor and making sure the referenced attribute is the same one as the one set by the LLM action.

Test the RAG service

  • Using the left-hand navigation menu, hover over 'Language Services' and select the RAG service you want to review

  • View the RAG service and it's list of topics

  • Click into the topic the LLM Action generating the erroneous response is referencing

  • In the right-hand panel, paste the user utterance that caused the response in the input field, hit the 'Run test' button

  • View the detailed results from the RAG service in the right-hand panel

  • Click on the Inspect link in the right-hand panel to view the exact returned response for further information

If you can see the wrong response from your RAG service, than you can now remedy this by reviewing the document you used to see if it contains the correct information. If that is not the case, update the sources you use for this topic and re-vectorise.

If the document does reference the correct response but it is not returned correctly through the response, go to the RAG service settings to update specific settings such as the model used, Top K, temperature, etc. For more information on these settings, you can read the more in-depth documentation about RAG service settings.

Test the LLM Action

  • Using the left-hand navigation menu, hover over 'Scenarios' and select the scenario for your AI Agent

  • Hover over Integrate, and select LLM Actions

  • View the LLM Actions from the LLM Actions overview, and select the one that is run on the intent that yielded the wrong response

  • In the tabs in the middle of the screen, select 'Prompt Configuration'

  • Review the system prompt in the system prompt section to detect if anything in the prompt might be yielding the wrong result

  • In the right-hand panel, paste the user utterance that caused the wrong response in the input field, and hit the 'Run test' button

  • View the detailed results from the LLM Action in the right-hand panel

  • Click on the Inspect link in the right-hand panel to view the exact returned response for further information and detecting any unexpected responses or behaviour

These are just a few of many things that you can test using the extensive test tooling for each component of OpenDialog, avoiding issues along the way.

Last updated

#421: November changes

Change request updated