A Process Handling AI Agent
OpenDialog Pattern for managing multiple steps in a process through an AI Agent
Conversational AI Agents typically need to support three modes of interaction
Handling wide range of questions across a number of topics.
Guiding the user through a specific process with multiple steps while executing appropriate actions in backend systems to support that process.
A combination of question-answering coupled with stepping through a process.
The Quick Start AI agent and the Start from Scratch AI Agent are both focussed on handling questions on a wide range of topics. Here we delve into the second aspect, which is handling processes and look at how it can be combined with the question-answering.
Template available
To get started with the Process Handling Agent in OpenDialog visit the "Create a new scenario" page and select the Process AI Agent template.
What's special about processes?
Handling processes needs specific attention because they have a well defined series of steps (and sequence) and a well-defined end goal. For example, if we need to collect a few different pieces of information in order to finalise an appointment booking for a user we cannot achieve that final goal until all the pieces of information are collected (just as would happen in the real world).
As such, while we can allow the user to ask question and go "off-topic" during a process, we also need to design a conversation that drives back towards the key goal (or enables the user to abandon the process).
In addition, we need to carefully consider what types of questions we want to be able to handle while in a process. While we could cover any range of topics, realistically we want to focus on issues that are relevant to a process. Just as in a real-life conversation you would not veer wildly off-topic if you are trying to book an appointment, for example, we similarly need to bring focus to our processes. Our AI Agent needs to be empowered to say no to some things so that it can focus on the process at hand.
The Process Handling Agent illustrates the OpenDialog conversation patterns we use to support such processes in OpenDialog and can be a great starting point for your process handling AI Agents.
Representing Process Steps as Scenes in OpenDialog
A process could be anything from an appointment booking, an insurance claims process, a triage of a customer support issue. It typically consists of a series of steps, which each step achieving a specific sub-goal of the overall process. For example, in appointment booking, we might first want to pick who to book an appointment with, then examine their availability and finally pick a specific date.
A user will move from one step to the other completing the process, although they can also move backwards and repeat steps as they may change their mind.
In OpenDialog we suggest representing the overall Goal as a Conversation while each step of the Process is a Scene in that Conversation.
So an Appointment Booking Conversation for a doctor would look something like this:
Each scene has a clear goal. At each scene (or step) the user can complete the goal and move forward or they can abandon the process and they will move back to the Topic Conversation.
Name your scenes in easy to understand, human friendly ways so that you can easily track what is going on!
Worth noting that in the Topic Conversation we have a specific Topic, called Book An Appointment which will detect if the user is asking to book an appointment which will direct them to the Appointment Booking Conversation.
How "big" or "small" should a sub-goal be?
Sub-goals can involve asking for just one piece of information, or multiple elements. For example, you can imagine a single scene asking for 1. the person to book an appointment with and 2. what is the preferred date for the appointment.
There is no specifically wrong or right answer. These are design choices based on what you are trying to achieve, how you are collecting the information, what backend integrations (i.e. what API actions you are calling) at the same time and how you design your LLM Actions (prompts) to collect information. If in doubt, we recommend starting with smaller discreet steps for more control and then considering whether they should be combined.
A Sample Process Scene
Every scene in the Appointment Booking Conversation (or every step in our Process) follows the same pattern. Let's example that Pattern here:
Starting Turn: Step Introduction
There is one starting turn called Step Introduction. The purpose of this turn is to setup the scene for our user and explain the sub-goal. It will present the user with an appropriate welcome to the step and explain what we are trying to achieve.
The Step Introduction turn starts with an APP intent. There is no direct response within the turn. This means that the Conversation Engine will send the message associated with the APP intent to the user and then will set the context to the overall scene waiting for input from the user and looking for turns with the Open behavior to handle user utterances .
Hold on - how do we even get to Step Introduction turn?
Great question! Remember the Topic conversation had a Book Appointment booking topic. A USER intent from there transitions us to the Appointment Booking Conversation. The conversation engine will then look for a start scene (our first step!) within the Appointment Booking Conversation, with a starting turn (our Step Introduction turn) and given that there is an application intent there it will use it to formulate a response to the user.
This approach gives you several benefits. Since you are directing the user from one conversation to the "top" of another conversation without "hard-coding" a specific scene or turn you completely isolate the two. If you want to introduce a different starting scene with different behaviour you can just concentrate on the Appointment Booking conversation. You can also have multiple starting scenes with different conditions based on the type of user or the amount of information you already have about what needs to happen in the process.
Ok, with the introduction to the step out of the way let's turn our attention to the other turns.
Every other turn in the scene will start with a USER intent since each other turn represents a potential response to that introduction turn with the APP intent.
We will start with the Continue or Complete Step turn.
Continue or Complete Step
Imagine the following dialog:
This dialog got the user from the Topic conversation and dropped them into the appointment booking process. At this point the user can confirm that they are happy to proceed, or they might change their mind. While we have buttons to help the user they can, of course, just express that using natural language.
The purpose of the Continue or Complete Step is to define what happens if the user confirms (or , in general, says something or performs an action that leads to the conclusion that this step has been achieved).
For example, consider the following interaction:
The "Continue or Complete" turn has the "Confirm" incoming intent that led us from "Step 0 - Welcome" to "Step 1 - Select Doctor" and the "WelcomeToDoctorChoice" outgoing intent.
There are two reasoning steps that led us to this outcome - we fist acknowledge the user choice and we then move the conversation to the next step. By separating them we can adjust them to the user and context with fine-grained control.
Confirm user choice
The conversation engine interpreted "sure buddy let's go ahead"
as a Confirm intent (using a Semantic Classifier). That Confirm intent had an APP response within the turn as shown below.
Since the user confirmed we then provide a response, which is "Great, let's get started"
.
Move conversation context to next step
Now, turn your attention to the Virtual Intent defined the the ConfirmResponse intent.
We are defining a Virtual Intent called GoToNextStep
. We use this virtual intent to separate the confirmation turn and confirmation text ("great, let's get started") from the turn that decides a. where to take the user next and b. what message to display after the user has changed context. This leads to more flexible, natural conversation while keeping us in control!
The GoToNextStep
intent is in an Open turn in our Step 0 conversation (the "User Routing" turn) and provides a transition to the next step - in this case the Step 1 Scene. In Step 1 a starting turn with starting APP intent presents us with the doctor choice.
This same pattern allows us to move forward, backwards or in any direction we need in the process. We can, for example, also re-enter Step 1 from say Step 2 if the user chooses to go back a step but with different connecting text - e.g. something like "Let's get you back to doctor choice" - without having to change Step 1. Virtual intents enable us to simulate user requests, keeping the USER-APP or APP-USER cadence and flexibly connect different contexts while re-using the same conversation elements.
What is a Virtual Intent?
A Virtual Intent is a USER intent that we explicitly inject in the conversation following an APP intent. It enables us to emulate the user and move the conversation forward as if the user had actually (and not virtually) said something that would be interpreted as that intent.
Virtual intents are useful for a number of reasons:
They enable us to keep a conversation cadence of USER-APP and connect multiple APP intents in a series (up to five) without forcibly chaining the APP intents. We are simulating a conversation by injecting USER intents that means we have much more flexibility around the outcome. Rather than hard-coding an outcome we are asking the conversation engine to figure out what it would do if the user said something. This is especially useful when we are changing conversation context (i.e. moving from one scene to the other).
Since we are connecting APP intents together we can have connecting text (such as "Great, let's get started") that is relevant to the outgoing context - in this case the Step 0 Scene - but not relevant to the in coming next context - the Step 1 Scene. This supports reuse while leaving the conversation fluid and natural.
They enable us to reuse sections of a conversation or pick specific behavior by essentially telling the conversation engine "even if the user said intent X treat it as if they said intent Y"
Virtual intents are supported in the Analyse view of a conversation and in the conversation player in Conversation Design so you can always see what the impact of this intent injection will be.
Support Intents
We now turn our attention to the supporting intents in the scene.
NoMatch
Question
TalkToHuman
No Match
This is a local No Match (intent.core.TurnNoMatch
) that will be selected if the conversation engine does not find an intent that matches what the interpreter output is. We can use this turn to recover the conversation, asking the user to rephrase, etc. Of course the output can be driven by an LLM action and be contextuall and conversationally relevant, but it recognises that we were not confident enough to treat it as something else providing significant control.
Question
The Question intent will match to general questions and statements based on the definition of our semantic classifier. We can then connect it to an appropriate LLM action to answer the question. We can also decide to refocus the user on the goal through another virtual intent.
The configuration above shoes that after we've provided a response to the question we then inject an "AskAboutDoctor" intent.
This intent is in our Routing turn and transitions us back to the top of the same scene we are in which will, in turn, re-present the doctor choice widget. This enables us to reuse the exact same design and makes it more scalable moving forward as we can easily make different choices or introduce other steps by intervening at just the right point.
TalkToHuman
Finally, the TalkToHuman intent would specifically respond to a request to a human agent and handover the conversation or provide instructions on how to do so as apporpriate.
Last updated