What does it mean to build a conversational application with OpenDialog?
We've talked quite a bit about the underlying concepts behind OpenDialog and have walked through creating a scenario, explored a scenario to see its individual components and also looked at the conversational flows.
Here we take a much more practical look to get you started.
The aim of OpenDialog is to empower you to quickly create sophisticated conversational experiences.
This primarily happens in two ways. Firstly, by helping you define and leverage both conversational and wider application context and, secondly, by having a pro-active conversational engine that uses that context definition as much as possible.
In addition, we allow you to weave in interpreters and define actions as required, as a cohesive and coherent part of the overall application.
In this section, we are going to provide a high-level, quick-fire overview of these concepts and in subsequent sections, we will dive into the specific details of each.
The journey of helping you manage context starts with attributes. Attributes are the way you can describe anything that is relevant to the conversational application around its environment.
Attributes are captured within different contexts. Contexts are like buckets that help us separate and related different types of attributes. There are contexts such as the user context or the conversation context that store attributes related to that. We will be looking at each in more detail later, but for now just keep in mind that anything describable is an attribute, and each attribute will be stored in a context.
The next bit is conversational context and flow. The OpenDialog Designer enables you to define both conversational context and flow through a single conceptual framework. The highest level abstraction is that of scenario, with scenarios able to contain multiple conversations, a conversation contains scenes and finally, scenes can contain turns, with each turn having at least a request intent and possibly response intents.
The conversation engine then uses that definition to decide:
The current conversational state
The possible follow-up states
The conversational state will let us know which (if any) Scenario is currently active, and then within that Scenario which (if any) Conversation is currently ongoing, and then within the Conversation which (if any) Scene is selected and finally within the Scene which (if any) Turn is selected. Each turn has at least one Request Intent and may have Response Intents. Based on the Request Intent the conversation engine will look to select an appropriate response intent or it can perform a Transition to another Conversation or Scene, and attempt to find an appropriate intent there.
Yes, that is quite a journey but it provides us with a very flexible way of capturing context to a useful degree of detail! The OpenDialog Designer makes it very easy to capture the different stages.
The current conversational state will then determine what are the possible next states.
At the highest level, a user is only ever in one of two states. They are either in an ongoing conversation or they are not in an ongoing conversation.
If no ongoing conversation is selected then when an utterance event takes place (an utterance event is just a fancy way to say that someone said something) the job of the conversation engine is to determine what (if any) Turn the user could be placed in (and by consequence what the ongoing conversation is).
To do this the Conversation Engine is going to examine all active scenarios, look at all the conversations that have been given the STARTING behaviour, find all the Scenes that have a STARTING behaviour, then find Turns with a STARTING behaviour.
From those Turns, it is going to use the interpreter associated with each turn to attempt to interpret the user utterance. The interpreter with the highest confidence score will win, and the user will be positioned in that conversational state.
At that point, the Conversation Engine will consider what are the Response Intents or whether any transitions are defined and move the conversational state appropriately.
If the user is in an ongoing conversation and we have an incoming utterance we will attempt to match that utterance against all the possible turns within a given scene.
If the Conversation Engine does not match any of the possible intents it will create its own intent - called a No-Match intent and then attempt to match that! The Conversation Engine starts with what is called a
TurnNoMatch, which means it was looking for a Turn but didn't manage to find one and it will now look for a Turn that can handle a TurnNoMatch. It then escalates to a
SceneNoMatch, then a
ConversationNoMatch and finally the globalNoMatch. This cascading failure allows us to capture "no matches" at the appropriate level and attempt to recover. We will be looking at more specific examples further on in the documentation
A key aspect of conversational behaviour is the interpretation of user utterances. The way OpenDialog handles this is by using interpreters. Interpreters are assigned to individual user-related intents. When a user utterance is provided all the relevant interpreters based on conversational context will be queried and the interpreter with the highest confidence score will be matched.
Interpreters can also generate expected attributes. These are pieces of information that we expect to find within a user utterance. If an interpreter finds an expected attribute it will store it in the context we provided, this enables us to then use it.
For example, for the phrase "I would like to order size 7 shoes", we can have an expected attribute of;
shoe_sizethat we can direct the conversation engine to store in the
User context. We can then use conditions throughout the conversation to determine whether we have this piece of information or not.
The last pieces of the puzzle are actions. Actions enable us to interact with external services or simply perform some specific type of computation that we can then feed back into the context.
Actions take attributes as their inputs and provide attributes as their outputs. This gives us a consistent way of doing something and then providing that information back to our conversational application so that it can use it in its interactions with the user.