Hello, my name is Mikhail Burtsev and I'm founder and leader of the Pavlov Project at Moscow Institute of Physics and Technology. And today I will tell you about Deepavlov Agent, which is open source framework for multi-scale conversational AI. So let's start with a question, why multi-skill is so important? It is important because customer experience spans multiple domains, like surveys, promotions, campaigns, customer service, technical supports, and many, many others. And usually to address every domain, every single domain, you need specific skill. So this is why we need to build multi-skilled digital assistant, and we need to have multiple conversational skills in our system. And this is, if you take a look at the
e-commerce assistants, like a modern complex dialogue systems, for example, here is a case of Alimia Assist, which is assistant at AliExpress. So you can see here that it's hybrid system with many different skills. For example, we have this assistant service with some slot filling engine, and we have customer service with a knowledge graph engine, and we have chatting service with a chat engine. So you see that it's combination of some business rules, of scripted scenarios, and with specific skills addressing different customer needs. So what is traditional way to build conversational systems right now? The most dominant approach is so-called modular dialogue system. So how it works? We have user, user have some prompt to the system, and this prompt is converted to the textual form and feed in natural language understanding module, which performs basically three functions, domain detection, intent detection, and entity entities detection in the input of the user. And then after these pre-processing, we have some formal description of the user input, which is also called semantic frame, where we have intent, here it's request movie, and we have entities. In this request, it's general comedy and date weekend, and then all this information goes to the dialogue manager. And the task of the dialogue manager is first to update current dialogue state, to make it up to date, to integrate this new information in the previous history of the dialogue, and then with this dialogue updated dialogue state, perform the action you need on the side of the system. So it consists from the dialogue state, from the policy or script, which decides what actions should be selected given the current dialogue state. And here in our example, we have action, which is request location, but this action is in some like internal system representation, and we need to convert this action into the natural language prompt. And here we have a last module of our system, which is natural language generation, which creates surface form of our request to the users. So with action request location, we have output in natural language, where are you. So this is basically how current systems are built, and mainly in this interview part, we have a lot of neural nets, deep learning models, which is used here, and in the dialogue manager part, we have some neural networks and a lot of rules and scripted dialogues. And also for natural language generation, we mostly have either retrieval models with some slot filling or templates. Okay, so then what is AI assistant lifecycle? How we are using, how we are building our digital assistant, our dialogue systems with this modular technology. So usually we start with some MVP, minimal viable product, with this like for NLU, we have some features and some models pre-trained for this domain. And on the side of dialogue manager, we have a few scripts, and it's very nice and clear
architecture, and we understand how it works, and it covers the most important aspects of the interaction between system and the user. And then we deployed this MVP into production, and the system starts to interact with the users. And here we understand that we need to increase coverage of the system, because users ask the same questions differently, because of language variability, so we need to add more features and make our natural language understanding part of our system more complex. And we also want to cover more functions, so we add more scripts on the side of our dialogue manager. And then we continue with more features and more scripts, more features, more scripts, until we reach so-called major AI assistant stage, which is actually a mess of features and scripts. And this is a solution which approached already a complexity, maximum complexity it can have, because all of these interdependent components. So now you're in a position that you cannot grow your product anymore. And what we want to do with our framework, with Deep Power of Agent, we want to break this picture, we want to go beyond this complexity ceiling of the current technology. So in our vision, AI assistant lifecycle starts with the same simple and clean, nice MVP. And then what you do, you test it, and then you just add it to already deployed system as only one of the conversational skills. And then if you want to add more functionality to your system, you just create your conversational skill and add it to your agent. This allows you to decompose complexity between agent, which is basically a skill orchestration framework, and conversational skills. And this provides you a very nice microservice
architecture, which can be scaled in much more complex, mature AI assistant. And it also gives you many nice features. For example, you can have default skills, you don't need to develop them by yourself, you can just need to plug in your own skills. And it's, as I've said, it's very scalable
architecture, because every skill is deployed as a microservice. And it's also very handy, because when you create a new product, or you want to create new skills, which is similar to the one which you already have, you can just reuse the old one and extend it for the new function or to integrate your product. And which is also important in our global development culture right now, that usually complex solutions are built with distributed teams. And this
architecture of skill orchestration and modular structure of your conversational agent allows you to distribute, maintaining and development of separate skills to different separate teams. So this making your work and coordination between skills much more organized and much more efficient. So this is what we want this is our vision. What we want to do, we want to have conversational skills and we want to have conversational orchestration level. So what we are doing right now to implement this vision. So we have started with the Deep Pavlov library. Deep Pavlov library is an open source library open source library for building
nlp pipelines and conversational skills for conversational AI. So you can have a specific
nlp models like named entity recognitions, coreference resolution, intent recognition, and self-detection, question answering, dialogue policy, dialogue history, language models, and so on. And then with our framework, you can combine these different components into some conversational skills for specific domains, for specific tasks. Like here, you can have some task oriented skill like restaurant booking. We can have factoid skill which allows you to answer factoid question and we can have chitchat skill. And then we have Deep Pavlov agent framework which orchestrates these skills. And as an example of this
architecture, I would like to present you
architecture of DreamSocialBot which is built with Deep Pavlov and Deep Pavlov agent framework for Alexa Prize competition, Alexa Prize challenge. And our team, because it's a university team, we have participated last year in the Alexa challenge and we're selected as one of the 10 teams out of 350 applications to develop solution which will be hosted inside Amazon Alexa and which can be evoked by Alexa chat common. So we used our Deep Pavlov agent to build this DreamSocialBot. So let's take a look how it works. So first we have user input and this user input is going to so-called annotators. It's the first stage of processing and annotators are also implemented as Deep Pavlov
nlp pipelines and run as
microservices and they are used to extract information from the user input. After that, this annotated user input goes to the dialogue state and dialogue state is like a shared memory between all
microservices like annotators or skills and so on. So this annotated user input and current dialogue state are used by skill selector to decide which skills are most relevant to the current dialogue state. And here for the competition we have developed about 25 different skills and some for use for weather, movies, books, like general, chat and so on. And then as I've said only some subset of these skills is selected and these skills are executed and produce response candidates. Every skill outputs response candidate and some confidence in its own response and all these candidates goes to candidate annotators. We need these annotators to be sure that no candidates might be harmful to our users. So we perform toxicity detection, dialogue termination detection and blacklist filtering of the response candidates. After that we have annotated response candidates and we perform final selection. So we have response selector which perform final selection of the final output of our system and this response can be also post-annotated and presented to the users. And so it's actually, as you see, it's multi-skill system and it all the elements of our pipeline are running asynchronously. So all the annotators and skills they are running asynchronously as a
microservices with only two points of synchronization. It's a skill selector and response selector and our dialogue state as I've said it serves as a shared memory. And so what we have right now, what we want to build, actually we have what we can call a Deep Pavlov
ecosystem. It's an
ecosystem of our products and on the left hand side here you can see it's our Deep Pavlov library which is library for creating
nlp pipelines and you can also use, you can also include third-party
nlp models like Hugging Face Transformers or NVIDIA NEMO as a components of your
nlp pipelines. And you then you can deploy these all these
nlp pipelines as
microservices in the
cloud. For example, for your AI assistant, these can be annotators or some NLU components. And then we have what we call Deep Pavlov Dream. Why we call this stuff Deep Pavlov Dream? Because what we want to do, we want to open source all our skills which we have developed for the competition to make them open source to provide like default distribution for conversational agent to be used by others. So then you don't need to develop your own chit-chat skill or some basic skills like weather and so on. For your solution you can just reuse our skills and then add your own task-oriented skills. So as I've said, Deep Pavlov Dream then is like a repository for different skills or skill templates and you can also use third-party skills here because our
architecture allows you to integrate other skills via
api. So you can use RASA or AI ML skills in your pipeline. So as a part of Deep Pavlov Dream. So then you run all these skills as a
microservices and in the center here we have Deep Pavlov Agent which perform orchestration of all these components, annotators and skills to produce like the final conversational experience of your AI assistant. And as I've said, what we want to do, we want to create which can be seen as a analog of the open source operating system. But here we have not operating system, but like a conversational or dialogue system because we have some applications and these applications are conversational skills and we have some services like annotators and we have
user experience. So you can add your applications to make
user experience good. And we want also to create an open hub for exchange of skills to make life of every developer for conversational, complex conversational assistance easier because then we exchange some general purpose skills. It will make development of the solution much more easy and faster in prototyping of your complex conversational agents. So now if we look at the stack of conversational AI technologies, then at the bottom of this stack we have ML platforms like PyTorch and
tensorflow. Then we have
nlp frameworks which integrates ML models into the
nlp pipelines and here we have spaCy and Transformers and NVIDIA NIMR or even Stanford
nlp and our DeepAvalove library belongs to this
nlp frameworks level. But not only to this level, it also takes a part of a conversational skills level because you can create an
nlp pipeline in the DeepAvalove which is exactly solves the problem of the conversational skill. So here on this conversational skill level we have Rasa or Pandora Bots or Elmont. These are
frameworks for creating separate conversational skills. And then on the highest level, on the level of multi-skill orchestration, I think that in the open source domain we provide a unique solution right now. It's our DeepAvalove agent which is framework for conversational skills orchestration. So with it, it allows you to deploy your skills and to manage your skills and to orchestrate them to provide a very nice
user experience. So this is, on this picture you can see the whole line of our framework which we are building to create our open source solution for the, like, full stack open source solution for the conversational AI. So thank you for your attention. That's all and I would be very glad to have your questions and to answer your questions. Thank you. Hi, once again. Hey, thanks for joining us today and giving this amazing talk. We have some questions from the audience. Shall we get right to it? Yeah. All right. Well, the first question is actually from my co-emcee, AJ, and he would like to know, how do you compare DeepAvalove AI to other conversational AI like those built with Microsoft's language understanding service, for example? Sorry, can you repeat the question? Yeah. How would you compare DeepAvalove.AI to other conversational AI like those built with Microsoft's language understanding service LUIS, for example? Okay. So the major difference here is that our project is fully open source. It's fully open source. So you can, with Microsoft LUIS, you are not so flexible as with our framework. And of course, we have, with this flexibility, there is a cost for mastering all the components. So you need better knowledge of components to build something similar to what you can build on LUIS and so on. Yeah. So it's a steeper learning curve, but you're more flexible. Yeah. Yeah. All right. Next question is from the audience member called Nick, and he would like to know, what will be the steps to integrate DeepAvalove agent as a chatbot for customer support service? Also, can annotators support localization? So right now, DeepAvalove agent, it can be integrated via
api. So you need like just as a REST service, you can use it as a REST service. And also, it has many components like annotators and skills. And these skills can use some
data from your internal databases. And then in this case, you need to provide also your own connectors to push the
data from your
database inside your conversational skills. And we have warpers for Telegram and for Amazon Alexa. But if it's for your company, and if you are not using Telegram, for example, and you're using your own like website widget or chatbot, like the interface built by your own company, then you need to use this REST
api, I think. Okay. Next question. AI chatbots generally give funny slash weird answers in extended conversation, even the bird-based ones. How long do you think it will take for us to expect coherent sound and consistent response? I think that here we should talk about two types of conversational skills. So in DeepAvalove, you have multi-skill
architecture where different skills can be implemented with a different background to the conversation. And you can use it for a lot of different things. So you have multi-skill
architecture where different skills can be implemented with a different background technology. Usually, if you have something like GPT-3 or some bird-based answer, you have a very weak control of what you got as an answer. But in other cases, you can use like AI ML or other approaches which allows you more strictly defined templates for your answers. You control what your bot says. Maybe it will not cover a very broad range of topics, but then in some narrow domain, you will have very sensible responses. And in DeepAvalove, you can integrate either these role-based approaches or neural generation approaches. So, and then decide at the current state of the dialogue, what is the most appropriate for your user, thus trying to combine both of these approaches. First one is role-based and script-based where you control dialogue flow. And on the other hand, it's neural generation where you can have much more funny and much more variable responses, but they might lose coherence after some number of conversational term. Okay, thanks a lot. Another question is from my co-MC Sergi. Since you are running a research lab, what do you think about OpenAI? Does it help to accelerate research? I think that it plays a role on both sides, like accelerating research and also hyping research, because they started with creating OpenAI Gym, and it's a very nice set of tasks for reinforcement learning, and many people use them. And it's very nice and very good. But then they released, or not released the GPT-2 and started discussion and like promoting their results without open sourcing solution. And this seems a bit strange. So it's like a bit strange organization, because on one hand, it has goals to produce open research, but on the other hand, it has very strict rules not to publish something and use it to make money. So it's not clear for
community how to understand goals of OpenAI right now. Okay. I think that's all the time we have right now for this Q&A session. But if people have questions or want to go deeper into Deep Pavlov, then you will be in your speaker room, right? So people can join you there on the Spatial.chat. And yeah, now I would like to thank you for your time and invite you to go to your speaker room.