DeepPavlov Agent: Open-source Framework for Multiskill Conversational AI

Rate this content
Bookmark

DeepPavlov Agent is a framework designed to facilitate the development of scalable and production-ready multi-skill virtual assistants, complex dialogue systems, and chatbots. Key features of DeepPavlov Agent include (1) scalability and reliability in the high load environment due to micro-service architecture; (2) ease of adding and orchestrating conversational skills; (3) shared dialogue state memory and NLP annotations accessible to all skills.


DeepPavlov DREAM is a socialbot platform with a modular design with the main components such as annotators, skills and selectors run as independent services. These components are configured and deployed using Docker containers. It allows developers to focus on application development instead of focusing on the intrinsic details of the manual low-level infrastructure configuration.

Mikhail Burtsev
Mikhail Burtsev
27 min
02 Jul, 2021

Video Summary and Transcription

The Pavlov Agent is an open source framework for multi-skill conversational AI, addressing the need for specific skills in different domains. The microservice architecture allows for scalability and skill reuse. The Deep Pavlov Library enables the creation of NLP pipelines for different skills. The Deep Pavlov Dream serves as a repository for skills and templates, while the Deployment Agent orchestrates all components for a seamless conversational experience. DeepLove.AI offers more flexibility and customization compared to Microsoft's LUIS service.

Available in Español

1. Introduction to Pavlov Agent

Short description:

Hello, my name is Mikhail Burtsev, and I'm founder and leader of the Pavlov project at Moscow Institute of Physics and Technology. Today, I will tell you about the Pavlov Agent, an open source framework for multi-skill conversational AI. Multi-skill is important because customer experience spans multiple domains, and to address each domain, specific skills are needed. Traditional conversational systems use a modular dialog system, where user prompts are converted to textual form and processed by a natural language understanding module. The Dialog Manager updates the dialogue state and performs actions based on the current state. Current systems rely on neural nets, deep learning models, and rules for dialogue management and natural language generation. The AI assistant lifecycle begins with a Minimal Viable Product, featuring pre-trained models for NLU and scripts for the Dialog Manager.

Hello, my name is Mikhail Burtsev, and I'm founder and leader of the Pavlov project at Moscow Institute of Physics and Technology. And today, I will tell you about the Pavlov Agent, which is open source framework for multi-skill conversational AI.

So let's start with a question, why multi-skill is so important? It is important because customer experience spans multiple domains, like surveys, promotions, campaigns, customer service, technical supports and many, many others. And usually to address every domain, every single domain, you need specific skill. So this is why we need to build multi-skilled digital assistant, and we need to have multiple conversational skills in our system.

And this is, if you take a look at the e-commerce assistants, like a modern complex dialogue systems. For example, here is a case of Alimia Assist, which is an assistant at AliExpress. So you can see here that it's hybrid system with many different skills. For example, we have these assistant service with some slot filling engine, and we have customer service with a knowledge graph engine, and we have chatting service with a chat engine. So you see that it's combination of some business rules, of scripted scenarios, and with specific skills addressing different customer needs.

So what is traditional way to build conversational systems right now? The most dominant approach is so-called modular dialog system. So how it works? We have user, user have some prompt to the system, and this prompt is converted to the textual form and feed in natural language understanding module, which performs basically three functions. Domain detection, intent detection, and entities detection in the input of the user. And then after these preprocessing, we have some formal description of the user input, which is also called semantic frame, where we have intent, here it's request movie, and we have entities. In this request, it's general comedy and date weekend. And then all this information goes to the Dialog Manager. And the task of the Dialog Manager is first to update current dialogue state, to make it up to date, to integrate this new information in the previous history of the dialogue, and then with these updated dialogue state, perform the action you need on the side of the system.

So it consists from the dialogue state and from the policy, or script, which decides what action should be selected, given the current dialogue state. And here in our example we have action which is request location. But this action is in some, like internal system representation. And we need to convert this action into the natural language prompt. And here we have the last module of our system, which is natural language generation. Which creates surface form of our request to the users. So we, with action request location, we have output in natural language where are you. So this is basically how current systems are built. And mainly in this interview part, we have a lot of neural nets, deep learning models, which is used here, and in the dialogue manager part, we have some neural networks and a lot of rules and scripted dialogues. And also for natural language generation, we mostly have either retrieval models with some slot filling or templates.

Okay, so then what is AI assistant lifecycle? How we are building our digital assistant, our dialogue systems, with this modular technology. So usually we start with some MVP, a Minimal Viable Product. For NLU, we have some features and some models pre-trained for this domain, and on the side of Dialog Manager, we have a few scripts and it's very nice and clear architecture, and we understand how it works.

2. Advantages of Decomposing Complexity

Short description:

We want to go beyond the complexity ceiling of the current technology by decomposing complexity between the agent and conversational skills. Our microservice architecture allows for scalability and the reuse of existing skills. With the Deep Pavlov Library, we can build NLP pipelines and combine different components into conversational skills for specific domains and tasks.

It covers the most important aspects of the interaction between system and the user. And then, we deployed this MVP into production and this system starts to interact with the users. And here, we understand that we need to increase coverage of the system because users ask the same questions differently because of language variability, so we need to add more features and make our natural language understanding part of our system more complex.

And we also want to cover more functions, so we add more scripts on the side of our dialogue manager. And then, we continue with more features and more scripts, more features, more scripts, we reach so-called major AI assistant stage, which is actually a mess of features and scripts. And this is a solution which approach it already, complexity, maximum complexity it can have because of all of these interdependent components. So now you're in a position that you cannot grow your product anymore.

And what we want to do with our framework, with Deep Power of Agent, we want to break this picture. We want to go beyond this complexity ceiling of the current technology. So in our vision, AI system lifecycle starts with the same simple and clear and nice MVP. And then, what you do, you test it and then you just add it to already deployed system as only one of the conversational skills. And then, if you want to add more functionality to your system, you just create a new conversational skill and add it to your agent. This allows you to decompose complexity between agent, which is basically a skill orchestration framework and conversational skills. And this provides you a very nice microservice architecture, which can be scaled in much more complex major IIS system.

And it also gives you many nice features. For example, you can have default skills. You don't need to develop them by yourself. You can just need to plug in your own skills. And it's, as I've said, it's a very scalable architecture because every skill is deployed as a microservice. And it's also very handy because when you create a new product or you want to create new skills, which are similar to the one which you already have, you can just reuse the old one and extend it for the new function, or to integrate in your product. And which is also important in our global development culture right now, that usually complex solutions are built with distributed teams. And this architecture of skill orchestration and modular structure of your conversational agent allows you to distribute maintaining and development of separate skills to different separate teams. So, this is making your work and coordination between skills much more organized and much more efficient. So this is what we want, this is our vision. What we want to do, we want to have conversational skills and we want to have conversational orchestration level. So what we are doing right now to implement this vision. So we have started with the Deep Pavlov Library. Deep Pavlov Library is an open-source library for building NLP pipelines and conversational skills for conversational AI. So you can have specific NLP models like named entity recognition, coreference resolution, intent recognition, and self-detection, question answering, dialogue policy, dialogue history, language models, and so on. And then, with our framework, you can combine these different components into some conversational skills for specific domains, for specific tasks, like here.

3. Deep Pavlov Agent Framework and DreamSocialBot

Short description:

We have task-oriented skills, factored skills, and chit-chat skills, all orchestrated by the deep Pavlov agent framework. An example of this architecture is the DreamSocialBot, built for the Alexa Prize competition. User input is processed by annotators, which extract information and update the dialog state. A skill selector determines the most relevant skills, and a subset of these skills is executed to produce response candidates. Candidate annotators ensure user safety, and a response selector performs final selection. This multi-skill system runs asynchronously, with the deep Pavlov library serving as the foundation for creating NLP pipelines.

We can have some task-oriented skills, like restaurant booking. We can have factored skills, which allows you to answer a factored question, and we can have chit-chat skills. And then, we have deep Pavlov agent framework, which orchestrates these skills.

And as an example of this architecture, I would like to present you architecture of DreamSocialBot, which is built with deep Pavlov and deep Pavlov agent framework for Alexa Prize competition, Alexa Prize Challenge. And our team, because it's a university team, we have participated last year in the Alexa Challenge and were selected as one of the 10 teams out of 350 applications to develop a solution which will be hosted inside Amazon Alexa and which can be evoked by Alexa Chat Commons. So, we used our deep Pavlov agent to build this DreamSocialbot. So, let's take a look at how it works.

So, first we have user input. And this user input is going to so-called annotators. It's the first stage of processing. Current annotators are also implemented as deep Pavlov NLP pipelines and run as microservices and they are used to extract information from the user input. After that, this annotated user input goes to the dialog state and dialog state is like a shared memory between all microservices like annotators or scales and so on. So, this annotated user input and current dialog state are used by a skill selector to decide which skills are most relevant to the current dialog state. And here, for the competition, we have developed about 25 different skills and some for use for weather, movies, books, like general chart and so on.

And then, as I've said, only some subset of these skills is selected and these skills are executed and produce response candidates. Every skill outputs response candidate and some confidence in its own response. And all these candidates go to candidate annotators. We need these annotators to be sure that no candidates might be harmful to our users. So we perform toxicity detection, dialog termination detection, and blacklist filtering of the response candidates. After that, we have annotated response candidates and we perform final selection. So we have response selector which performs final selection of the final output of our system. And this response can be also post-annotated and presented to the users. And so it's actually, as you see, it's a multi-skill system and it all the elements of our pipeline are running asynchronously. So all the annotators and skills, they are running asynchronously as a microservices with only two points of synchronization, it's a skill selector and response selector and our dialogue state, as I've said, serves as a shared memory. And so what we have right now, what we want to build, actually we have what we can call a deep Pavlov ecosystem. It's a ecosystem of our products and on the left-hand side here, you can see it's our deep Pavlov library, which is library for creating NLP pipelines. And you can also use, you can also include third-party NLP models like a Hugging Face Transformers or NVIDIA NIMU as the components of your NLP pipelines. And you, then you can deploy these, all these NLP pipelines as microservices in the cloud. For example, for your AI assistant, these can be annotators or some NLP components. And then we have what we call deep Pavlov dream.

4. Deep Pavlov Dream and Deployment Agent

Short description:

We want to open source our skills and provide them as a default distribution for conversational agents. You can reuse our skills and add your own task-oriented skills. The deep Pavlov dream is a repository for different skills and templates, allowing the integration of third-party skills. The Pavlov agent orchestrates all components to produce the final conversational experience. We aim to create an open hub for exchanging skills, making development easier and faster. Our DeepPowell library integrates ML models into NLP pipelines and solves conversational skill problems. We provide a unique solution with our Deployment Agent for multi-skill orchestration.

Why we call this stuff deep Pavlov dream? Because what we want to do, we want to open source all our skills, which we have developed for the competition to make them open source to provide like default distribution for conversational agent to be used by others. So then you don't need to develop your own chit chat skill or some basic skills like weather and so on. For your solution, you can just reuse our skills and then add your own task-oriented skills.

So as I've said, deep Pavlov dream then is like a repository for different skills or templates. And you can also use third-party skills here, because our architecture allows you to integrate other skills via API. So you can use Rasa or AI ML skills in your pipeline as a part of the Pavlov dream. So then you run all these skills as microservices. And in the center here we have the Pavlov agent, which performs orchestration of all these components, annotators, and skills, to produce the final conversational experience of your AI assistant.

And as I've said, what we want to do, we want to create, which can be seen as an analogue of the open source operating system. But here we have not operating system, but like a conversational or dialog system. Because we have some applications, and these applications are conversational skills, and we have some services like annotators, and we have user experience. So you can add your applications to make user experience good. And we want also to create open hub for exchange of skills to make life of every developer for conversational, complex conversational assistance easier, because then we exchange some general purpose skills. It will make the development of the solution much more easy and faster in prototyping of your complex conversational agents.

So now, if we look at the stack of conversational AI technologies, then at the bottom of this stack we have ML platforms like PyTorch and TensorFlow. Then we have NLP frameworks, which integrates ML models into the NLP pipelines. And here we have SpaCy and Transformers and NVIDIA NIMR or even Stanford NLP, and our DeepPowell library belongs to this NLP frameworks level. But not only to this level, it also takes a part of a conversational skills level because you can create an NLP pipeline in the DeepPowell, which exactly solves the problem of the conversational skill.

So here, on this conversational skill level, we have Rasa or Pandora Bots or LMONT. These are frameworks for creating separate conversational skills. And then on the highest level, on the level of multi-skill orchestration, I think that in the open source domain, we provide a unique solution right now. It's our Deployment Agent which is a framework for conversational skills orchestration. So with it, it allows you to deploy your skills and to manage your skills and to orchestrate them to provide a very nice user experience. So this picture, you can see the whole line of our framework which we are building to create our open-source solution for the, like, full-stack open-source solution for the conversational AI. So thank you for your attention. That's all. And I would be very glad to have your questions and to answer your questions. Thank you. Hi, once again. Hey, thanks for joining us today and giving this amazing talk.

5. Comparison with Microsoft LUIS

Short description:

DeepLove.AI is fully open source, providing more flexibility compared to Microsoft's Language Understanding service (LUIS). While there is a steeper learning curve, our framework allows for greater customization and control over the components.

We have some questions from the audience. Shall we get right to it? Yep. All right. Well, the first question is actually from my co-emcee, AJ, and he would like to know, how do you compare DeepLove AI to other conversational AI, like those built with Microsoft's Language Understanding service, for example? Sorry. Sorry, can you repeat the question? How would you compare DeepLove.AI to other conversational AI, like those built with Microsoft's Language Understanding service, LUIS, for example? Okay. So the major difference here is that our project, it's fully open source. With Microsoft LUIS, you are not as flexible as with our framework, and of course, with this flexibility, there is a cost for mastering all the components. So you need better knowledge of components to build something similar to what you can build on LUIS and so on. Yes, so it is a steeper learning curve, but you are more flexible. Yes. Yes, all right.

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Charlie Gerard's Career Advice: Be intentional about how you spend your time and effort
6 min
Charlie Gerard's Career Advice: Be intentional about how you spend your time and effort
Featured Article
Charlie Gerard
Jan Tomes
2 authors
When it comes to career, Charlie has one trick: to focus. But that doesn’t mean that you shouldn’t try different things — currently a senior front-end developer at Netlify, she is also a sought-after speaker, mentor, and a machine learning trailblazer of the JavaScript universe. "Experiment with things, but build expertise in a specific area," she advises.

What led you to software engineering?My background is in digital marketing, so I started my career as a project manager in advertising agencies. After a couple of years of doing that, I realized that I wasn't learning and growing as much as I wanted to. I was interested in learning more about building websites, so I quit my job and signed up for an intensive coding boot camp called General Assembly. I absolutely loved it and started my career in tech from there.
 What is the most impactful thing you ever did to boost your career?I think it might be public speaking. Going on stage to share knowledge about things I learned while building my side projects gave me the opportunity to meet a lot of people in the industry, learn a ton from watching other people's talks and, for lack of better words, build a personal brand.
 What would be your three tips for engineers to level up their career?Practice your communication skills. I can't stress enough how important it is to be able to explain things in a way anyone can understand, but also communicate in a way that's inclusive and creates an environment where team members feel safe and welcome to contribute ideas, ask questions, and give feedback. In addition, build some expertise in a specific area. I'm a huge fan of learning and experimenting with lots of technologies but as you grow in your career, there comes a time where you need to pick an area to focus on to build more profound knowledge. This could be in a specific language like JavaScript or Python or in a practice like accessibility or web performance. It doesn't mean you shouldn't keep in touch with anything else that's going on in the industry, but it means that you focus on an area you want to have more expertise in. If you could be the "go-to" person for something, what would you want it to be? 
 And lastly, be intentional about how you spend your time and effort. Saying yes to everything isn't always helpful if it doesn't serve your goals. No matter the job, there are always projects and tasks that will help you reach your goals and some that won't. If you can, try to focus on the tasks that will grow the skills you want to grow or help you get the next job you'd like to have.
 What are you working on right now?Recently I've taken a pretty big break from side projects, but the next one I'd like to work on is a prototype of a tool that would allow hands-free coding using gaze detection. 
 Do you have some rituals that keep you focused and goal-oriented?Usually, when I come up with a side project idea I'm really excited about, that excitement is enough to keep me motivated. That's why I tend to avoid spending time on things I'm not genuinely interested in. Otherwise, breaking down projects into smaller chunks allows me to fit them better in my schedule. I make sure to take enough breaks, so I maintain a certain level of energy and motivation to finish what I have in mind.
 You wrote a book called Practical Machine Learning in JavaScript. What got you so excited about the connection between JavaScript and ML?The release of TensorFlow.js opened up the world of ML to frontend devs, and this is what really got me excited. I had machine learning on my list of things I wanted to learn for a few years, but I didn't start looking into it before because I knew I'd have to learn another language as well, like Python, for example. As soon as I realized it was now available in JS, that removed a big barrier and made it a lot more approachable. Considering that you can use JavaScript to build lots of different applications, including augmented reality, virtual reality, and IoT, and combine them with machine learning as well as some fun web APIs felt super exciting to me.


Where do you see the fields going together in the future, near or far? I'd love to see more AI-powered web applications in the future, especially as machine learning models get smaller and more performant. However, it seems like the adoption of ML in JS is still rather low. Considering the amount of content we post online, there could be great opportunities to build tools that assist you in writing blog posts or that can automatically edit podcasts and videos. There are lots of tasks we do that feel cumbersome that could be made a bit easier with the help of machine learning.
 You are a frequent conference speaker. You have your own blog and even a newsletter. What made you start with content creation?I realized that I love learning new things because I love teaching. I think that if I kept what I know to myself, it would be pretty boring. If I'm excited about something, I want to share the knowledge I gained, and I'd like other people to feel the same excitement I feel. That's definitely what motivated me to start creating content.
 How has content affected your career?I don't track any metrics on my blog or likes and follows on Twitter, so I don't know what created different opportunities. Creating content to share something you built improves the chances of people stumbling upon it and learning more about you and what you like to do, but this is not something that's guaranteed. I think over time, I accumulated enough projects, blog posts, and conference talks that some conferences now invite me, so I don't always apply anymore. I sometimes get invited on podcasts and asked if I want to create video content and things like that. Having a backlog of content helps people better understand who you are and quickly decide if you're the right person for an opportunity.What pieces of your work are you most proud of?It is probably that I've managed to develop a mindset where I set myself hard challenges on my side project, and I'm not scared to fail and push the boundaries of what I think is possible. I don't prefer a particular project, it's more around the creative thinking I've developed over the years that I believe has become a big strength of mine.***Follow Charlie on Twitter
TensorFlow.js 101: ML in the Browser and Beyond
ML conf EU 2020ML conf EU 2020
41 min
TensorFlow.js 101: ML in the Browser and Beyond
Discover how to embrace machine learning in JavaScript using TensorFlow.js in the browser and beyond in this speedy talk. Get inspired through a whole bunch of creative prototypes that push the boundaries of what is possible in the modern web browser (things have come a long way) and then take your own first steps with machine learning in minutes. By the end of the talk everyone will understand how to recognize an object of their choice which could then be used in any creative way you can imagine. Familiarity with JavaScript is assumed, but no background in machine learning is required. Come take your first steps with TensorFlow.js!
Using MediaPipe to Create Cross Platform Machine Learning Applications with React
React Advanced Conference 2021React Advanced Conference 2021
21 min
Using MediaPipe to Create Cross Platform Machine Learning Applications with React
Top Content
This talk gives an introduction about MediaPipe which is an open source Machine Learning Solutions that allows running machine learning models on low-powered devices and helps integrate the models with mobile applications. It gives these creative professionals a lot of dynamic tools and utilizes Machine learning in a really easy way to create powerful and intuitive applications without having much / no knowledge of machine learning beforehand. So we can see how MediaPipe can be integrated with React. Giving easy access to include machine learning use cases to build web applications with React.
TensorFlow.JS 101: ML in the Browser and Beyond
JSNation Live 2021JSNation Live 2021
39 min
TensorFlow.JS 101: ML in the Browser and Beyond
Discover how to embrace machine learning in JavaScript using TensorFlow.js in the browser and beyond in this speedy talk. Get inspired through a whole bunch of creative prototypes that push the boundaries of what is possible in the modern web browser (things have come a long way) and then take your own first steps with machine learning in minutes. By the end of the talk everyone will understand how to recognize an object of their choice which could then be used in any creative way you can imagine. Familiarity with JavaScript is assumed, but no background in machine learning is required. Come take your first steps with TensorFlow.js!
An Introduction to Transfer Learning in NLP and HuggingFace
ML conf EU 2020ML conf EU 2020
32 min
An Introduction to Transfer Learning in NLP and HuggingFace
In this talk I'll start introducing the recent breakthroughs in NLP that resulted from the combination of Transfer Learning schemes and Transformer architectures. The second part of the talk will be dedicated to an introduction of the open-source tools released HuggingFace, in particular our Transformers, Tokenizers and Datasets libraries and our models.
Observability with diagnostics_channel and AsyncLocalStorage
Node Congress 2023Node Congress 2023
21 min
Observability with diagnostics_channel and AsyncLocalStorage
Modern tracing products work by combining diagnostics_channel with AsyncLocalStorage. Let's build a tracer together to see how it works and what you can do to make your apps more observable.

Workshops on related topic

Hands on with TensorFlow.js
ML conf EU 2020ML conf EU 2020
160 min
Hands on with TensorFlow.js
Workshop
Jason Mayes
Jason Mayes
Come check out our workshop which will walk you through 3 common journeys when using TensorFlow.js. We will start with demonstrating how to use one of our pre-made models - super easy to use JS classes to get you working with ML fast. We will then look into how to retrain one of these models in minutes using in browser transfer learning via Teachable Machine and how that can be then used on your own custom website, and finally end with a hello world of writing your own model code from scratch to make a simple linear regression to predict fictional house prices based on their square footage.
The Hitchhiker's Guide to the Machine Learning Engineering Galaxy
ML conf EU 2020ML conf EU 2020
112 min
The Hitchhiker's Guide to the Machine Learning Engineering Galaxy
Workshop
Alyona Galyeva
Alyona Galyeva
Are you a Software Engineer who got tasked to deploy a machine learning or deep learning model for the first time in your life? Are you wondering what steps to take and how AI-powered software is different from traditional software? Then it is the right workshop to attend.
The internet offers thousands of articles and free of charge courses, showing how it is easy to train and deploy a simple AI model. At the same time in reality it is difficult to integrate a real model into the current infrastructure, debug, test, deploy, and monitor it properly. In this workshop, I will guide you through this process sharing tips, tricks, and favorite open source tools that will make your life much easier. So, at the end of the workshop, you will know where to start your deployment journey, what tools to use, and what questions to ask.
Introduction to Machine Learning on the Cloud
ML conf EU 2020ML conf EU 2020
146 min
Introduction to Machine Learning on the Cloud
Workshop
Dmitry Soshnikov
Dmitry Soshnikov
This workshop will be both a gentle introduction to Machine Learning, and a practical exercise of using the cloud to train simple and not-so-simple machine learning models. We will start with using Automatic ML to train the model to predict survival on Titanic, and then move to more complex machine learning tasks such as hyperparameter optimization and scheduling series of experiments on the compute cluster. Finally, I will show how Azure Machine Learning can be used to generate artificial paintings using Generative Adversarial Networks, and how to train language question-answering model on COVID papers to answer COVID-related questions.