Let AI Be Your Docs

Rate this content

Unleash the power of AI in your documentation with this hands-on workshop. In this interactive session, I'll guide you through the process of creating a dynamic documentation site, supercharged with the intelligence of AI (ChatGPT).

Imagine a world where you don't have to sift through pages of documentation to find that elusive line of code. With this AI-powered solution, you'll get precise answers, succinct summaries, and relevant links for deeper exploration, all at your fingertips.

This workshop isn't just about learning; it's about doing. You'll get your hands dirty with some of the most sought-after technologies in the market today: Next.js 13.4 (app router), Tailwind CSS, shadcn-ui (Radix-ui), OpenAI, LangChain, and MongoDB Vector Search.

88 min
14 Dec, 2023

AI Generated Video Summary

AI is a revolutionary change that helps businesses solve real problems and increase productivity. The workshop covers the demand for intelligent apps, limitations of LLMs, and how to overcome them. It explores the tech stack, integrating GPT, and optimizing the user experience. MongoDB Atlas Search and Vector Search are used to store embeddings and enable semantic search. Prompt engineering allows customization of AI responses.

1. Introduction to AI and its Impact on Applications

Short description:

Welcome everyone. AI is not a fad, it's a revolutionary change that helps businesses solve real problems and increase productivity. In this workshop, we'll build AI into a React application, discuss the demand for intelligent apps, practical use cases, limitations of LLMs, and how to overcome them. We'll also cover the tech stack, integrating GPT, and optimizing the user experience. The workshop will start with an overview, followed by hands-on activities, including setting up the application, using MongoDB's Atlas search, creating embeddings and a search index, and implementing Rag. We'll conclude with a Q&A session. AI is crucial for modern applications, driving user engagement, efficiency, and profitability. It's used in various industries, such as retail, healthcare, finance, and manufacturing. AI has evolved from analytics to batch machine learning, enabling predictions and informed business decisions based on historic data. Let's explore the power of AI in this workshop.

Welcome everyone. So what we're going to do first off is let me go ahead and share my screen. We are going to get started with a little bit of an introduction before we get into the hands-on. So AI is, or artificial intelligence, it's just a fad. Right? Well, actually, I don't think it's a fad. It's actually going to be here to stay and it's a revolutionary change. It helps businesses solve real problems and it's helping employees and individuals become more productive. Let's talk about why AI matters now more than ever and how AI can take your React applications to the next level.

So through this workshop we're going to build AI into a React application that only answers questions from the documentation that we provided, our custom data. So I'm Jesse Hall, a senior developer advocate at MongoDB. You might know me from my YouTube channel CodeStacker. So what we're going to do today, if you didn't get a chance to watch my talk at React Day Berlin, then this first part is going to be an overview of that talk, just so that we're on the same page. So we're going to talk about the demand for intelligent apps, practical use cases, limitations of LLMs, overcoming these limitations, the tech that we're going to use to build this app, and then how to integrate GPT, make it smarter, and optimize the user experience.

And so the introduction is going to be an overview. I personally hate slides, but I have to talk for a little bit just to make sure that we're on the same page, and then we're going to get to the hands-on. So the prerequisites, we're going to make sure everyone's up to speed locally. That shouldn't take more than 15 minutes. We're going to get the application set up. We're going to understand how we're going to use MongoDB's Atlas search, which includes vector search, and that's how we're going to be actually making this app work. We're going to create our embeddings. We're going to create a search index, and then implement Rag. We're going to find out what Rag means in a bit, and then I'll leave some time at the end for Q&A. All right. So let's get right into it.

There's a huge demand for building intelligence into our applications in order to make these modern highly engaging applications and to make differentiating experiences for each of our users. And so you could use this for fraud detection, for chat bots, personalized recommendations, and so many other use cases. To compete and win, we need to make our applications smarter and surface insights faster. Smarter apps use AI-powered models to take action autonomously for the user, and the results are twofold. Firstly, our apps drive competitive advantage by deepening user engagement and satisfaction as they interact with your application. And secondly, our apps can unlock higher efficiency and profitability by making intelligent decisions faster on fresher, more accurate data.

And so, going back, almost every application going forward is going to use AI in some capacity. AI is not going to wait for anyone. So we have to stay competitive and build intelligence into our applications in order to gain those rich insights from our data. And so, AI is being used in both to power the user-facing aspect and the fresh data and insights that we gain from that is going to help power more efficient business decision models as well. And so there are many different use cases, and here are just a few. Retail, healthcare, finance, manufacturing. Now, these are very different use cases, but they're all unified in their critical need for to work with the freshest data in order to achieve their objectives in real time. They all consist of AI-powered apps that drive the user-facing experience and then predictive insights that make use of that fresh data to automate and drive more efficient business processes. But how did we get to this stage of AI? Well, let's look at AI through the ages. First, we start out with analytics. In the early days of computing, applications primarily relied on analytics to make sense of data. This involved analyzing large data sets, extracting insights that could inform business decisions. And as computing power increased, it became easier to analyze these large data sets in less time. And so, this is where batch AI came in. So, as computing power continued to increase, the focus shifted towards machine learning. And traditional batch machine learning involves training models on historic data and using this to make predictions or inferences about future events. And we could possibly see how users might interact in the future. So more data over time, the more data that we feed the model over time, it gets better. And the more we can tune it, and the more accurate the future predictions become. And so, as you can imagine, this is really powerful because if you can predict what's going to happen tomorrow, we can make really good business decisions today.

2. AI Evolution and Real-time Applications

Short description:

Batch AI analyzes historic data to make predictions, but it can't react to real-time events. Real-time AI trains models on live data for quick decision-making. Generative AI creates new content, transforming how we interact with technology.

So, batch AI, as the name implies, is usually run offline on a schedule. And so, it's analyzing historic data to make predictions about the future. But that's where the problem is with batch AI. It's working on historic data. It can't react to events that are happening quickly in real-time. Now, although, I mean, it's good for some industries, such as maybe finance and healthcare, where we can look at the history. But we need data on things that are happening now. So we can make those real-time decisions. And so, this is where real-time AI comes in.

Real-time AI represents a significant step forward from traditional AI. This approach involves training models on live data and using them to make predictions or inferences in real-time. And this is particularly useful in fraud detections, for instance, where decisions need to be made quickly based on what's happening right now. And I mean, what good is fraud detection if the person defrauding you has already gotten away with it, right?

And then, this finally brings us to generative AI. This represents the cutting edge. This approach involves training models to generate new content. And this could be images, text, music, video. It's not simply making predictions anymore. It's actually creating the future. And so, a fun fact about this slide here, all of these images were created using Dolly. And so, over the years, we've seen AI evolve from analytics to real-time machine learning and now to generative AI. And these are not incremental changes. They're transformative. And they shape how we interact with technology every single day.

3. Introduction to RAG and Vectors

Short description:

GPTs are powerful but have limitations. They rely on a static knowledge base and can't access real-time proprietary data. To overcome this, we'll explore Retrieval Augmented Generation (RAG) and use vectors to represent complex data. Vectors enable semantic search and can be created using encoders. RAG leverages vectors to augment GPT models with real-time contextually relevant data.

So, let's zoom in a bit. We have something called generative pre-trained transformers, GPT. These large-language models perform a variety of tasks from natural language processing to content generation and even some elements of common sense reasoning. These are the brains that are making our applications smarter.

But there's a catch to this. GPTs are incredible, but they are not perfect. One of the key limitations is their static knowledge base. And they only know what they are trained on. So, there are some integrations with some models now that can search the internet for newer information. But how do we know that the information that they're finding on the internet is accurate? They can also hallucinate very confidently, I might add. And so, how do we minimize this? They can't access or learn from real-time proprietary data, your specific data. And that's a big limitation. So, the need for real-time proprietary domain-specific data is why we can't rely on the LLMs as they are. This is especially true in the business context where up-to-date information can be a game-changer.

And so, what is the solution? How do we make these models adaptable, real-time, and more aligned with our specific needs? Well, this brings us to the focus of the workshop today. It's not merely about leveraging the power of GPTs in React, but it's about taking your React applications to the next level by making them intelligent and context-aware. So, we're going to explore how to augment React apps with smart capabilities using these large language models and boost those capabilities even further with Retrieval Augmented Generation, or RAG. So, we're not just integrating AI into React, we're optimizing it and making it smarter and more context-aware.

So, what's involved in Retrieval Augmented Generation? First up is vectors. What are vectors? We have to understand what this is. These are the building blocks that allow us to represent complex, multidimensional data in a format that is easier to manipulate and understand. So, in the simplest explanation, a vector is a numerical representation of data, and it's basically an array of numbers. These numbers are coordinates in an in-dimensional space, where in represents the length. So, however many numbers we have in this array is how many dimensions we have. We also may hear vectors referred to as vector embeddings or just embeddings.

Here's a real-life example of vectors in use. When you go to a store and you ask one of the workers where to find something, they might say, go to aisle 30, bay 15. That is a two-dimensional vector. You'll also notice in the stores that similar items are placed next to each other for ease of searching and finding. So, the light bulbs, for instance, are not scattered all around the store, they're strategically placed so that you can find them easily. Another example is games. Games use 2D and 3D coordinates to know where objects are in the game's world. With these coordinates, we can compute the proximity between objects to detect collisions, for instance. So, the same math is used to compute the similarity between vectors during vector search. If you're a Stargate fan, the gate address is made up of at least seven dimensions that are like vectors. To locate Stargates and other galaxies, you can add an eighth or a ninth dimension, just like you would add to a phone number the area code or the country code. And so, this shows how adding dimensions significantly increases the size of that virtual space in which our data is organized.

So, again, what makes vectors so special? They enable semantic search. And so, this is in simpler terms, they let us find information that is contextually relevant and not just a keyword search. And this data is not limited to just text. It can also be images, video, audio. These can all be converted into vectors. So, how do we go about creating these vectors? This is done through an encoder. The encoder defines how the information is organized in that virtual space. Now, there are different types of encoders that can organize these vectors in different ways depending on our use case. There are specific encoders for text, for audio, images, video, and so on. And many of the most popular encoders can be accessed through Huggingface, Open AI, and many others. And so, how do we tie all of this back to retrieval, augmented generation? Well, RAG leverages vectors to pull in real-time contextually relevant data to augment the capabilities of the LLM. Your search capabilities can augment that performance and accuracy of GPT models by providing a memory or a ground truth to reduce hallucinations, providing up-to-date information and allows us to access our proprietary private data. So, first, we take our custom data, whatever it is, and generate our embeddings using an embedding model.

4. Storing Embeddings and Tech Stack

Short description:

We store embeddings in a vector database. User queries are sent to an LLM to vectorize and search for relevant information. React app with RAG and vector embeddings is adaptable and context-aware. Tech stack includes Next.js, OpenAI, LangChain, Vercel AI SDK, and MongoDB Vector Search.

And then, we store those embeddings in a vector database. So, again, this data, it could be documents. It could be documentation, blog articles, videos, images, PDFs, anything that we have. And through here, we create those embeddings. And now, you don't have to use LangChain to facilitate all of this, but it is very helpful and we're going to talk more about that later.

So, those embeddings are created. They're stored now in our vector database. And now, we're able to accept user queries to find relevant information within our custom data. Now, to do this, we send the user's natural language query to an LLM to vectorize the query as well. And then, we use the vector search to find information that is closely related, semantically related to the user's query and we return those results. And we can do anything that we want with these results once we have them. We can... It looks like someone's waiting in the waiting room. One second. Here we go. So, we can summarize and answer their questions based on our custom data. We can respond with links to specific documentation pages and so on.

And so, imagine your React app has an intelligent chatbot. And with RAG and vector embeddings, this chatbot can pull in real-time data, let's say the latest product inventory and it knows what's in stock and what's not in stock. And it can offer the customer some other products during their interaction. With RAG and vector embeddings, your React app isn't just smart, it's adaptable with real-time and incredibly context-aware. And so, how are we going to go about this? Let's take a look at the tech stack that we're going to use.

The first thing that we're going to use is Next.js. And of course, we're going to use the app router. Next.js and Vercel just makes building AI apps and working with AI technology so easy. Next, OpenAI. They have been spearheading advancements in language models like GPT 3.5 Turbo, 4, and so on. And so, there are many other language models out there, but today we're going to focus on OpenAI. We're going to use them for the embedding and for the generating of the responses. LangChain is another crucial part of our tech stack. It helps us in data preprocessing, routing data to the proper storage, and making the AI part of our app more efficient and just easier to write. And then there is the Vercel AI SDK. This is an open source library designed specifically for building conversational streaming UIs. So, any time you have a chatbot, it reduces so much boilerplate code that you would have to write otherwise. And last, but definitely not least, we're going to store all of our vector embeddings in MongoDB, and we're going to use MongoDB Vector Search to find the similarity between vectors. And this is a game changer for AI applications, enabling us to provide more contextual and meaningful user experience by storing our vector embeddings directly in our application database. So, instead of bolting on another external service, and it's not just vector search, but MongoDB Atlas itself brings in a new level of power to generative AI applications, and we'll take a look at that. And each of these technologies in this stack were chosen for a specific reason, and when these are combined, they enable us to build smarter, more powerful React applications.

Okay, so it's time for me to stop talking for a bit. There's another person in the waiting room. Let me admit them. And this is where we're going to start out. So, let everyone locally go to mdb.link slash vs-demo. So this is the workshop page, and let's enter the workshop. Now, this is a link that you can share with anyone that you'd like. This is actually a self-paced workshop. It's designed to be a self-paced workshop. It's better when you have someone walking you through it. So some of the things in here, we're going to kind of skip over, because I've already talked about them, but we're going to use this as our guide for this workshop. So, some prerequisites.

5. Setting Up MongoDB Atlas Account

Short description:

We need a MongoDB Atlas account, an OpenAI account, Node.js 18+, and basic Git knowledge. MongoDB Atlas is the cloud-hosted version with additional capabilities like app services, serverless functions, and Atlas Search. It's deployed on major clouds, offers a free plan for testing and prototyping.

We are going to need a MongoDB Atlas account, an OpenAI account, and Node.js version 18 or later locally. You also just need some basic Git knowledge. And then, let's move on to the next part.

So, what is... The first thing that we're going to do is make sure that we're all up to speed and we have an Atlas account. So, I'm sure most of you are probably familiar with MongoDB, the open source database. MongoDB Atlas is the cloud-hosted version, and it adds a ton of capabilities that the local version does not have. There's app services for fully managed backend for your web applications. There's serverless functions and triggers and a whole bunch of other stuff included with that. There's Atlas Search. There's Data Federation. You can store stuff in S3, Google Cloud Storage, and access those in MongoDB. But what we're going to look at, what we're going to use today is the Atlas Vector Search. And that is going to allow us to find the similarities between our vectors during our query time. And so, where is MongoDB Atlas deployed? It's on all of the major clouds. AWS, Azure, Google Cloud, and many, many regions all around the world. You can also, how much does it cost? Well, today, we're going to, of course, use the free plan. There is a very generous free cluster. It's free forever. It does not require a credit card. So that's what we're going to use today. It is perfect for testing and for prototyping. I would never recommend it for production, but it is completely free.

6. Creating Atlas Account and Cluster

Short description:

To get started, create your Atlas account and follow the guide provided. Verify your email address and log in. Once you have your Atlas account, click the green check under reactions. Next, create a cluster by selecting the M03 cluster and choosing a provider and region. Create a username and password, whitelist your IP address, and ensure it's secure. Now, let's move on to the OpenAI part.

All right, so first thing we're going to do is create your Atlas account if you don't already have one. So just follow this guide here. There's a link here to the website. You're going to fill out, you can log in with Google. That makes it easier to fill out the information. Verify your email address. Last time I went through this workshop, it took about a minute for that verification email to come in, and some people were a little quick on the draw and they hit resend too quickly and we took us a little bit. So just be a little bit patient on that email. And I'll kind of wait here for that spot.

Now at the bottom of Zoom in the menu bar, you're going to see something that it says, where is it? Reactions. Now under reactions, there's a whole bunch of things you can do. You can wave and thumbs up and all that sort of stuff. There's a green check, and I'm going to put it on mine. When you're good at this stage and you have your Atlas account, hit that green check under reactions. That way I know that we're all ready to go.

After we get this, we're going to work on creating a cluster. Some green check marks, good. I'm just going to go to this next phase here. If you're ready to go to this next phase, feel free. Creating a cluster is pretty simple. As soon as you get that email and you get logged in, just kind of go through the wizard that pops up. Create your new deployment. Be sure to select the M03 cluster. Choose whatever provider you're comfortable with and a region that's closest to you would be great. For the name, you can leave the default. It doesn't matter what you name it. And then hit create. Any provider, it doesn't matter which one you choose, whichever one you're comfortable with. AWS, Google Cloud, Azure, you can choose any provider. And then the region and whatever is closest to you. So it really doesn't matter which one you choose. As long as you've got the M03 cluster selected, it's all free. Some providers have more regions in different areas. So you might go through those just to find a region that's closest to you. It'll just be faster. After you've chosen your provider and your region, you'll be prompted to create a user. You can create a username and a password, whatever you'd like to choose there. Just save it because we'll need that later. Then you'll need to whitelist your IP address. That should be done by default. You should see your IP address in there, whitelisted. There is an option to allow access from anywhere. We don't recommend this, especially in production, because this will open up your database to anyone in the world. So ideally, it will just be opened up to your specific IP address. That makes it super secure because not only do they have to be on your network, they also have to know your username and password to access it.

All right. I'm going to move on now to the OpenAI part, unless anyone is still having issues. I think there's a way for me to clear the reactions, is there not? There used to be a way in Zoom to clear the reactions. All right. Just go ahead and remove your check marks for me, please, so we can move on to the next part.

7. Setting Up OpenAI and Application

Short description:

The next part is getting OpenAI set up. OpenAI is the creator of amazing large language models like GPT-3, 3.5, 4, 4Vision, and more. Setting up a brand new account will provide enough credits to complete this workshop. API calls are affordable, and if you've exhausted your credits, this workshop won't cost more than five cents. Create an account on OpenAI.com and follow the steps. If you encounter any issues, I'll share an API key. Once signed up, create a new secret API key and save it securely. We'll then move on to setting up the application. We'll create a documentation application with a chat bot using Next.js, Tailwind, OpenAI, LangChain, Vercel AISDK, and MongoDB. Clone the GitHub repo, navigate to the directory, and run npm install. If you encounter any permission issues, ensure you have write permissions and are in the correct directory on your Mac.

All right. The next part is getting OpenAI set up. All right. So what is OpenAI? I'm pretty sure most of us are familiar with them. They are creators of some of the amazing large language models like GPT-3, 3.5, 4, 4Vision, and so on.

Now, how much does OpenAI cost? Well, if you set up a brand new account, you're going to receive credits, and those credits will be enough to complete this workshop. The API calls are a fraction of a cent. If you've already exhausted your credits, this workshop is not going to cost more than five cents. So you can look at the pricing details here.

So we'll go through here and create an account. So you go to OpenAI.com. So in the last time I ran this workshop, we had a bit of an issue here, because a lot of people already had accounts and they've already exhausted their credits. So you can create a second account, but you have to have a second email address, and you have to have a second phone number, because they've added phone number validation, and you only get credits if both are unique. So if we're going through here and creating accounts and we have any issues, then I'll share an API key with you so that you can use that, because I don't want you to have to put a credit card in and all that stuff for five cents. So let me know if you're creating a brand new account and you've got credits. Great. If you can't get to that spot, ideally I would love you to be able to go through this so that you can see the steps. If not, then I'll share an API key with you. So once you've signed up, you go to API keys, you'll create a new secret, and then you'll want to copy and save this somewhere safe, because you'll never see it again. But it's okay. You can always delete it and create a new one, so it's not a big deal. Green checkmarks, whenever you get to this phase and you've got your API key. We'll move on to the application setup.

So what are we going to do here? We're going to create a documentation application with a chat bot that only answers questions from the information that we've provided it. So the base of this application uses a starter kit from Vercel that includes Next.js, Tailwind, OpenAI, LangChain, and the Vercel AISDK. And then to that I've added the MongoDB package. So you can see that starter here, but you don't need to go there. It's just for information. The actual application is on the next page. So this is where the GitHub repo is. So you can git clone this, cd into that directory, and then do an npm install. However you like to normally do this, I'm going to go through these same steps. I'm going to clone the repo and do an npm install. So over here in GitHub repo, this is the way I like to do it. I just go here and grab the HTTPS, and then where is my VS code? I've got too many windows open. There it is. VS code, I go over to source control, clone a repo, and then choose where I want to put it. I'll just put it in my desktop and select open. I've got the repo here. Open up the terminal and run npm install. Oh, yes, sure, I would love to update right now in the middle of a workshop. Screen check marks again. Whenever you've completed this, looks like many of you have completed it before me. That's great. All right, npm install. Permission denied and cloning. Okay, so that's going to be a local issue. Permission denied is where are you cloning it to? Assuming you're probably on a Mac. Just be sure that you have permission to write to whatever directory you're trying to clone into. You do it and be sure you're in your user directory on Mac.

8. Configuring Application and API Keys

Short description:

Create a blank folder and clone the repository. Configure the application by renaming the .env.example file to .env and adding your OpenAI API key and MongoDB Atlas connection string. Retrieve the Atlas connection string from the connect button under drivers. Copy the string with your user ID and password, and paste it into the .env file. Save and rename the file to .env. If there are any issues, the API key will be shared.

Create a blank folder there or something and clone into there. We'll give this another couple of minutes and see if we can help.

Let's go ahead to configure the application. If you could clear your check marks. So there's a.env.example file that you'll find in that repo. We need to rename that to.env and add in some keys here. So you got your open AI API key that you just got and your MongoDB Atlas connection string.

So to get the Atlas connection string, that is going to be under connect and then drivers and then this is what it looks like. So let me see. Let's go. I'll walk you through that on my page. So here I'm under overview. You could find it from overview. You can also find it under databases. Either way. So let's go to overview and this connect button is what you're looking for. So connect. These all basically do the same thing. But if we go to drivers, and then this string right here is what you want. Everyone's string is going to be a little bit different. It contains your user ID and then a place for your password. So let me copy this and then let's go back over to VS Code. And then the.env. So I'm going to paste that right here. And some strange formatting. What's going on in my VS Code? It's weird. All right. So MongoDB is my username. And then my password, I think I said it was MongoDB as well. And then, you know, yours is going to be different with a different cluster, etc. But just be sure that you put in your, you have your username there and your password there. And that is good to go. I'm not worried about you knowing this because you're not on my internet. So you won't have the same IP address. So you won't be able to access it. And then the open AI API key. Let me grab that. And put that here as well. And then save the file. Actually, we need to rename it. So let me go over here and rename that and just make it a.env. There we go. Now the formatting looks better. That's what it was because it was a.example file. And then, again, like I said, if anyone has issues on the next stage, then I'll share this API key with you. Let me see. Go back. Make sure. Okay. So.example.

9. Testing the App and Setting Up OpenAI API Key

Short description:

Ensure your connection string is correctly entered in the .env file. Test the app by running 'npm run dev' and open it on localhost:3000. Verify the connection to OpenAI and the default LLM. Ask a question like you would in Chat GPT. If you encounter any errors, check the console in VS code. If you exceed your quota, enter the provided OpenAI API key in the .env file. Restart the server and try again. The main goal is to understand how to obtain and set up the OpenAI API key.

Yeah. Into your connection string. And then rename it to.env. Okay. Let me know if anyone is having any issues with that. Finding your connection string, et cetera, should be good.

Next phase, we're going to test the app. This is where we'll find if there are any issues. Oh, yeah. One other thing. I'll go back here. So here in the connection string that you got from MongoDB, be sure that you replace the brackets as well. Do not leave the brackets in. So it should just be username and the colon and then your password. So double-check that. All right. Let's move on to testing the app. So pretty simple, npm run dev. So let's go back to the app. Yes, code of the terminal in p. And you should be able to open it up on localhost 3000. And at this point, we should have a connection to open AI, to the default llm. And you should be able to ask questions like basically just exactly like basically just exactly like chat GPT. So this is what you should see at the bottom. Say something. I'm gonna zoom in a bit, and then let's just say hi. See what happens. Hello, how can I assist you today? So we're getting a connection here to chat GPT. This is where we're probably gonna run into an error somewhere. So let me know who is having problems. If I go back into VS code, and I check my console, this is where you'll see any errors, any issues. The reply won't come back. Okay, so go back to your terminal. And you should see some sort of an error or something happening here in your terminal. You exceeded, okay, you exceeded your quota. Yep, insufficient quota. Yep. Okay. So let me error insufficient quota. Yep. Quota. Yep. Okay, so everyone's quota is expired. All right. So let's see, where is that key? Okay, so enter that as your open AI API key in your.env file. If you're having a quota issue, you can use that, and it should work. Go ahead and, yes, if you're changing an environment variable, go ahead and kill the server and restart it. And then try again. Awesome, glad it's working for everybody. The main point was being able to see, in OpenAI, you know, where to get the key and how to set it up and all of that, so at least you were able to go through there.

10. Overview of MongoDB Atlas Search and Vector Search

Short description:

MongoDB Atlas Search is a full text search engine built directly into MongoDB Atlas. It provides features like keyword searching, scoring, language support, auto complete, and highlighting. Unlike other search engines, Atlas Search is built on Lucene and integrated into the MongoDB Query API, making it faster and eliminating the need for syncing. It also supports vector search, which allows for semantic search based on the meaning of the data. We'll be using cosine search, the most efficient method for textual search. To create vectors, we'll vectorize and create embeddings. Let's explore the package.json file in our cloned repo to see the dependencies.

And some were even having issues, and the last time I ran this workshop, they were trying to add their credit cards, and there was like a $5 minimum or something like that. I've never run into that before. So it's kind of strange. So definitely didn't want to put all of you through that for a few cents. I'll be sure that MongoDB reimburses me. It doesn't look like anyone's having any issues. If you are, please let me know. But what I'm gonna do for now is move on to the next step, which is more of an explanation, which I've already sort of gone through. So let me just give you a brief overview of MongoDB Atlas Search, as a whole, it is a full text search engine. So similar to Elasticsearch or other third-party search engines that you can bolt on to your existing database, but this is built right into MongoDB Atlas. So it's directly on your database, it's not an extra feature. And so it helps us with full text search, so keyword searching, scoreing, there's a ton of language support, auto complete, highlighting, all kinds of stuff there that you would expect in a search engine.

And then why right there in Atlas? Why is it a great thing? Well, again, like we talked about Elasticsearch, Solar is another one. Those are built on Lucene, and normally you would have Elasticsearch or something on the side, bolted on to your existing database, and you have to worry about syncing back and forth, CICD pipelines, etc. So Atlas Search is also built on Lucene, but it's directly on the database. You don't have to worry about any of the back and forth, syncing, etc. So it's much faster as well, because there isn't that line in between. And it integrates very great, very well into the MongoDB Query API, it's just a dollar search operator to perform those search operations. And so how does that work? You can do simple text search, full text search. And then there are, of course, you have to create indexes to tokenize, to create these searches, and make these searches faster. And that is the basis of what we're gonna use today, but what is built on top of that is vector search. So vector search is not just a keyword search, it's a semantic search. So it allows us to see meaning in data. So there's an example here, if you search for how to make a cake, a keyword search would look for how to make a cake, those words. But in a semantic search, it would actually translate that to multiple different things. It could be how to bake a cake, how to make a pie, because those things are similar to each other. Even though those aren't the exact words that you typed in, it still could return those results because they are so similar. So that's the difference between keyword search and semantic search. It's all about meaning.

And so, again, I'm kinda skipping over a lot of this stuff because I've already covered it, and this is meant to be a self-paced if I wasn't here before and talking before. So this goes into vectors, why do we need vectors? This is a graph showing you, we can find vectors that are close to each other. There are different types of vector searches actually. So if we're looking for something here, one is gonna be the closest. But there's this Euclidean search is one type. And it finds vectors that are closest to the search vector that you're looking for. So for example, one, two, three, four, and five, these are the vectors that we have, our embeddings that we've created in our database. The little search icon here is the user's query. The user queries something, and the closest things are one and four. So that's what gets returned in a vector search using the Euclidean search method. Cosine search method is a little different because it doesn't include anything negative or behind it. It uses everything positive. So one and three are in this top right vector, which means they're more similar to each other than four is. Even though four is physically closer, four is not in the same category. So a cosine search would actually return one and three. So there's different types of searches, there's different ways to find vectors, and it all depends on your use case. So for today, we're gonna be using cosine because it is actually the most efficient way to return vectors in the type of textual search that we're doing. So how do we create vectors? We kind of talked about this, we have to vectorize and create those embeddings. And so that is what we're gonna do right now. So we are going to look in our repo that we cloned, and let's take a look at the package.json. We can see the different dependencies that we have here.

11. Setting Up Embeddings and MongoDB Atlas

Short description:

We have dev, build, start, and embed scripts. The create-embeddings.mjs node app runs in the root directory. It uses the Langchain text splitter to split documents into chunks. The Langchain MongoDB Atlas vector search method and Langchain open AI embeddings method are utilized. The MongoDB client is set up to connect to Atlas and look for a database named docs and a collection named embeddings. The demo uses fake docs from a technology company. The documents are grabbed, their contents logged, and then vectorized using the recursive character text splitter from link chain. The output is the text used to create the embeddings using the MongoDB Atlas vector search and OpenAI embeddings call.

We kind of talked about those, but the scripts, we have a dev script, a build script, a start script, and an embed script. So what this is gonna do is it's gonna run a node app that I've created called create-embeddings.mjs. So if we go here in the root directory, we'll find that file.

So in this file, it's basically just a node.js app. So we are bringing in the file system from node.js. We're using the Langchain text splitter because we need to split our documents into chunks. We can't put them all, we can't create an embedding on the entire thing because it's just too big. So we're gonna split it, our documents into chunks. We're going to use the Langchain MongoDB Atlas vector search method. So this is a direct integration with Langchain through MongoDB. We're also gonna use Langchain open AI embeddings method. We'll bring in the MongoDB client and.env to bring in our secrets.

So the first thing we do here is we set up our MongoDB client. So this is going to allow us to connect to Atlas using our Atlas connection string that we've already put in our env file. We're going to look for a database named docs and a collection named embeddings. Now those are not in any of your databases so far. So far we don't have anything there. But this is gonna automatically be created for us when we run this file. So we're going to have a database named docs and a collection named embeddings. And we are then going to use that collection a little bit further down.

So under in the folder here, that we have a folder called underscore workshop assets. And in there there's some fake docs for a fake technology company. And we're gonna use those for this demo. So we're gonna grab those files. We're gonna console log the file names. We're gonna loop through the files. And then we are going to grab. We're gonna grab. Well, I clicked on the wrong thing. There we go. We're gonna grab the contents of those files. And then just console log that we're about to vectorize that file. All right, we're gonna send that through our recursive character text splitter from link chain. What that does is it's again, takes a long document. And splits it into small chunks that we can grab context from. And this is something that is use case specific. You can define the chunk size and the chunk overlap. So the more they overlap, the more context you're gonna have. Too much overlap could be a bad thing. Too small chunks could be a bad thing. Too large of chunks could be a bad thing. It all depends on use case. So these are numbers that there's no like magic number. You have to kind of play around with this depending on your use case. Then we're gonna just grab the output of those documents, those embeddings. Or actually the text from those documents to create the embeddings. So this is going to use the MongoDB Atlas vector search. We're gonna create an OpenAI embeddings call to OpenAI. We're gonna tell it which collection, index name, text key, and embedding key. And we're gonna look at what all this means in just a bit.

12. Running the Embedding Script and Troubleshooting

Short description:

This script takes our documents, cuts them into chunks, sends them to OpenAI, and retrieves the embeddings stored in MongoDB. Run the script once using 'NPM run embed' in the terminal. If you encounter an error, it's likely due to an authentication issue. It's recommended to use your own OpenAI API key. If the script is not working for anyone, it may be due to hitting a limit. Check for any MongoDB connection issues, such as missing hostname, domain name, or TLD in the URI. Verify your MongoDB database and connection settings. If necessary, try a different connection string.

And then just console log that we're closing the connection. So what basically in a nutshell, this takes our documents, cuts it up into chunks, sends it to OpenAI. OpenAI sends back the embeddings we stored in MongoDB. I guess I could have started with that. That's a much easier explanation.

Okay, so what we're going to do is run this script. So in your terminal, you should be able to run. Now, only run this once. Cuz this is not like a fully tested app. If you run it more than once, you're gonna just create a bunch of extra data in your database that you don't need. So NPM run embed is what you want to run. So let me go do that. NPM run embed. All right, so it grabbed all of our documents. It's vectorizing, it's looping through each one. It's like it worked for Tomas, normally it doesn't take this long. I'm wondering if all using the same API key might be a bad thing. Is it not working for anyone? I mean, I'm not sure. All using the same API key might be a bad thing. Is it not working for anyone else? Yeah, I got an issue. It threw an error. Yeah, okay, error, bad. Auth, auth failed, okay. Well, at least we're all getting the same error. Let's see what this means. But it worked for Tomas. It did. It worked for Kira. Are you all using my open AI key or your own? I'm curious. Using mine, okay, well it worked for you all, that's great. It might just be a waiting game because we might have hit some limit. Okay, all right, let's see, so what happened here? Server selection timed out, okay, so why did it time out? Okay, so this looks like a, my error here looks like a MongoDB connection issue. Mongo URI must include hostname, domain name, and TLD. TLD. Yeah, this looks like a MongoDB error that I'm having here. Server description, all right, let me double check. Yeah, all right, let me double check my, well, first of all, go to MongoDB. And I'm gonna go to database and see if anything happened. Looks like we hit something. We got some somethings. I'm gonna go to browse collections and see if anything got written. Yeah, so I have no data. So nothing got written, but we did try to make a connection. So let me, Database access, MongoDB. MongoDB, I'm pretty sure I have the right password. And make sure my IP address, okay. I think my IP address changed. Or did it not? Yeah, that should be. That's fine. All right, so let me go back to databases, connect, and I'm gonna grab this more simple connection string and see if it's any different, see if it works.

13. Troubleshooting Database Connection

Short description:

Go to your env file and replace the password. Check the IP address and database connection. Delete and add the user in the database access. Wait for updates to deploy and try again. Allow access from anywhere if necessary. Check the database for the collections and verify the information. The database contains split documents with embeddings from OpenAI. Each document has an ID, text field, embedding field, and location information. Ensure others have resolved their issues before proceeding.

Go over to, My env file. And replace that, make sure I put in my password. And let's save that and go back to the terminal and try again. Yeah, see I'm getting the response from OpenAI, but I'm not connecting to my database. That's where I'm getting the timeout. That's odd, okay, let me, the joys of live coding, there's always an issue. I'm gonna double check something on my end here. Yeah, I got that same error, okay. Bonus points to whoever solves it first. Yeah, yeah, I checked my IP address, it is my current one. So that should be good. Yeah, I did double check. Let's just go, All right, I'm assuming others are still having issues as well. What I'm gonna do is just go to my database access, Just delete the user and add the user back, MongoDB, MongoDB. And I want this to be a read and write to any database, and add. So that should be good, and then network access. Let's just delete that, and add. Add my current IP address, yeah, that's the same one. Wait, so wait for these updates to deploy, shouldn't take more than just a few seconds. And if I go to my database, see, I'm trying to make a connection. Just make sure to browse collections, yep, there's still nothing there. Okay, this should all still be correct. Still not, okay, are others still having the same issue, I'm curious. Yeah, I could add 000, but it shouldn't be necessary, because my IP address is on there. So it should allow me right now to do this. Yeah, same issue. All right, I'll just go back and give it a shot. I mean, allow access from anywhere. I mean, if that works, then I'm gonna be upset, because that shouldn't matter. I'll wait till this says active. So what it's doing here is it's actually, whenever you set up one of these clusters, it's actually a three node cluster. So there's three servers running in failover, and one is a primary, and two are the backups. And so what it's doing is it's propagating this change to all three of those servers. All right, active. Let's go back, and this should all be good. Let's see, I'm running bad. Come on. Okay, it worked. Gene, try and give it a shot. That should not have caused this issue. I'm upset. Okay, so now, if I go to database, under browse collection, we should now see our information. Awesome, so I have a docs database and embeddings collection, and in here, I have all of the split documents with their embeddings that came from OpenAI. So I have a few fields here. We have an underscore id, which is a default id that gets added to everything in the MongoDB collection. We have the text field, which is the original text that we split into chunks and sent to OpenAI, and then we have the embedding field, which is the array of 1,536 vectors that came back to us from OpenAI. If I expand that down, we'll see this is what a vector looks like, a vector embedding, so it's 1,536 numbers in an array. And then under location, or loc is the location of where this specific chunk came from in the file. So we have the lines from and to. All right, let me give this one more minute and make sure, if anyone else is still having issues, let us know.

14. Creating the Vector Search Index in MongoDB

Short description:

We successfully created the vector search index in MongoDB. To do this, go to the database deployments tab and click on Atlas Search. Choose the JSON editor and enter the necessary configuration for the index. Specify the mappings and fields, including the name and dimensions of the embeddings. Select the similarity type as cosine and the search type as knnVector. Finally, define the path to the embeddings and create the index. Keep in mind that the UI may have changed, so adapt accordingly.

It works now, awesome. All right, well, I'm glad it's working for everyone. Let's go back up here. So we ran that, we vectorized, awesome, awesome, looks like it's working. Okay, and then we checked it out, we looked at the embeddings, we saw what it actually did, the results, awesome.

All right, so let's now, the next thing that we need to do is create our vector search index in MongoDB. This is the key thing that is going to allow us to search those vectors in MongoDB. So on your database deployments tab, you should see at the bottom right here, Atlas Search and then create an index. There's a few different ways to do this, but this is kind of the easiest way. Create index, you'll click that and then you'll click Create Search Index on the next page. And then choose the JSON editor and click Next, and then you'll need this. Just so you can copy this from this page here, but this is the JSON configuration for this index. So I'm gonna walk through this myself as well, and I'll explain it. So under Databases, bottom right, create an index and then click the button Create Search Index. I'm going to choose Atlas Vector, so this is, the UI has changed again. I'm gonna have to update my workshop. All right, Atlas Vector Search, JSON Editor, that's what you wanna choose. Next. For the index name, I think I have it set to default, cuz that's what it used to be, it's different now. So change your index name to default. If it's something else, it's not an issue, we can change it somewhere else, I'll show you where. But I'm gonna name mine default. On the left side, we're gonna choose the Docs Database and then the Embeddings Collection. We wanna know where do we want to create this index. And then I'm just gonna highlight this whole thing here and then hit Paste. So what we're looking at is our mappings, what do we want? Show the path to JSON Editor. Yeah, so we went to Databases, create an index, and then choose Vector Search, JSON Editor. I'll go back in just a second. In this JSON schema that we're putting in here, we have our mappings, we're telling it it's going to be dynamic, and then this is the important part, the fields. All right, so remember, we just created, in this embeddings collection, we created a bunch of documents, and they have a field called embedding. So that is where our vector embeddings live. And so we're telling it in the index where that is as well. So this right here could be whatever you called it. It could be document embeddings, or whatever you called that field. That's what you would put here. And then we're telling it the dimensions, because every encoder, we used the OpenAI encoder, and it has 1,536 dimensions. So depending on your encoder that you used, you might want to change this. Some similarity, we're going to use cosine. Remember, we talked about there's different types. There's Euclidean, there's cosine, there's several others. And then for type, it's knnVector. Knn stands for k-nearest neighbor, and that is the type of vector search that we're going to use. So got all that, selected that, and chose that, and type. Choose, please define the type property in your index structure. Let's see. I thought I did. All right. The joys of when they change the user interface without telling you. Type is vector, path, I guess path is going to be embeddings. All right, this is going to be totally off the cuff, different than what it was last time I did this.

15. Creating the Search Index

Short description:

We're going to create a search index using the vector search and vector embeddings types. Let's give it a try and see if it works. If anyone needs assistance, I'll provide the necessary information. Once the search index is created, we can proceed with the workshop. Any issues so far?

Embeddings, number of dimensions, 1536, 1536. Similarity, we're going to use cosine. All right, so if you go back and choose conventional JSON editor, you'll be able to click Next, yep. Yep, I think you're right here. I think either way that you do it, it should work just fine. They've changed this, and I'll have to give them some feedback on this, because this is a type. Okay, type is going to be vector search, and then type again is vector embeddings. All right, let's give this a shot and see if it works. Let me copy this, and I'm going to paste this in the chat, in case anybody needs that. And let me double check embeddings, that's what it was supposed to be, right? Called the field. The field is, no, no S at the end, no S, embedding. I got that wrong. Let me go back. The path is embedding. All right, next. Create the search index, and close that. It shouldn't take more than just a couple of, or maybe 30 seconds or so, for this index to be created. While it's doing that, let me check in with everybody. Again, I'll run through this search index. We can create a search index. So, in the past, the last time I did this workshop, there was not a third option. There was just these two options. And so, JSON Editor, what's in the workshop walkthrough would probably still work in here, I would assume. The new Editor is where I had to fill it out a little differently. Anyone having any issues at this point?

16. Updating Vector Search and Injecting Functionality

Short description:

We're updating the vector search route and creating a new route. In the vector search route, we're using the MongoDB Atlas vector search method to create embeddings for the user query and compare them with the vectors in the database. The closest results are returned based on maximal marginal reference. In the chat route, we're injecting the vector search functionality. The last message from the user is retrieved from the array of messages.

Anyone having any issues at this point? All right, looks like our search index is created. So, if we go back and check out what's going on here. And in the default starter app that we got from Next.js, there is a route under api slash vector search slash route.ts. And this is what we used when we ran npm run dev and we're talking directly to the OpenAI LLM. This is how it's set up. It's using OpenAI embeddings and then, let's see, it is using the text embedding 802 model to, again, we talked about in the past before, when the user submits a query, that has to be, an embedding has to be created for that as well. So that we have vectors of the users query, and we have our vector embeddings, and we can compare the two using vector search. So it sends that off, it does all of that, and then it returns the answer. But what we want to do is we want to inject some extra stuff before we send it to the LLM. And that is where rag comes into play. That is how we augment the capabilities. We give it extra information, extra context from our proprietary data. So we're gonna create a new route. Actually, we're gonna update this route and then create a new route. So, Under API slash, I think we're up here. Yeah, so the first one, under add a vector search route, We can go ahead and copy this text block, api vector search route.ts. We go into VSCode and all of this. So under app, API, we currently have the generic chat route. We're gonna create a new route. And so I'm gonna say that was vector search slash route.ts. And it put it in the wrong spot, didn't it? Yeah, so I want vector search folder to be in the API folder, right? So we have a chat route, and we have a vector search route. And then I'm just gonna paste in that code block that I copied. So again, in this vector search route, it is going to use the MongoDB Atlas vector search method from LangChain. And it is going to create those embeddings, the embedding for the user query. And it's going to compare the users query, The vectors from the users query with the vectors that are already in the database. And it's gonna return the closest results. And how we tell it what the closest result should be is right here. So we're using something called maximal, marginal reference. And what we're able to tell it is, how many results it should find, like the closest numbers that it should find, but then return only a certain amount of that, the top percentage of those results. So we're able to, again with these numbers, kind of play around with them, depending on your use case to make our queries more accurate. So we could say, so here we're saying fetch 20 results, but only return the 1% from that. Or it's actually, yeah, 0.1%. So we're going to fetch 20, and we're gonna return 2. So it's gonna return the top 2 basically, out of that larger sample set. So we're gonna save that. And then, go back into here, and now we're gonna modify the chat route. And this is where we're going to implement the magic. This is where that all happens. So in this code block here, go ahead and copy this code block. And I'll go back into VS Code. And let's open up the default chat route. So this is what came with the starter from Vercel. So in here, this is where we would normally just interact with OpenAI. And we want to inject some extra stuff. I'm gonna highlight all of this and just paste in that new code block. And so in here, what we're doing is, what we've added to it, is this vector search right here. So we're gonna call this new vector search route that we just created within this route. We are grabbing the last message from the user. So as the messages come in and out of this chat, they're stored in an array. The last message is gonna be at the end of that array.

17. Modifying the Template and Testing

Short description:

The last message is gonna be at the end of that array. We're gonna send that as context to vector search to return our results. Here's the good stuff, this template. Instead of just sending the user's query, we're gonna send this whole thing. We inject our vector search results as context. We've modified it so that we now have vector search capabilities. Let's run npm run dev again. It should only answer questions from our specific documentation. We got an error, it looks like an index error. Let's delete this index and create a new search index using the regular JSON editor.

The last message is gonna be at the end of that array. So we're gonna just pluck out that last message, that we know that that was from the user. And we're gonna send that as context to vector search to return our results, the nearest results. Once we get that, here's the good stuff, this template. So this entire thing here is our template that we're going to actually send to OpenAI. Instead of just sending the user's query, we're gonna send this whole thing. So here's where we can modify this to be whatever we want it to be. So here I'm saying you're an enthusiastic fancy widget representative. That's the name of the tech thing that I had ChatGPT help me create docs for. Fancy widget is a JavaScript library who loves to help people. Given the following sections from the fancy widget documentation, answer the question using only that information. I'll put it in markdown format. If you're unsure what the answer is, if it's not explicitly written in the documentation, say, sorry, I don't know how to help you with that. And then for context, this is where we inject our vector search results. So this is the context that we're giving the LLM. And then we're also including the user's original question. So that then gets put back into our array of messages. And then the rest of this is unmodified. This is what came with the the Vercel starter kit. So we're chatting with OpenAI. This is where we can specify the model that we want to use. And then streaming is true so that we don't have to wait on the request to come back. It streams into the user. All right, so we saved that. Now we have, we've modified it so that we now have vector search capabilities. Let me just pop over here real quick, make sure, yep, it's time to test it out. So go ahead and run npm run dev again. And this time it should only answer questions from our specific documentation. And if we go over into the folders here under workshop assets, there's a questions.txt file with some sample questions that should work with the bot and any other questions should not work with the bot. Also again, those fake documents are in here if you wanted to take a look at them. They're all AI generated. So let's run npm run dev. And then let me just copy this top question here. Go back to localhost 3000. Refresh this. All right, let me ask that question. So I'm gonna ask it what is the last change made to the Fancy Widget library? So it should know the answer to this. I don't know. Did I get an error too? Somebody said they got an error. All right. Yes, I got an error as well. The joys of live coding. Okay, so this is gonna be a search index issue instance of the syntax unexpected token, yeah. Lucene vector index, unknown error, cluster time. Okay, so. Did it work for anyone? I'm gonna assume this is, this looks like an index error. So let me go back. Here and I'm gonna do it the other way. I'm gonna go back and delete this index. And then I'm gonna create a new search index using the regular JSON editor.

18. Troubleshooting Search Index Creation

Short description:

The issue was under the search index creation. The new Atlas vector search JSON editor did not work, but the regular Atlas search JSON editor works. You can have multiple indexes and embeddings. Set up an index for each type of search. The text key specifies the field for the original text, and the embedding key is where the vector embeddings are stored. The text embedding ADA2 model is used to create embeddings for user queries. Everything should be working now.

Default docs embeddings and then, okay, it's still deleting the other name. Go back. Yeah, it's still deleting. I have to wait for this to delete. Okay, there we go. All right, let's create this regular JSON editor default. That in there and embeddings and next and create. All right, give this just a second to create the index. I'm gonna try it again. Apologies, yeah, when they change things, you don't get your workshop updated and things break.

All right, so it looks like we should be good to go there. We go back into VS code. I'm just going to kill the server and I'm gonna go back into VS code and VS code, I'm just going to kill the server and run it again. Go back and refresh. Still spinning. All right, should be good. Let me make sure, yep, there's no errors so far. And let's ask the question. There we go, okay. That was the issue. All right, so we got the answer and it came from the documentation. If I were to say anything, hi, it's going to say, I don't know how to help with that because the answer to hi is not in the docs. If I asked it another question like, for instance, how do you install the fancy widget library? It should respond with, with some markdown actually, which it does. It tells me npm install fancy widget dash dash save, which it got from the docs. So it should all be working now. Again, the issue was under the search index creation. This new Atlas vector search JSON editor. Obviously I need to do a little bit of research into what is needed there because the, what I created did not work. Under the regular Atlas search, JSON editor, the information that is in the walkthrough here works to create your search index. So if you create this search index, it will work. And now I named this index default. That is another thing that you have to be careful about. Let me go back into VS code. And if we go to our API route, vector search route, you'll find here the index name. So we can have multiple indexes. You can have multiple embeddings from different places. You can have multiple embeddings from different things. You have to set up an index for each different type of search. And so this one, the default name is default. So I just left it there. So if yours is named anything else, you would want to update that in this code as well. The text key that's telling us, telling the vector search algorithm, which key, which field in the collection document the original text is, so that it knows what to return. And then the embedding key is the field where the vector embeddings live. And then text embedding, ADA2, is the model that we used to create our original embeddings for our documentation. It's the model being used right here to embed, to create embeddings for the user query. Okay. So everything should be working now. Let me know in the chat if anyone is still having issues. Works for me now. Okay, awesome.

19. Hands-on Learning and Prompt Engineering

Short description:

I'm glad it's working for everyone. I love hands-on learning and being able to see how things actually work. We create embeddings from OpenAI, compare them to existing embeddings, find the top results, and have the flexibility to do anything we want. This is where prompt engineering comes in. We intercept the user's query, inject context, and customize the returned results.

Again, sorry about the issues. I have to go back to the product team and get them to let me know when they change things next time. Okay. Nice. Okay, good. I'm glad it's working for everyone. Again, I love hands-on. That's the way that I love to learn. I don't learn much from slides. So being able to like see how it actually works, you really can see, we created the embeddings from OpenAI, we stored them somewhere. We had to create embeddings for the user query. We compare that to our existing embeddings. We find the top results. And then we can do anything we want. This is, again, let me go back to the regular chat route. Here, this is all, really, where all of the secret comes in, which comes down to prompt engineering. That is what this is. You can tell it to do whatever you want. You're intercepting that user's query. You're injecting some extra context. And then you're having it return that context however you want. You can do whatever you want here with how that works.


Embedding, Querying, and Local Setup

Short description:

We convert user queries into embeddings to compare them in a multidimensional virtual space. Traditional keyword search is limited compared to vector search. The dimension value depends on the encoder used. The same LLM model should be used for embedding and querying. You can ask relevant questions based on the prompt. You can use LLM and MongoDB locally, but the vector search functionality requires Atlas in the cloud.

There's a question. If we already have embeddings in corresponding text, why convert ask question to existing embeddings to retrieve corresponding text? Can't it be a lookup? Oh, okay, yeah. So traditional search would just be keyword search. So keyword for keyword. But what we've done here is we've translated these keywords into an algorithm of sorts, into embeddings. And let's say, for instance, we have a graph. And you've got cats and dogs. They're up here. Humans are down here. And then you've got plants that are up here. The person that's searching for plants, they're gonna find all the plants up here and nothing else. Food, et cetera. So we are converting their query into an embedding so that we can compare the embeddings to each other. So we're putting all of our things into this virtual space that is multidimensional, 1,536 dimensions. So we also have to put the user's query into that same space so that we can find and see what's closest to that query. Traditional keyword search would just be straightaway, just a lookup if they're searching for a word. You can actually do some sort of a similarity even through keyword search, but it's not as extensive as vector search. So that's why we have to convert their query as well so that they're all in the same space and being compared.

Let's see, let's go back. And that is the end of this. So let's move on to some more questions. Why 1,536 as a dimension value and not another value? Okay, that's perfect. So that is not a value, an arbitrary number that I just came up with. That is the value that OpenAI came up with for their specific encoder. So it depends on your use case. If you're encoding audio, if you're encoding video for search, if you're encoding images, text, there's tons of different encoders out there. OpenAI has a bunch, but actually HuggingFace has a whole bunch as well. And you can find an encoder that is specific for your data type and use that. And the dimension value is going to be given to you by your encoder. There are encoders that are in the 100s. There are encoders in the 1,000s. It just depends on the encoder that you use. And you have to specify that number to the vector search query so that it knows how many dimensions that it can search for.

Question two, should we use the same LLM model for indexing and querying? Yeah, so for embedding and querying. Yeah. So for embedding and querying, they do need to use the same model because you're embedding your proprietary information, your data in a certain model in a certain virtual space. The query that comes in has to go into that same virtual space. So you have to use the same embedding on both sides for that to work. And then of course in your index, you have to specify, tell it which one so that it knows how to do the search.

Okay, so then should I be able to ask some different questions since we're just comparing text like using similarity? Yeah, I mean, you should be able to ask any questions that are relevant to the information that we've given it because of the way that I set up the prompt, this prompt engineering. I've specifically told it, don't answer any questions that you can't find in our documentation. And so if I left that out, it would still act like a regular chat GPT and it would try to answer your question even if it couldn't find the answer, which would possibly be a hallucination. So in my prompt, I specifically told it, don't answer any other questions unless it applies to this documentation.

Okay, so how can I use this as a doc? How can I use this as, I'm not sure, can you clarify your question? Could I also set this up locally, self-hosted without opening it? Yes, yes, totally possible to do that. You can run your own LLM locally. There's tons of self-hosted LLMs. You could run MongoDB locally. The vector search specific functionality though is a feature of Atlas. Now for development, there is a dev server that you can run locally, but for production, the vector search functionality, you would have to use Atlas in the cloud. So for development, all of that can be done locally, yes.

Using AI in Documentation

Short description:

If you want to see an example of using AI in documentation, visit the MongoDB Docs page. It provides a fully functional version where you can ask questions and get detailed answers. The AI provides a basic summary and instructions, as well as links for further reading. This is how we have implemented AI in our documentation.

Okay, Alfie, if you could rephrase your question or expand on that, how can I use this as a doc? For a more in-depth look into an example of this, you can actually go to the MongoDB Docs page and we have the AI actually working there. Let's see, MongoDB... A much more fully functional version of this is working there. So if I go to resources. Deployment, adaptation, I just want the regular docs. There we go. So let me zoom in a bit. At the top here, you can say, how do you deploy a free cluster? There's some suggestions, but you can ask it whatever you want. And so here it's replying to me, here's how you do it, da, da, da. Gives you some context, and then it also gives you further reading. Here are the links to dive deeper into the docs. So you can ask it anything about Atlas or about MongoDB in general, and it will return some basic summary, some basic instructions, and then a more deep dive. So this is how we've implemented it.

Q&A on Vector Search and Customization

Short description:

I'm available for questions and appreciate your feedback. There may be alternatives to locally hosted vector search, but I can't think of any off the top of my head. You can create your own question and answer by providing additional context through embeddings. This allows you to search for specific information that is not in the existing knowledge base. In the code, you can customize the LLM's behavior by injecting extra information and summarizing the results. Prompt engineering is an iterative process to fine-tune the responses. I'm curious about the use cases you have in mind. The vector search returns the text based on the embeddings, allowing for more context-specific results.

Okay, cool. All right, let's... I'm free to stick around for some time for some questions if anyone has any questions. And again, I really appreciate everyone working through the hiccups here and there. And I hope you learned something and really also looking forward to your feedback on how we can improve this and what you thought about the workshop. So open for any questions. Feel free to unmute if you want, or use the chat.

We're gonna go ahead and get started. So we'll start with a few questions. And we'll see if we can get you to speak out. Is there an alternative to vector search that could host self-host? Okay. Probably, off the top of my head, I can't think of any that are locally hosted. There are tons of vector databases popping up everywhere. And they all have vector search capabilities. Again, those vector databases are a bolt-on because you have to do the whole CICD thing and back and forth and sync and et cetera. But locally hosted, that's where I'm coming up with a blank because I can't think of, I'm sure there are, but I can't think of any that are locally hosted for production.

I appreciate that everyone. Can I create my own question and answer? Say I wanted to narrow it down to one sport tennis basic questions. Yeah, for sure. You have to kind of start with some information for it to use. So the idea here is to give the LLM more context. So when you're searching, when you're using Chattopiti, it just pulls from its existing knowledge base. The idea here is to give it extra stuff. For instance, let's say you have a company that has work orders and you need to like search through that information for some reason. You wanna ask it questions about these work orders, which ones are still active. The information that's in them, who the clients are, et cetera. You would feed that information as embeddings to the vector search and ask it questions about that proprietary information. And then that way you're not relying on Chattopiti by itself because it doesn't know anything about your work orders. So for sports specifically, Chattopiti probably already knows a good bit about tennis and other sports. But if you have some specific information about those sports that you want to make sure that it references, that is what you would want to embed and have it do a vector search on. I hope that answers your question.

More about how to do it in code. Okay. In the code, so this is really the part that you want to customize. Again, we're intercepting the user's question and then we're telling the LLM what to do with the question. We're adding some extra context here and we're telling it what to do. This is really in our implementation on MongoDB. We actually tell it something very similar, but we also tell it to then return the references, the embeddings that it referenced as links so that we can then add those at the bottom for the user to dive deeper. We tell it to take all of the information that it returns and summarize it and then return that in a short summary instead of just giving back an entire document that is just too much context. And so this in the code, this is where you inject that extra information other than before when we created the embeddings. Yeah, prompt engineering is a thing you have to, and this is another thing that you have to iterate on because maybe sometimes it doesn't return exactly what you wanted, you kind of have to come back in here and tweak it a bit until it's exactly what you're looking for. Same thing in chat GBT, you ask a question, you get a terrible response, but if you add more context, you get a better response. Any specific use cases that any of you might be looking into for this? I'm always curious how this might be used.

Yeah, so in this, it's actually returning the text. And so it's returning that to the LLM. So this is what's being sent to the LLM. So right here, this vector search, which came from our vector search route, it's actually returning that text field from our database with the actual markup, markdown in the text field. But the actual vector search is using the embeddings. So it's finding the embeddings, it's finding the documents in the database that are closest matches, and then it's returning the actual text. And then this current message context, that's the message that the user actually typed in in text format as well.

Q&A and Resources

Short description:

I'll stick around for a few more minutes to answer any questions. You'll receive a feedback email soon. Check out the MongoDB documentation and developer center for resources on vector search and AI. We have articles, videos, and tutorials covering various topics and frameworks. Feel free to explore and learn more. Thank you all for your participation, and I hope to see you again next year in Berlin!

All right, well, I'm gonna stick around for another few minutes in case anybody has any questions, but you're free to leave at any time. You're gonna receive a feedback email shortly. Again, I really appreciate your feedback. Thank you for sticking around and I hope I helped you learn something.

Do you have any documentation link or workshop to further advance custom? Customize questions, yeah. Yeah, you're welcome. So the documentation on the MongoDB web page as far as vector search is great. We also have a lot of resources here under... Why is this taking so long? Oh, I had to escape that. Okay, so if we go over to developer center, so it's actually mongodb.com slash developer. And if we go down to all technologies and then AI specifically, we've got a bunch of articles and videos on a bunch of different things, and some of them are Python related, some of them are JavaScript and Node.js related, but a bunch of different use cases and tutorials here as well could get you started. And we've got MongoDB actually has integrations with tons of frameworks in the space. Link chain is one of them. This new one, Rivet, it looks pretty cool. So yeah, there's a lot of cool stuff going on in this space and you can learn a lot here from the developer center as well.

If there's no more questions, I'll go ahead and let everyone go. I appreciate it again. And hopefully we'll see you next year in Berlin possibly.