JavaScript conferences

DevOps.js Conf 2024

DevOps.js Conf 2024

Top Content

English version

AI on Demand: Serverless AI

Nathan Disidore

Nathan Disidore

Systems Engineer, Emerging Tech & Incubation @ Cloudflare.

In this workshop, we discuss the merits of serverless architecture and how it can be applied to the AI space. We'll explore options around building serverless RAG applications for a more lambda-esque approach to AI. Next, we'll get hands on and build a sample CRUD app that allows you to store information and query it using an LLM with Workers AI, Vectorize, D1, and Cloudflare Workers.

serverless architecture artificial intelligence

Nathan Disidore

Nathan Disidore

163 min

14 Feb, 2024

Comments

Sign in or register to post your comment.

Volodymyr Huzar
Maersk
It was nice workshop but it's sad that it cannot be reproducible without a special Cloudflare account which was available only during an online session
16:45 Feb 26, 2024

Video Summary and Transcription

The workshop explores the intersection of serverless and AI, discussing the basics, benefits, scalability, and challenges of serverless. It also delves into AI architecture components, vector databases, and the use of context in querying. The workshop demonstrates the building process using HonoJS and Linkchain, setting up Cloudflare Workers and Wrangler, and loading data into a vector database. It also covers the creation of a chatbot with Cloudflare Workers AI and the use of API tokens and environment variables. The workshop concludes with information on pricing and recommendations for further questions.

Available in Español: IA a demanda: IA sin servidor

1. Introduction to Serverless and AI

Short description:

Welcome to the workshop! I'm Nathan Visidor from CloudFlare, and today we'll be exploring the intersection of serverless and AI. We'll cover the basics of serverless and AI, discuss how they can work together, and have a hands-on exercise. The workshop is scheduled to last three hours, and we'll take breaks. To participate, you'll need a thinking cap, an editor for JavaScript coding, Node installed, and a free tier CloudFlare account. Fill out the form I provide to access the shared account. Let's get started!

Welcome, welcome, everyone. And thanks for joining us. And in case no one has told you yet, happy Valentine's Day, if that's our thing, and celebrate it wherever you are. We're happy to have you today, and we're gonna have a little bit of fun with the Valentine's Day theme here. We can see even right away, our little robot friend is giving us a little bit of love off the bat.

So yeah, again, thanks for joining. If you're looking for the workshop or course on AI on demand, then you are in the right spot. And let's get this party started. Quick little intro as to who I am and what I'm about and why you should even pay attention to me in the first place. My name is Nathan Visidor. I'm one of the engineers here at CloudFlare, who works on our AI space, actually. I'm working on the vector DB that we have. And if you don't know what that is, we'll get into kind of what that is a little bit more in just a few minutes here. But yeah, it's been at CloudFlare for a little over four years now. A variety of roles at the company working, again, most recently on AI, but our serverless offerings before that, and then before that, kind of a more traditional back-end role wherever we're dealing with things like Kafka clusters, that process, you know, a couple of terabytes, a couple of trillion and worth, trillions of messages every day, alert notification services, those kinds of like internal tooling things. We're happy to have you.

And yeah, let's talk a little bit about maybe to kind of start things off what to expect here. We'll kind of set the stage or give you the core syllabus, so to say. So, this is, you know, basically what we're going to do to kick things off is to go over some slides. I definitely want to make this interactive. And again, we'll get into that in just a minute here. I don't want to be talking at you. We'll make this more of a dialogue. But yeah, we'll go over some basic concepts to kind of set the stage for what we're going to be doing, you know, in the more hands-on portion of all this. And once we've done that, we will get into the live exercise part of this. And you all can build something of your own to kind of test this stuff out in the real world. Here is what our agenda kind of looks like. You know, the kind of the bullet points we're going to hit is first we're going to talk about what serverless is. There are a fair few of you probably already familiar with the concepts there. But just a quick little primer refresher for people who aren't as familiar or haven't ever used it before themselves. We'll chat about AI, which I imagine a little bit more people are unfamiliar with. But yeah, skipping that we'll do a little bit of a pulse here in a second to kind of see what that looks like. And then we'll see what it looks like to kind of bridge those concepts in how serverless and AI can kind of work together. And it's not an easy thing to make happen. But I think by the end of this we'll kind of discuss what... Oh, hey, Christina world wide viewers here. Yeah, we'll chat about kind of again, you know, how we're able to marry these two concepts into something that works together. And then we'll get down to business and actually again get hands on. So, I hope is that today, the takeaway will be again, one, if you haven't learned about kind of what the building blocks of like an AI architecture application are, you'll come away with that. But more importantly, for you know, the crux of this talk is that, you know, how we're going to be able to actually apply some of the concepts of serverless to traditional AI architecture, and semantic search, and serverless specifically. So again, if you're unfamiliar with semantic search, we'll cover that in the AI section of our primers here. But this is kind of what I hope you take away from kind of what we're going to talk about here, the next three hours. And yeah, maybe that is a good thing to call out. So it's a good segue, where this is this workshop is scheduled to last three hours. That's a long time we're gonna be here for quite a while. So, you know, I definitely for your sake and for mine, I'm definitely going to be cognizant of time. And, you know, we've got a couple built in breaks to make sure that we're able to stretch and use the facilities and maybe get snacks or something like that. Because, you know, definitely want to stay hydrated and, you know, keep healthy and fed and all that stuff as well. Here are a few things that we are going to need to make this workshop a success. We definitely want you to have your thinking cap on so you be able to hopefully be in a learning attitude and learning spirit to pick up what we're throwing down. And for the live portion, we're definitely going to want some editor that we're able to use to actually do the live coding portion of this. We are going to be coding in JavaScript. If you don't fully understand it, these concepts aren't exclusive to JavaScript at all. It's just what makes things a little bit easier. And yes, we will also need a CloudFlare sorry, we will need Node installed because we are going to be doing JavaScript coding and a CloudFlare account. I see a question, what kind of account do we need? It's a great question. All you need is a free tier account. And there's a form that I'm going to give you in just a second too. That will give me the information I need to add you to a we need some special privileges to make this work. So if you go here and fill this setup an account and fill out this Google form with the email address that you used to set up the account, then I can add you to a shared account that we're all going to use for this exercise and you'll have the privileges that you need to actually make this work. Let me copy and paste that to chat as well because that'll probably be easier for everybody to follow along with. But the QR code is there if you're able to scan that as well. You can work on that in the background. It doesn't have to happen right now. We got a little bit of time to quite a decent amount to cover before we get there. But if you can have all these things ready by the time we get to the interactive portion, it will really help speed things along here. And I realized I sent that as a direct message. Let me try that again. There we go. Excellent.

2. Understanding Serverless and its Benefits

Short description:

Let's kick off with a poll to understand everyone's background. It seems that most people are comfortable with JavaScript, which is great for what we're doing. Not many are currently using Serverless, but that's expected. People are positive towards AI. Now, let's dive into Serverless. It's a controversial term, but from the customer's perspective, it refers to infrastructure-less deployments that are highly distributed. It's often based on microservices and function as a service. The benefits include ease of deployment and scalability. AWS Lambda is a popular serverless platform.

Cool. So let us actually get started here. And again, I want to encourage you. This is, again, the three hour long class. So you are 100% welcome to ask questions. I'm not as familiar with Zoom, but I assume it's like a raise hand or something like that. I want to make this interactive. We're having a dialogue here. And maybe that's a good segue to kick off a quick poll. I'd like to know a little bit about y'all's backgrounds here. So I'm launching a poll. I'm not exactly sure how this surfaces on your side, but you should be able to see some questions that just kind of give a general feel for what your current knowledge gap is. Let's see. We'll see where people lie here. All right. Let's see what people had to say. Boy, we lucked out. A lot of people are comfortable with JavaScript here. So that's excellent for what we're trying to do. Again, nothing here is exclusive to JavaScript. It's just the stack that we're going to be working with today. The nice thing about even some of the APIs that we're using is that they're pretty language agnostic, especially in the AI space. Python seems to be one of the defacto standards at least for prototyping and whatnot. So there's definitely options there. But it looks like y'all are pretty comfortable with JavaScript. So love to see that. Yeah, this is going to be the interesting one, I feel like, because I realize this is the DevOps conference, and Serverless is trying to abstract away a lot of that opsiness, but not in a bad way at all. So it looks like most people aren't currently using Serverless. And that's honestly what I expected, especially at this conference. I am not an advocate one way or the other, but I do, I do, it's right tool for the right job kind of thing. One of those situations. People are pretty positive towards AI. I would not blame you at all if you weren't. There's always, there's definitely conversations to be had on both sides there. But there's, again, time and a place, and it's worth it. This is an AI conference, or AI workshop. So I kind of figured people would be a little bit more positive towards it. But but not going to advocate one way or another, I'll let you guys do that yourself. And I think that is fine by me. We'll call that good. This is a Yeah, this is that's a good, a good now we kind of know what the common commonality is and what people's backgrounds are. It's it'll set the stage a little bit for where we're going to get to here.

Let's get into Serverless. So yeah, that'd be amazing. What is Serverless? And it turns out this is a pretty controversial. To come up with a definition here. I asked a couple of co workers, they showed them these slides, and they they had their own opinions. And I guess it also depends a little bit, you know, if you're looking from the point of view of the platform or the customer. But at least in my eyes, this definition kind of seems to fit, where it's basically, you know, infrastructure list deployments of whatever application you're trying to get out almost always in a highly distributed fashion. I put infrastructure lists in an asterix and gave us the old. Give the old Oh, sure, whatever you say about a gift here of Jennifer Lawrence, because, you know, infrastructure lists, there is...it's a lie. It's always somebody else's computer at the end of the day that you're running on. But it's infrastructure list from the point of view of the customer. It's almost always some micro or micro runtime that lives on a platform as a service. That also runs on whatever network that that that platform owns. I highlighted the micro here, especially because one because it's kind of fun to say runtime sounds like sounds buzzworthy. But a lot of times you're running microservices on these serverless deployments. So you're really kind of doing function as a service is most often what you're kind of most often here there. When you're targeting a serverless style deployment there. Why would you ever want to do something like that? That's a good question. Especially the abstracted away part. Well, I think one of the definite benefits is the ease of deployment. We can actually give it a diagram of what a traditional especially AI deployment looks like a little bit. And we'll kind of see it's pretty complicated. serverless in general does remove a lot of the cognitive burden there, just by making it usually like even a one line operation to like one terminal command operation to do a deployment and get things out. scalability is kind of built into the definition of serverless, you know, especially the distributed network part. Yeah, you're really making sure that sorry, I really bad butchering names here. pattern new function as a service of land this is exactly it. Yep. AWS Lambda is a very popular serverless platform to run on top of, but function as a service is definitely kind of what you're looking at there.

3. Serverless Scalability, Latency, and Challenges

Short description:

Serverless is scalable, stateless, and usage-based, making it cheaper for infrequent usage. It also provides low latency and can be complex to set up, with some challenges and gotchas.

But yeah, again, very scalable, built into the nature of the networks that you're running on top of. And the fact that this is both against is a gotcha as well. But things are meant to be stateless, at least out of the box. It's often cheaper, which goes kind of in tandem with the fact that it's usage based. Usage base is really one of the big selling points of all this. So instead of having something running 24 seven, if you're only hitting it, you know, kind of infrequently, you don't have you get serverless basically again, since every individual execution is its own unit, you get to build on the total number of executions, which and can can end up being cheaper a lot, especially for smaller, more hobby projects, where people with kind of atypical workloads.

The other, the other interesting, you know, additive bonus here is that things are low latency, because you're running on usually like an edge network, or at least in a distributed fashion, you're executing things closer to where the end user is, which means that your latency between connections ends up being a lot quicker than having to round trip to, say, a full data center somewhere else that you're trying to hit. Yeah, I see chats of AWS Lambda not being very easy. Yeah, the ecosystem ecosystems around them can be kind of complicated. I guess the actual, you know, the actual like invocation itself is, is usually pretty simple, but man, I've been in I've worked in AWS quite a bit with myself before and trying to get things set up can be can be pretty cumbersome. You're not wrong. I was mentioning I was looking for this, again, it's not a fill the bullet, there's, there's, there's definitely gotchas with serverless. And it's always worth talking about those when we're looking at kind of what we're trying to do here.

4. Serverless Use Cases and AI Architecture

Short description:

Serverless use cases include scalability, handling unpredictable traffic, and avoiding server management. Cold starts and the need for a different mental model are challenges. Shared memory and resource limitations are considerations. Expense and self-hosting difficulty are factors. Now moving on to AI, let's explore its architecture and components.

Cool. Anybody else want to weigh in with their experiences or, or, or, you know, give pros or cons that are totally welcome to as well again, definitely in an active here. I have to see my where's my gotchas at. Oh, there it is. There's my gotcha slide. Okay, after these cases. So I do go over those in a second.

Cool. So use cases for serverless or, you know, stateless apps. This is, you know, again, it doesn't have to be stateless. But it's kind of a default base case of most most, most serverless execution environments. So if you don't care about what things look like, or you actually want a blank slate between runs, definitely not a bad idea to look at like a serverless deployment architecture. Applications that require scalability. Again, this comes for free. Because because really of the above. And the fact that we're running, usually in some sort of isolated environment. For for Lambda, that's usually like Docker containers. For the CloudFlare that's, it's like node isolates, but some sort of like isolated runtime. And those individual executions really helps with scalability. Unpredictable traffic can be nice, too. You know, if you're, again, DevOps folks, I'm sure you're familiar with this, but having to go in and set up like HBAs, or some sort of load balancers and auto scalers and stuff like that for for traffic that like, you know, follows some sort of sinusoidal curve can be a annoying is a nice way to put it. So with serverless, you don't have to worry about that. Again, you're usage based, but you're also infinitely scalable, up to the limits of the provider, but you don't have to worry about all that stuff with serverless.

And then there are, some people just hate, hate having to do server management. You know, we're chatting specifically here about about AI and ML. The a lot of the all of the ML folks that I interacted with the past time, it can tend to be more research based and research focused, who aren't aren't as well versed in like, you know, architecting servers and all that kind of stuff. This with with with with the serverless setup, again, you don't have to worry about what we just talked about setting up HBAs and load balancers and very complicated, you know, apologies to basically just handle what essentially is a function indication, we get all that, all that all that's done for us. holdstarts are a thing. So especially if your, your little isolated environment these are running in is like a, you know, something as heavy as like a Docker container, you do eventually have to, you know, load load your stuff into memory, and make it hot. So, especially in a distributed network where you're running on a bunch of different nodes, cold starts, you know, if it only happened once, it would be okay. But if you are, you know, running on a distributed network, like 300 different locations, that's 300 different cold starts, if you're if you're running, you kind of get every single one of those different different nodes out there. So it's something that you got to start thinking about a little bit more is how do you keep things lightweight, and deal with cold starts. It does also kind of require a different mental model. So your would you typically think of of an application, you some sometimes faithful stuff just just, you know, happens. And when you're when you're forced to think of without without state it, it definitely requires a little bit different navigation process and kind of traversing some of those architectural decisions that happen elsewhere. Yes, timeout on AWS gateway can be quite short. I definitely hit that before with lambdas. And again, I talked about with cold starts that can be especially annoying. So it's a it's a good shot right there. I think we've got to hit on this a couple times already, but no shared memory, state persistence. There's, there's, there's an asterisk here too. This is again like the base default case, you can layer stuff on to the lambdas, or serverless execution runtimes to, to basically add these on externally, but then you, then you are not, you're no longer isolated to just your app here, you have a dependency on some external system, or, or a vendor specific runtime, most likely. So yeah, you can we can work around these things, but it's, it's not, it's not always the cleanest in the world. Resource limitations falls into the above a little bit. But usually, again, they're not going to just allow you to like run infinitely on a serverless program. If you if you run your serverless runtime, you know, 24 seven, you've essentially just built a regular server that is running somewhere else. So most of the time they have limitations around that, to, to make sure that you're not hampering the network. So it's, it's something to consider as well. And yes, expense is something to, to think about as well, because they, since it's all resource usage based, yes, if you run a long time, they can get very expensive. Self-hosting is hard as well. It's, it's a little bit harder to do this in house, because you, you know, you want you to typically wanting to do this in a highly distributed fashion, like we're talking about earlier. And that requires, there are they, there are there are, I think, a couple solutions out there for self-hosting some fast solutions, but it does require you to usually also have a presence in a bunch of different vcs or something like that, which means it's actually more maintenance because you now have a, you have the advantage of distributed network of metals and nodes, maybe not the most fun in the world.

We already kind of did this earlier, we said, we saw it was about 50 50. But do I do like to see that? Cool. Oh, yeah. Again, I want this to be interactive. Any any final thoughts anyone has on serverless before we hop to AI? If not, we can just kind of move along. But this is your this is your party. So we'll yeah, we'll we'll, we'll just kind of wait here for maybe 30 seconds to see if anyone has any final thoughts. All right. I think we're good. Then we'll move on to AI. More Jennifer Lawrence. It's that's fun. The I don't know if you all watch I've seen hot ones. But she she's she's eating the wings and it's a great episode. But anyway, yeah, AI in everybody, everybody it's the it's the buzzword these days. But what is it actually can you know what actually comprises when AI like like architecture and what are the what are the components that kind of go into some of the stuff? That is what we are about to explore.

5. AI Architecture Components

Short description:

AI architecture comprises models, embeddings, and vector databases. Models recognize patterns and make predictions based on trained parameters. Embeddings capture characteristics and properties, allowing for similarity search. Word2Vec is a popular embedding model that maps related terms into a vector space. Groupings of related topics can be identified within the space.

But anyway, yeah, AI in everybody, everybody it's the it's the buzzword these days. But what is it actually can you know what actually comprises when AI like like architecture and what are the what are the components that kind of go into some of the stuff? That is what we are about to explore.

We already looked at this as well. So we don't need to worry about this by check. So yeah, one of the big components in AI is the model that you're interested in training against or sorry, should say running against models are very, very fundamental in that they are the bulk of the compute logic. And they recognize if you can see here, recognize patterns and make predictions. So typically a model is some published Well published set of parameters that that have been have been trained against for you know, some some massive data set essentially. So it's a it's a pre computed list of a bunch of different parameters that they've deemed worth looking at and characterizing. So it's a it's a parameters that they've deemed worth looking at and characteristics and then a ton of seed data to basically used to build up a memory or a footprint of kind of what things look like. And then we'll use that later on to to actually do our similarity search and stuff like that. So again, recognize patterns, we use that to, to then look at the historically trained data and and make predictions. Embeddings are another important piece of the puzzle here. I have a very technical definition up top, which is a lot of word vomit. But we mentioned that models, one of the one of the things that models are trying to do are capture characteristics and parameters so that we can look use that as a metric in which to look for similarity later on. So it's it's basically embedding is the each embedding is used to each embedding is each property inside of, or sorry, each each trained kind of memory node inside of that model that has a value for each of the properties or characteristics that we used as a, you know, an important one in the model. So it, I think the simplified version kind of distills it down a little bit more. You also use the word dimensionality here. We have, we'll give an example of this in a second. So it should make more sense. But we have, it's basically just a huge number array. And that length is based on dimensionality. Dimensionality comes back to our characteristics and properties. Basically, it's the total number of those. And that indicates the relevance of that particular data point. And we need, we, we use this in our, when we're querying and we're inserting data to basically start to map out what the state of the world looks like inside of the mind of the or the ML space. But I think this visual example helps a little bit. So let's say that we are, one of the kind of popular embedding models out there or, yeah, yeah. Embedding generators out there is one of the original ones called word to beck. And it basically translates words to a vector representation of said words. So here's, here is an example. This is basically a 2D mapping of a vector arrays or, yeah, 2D mapping of vector arrays into a space that we can kind of, we can see a little bit more. And we can see that essentially what the model has done when it generates these is that related terms, so it's going to do that to hover thing for me. But related terms are grouped together. And we start to see some, some, some groups forming within our, our, our space of related topics. Sometimes it may not get it 100% right or it may be a little bit vague and fall into multiple, multiple buckets where we see like showers green here, which means the color kind of indicates where, you know, where we think something lies. Something lies. But yeah, the idea is that we start to, we start to look for groupings here in these points and we can use those as, as, we get into the quantization world, it's called, they're called centroids. But we're looking for basically like, again, similar, similar concepts so that we can then use later inputs to, to also kind of map to that same space.

6. Vector Databases

Short description:

A vector database stores embeddings and enables similarity search. It is highly optimized for this use case. Plugins like pgVector allow the use of traditional databases for vector storage. However, the default choice is a specialized vector database.

Cool. So, we have models, we have embeddings. The next important piece of the puzzle is a vector database. And a vector database takes the embeddings that come from what we just looked at, which are, again, these like long, long arrays of floats that are, are, are points representing similarities. And then stores those in its data store so that we can later use those to, to query against, essentially.

And these are pretty highly optimized for, for our use case here. And specifically, most of them have, you know, in a typical SQL database, you're, you're, you're doing, you know, equality and having some, your, your comparison operators can be a large, a large array, like range of things. Typically, when, with vector databases, the most important piece of the puzzle is, is similarity search. So, most, a lot of them, really, that's the main thing that they implement is, is the ability to return, you may have heard like Top K before, if you, if you're familiar with this at all, but to return the top results from, from your, your, your storage engine.

So, you may also be asking, like, you know, can I, like, can, can I use a Postgres database for this? Or a MySQL database for this? And the question is, yes, with an asterisk by it, by another, another asterisk. The, by default, the, you know, the answer is probably no. It becomes very, very computationally expensive to try and just retrieve, like, you know, a JSON array of the floats and then to do the calculation locally. But there are plugins for, for Postgres, like pgVector. I'm not aware of a MySQL equivalent, but I'm sure that, I'm sure there is one out there. But essentially, there are some plugins that you can use to use even some of the more traditional style, like, RDBMSs for, for this type of thing. If you're, if you're hosting, like, self-hosting or something along those lines.

7. Dimension Weights and Video Search

Short description:

Dimensions in models can have different weights. Similarity search can be used for images and videos. Vector databases are optimized for distance computation. Distance metrics like Euclidean and cosine distances can be used to calculate similarity.

But the, the main optimization here is, again, see here, I think Jan, Jan's kind of hinting at it. Do all dimensions have the same importance? Which dimensions have different weights? They, they do have, they all have the same importance for the most, actually, no, I'm sorry, it depends on your model, depends on your model. There, there are weighted models where, where you can, you know, say the, like, this, this is more important than this. Most of the time, you're not going to have to worry about that. Most of the time, you're not, you're going to use a model that has already been provided unless you're doing a self-training, which you totally, you totally can do, too. But it, it depends on your, your use case there. But you could, you could have weighted, a weighted version of vectors as well.

Yes. Can similarity search be used for image and videos? Yes, it can. There, there are just all, there's just like there's Word2Vec, there's also, like, an in, in IMG2Vec transformer out there that will generate similarity searches for, for, for, for images. Videos, I'm sure something exists. I don't know that one off the top of my head, but if it can be byte encoded or represented in, in, in something like that, it's, it's almost, there's almost always some sort of way to, to, to do an, an embedding for those things. The real takeaway here with, with these vector databases is that they're optimized for doing distance computation across, across the, the vector space. So, you'll, when you're, when you're setting up your, your, your index or an index in your vector database, you typically tell it a metric that you're interested in using to compute the distance, distance on. Most of the time that's like a Euclidean distance or a cosine distance or something along those lines. There's quite a few out there though. So it, you know, there's not limited to just those. But your, you can use those distances, distance metrics to, to calculate how far away from, you know, an expected value you are, and use that as a baseline for, for, you know, the, what you're returning from this. So you can use that, kind of further use that information.

8. Adding Context and Querying

Short description:

A similarity search finds a matching vector in the vector database. Active learning constantly augments models with new data. Retrieval Augmented Generation (RAG) adds context to models. Tokenization breaks down input data into smaller pieces called tokens. Embedding generator converts tokens into numeric floats representing vectors. Embeddings are stored in the vector database for similarity search. The process involves adding information to the store and querying for relevant embeddings. RAG combines query information with context to generate responses. The process can be complex, but visuals help with understanding.

That's why, you know, this is, I think this picture kind of fits here. You're, you're passing in, you have an original thing here, that was in your vector database, and you're passing, you're looking for, you know, you're doing a similarity search in like a query style operation, it's going to find that match and know that that, that that vector is the one that we're, that we're interested in, in using. And it'll probably find a couple other similar ones too. If you're, if you're, if your context size is, is large.

So, I had to, I had to include this one. You have to, you have to say it right, right? The circle of lay-i's. The kind of the constant operation and flow of information into these different components is, is how basically, active training is one way, or even just like retraining of old models works. We have our original model, or maybe we, maybe we're creating it for the first time. We're just kind of preceding it with original data, but we have a model where we're, we're using the like new incoming data to continually augment that model. We look at the new data coming in and it's a step called, I think it's called attention, where you say, is this, is this, is this new information worth keeping or, or, or, or using as, as future training, training information. And you can use, kind of use the cycle of like model to input, to store data, to, you know, relevant, relevancy, relevancy slash, you know, whether it's worth keeping, which can be a manual step, but it becomes this circle of, I think you can, you can use that information to then kind of further improve your model. So, you can, you can basically kind of refine over time, and feed new information into this, into this process through just a normal kind of operation of, of these different pieces of the puzzle here. And again, that's kind of what's called active learning.

And now, this is, this is something that we're, you know, we're going to be building a RAG application here in just a little while. But maybe it's worth hitting on what, what RAG actually is. So, RAG stands for Retrieval Augmented Generation. And it looks a lot like what we just talked about in that circle of life, life thing. We, so we have, you know, if we're using like a base model to do our initial, you know, embedding generation and stuff like that, that works, you know, until, until we have some new information that like, there's usually, you may have, you may have tried to use chat GTP, and it says, Oh, hey, sorry, my cutoff date was, you know, I don't know what it is these days, but we'll say like, July 17th, 2023. So, I don't have any new information since then. Sometimes, what information is currently in your model is not enough, and you want to actually give it more context. So, that it has information that it didn't previously have. This is the, this is the real selling point of Retrieval Augmented Generation. And it's real, you know, it's two, two big points. Text is, you know, text generation, that's not unique to RAGs necessarily, that happens with any model. But anything that does information retrieval, is really where RAGs kind of come to, come into, come into play. And sometimes this intrival information is called context. So, when we think of like, where we could use something that also has like contextual information relevant to stuff that we want, it often gets used for like, question, or prompt-based answering, recommendation engines, you know, it can be tailored to like a specific store, or you know, I'm sure Amazon uses a lot of this stuff for their type of setup. Document summarization, if you're actually using the RAG to fetch information from documents, can then use those as context sources for summarizing. And again, really anything that needs access to local information that may not already be present inside of the original model. So, I know we're getting into the weeds here a little bit, but I promised I was going to show you kind of what embeddings and stuff look like. Here is what, you know, let's say we want to add some information to an existing model that we're using. And it doesn't know, you know, what people typically buy each other on Valentine's Day. So, what I've done here is, you know, I feed it an input and I say traditional gifts on Valentine's Day include, and, you know, go on and on about flowers and, you know, stuffed animals, whatever else you like. And that we would feed that input into our LLM, our model. Here I gave, like, this is an example one, it doesn't have to be this one, but you know, it's what we have here. Now, that LLM, this tokenizer lives inside the LLM. But there is a step in the process called tokenization, where we're breaking down the input into, that's not the way to say it, tokens, which are this kind of nebulous concept in AI and ML. But essentially, it's a way, it's trying to, like, segment your input data into smaller pieces that it can then use to generate the embeddings. So, it depends a little bit on, like, in the model that you use as to how this tokenizer works, but it's trying to find what pieces of information here can be broken down to an embedding that can later be stored. So, in this case, this is, I went to open AI and type this in, and they've got, like, an online tokenizer. This would be the token count for what this looks like. And I think the general guidance they give is that, like, roughly four characters is supposed to be a token, but we can see here it's pretty all over the place. So, again, given these is, you don't have to worry about this too much. This all happens a little bit behind the scenes. But I thought it was worth it, worth mentioning, at least just since we are kind of getting into the weeds here a little bit. Out of those tokens that get generated, then we pass it into our embedding generator, and this is what actually converts the tokens into numeric floats that represent the vectors. And, again, we're saying vectors are represent the relativeness of each particular point inside of this dimensionality set that we have here, which is, again, trained against all of the parameters and features that we put together from the model itself. These floats typically are between 1 and 0, although they don't have to be. It helps a little bit for some of the distance metrics if they are between 1 and 0. So, sometimes you may want to normalize, like, a lot of models will actually normalize them for you. Sometimes you may have to do that yourself if you care about the distance metric that does need it to be between 1 and 0, but, again, you're not going to have to worry about that part too much. It's just, this all kind of happens for you. And these embeddings are stored in the vector database, which we just talked about, and eventually, this is like a context store, and each of the rows in here have this float array of n dimensions. So, this, again, the whole process that we're working through here is we want to take this information and store it in the vector database so that we can use it to then do similarity search on later. So, this is the process of adding information to our store. Querying uses a fairly similar type of setup here, where we query, again, we generate our embedding, and then that embedding goes to the vector database and finds the top k relevant, depending on what we've set for that value, based on our dimensionality space here. And we use that, again, as context into the LLM that does the augmentation of, basically, it kind of combines the query information that comes in with the context that's come out of the store here, and then we can use that to then do the actual generation inside of our LLM and get the response out of that. Questions, comments? I know there's a lot going on here. This can be a little bit tough to follow, so hopefully these visuals are helping a little bit. But, again, I know a lot of these concepts may not be, you know, you may not be as familiar with, so, yeah, you're, you know, don't be shy if it's a syllable confusing. And we can always circle back later. Cool. And we are going to do a break here in just a little bit, so don't worry here.

9. Using Context in LLM

Short description:

The LLM combines query information with context to generate a response. It can be challenging to follow, but visuals can help. Feel free to ask questions. We will take a break soon.

And we use that, again, as context into the LLM that does the augmentation of, basically, it kind of combines the query information that comes in with the context that's come out of the store here, and then we can use that to then do the actual generation inside of our LLM and get the response out of that.

Questions, comments? I know there's a lot going on here. This can be a little bit tough to follow, so hopefully these visuals are helping a little bit. But, again, I know a lot of these concepts may not be, you know, you may not be as familiar with, so, yeah, you're, you know, don't be shy if it's a syllable confusing. And we can always circle back later. Cool. And we are going to do a break here in just a little bit, so don't worry here.

10. Serverless AI and Vector Databases

Short description:

Serverless AI is becoming a new trend, with companies offering serverless AI due to the advantages of the serverless architecture. Traditional deployments can be complex and expensive, requiring multiple components and constant running servers. However, by focusing on the query portion of the loop and using off-the-shelf models, we can make the query piece serverless. The insert and query operations are crucial, and with the deployment of LLM and vector DB in a serverless manner, we can achieve success. Serverless models work well for relatively static applications, and although they can be heavy to load, keeping the model hot across the network can overcome this challenge. Additionally, there are various offerings for serverless vector databases, such as Vectorize and Pinecone, that constantly update their offerings.

Cool. Now, another serverless AI. You know, that serverless. AI is a new hotness. Serverless is not the new hotness anymore these days. It's an old hotness. But together, I feel like they're becoming the new hotness thing again. I've seen quite a few different companies put out serverless AI offerings recently because of the advantages of the serverless architecture that we just talked about a little bit earlier.

But maybe I think it makes sense to talk about what a traditional deployment looks like and then how easy it can be with serverless, you know, if you're using a hosted offering. So, there's a really, actually, a really nice open source vector database out there called Milvus. And Milvus gives you this nice diagram where they kind of go into what it looks like to actually host some of the stuff. Milvus does a lot of very, very nice things for you, like switch between CPU based and GPU based computation of those vectors, and then also does some highly optimized metadata querying. So, you can do a nice little hybrid search and stuff like that.

But as we can see, it's a pretty complicated setup here. We've got load balancers with proxies to different notify, like message queues, multiple nodes to store multiple nodes to store the metadata in like a an ETCD container. And a bunch of nodes that have to run to make this HA, and then object storage to actually store the raw files themselves. Again, fairly complicated setup for something. Maybe not insurmountable by any stretch of the imagination. But you're running, look at all these servers here. You're running a lot to get, to even get like a Hello World up, at least in enterprise grade or highly available fashion for this type of a setup here. Again, it's a great product. I'm not dissing it at all. But it's kind of showing what it kind of looks like from a traditional point of view.

And so, again, we kind of hit on this already. But there's a large number of components and everything is always running. So, you can imagine that something with that many pieces of the puzzle can get pretty expensive. We're doing a 24-7 running here. And it does, you know, we still have to think about the distributionally if that's important to us, because we do have a little bit with the HA setup that they proposed there. But that still runs on VCs, which aren't as close as edge things. And again, it doesn't matter if you're using a provider, there's always going to be something running with a traditional architecture.

So, let's think about what it looks like to make this serverless. So, but yeah, we talked a little bit about the circle. Again, this is kind of back to the circle of life thing. We have training, estimation, querying, labeling. It all just kind of flows into each other over time. But for a vast majority of things, if this is, if we're not doing active learning, this is an active learning loop, which we talked about. But if we're okay with like a passive learning and like a RAG, then we spend almost 95% of our time just in this query portion of the loop. We don't have to worry about the training and the estimation and labeling all that much. Especially if you're using like an off the shelf model. Like they handle the training and the labeling, and even the estimation for us. We're just concerned about the querying. So, if we can make that query piece serverless, we are going to look pretty good.

So, there's two little operations that are important to us. There's the insert, and this is, again, this is in the context of a retrieval augmented generation application. But the insert and the query are the pieces that are really interesting here, because this is what 95, even I, probably even 99% of our operations is going to be. Every once in a while, we'll update the model, but I don't have to really do that too much. So, we know we look at what we have. We talked about tokenizers and embeddings and our LLM and RVDB, but these are really the only components that are needed to support these two operations here, this query and this insert. So, if we can, again, get that LLM and that vector DB deployed in a serverless way, we'll be in business.

When it comes to serverless models, again, well, if we want something that runs serverless, we're not really going to want to be in an active learning loop here. We're going to want something that's relatively static. That's usually okay. Again, especially for off-the-shelf stuff, like that's not changing very frequently. So, if you have, let's say, a serverless stuff, like that's not changing very frequently at all. And you can kind of flesh that with the RAG aspects because you can add to your context still while keeping your model static. One of the challenges is that these models are pretty heavy. They're pretty hefty most of the time. So, they can be heavy. They're heavy to load to get the stuff available to us. And again, as I talked about, we're not really able to do active learning, but we can kind of do that with RAGs. So, really the only, the major con here is that things are heavy to load. But if we can keep that model hot across the network, which was especially for using off-the-shelf things that we don't have to custom train and have custom models running for everybody, and we can make a pretty good case here for making things serverless. When it comes to serverless vector databases, there's actually quite a fair few of offerings of those as well. So, these are always changing because we're constantly, maybe not constantly, but often adding new information to the RAG context so that it can get all the information it needs to make informed decisions later on. And there are a number of actual, again, like serverless offerings already. Vectorize is the one. This is what I work on. I actually work on Vectorize. I do this day to day. Pinecone is another player in the space that just released a serverless vector database.

11. Building with HonoJS and Linkchain

Short description:

We have the necessary tools and components to move forward with building. Let's take a break before getting hands-on. We'll be using HonoJS, a framework optimized for serverless environments, and Linkchain, an ecosystem that abstracts away the complexities of building LLVM-based apps. We'll be using a repo called Bard of Love as the base for our operations, where we'll be building an app that generates love poems. Let's clone the repo and get started.

Again, all usage based. And LanceDB is an open source version of all this. It runs on, it's even cloud platform agnostic, which is kind of cool. It supports GCS, AWS, S3. It's like remote object stores and has a couple custom file formats inside of it that make it nice to play with. But with those two pieces, I think we're able to move forward here because now we have all the cases covered and all the components that we need to actually build. Yeah, going back to the query, well, it's on a different page. But yeah, going back to the query operation, we have the ability to basically flow through that entire process now if we have those tools available to us.

So we're about to get into it. But first, I do think it's good. Let's take a break. I know I'll probably do two. So yeah, let's pause for 10 minutes, take care of business, and then when we come back, we'll start to get the hands on here in a little bit. Again, also if you haven't filled out that form with your five-flag credentials, if you could do that, that would be amazing. I'll have to probably get you all settled or settle on that, but I'll turn this break as well. And it should be in chat history if you're looking for it. But yeah, sounds good. Let's get on, you know, stretch out, get ready for the good stuff. Now that we know what we're doing a little bit more, let's actually do it.

So time to get physical. Real quick, just a quick little primer on two kind of main tools that we'll be using to make this happen. This isn't super duper important, but there's a couple of different ways to do this. But one of the things we started using, one I like a lot here when we're working on serverless apps, this is a little framework called HonoJS. It's basically optimized for these kind of, like, micro service serverless environments kind of setups. And again, also nice, it's not like, it doesn't lock you into a specific vendor. You've got, let's see here. Getting started. Yeah. Yeah. You can see here, maybe not see because it's not zoomed in I'm sure, but see here, it's got support for Cloudflare things, Fastly, Versell, Elanda, Superbase. So it's kind of nice. It's like a basically, you know, a platform agnostic is still where you actually run these things. So we'll be using Hono, which is basically going to be our routing framework slash, you know, do some middleware stuff for us. Yeah, that's super important. There's other ways to do it, but I kind of have come to like that a lot recently. I will also be using Linkchain.

I'm not sure if you, you may be familiar with this, but I'm not 100% sure. I imagine most of the others, quite a few that aren't. Linkchain is this ecosystem. It is almost a full ecosystem at this point, that essentially, that's a lot of the thought away from having to build server, not even serverless, but any sort of like LLVM based app. In that it, the chain, chaining is kind of where it really, you know, that's where it gets its name from, the ability to basically, you know, combine or chain different operations within a single run. But it's also grown to become this thing that abstracts away. And let's see if I have it on here. Yeah, I do. The legos of AI is what one of my coworker calls it. It's a lot like Hono, in that it allows you to do platform agnostic versions of, of rags and any LLVM operation. So, it's kind of nice in that it has bindings for things like the Cloudflare stuff that we'll be using today. But you can also use it on a bunch of other different services too, with very minimal changes to code. So, nothing we're doing here is trying to be like, you know, vendor lock in the year, anything like that we're using a lot of tools and frameworks to really make sure that we have, you know, a true serverless fashion, the ability to kind of to jump around without too much of a, you know, too many hoops to jump through.

So, I have for us a repo that does some of the bootstrapping code. I don't want to get, you know, caught up in like having to worry about things like CSS and stuff like that. And I would encourage you to hit this URL, do a clone of this repo, and this is what we'll be using as kind of the base of a lot of our operations here. You may also see, it's called the Bard of Love. We're trying to play into the Valentine's Day a little bit and we'll be building a app that either, you know, it gives us, we feed it some original love poems and tell it to kind of give those back to us if it can. But it's also able to make up its own love poems for us if it needs to. So, we can, and we'll get into exactly how that works as a part of this exercise here.

But I'm going to follow along with you, actually. We'll make this interactive so you can, I'll do the same thing that you do. A get clone. I already have it right here. So, hopping into Bard of Love and then we need to get into that directory. And I use VS Code, whatever editor is your favorite. Make it so. You don't have to trust me but I encourage you to. Let me just get my desktops arranged properly here. No, I want serverless AI to be after. There we go. Zoom in. If you take a peek at this project, we can see there's a couple things again kind of set up for us already here. The, I think we had about half and half that had used Cloudflare Workers before.

12. Setting Up Cloudflare Workers and Wrangler

Short description:

Cloudflare Workers is a serverless ecosystem that runs node isolates, providing a faster alternative to traditional serverless platforms. Cloudflare Workers uses V8 isolates and supports JS. The project is set up with prettier and TypeScript is optional. We'll be using Hono and LangChain as external tools. An AOL instant messenger style chat app has been set up. We'll be using a shared account in Cloudflare for the project. To start making changes via the CLI, we need to get a service token and log in with Wrangler. We'll be using the Wrangler.toml configuration file for the project.

So, Cloudflare Workers is the serverless ecosystem that Cloudflare has for running these types of apps. A little bit different than traditional serverless platforms in that this runs node isolates. I think Cloud does now these days too. But instead of like actual Docker containers, this runs like a V8 isolate, which is like a very small shell over C groups and node to make this hopefully even a little bit faster. But one of the pieces of the puzzle with Cloudflare Workers is, you know, at the end it's all JS. We can see here we're set up here. And the readme gives you a bit of an intro into technologies that are used and what we're actually doing here.

We've got things like prettier set up for us. And if we take a look at the package, I do have TypeScript set up on this too. You're not required to use TypeScript. I use the types in JS to make it a little bit nicer, but don't feel, you're welcome. Pure JavaScript will totally work here too. So don't feel like you have to do that for us. But it can be a nice, always nice to have, in my opinion, when you can get away with a little bit of typing. We see our Hono here and LangChain, which we mentioned before, to actually get us those two kind of external tools to get this up and going. Oops. Also in assets, the assets folder, again, you don't have to worry too much about CSS and stuff like that. So I've got something set up for us here. Actually I can't take full credit for this. My coworker had a brilliant idea of getting this AOL instant messenger style chat app going. So kudos to him for getting a lot of this work set up. But it's kind of interesting. He actually asked chat GTP to give him a code to basically render an AOL instant messenger style dialogue, and it did for him. So we're using AI to do an AI workshop. How about that? All right.

So what I did over break is add you all to a special account in Cloudflare that has the necessary permissions to do this project. So if you head over now to dash.cloudflare.com, I'm going to log in myself. Fun. Good old corporate VPNs making this go, so let me get this real quick. Cool. I'm logged in. Under accounts, you now should see devops.js AI workshop shared. This is the account that we'll use to make all this happen. And it's got some permissions, basically the permissions needed for the AI offering some Cloudflare workers ecosystem to for the AI offering some Cloudflare workers ecosystem to allow us to get this stuff done. If you're unfamiliar with the Cloudflare dash, this is what it looks like when you, again, you all just have to click on that account, the devops workshop shared. We actually won't even need to really be in this dash all that much. But for future reference, over here on the left, this workers and pages is where the actual worker, our different applications are going to live. I can see I've already done this once, but let's do this again together.

First and foremost, we need to actually get a... We're going to have to get a service token to Cloudflare to allow us to start making changes via the CLI. It's a little bit easier to do it here than in dash. It's always nice to use an IDE and stuff like that when you can. If you do, sorry, I should have said this first as well, you need to first do NPMI, if you haven't already, to install all of the packages necessary. Once you've done that, then Wrangler will be installed for the dev dependency, and you can use level here, mpx to start interacting with Wrangler, the Wrangler package you've installed as part of this here.

First thing we're going to do is mpx Wrangler login. This will get us a key for that account. I hit that. It should bring up a window in your browser. Whatever open is. Basically, it'll give you a whole bunch of permission changes and scopes that will allow the Wrangler CLI to make changes for us. You're fine to just accept that. You can read through them all if you want, but we're going to end up getting rid of this account anyway, so it's all good either way. Once you do that, you'll get logged in, and you can triple check that if you mpx Wrangler whoami, that'll get you a window like this, and we should have these scopes here, which will allow us to do the AI changes and all that good stuff. If you're following along, we should be logged in with Wrangler. We should be good to go. When we're doing a deployment in workers, there's this Wrangler.toml, which is our configuration file for the project. We're all using a shared account for this, so we have the potential to step on people's toes. Apologies.

13. Logging in and Setting Up Wrangler

Short description:

To log in, fill out the form and create a Cloudflare account. Wrangler is a Cloudflare tool for interacting with workers and Cloudflare Pages. Fill out the form using the email you used to sign up. You'll get access to a privileged Cloudflare account. Update the names in the configuration file to be unique. Save the file and move to the next step in the terminal. If you encounter login issues, it's difficult to troubleshoot at the moment.

How do I log in? Oh, yeah, sure. Let me put links here. I'll put this back, because we want bash. So, this is how you log in, and if you haven't done the form to give me your email address, you'll make an account, but if you haven't done the form to give me your email address, then you won't see this account listed yet. At this point, you can just show your account ID, or well, you probably don't want to show your email. Just let me know you filled out the form again and I'll check.

But, yeah, what is Wrangler? Wrangler is the CLI that interfaces, it's specific to Cloudflare, so it's a Cloudflare tool. The password should be whatever you use for the account that you created. So, you're going to need to make, you need to make an account if you haven't already. Just click that sign up button and, yeah, and then here I need to, I need to keep this, keep this up, this window up here. Send link, again, once you, once you have that sign up complete, just put your, put your email that you used in here, and I'll get you added. But, yes, Wrangler is a Cloudflare tool that helps interact with workers, and also this thing called Cloudflare Pages, which is a kind of like a static asset version of workers. It's just a nice little, you know, think of it as like AWS CLI, essentially, and it's that over the worker specific implementation here. Cool.

Maybe we'll give people just a second to catch up, since they're signing up. Yeah. Cool, Karthik. If you, if you haven't already, make sure you fill out this form, and it should look like, oops, should look like the DevOps JS Workshop Google form. And, and make sure you put the email that you use to sign up right here, so I can, I can add you to the account that we're using. And, oh, I see two new responses. Perfect. All right. See you, guys. So, let me add, add you. And, yeah, you all are getting access to a privileged Cloudflare account, if you haven't already. So, you could, maybe, I'll give you a little bit of time to poke around afterwards, if you want to see, you know, see what you have permission to, because it's a, it's not, it's a, it's a pretty highly privileged Cloudflare account for this stuff. So, just adding you to the correct groups here. I think that should, good. Thank you for all, you know, being good sports here. It's a little bit, a little bit challenging to do these types of things virtually. But, yes, I got you both added. And, looks like that was successful. So, now, we have more people in here. Again, the, let me just copy paste the commands, so that we don't lose those anywhere. Oh, I don't think, I don't see those. Formatting, but we'll find out. Oh, well, you get the idea. And, now, well, I think we should be kind of good to move along here.

This regular Toml, again, is our configuration file. And, since we're all using a shared account, we do want the names of things to be different. So, you can see here, like I used my first, initial, last name, part of love for the worker name. If y'all could do something unique. You know, your name would be great. Maybe, the same thing, first initial, last name. But, you're going to want to just change this value to something unique. And, then, whatever you use for that, I would use the same value here in this vectorized index. So, I'll do Nathan demo, just to follow along. And, again, I know this is, we're not focusing on the code at all yet. But this is just the initial setup stuff to where we're able to. When we actually get this running, then we'll start to step through the code a little bit more. If y'all have done that, perfect. Just save that file. No need to do anything yet.

The next step is, we're going back to our terminal. And, we are going to Unencoded. I am a little unsure of what that means. It's a conflict error code, I believe. Oh, oh, yeah, yeah, yeah. On the login? Yeah, as long as you're able to see this, then you should be fine. You may get a little bar at the bottom that says something. It's just because we're using a shared account that's a little bit unique here. I see someone's already gone ahead and deployed their zone. I love that. Cool. It doesn't get past the login screen, though. It doesn't get past the login screen, though. Well, that's a bit unfortunate. It's a little bit tough to triage that right now.

14. Setting Up Vector Database and Deploying App

Short description:

We changed Wrangler to give us a custom name and created a vectorized index binding. We created a vector database using the name from our Wrangler toml. We can deploy our app now, but it won't work yet. We included love poems in the repo for us.

It's a little bit tough to triage that right now. Okay. Interesting. I don't actually see Owen, um, I have 20 people in there. I'm just taking a real quick peek at this for Owen. Oh, let's see here. I don't see you in this account. Let me try this one more time. So, now, I have you in the account, Owen. It may be worth seeing if you can log in now. We'll see. We'll see how that goes. Otherwise, I'm sorry. I may just have to ask you to follow along. I don't want to hold things up too much yet. Ah, okay. There we go. Gotcha. Well, if you want to use a different email, you're welcome to. Just, just ping me. Yeah, yeah, yeah. Cool. Okay. So, yeah, back to what we're trying to do here. Um, let's, so, what we did was we changed our Wrangler to give us a custom name. You'll also notice here that we want, we're creating, you know, a vectorized index binding. This, this, this is, again, we talked about vector databases. Vectorized is the vector database that we're going to be using here. And this is something that we need to create outside of our code but back here on the terminal. So, if you look at that README, you'll see there is a create vector database step. Now, again, same thing. We want to use the name that we used in our Wrangler toml. So, for here, I'm going to change this to Nathan Demo. And when we're creating a vector database, we mentioned that we are storing, again, we're storing these float arrays based on dimensionality and with the metric. And we also have the ability to use like, so, when we're doing this, we're setting this up, Flatflair knows about a couple, again, kind of like off the shelf models, again, kind of like off the shelf models. Especially a couple of these open source ones. So, it actually can use the information that it knows about the already existing model in regards to its dimensionality and distance metric of choice with just a preset. So, there is an alternative form of this command where you can specify the dimensionality and the distance metric manually. But, yeah, if we could use our own DB, that would be great, I think. Just to give you a chance. Because then you can really kind of play with it. So, yeah, I would change the name of the DB to whatever you had in your Wrangler toml. Copy that command, and we can use using this preset should be fine. EGE base. If you want to read more about that, there's a CloudFlare page.

Okay. So, let's do that. Make poems. If I do that, you want to make sure you put that in the shared account. And we successfully created one, which is great. You know, it gives us the information that we need if we want to start using this in our Wrangler toml, but we already have that information. We did that ahead of time. So, we did that ahead of time. So, it should match this name here. Cool. So, now we have a vector database, and we have our scaffolding to actually start creating our RAG app. If we wanted to, I believe we can actually deploy this even right now, and it will show us something. It won't work, but I think it will at least show us, you know, kind of a scaffold here. So, to do that, I type in npm wrangler deploy. It's going to get all this going. These are the, again, the findings is what they're called, but basically just the links to our, we want Cloudflare AI, so we need this AI one that's already there, and then our vector index, vectorized index that we created. And now we have a URL that we can hit. So, if I copy that and go to, I'm curious, I think, oh, it's not going to work. Okay. Well, I kind of, I had a little ahead of myself here. I thought it would work because we haven't set stuff up in the code yet, but we'll get it working. We'll get it working. All right. So, we talked about the love poems that we want to use to actually, you know, kind of seed some local stuff for us so we can use those. I already included those in the repo for us.

15. Loading Data into Vector Database

Short description:

We have Indie JSON data containing poems by famous authors. We want to use this data to seed our vector database. To do that, we create an embeddings object using Cloudflare AI embeddings from link chain and set the configuration object with the binding names. We also create a store using Cloudflare vectorize store. We ensure that the dimensionality of the embeddings matches the Vector store. The names of these variables may differ depending on the platform.

If you go to assets poems.indijson, Indie stands for new line delimited. It's like a special format. But you can see in here, they're lines of JSON that have a title, which is the poem title, author of the poem, and then the text that the poem consists of. So, we've got some heavy hitters in here, like E.E. Cummings, Maya Angelou, Shakespeare's in here and stuff like that. But a couple, this is basically the data that we'll use to see the seed here. We won't have to really touch this at all. But, not yet, at least. If you want to do that on your own time, it can add and remove stuff here. It will change things a little bit. But it will allow it to give you a little bit more context. But what we want to do is use that, use that Indie JSON as data in our, in our application. In the vector database. So, to do that, I made a endpoint called load em up, where we load the poem data into the database so it can be used as context. So, this is what we'll look at first, is getting our stuff into, into, into vectorize, into our vector DB.

So, to do that, we need, we're going to use a couple, a couple functions from link chain. So, we're going to make, we're going to create the embedding, the embeddings object. So, embeddings. And this is going to use a new Cloudflare workers AI embeddings function, which we can see. Get an auto, autocomplete there. But, yeah, this comes again from link chain. And there's, you can see up here, it, link chain has like a Cloudflare specific package. You could replace this with, there's also one, like, once again, for like advanced DB and pinecone and stuff like that, too, if you really wanted to go that route. But again, for the purposes here, we're going to use Cloudflare AI embeddings. And that takes a configuration object, which is, it's a binding. Again, bindings and workers, if you're unfamiliar, bindings are like the links to the global variables that get injected. They come from the strangler Toml. We set our binding name for vectorize was vectorize index. Our binding name for AI is just AI. So, we'll come here, and this becomes C.env.AI. Yep. Which I haven't typed yet. I probably should have done that. But, that's all good. And then, we had, it's worth noting what we created our index with. Again, we created that with this preset here, this EF by VGE base. I'm going to put that in a constant variable, because we'll reference it a couple times. So, say const embedding model is going to be that same thing we used to create the index. Because the other parameter in this embeddings, is a model name that we need. So, we'll give that the embedding model. And, now we have our embedding generator ready to rock and roll. We also want to store these embeddings somewhere. So, we're going to create a store, which is going to be a new CloudFlare vectorize store. Again, comes from LangChain, already kind of set up for us. Remember, embeddings come from the model. So, this is this worker's AI, kind of handles the model side of things. So, and now vectorize is our vector store. So, oh, you get a vectorized not entitled there. See, I'd be curious to see if other people get that too. I'm hoping I got the permissions right on. Let's see if I make sure that you have, you're in this account. Okay. I'm getting an email real quick to make sure that you are in here. So, I'll just make sure that you're added to this stuff again. I don't think you were, oh, let's see. If you can try that again, Cicel, that would be amazing. Because I think, oh, shoot, I should have just direct added you. You're going to have to check your email. Yes. You're going to have to check your email, Cicel, to actually confirm the invite. Sorry about that. I could have direct added you, but I forgot to check that box. But you should be good to go. Yes, here's our embeddings model. Again, same thing, we could just copy and paste that from what we have here because we just want to make sure that this matches the value in this because we want what the AI, again, this is not Cloudflare specific, but we want whatever the AI uses to generate embeddings to match the dimensionality of the Vector store. And that's really what this is doing. So, we want to, yeah, we want to, oh, yeah, there is an ad at the front. But it's a good shout. That is a Cloudflare convention. The names of these will be a little bit different, depending on where you go, but the idea is the same.

16. Loading and Parsing Poem Data

Short description:

We have a Vectorized store and an embedding generator from the model. We need to specify the index parameter for the Vectorized store and the location to ingest the embeddings from. We load the raw text from the local file assets.json into a variable called raw poem data. We split the lines of the poem data on new lines and parse each line as valid JSON. The format of the JSON is title, author, and text.

Yeah, you want your dimensionality to be the same between what the model generates and what you store. Cool. We're going back here. I will also, maybe it's easier, I'll link to, I'll link you all to it. I was going to save this for the end. But if you want to, if you get behind or you need a little bit of help, there is a completed version of this exercise located here. But I would encourage you to follow along. Just to kind of see things as they get built. Cool.

So, again, now we have a Vectorized store, but we need to tell it where our actual index is. So, there's a parameter. There should be a parameter on this that is called index. And we type in the name of this Vectorized binding. So, we can copy that here. And it should be all set. So, now we have our embedding generator from the model. We also have our Vectorized store. We're just going to store what comes out of those embeddings in this Vectorized piece of the puzzle. Now, we need to actually pull in the data and stick it in there.

Sometimes it can be nice just to have console logs, so you can actually sometimes it can be nice to have console logs so you're able to make sure you get to certain spots. Let's just say here. Oh, the first. Oh, thank you. Thank you. You're right. Yes. I was like why is the TypeScript not giving me hints. You're so right. Thank you. We also need to tell it where to ingest the embeddings from. Good shout. Cool. So, yeah. I like the console log. It's just kind of helps us track where we are. So, say loading vectors. Sorry, loading poems from the seed file. And now, we need to, yeah, actually do our loading here. A little. We're using format. Things get interesting in the way we have to read things here. Typically, you would use. You may want to load this stuff from a remote data store like R2 or S3 or something along those lines. It would make this a little bit simpler. Because I didn't want to have to involve getting all access to R2 and stuff like that. I want to keep things as simple as possible. We're loading something from a local spot. What we're going to do is load our file in directly here. So, I think I just call it like raw poem data will work from assets.json. So, we're just going to load our actual raw text from that file into this variable. I'll let you all copy that. And once that is finished, then we can actually start iterating over that. So, I like reduce or mapping functions if I can get away with them. So, I call that raw poem data. That's a string. Let's know that. Yeah. Oh, well. Split. And we're going to split these on new lines because it's new line JSON. We're going to take each of those lines and say let's see here. Each line is valid JSON. We can just go ahead and parse that. So, the parsed is JSON.parse.line. This is just an implementation detail of what I had to do. Because we're doing new line JSON in here, I had to I couldn't use new lines to split the lines of the actual poem. So, I just made those pipes instead of slash ins. So, what we want to do is convert those slash or those pipes back to the original slash ins so that they can they get loaded correctly. We know that parse, again, parse is JSON. The format here is title, author, text.

17. Loading and Chunking Poems

Short description:

We replace the pipes in parsed text with new lines and load the poems into an array. We consider the size of the poems when storing them in the vector database to improve search relevancy. Storing the entire poem as an embedding may decrease the likelihood of a successful similarity search. Instead, we can chunk and split the poem to store smaller, more representative sections. This approach balances the storage of context and the accuracy of search results.

So, parsed.text is going to be parsed text.replace.all. And we're replacing all the pipes with new lines. Now we have that set up correctly. And we can return parsed. So, this should load all of our poems into an array. And I kind of like doing types for this stuff again. I'm just going to tell this ahead of time. This type is an object of title, string, author, string, oops, and string. And that's an array of those. Now we have a type for poems. Perfect. And it's always good, again, just tracks that we actually got what we wanted to. So, say, loaded. Poems.length.

Hopefully all that's making sense so far. Now we're getting to something a little bit more specific to AI and how this works. So, if we look at these poems. Let me turn this back to we can see that the actual data in here since we are importing the poem should not be placed. Yeah. You're right, Jan. The location is not the greatest. It should probably just be in source. And, yeah, you're welcome to make that change if you want to. I just had this working and didn't really want to mess with it too much. But to get your example right, it's probably not where it should be if we're loading it in the actual source code or compiled code. Cool. We can see these these things are actually pretty long. Some of these poems we have a couple that are very short. This is Melody Godfrey. I like her little self worth poem here. One of the things I own and my worth is my favorite possession. That's a nice little short one. But we also have some that are quite a bit longer like Maria Angelou's phenomenal woman and stuff like that.

So, one of the things to consider when you're storing this information is how big of pieces of information you want to actually store in your vector database to increase the relevancy of these searches here. So, we could put the entire poem in a embedding and store it, you know, it's a single row in the vector database. But that means that, like, similarity, like a similarity search against the whole poem becomes a lot more less likely to hit, because it's, you know, it's bigger, the search space is increased. And the overall parameters and characteristics may not be as accurately represented as it would be with a small, like, like a chunked version of something like that. So, there's this concept in storing vectors called peck splitting or chunking and splitting. People call it a couple of different, like, there's a couple of different things it's called. But there's a whole science to this as well, like trying to get this, this information right. Making sure that you've got an accurate enough chunk that you actually store context that's well that's, you know, long enough to then use somewhere, but not too long in that like your match, your match likelihood goes down.

18. Chunking Poems and Adding to Store

Short description:

We ensure that the chunks we store in the vector database have accurate context but are not too long to decrease the match likelihood. We use a chunk size of 200 and a cursive character text splitter to split the poems into documents. These documents are then processed by the tokenizer and embeddings. We preserve the poem title and author as metadata. There is a slight constraint with the metadata format, requiring us to delete a line of code. Finally, we add the documents to our store and obtain the corresponding IDs.

Making sure that you've got an accurate enough chunk that you actually store context that's well that's, you know, long enough to then use somewhere, but not too long in that like your match, your match likelihood goes down. So, we can even put even this definitely a value to play with. Let's go. I've already kind of found one that works fairly well with this chunk size of 200. So, let's go with that for now. And then I'll kind of explain a little bit more as, you know, after this is all finished and we have code to go off of.

We're going to create a splitter. And this also see here, the cursive characters splitter comes from I'm having to I have a cheat sheet over here. I don't know exactly where the imports and stuff are recurse. Like spell right that would help. Sorry, see you or this. I know let's do this. From we I know it comes from lane chain and it's the text letter. There we go. Now, it gives us what we need. Cursive character text splitter is one of the text splitters available out there. This is specific to text. Someone asked if all this stuff works with images and videos as well. It definitely does. But you will use a different way of kind of chunking data in a like an image splitter versus like a text one here. For the purposes of the rag app we're going to go with this. But that's worth kind of looking into. As the like ways to do this sort of splitting and chunking for images and videos and stuff like that. But anyway, yeah, we'll get this going.

We take some parameters here. I was going to give this a chunk size. I found 200 works pretty well. The nice thing is that you also you also give it an overlap as well. So, you don't lose you know you're not just like drawing hard lines in the sand as to like how this works. You are giving a little bit of you can't give it the ability to overlap and get some context from previous spots as well. Yeah, we'll go with a splitter that looks like this. And now we're going to use that to start chunking up our poems into what are called documents. Basically, the individual documents get fed into the the tokenizer and all and then that gets fed into the embeddings and that's kind of the flow there. So, we're going to leap over these const poem in poems. We mentioned we're creating documents, so docs is going to be that's asynchronous operations which I can I'm going to use that splitter to split via this this method called create documents. Again, that basically creates a bunch of subsections of our original original input. Those subsessions are docs. But the input into this is going to be our poem text. We can see that it takes that takes the text here. This is what we're trying to split and create documents for. It's the in this case it's going to be poem.txt. We can give it some metadata as well so we don't lose a ton of information in doing this. Like maybe, you know, we want to make sure that each of these documents still we still know where it came from. So, what we'll do is we'll say that the title, we'll give it one called title and the title is the poem.title and we also we'll say the same thing for author essentially. It's going to be our author oops, sorry. Cool. It's going to create a bunch of subsections of our text but we'll preserve the information there. We can do a bunch of logs if we want to. It's always kind of interesting, I guess. So, let's say that we do we have let's see, yeah, we want to capture what we actually just processed. So, poem.title oops created and then let's get our dot count. dot length dots. Is it plural? It's two tags. So, now we know what that looks like. Cool. And the, there's one slight hiccup with the way things kind of currently work in and all this with vectorize and a couple other databases.

19. Metadata Format and Adding to Store

Short description:

The format for metadata used in the vector insertion has a constrained depth. Using the doc splitter from link chain violates this constraint. However, this information is not critical. We loop through the docs and delete the line of code doc.metadata.loc. It represents the covered lines of the create document request. Currently, this is an incompatibility that needs fixing. After that, we add the documents to our store and obtain the index IDs. The response is not crucial for our purpose.

The format for metadata that you use when you're inserting your vector is kind of constrained like it's not just like an, it is, it's a JSON object but the depth of that object is restricted. And unfortunately, if you use docs, doc splitter from link chain, it violates that depth constraint. The good news is this is not information we need too much. You'll have to trust me on this. All we want to do is go into our basically loop through the docs and then delete. It's called doc.metadata.loc. It's this line of code. Essentially, what this does stand for in the splitter is it says what, you know, if it's a multi-line string, what lines of your string are covered by this create document, the individual document that was processed by the create documents request. And yeah, right now, that's an incompatibility. Something we're going to have to fix. But just know that it's going to have to be a thing for now. Cool. And then now we're going to actually add these to our store. So when we add something, we get back IDs for it. So we'll check those. We'll say, we'll call those index IDs, but we're going to do await store.add documents. And it is the docs that we just created. And so, yeah, these are vectorized store from earlier. It's the documents we just created. And I brought that idea to console log this as well. But we can say, oops, one of the template inserted. If we can just log these index IDs that we got back to index. And yeah, response here doesn't really matter too much. We're not actually using this to, we're kind of more interested in actually loading and you know, it's a doer versus a getter here. Cool. Cool. Any questions, comments, concerns so far?

20. Using Vector Database for Querying

Short description:

We have a bunch of documents in our vector database and all the context we need for a query prompt. Let's take a five-minute break before proceeding. We can test the ability to use the vector database by performing a semantic similarity search using the 'search' endpoint. We'll pull in a query from the query string and get the embeddings from the store. Then we'll use the 'similarity search' method on the store to perform the query and limit the results to five. After deploying with Wrangler, we can access the search endpoint and call the 'load them up' endpoint to load the data. If there are any issues, we can use 'npx wrangler tail' for debugging. Let's check the format of the poems and redeploy if necessary.

So the, we now have a bunch of documents inside of our vector database and all the context we need to actually use that in a query prompt. How about, I know we're all interested in getting to the actual prompt. Actually, maybe we do like another five minute break because we are, we're on another hour, Mark. Yeah.

Now we have the ability to load data. Let's just do a quick test to make sure we can actually use it. Then we get back, you know, kind of what we're expecting here. So there's a quick little endpoint here called search. Also a good endpoint. But this is actually going to make use of the vector database to do just a regular semantic similarity search. So what this looks like, we're going to pull in a query. We'll pull it from the actual query string, too. So C.req, so C.req.query, name it like Q or something. Because we don't need this to be, how do I have the bindings from Wrangler, Tom only? That should happen automatically when you do like a Wrangler deploy or something like that. Now your types may not be present. So if you're talking about type specifically, you do have to do a little bit of extra work to add those as globals that are present. Or actually, hono, you can see the first parameter of the generic version of hono is ENV. So you can give it there. We haven't done that yet. But if it's type specifically you're worried about, that should be fine. If you can't find them at all, like a Wrangler deploy, that probably means something else is up. Solace should clarify that.

In the meantime, let's proceed forward. So we have query. And we also want to do the same thing we just kind of did, in fact, let's just copy paste. We're going to get our embeddings in our store, so that we can actually use the vectorize to do the query here. And then it's just going to copy paste. Probably just a simple matter of doing const save results equals await store. And we can see that store has properties on it. But the one that we're interested in here is similarity search. And we want that to just be whatever query is. And we'll limit it to five for now. Go ahead and response JSON results. Cool. This is it. Super simple. We copied and pasted this. All we're doing here is pulling in the query variable. And then we just, this is really the only new piece of the puzzle, is being able to use the similarity search method on the store. So, let's go ahead and actually do a Wrangler deploy now. Again, that command is npx wrangler deploy. If the main page wasn't working before it probably still won't. But that's okay, we're not we're not actually going to, that's not what we're interested in here, we want, how do I get this bar to go away? It's messing with my Chrome stuff. What we actually want to do is go to, we created that search endpoint if we go back here. And we're giving it a, we're just pulling in the query view to, oh, I'm sorry. I got a little bit of myself. We actually have to call load them up first to load these things now. So, we did all the work to do it. Let's see if it actually works. So, to do that, we made that a get endpoint so we can just call, load them up. And it looks like we got an internal server error here. So, that's fine. There's a way to do some debugging if we need to. Ignore that command. That's just a Cloudflare thing. What we want to do is npx wrangler tail to see why this failed. Refresh that. We can see, can that read properties of undefined. Can that read properties of undefined. So, we loaded 10 poems. We got this piece loaded, worked. We never, we got to, yeah, we got to load 10 poems. We never made it here. Let's do a console.log of poems to see what that actually looks like, make sure that's in the format we expect. I'm going to want to deploy that again. This is part of the fun, right? Always live demos, live coding, trying to figure things out on the fly here. That looks pretty good to me. We have our stuff. Console.log poems. We did our slash in.

21. Creating a Chatbot with Cloudflare Workers AI

Short description:

We fixed the issue with document creation and successfully loaded poems into the VectorDB. Searching for 'goose' returned the poem with the word 'goose' and other related results. Our API validates that we can load and view the poems. We're now ready to create a chatbot using the Cloudflare workers AI package, specifically the chat component.

We did our slash in. That looks good. It would be nice. I believe we're failing at the, at this document creation time. So, maybe we also do a, oh, we've got just the individual poem here. So, that doesn't look too bad there. Do a quick deploy and tail. But to work the first time I would have been amazed. Oh, interesting. So, it looks like poems is a, that's an index. I think we get zero out of this. So, what I changed here. Thank you. Yep, that's the, that's the problem. Oh, I mean, for const not in but of. Yeah, you're exactly right. Man, thank you, thank you for that spot. This is why it's nice to have a little, it's a true pairing exercise here.

Cool. This ought to work now. But let's triple check that. We see it spinning for a while. That's usually a good sign. That means it's trying to do a thing. We come back to our logs. It did take a little while to show up here. But let's see what we get. Yeah, look at there. So, now we can see that we have indeed inserted a bunch of records with the text for our, yeah, our poems. This is great. Appreciate the catch on again, that, that, that, that, that, that little bundle there with the of versus in. Love to see that. So, now we'll do, now we'll actually try this search thing. Now that we have stuff in, in the VectorDB, if we search, we see a queue. I happen to know that geese is in here. I'm pretty sure. Or maybe it's goose. It kind of messes with the results a little bit. But let's do goose. We'll search this for goose. And just like that, we do see that we get we do see that we get back. This is the, this is the poem that has the goose in it, right there. And a couple other ones, we see one that has, I'm sure there's like a couple other ones with birds and stuff. But that doesn't really matter too much. Anyway, validation that we do are we can indeed load the can indeed load the poems, and that they are in the order that we kind of expected them to be in. There's some yeah, that gander goose, small birds, this all kind of makes sense. We're seeing, we're seeing stuff like that. There's a bit of a theme here. So yeah, I think I think it's looking pretty good. We have an API that does actually validate that we're able to see what to see the stuff.

Let's move on to the actual fun part, which is we're making a silly Guinness like a chat, a chatbot is kind of the idea here. In the style of like, all instant messenger, although we're talking to like a cupid, or his name is Bernard the bard in this case, but a cupid-esque type of person. So let's go ahead and knock this out. First things first, we want to actually read what came in. So we'll say there, request. For now, we'll say body is equal to C.request.json. This is actually a data seek operation. So let's await that. And this will give us the input that we can then use to feed into our process here. We also need same thing as before. We need access to the store and embedding. So we can do our search. So we'll just copy and paste those again, should look exactly the same. Now we want to set up the actual chatbot a little bit more. So let's go ahead and do that. So for chat, or something, on chat is you and this is going to be chat. Cloudflare workers AI. This also comes from the lang chain Cloudflare package, already kind of pre-configured to do what we want it to do in regards to chat stuff. There are a ton of other models available that do things like image recognition and even image generation and stuff like that. But we're going to be working with chat for this case.

22. Setting Up Chatbot with AI Model

Short description:

We set up the model, account ID, and account token. We use the store to fetch data for chatting. We feed the user message and conversation context into the model to get the desired response. We define the chatbot's rules and system message. Our bot is ready to chat!

And we want to say this model is going to be, now in this case, we actually allow people to pass in the model that they want to use. We could hard code this too. But I already have the front end set up to send this, so we can just pass it, take it from the request that we've parsed. And now we need to set the account ID and account token. This is going to have us going back to Cloudflare account ID, Cloudflare API token. We'll come back to this. Yeah, we'll just put a big old to do, because we do need to fill these out. We'll worry about that in a sec.

Okay, so now we have our store, but we want to actually use that to fetch things out of it now for the sake of chatting. Again, the blank chain has done a really good job of building some of these stuff out for us. There's this concept in there called retrievers. That is basically tailor-made for, again, looking things up in vector databases. Those are generic over whatever implements the VectorStore API, so we can just pass call asRetriever on our store, and we'll get our store. And we'll get something of the type VectorStoreRetriever, which we can then use to do our lookups.

Now this requires a little bit of knowledge of what's going on on the front end. I'll have to open on the side when this is small. This is the Vue app. Don't worry if you've never used Vue before, you don't have to actually touch this at all. The idea is, we can see here a send prompt message. It sends us a user message, and the model that we just used a second ago, and then the messages so far, which is just an array of previously existing messages. This user message is what we're going to use to do our search. Obviously, we want to respond to what the user just typed to us. So we'll just say, we'll capture that locally, so user message is body.userMessage. Perfect. And then we wanted to also have the historical context of the conversation as well. We have all of the messages in this messages list here. We'll do a little bit of a filter on that to get the full message set so far. So messages, message objects, is body.messages.map. And for each of these, we want to say, message.role. So there's a special role that Blank Chain assigns to the chatbot response, and that's assistant. So we'll say, if it came from the assistant, then label that as like an AI message. Otherwise, we'll just use the message itself. So, and we also want in that array to include the content of that message. Yeah, okay. So we have our messages that come from the user, we have our context here. We want to now feed that into our model to actually send us back what we're interested in hearing about. So this is where we actually get into programming the AI and the prompt with prompt space engineering. So if you have never peeked behind the covers before, most chatbots have a parameter programming set where they are set up in human-readable format to be like, this is your name, you're never supposed to talk about how you're going to destroy the world, all those kinds of things. Just kind of what we're doing that for this setup here. This is a chatbot, let's see, look at my notes again for where this comes from. Chat, wrong, not gonna give it to me free this time. Slang chain, chat prompt template. Okay, so the current type of activity that is happening right now is the chat prompt template. That is what we're looking for from this and again, we can see here this is a again tailor made for chat prompting essentially and it has very good documentation kind of what that looks like. But here's again where we're saying you know what our our sort of system looks like, the that the chat bot. So let's go ahead and make it happen. Chat prompt template dot from message messages. And you give it both the system let's see is a system that's a little thing that comes from somewhere unique. This system message prompt template also in the same spot actually. And human message. System message plus template. That has a fun template method that we call and this is where we're defining what our what our system looks like. I'm actually going to copy and paste this from my GitHub repo because I already kind of defined this a little bit. But these are the rules. Again, you're defining the rules for your chatbots. So, for my case, let me just copy this. Yeah. This is where again I get to kind of tell the tell the AI who it is and what its rules are. So, in this case, I'm telling I'm saying that you know you're today's Valentine's Day, you're spreading the love with everybody. And its name is, the fact that it's supposed to use poems that we've already kind of seated here. It tends to be a little bit wordy, so, I like to try to limit it to just a few sentences. And yeah. It uses that message objects that we were just talking about as like the context into that. And the same for the prompt template. Again, we also want to tell it what the human has said. So, that's what this human message prompt template from user this user message maps back to the body which maps to the message that comes from our front end here. And just like that, we have defined a defined our bot really. Thanks for stopping in, Raiz. Appreciate you. Cool.

23. Completing the Lane Chain

Short description:

We set up our prompt and import necessary dependencies. We define the poems and user message as inputs. We format the documents as a string and use the string output parser. We stream the user message and return the result using the Hono function. Our chain is almost complete.

We're super close. We're near in the home stretch here, I promise. So, now we have our prompt setup. We want to actually use that in, you know, the lane chain. Lane chain really comprises of what's called like runnable sequences. I would have figured that would be an export that's available. But apparently not. Runnable sequence for core runnables. Okay. So, import. Lane chain. No CloudFlare account ID? Yes. Yeah, yeah, yeah. That will happen. This is the to do we have here. We need to fill this out. To do to actually get that. And that's what we'll get there. It takes us out of the coding a little bit to do that. So, I figured we kind of circle back on that. But that's kind of expected for now at least. It's a good shot, though. So, runnable sequence. If you're trying to run the already completed one, it means that you need to go to your ReadMe. I'll tell you what, if we don't finish this in five minutes, we'll just dump the completed one, and I'll show you exactly how to make it work. But until then, we'll try to actually get through this. Because it's a little bit cumbersome. If you haven't set up a CloudFlare API token before, it's a little bit of a... Yeah, you kind of have to go through some dash pages to do it. But we'll get there. Cool. So, we have our imports now. We want runnable sequence. Runnable sequences are the core of the chain. It's where the chain comes from. It's running a series of operations and an LLM.

Now we have our prompt. That's one of our inputs into this chain. So, the first thing we're gonna do is we're gonna say that, hey, the poems in this come from the retriever, the vector store retriever, and that we're going to pipe that into a function that does some special formatting for us, but also comes from LaneChain out of the box. It's just going to take the documents that we created earlier and bring them together into a string. And basically say that they were originally string types, so it's safe to read them as string. So, that comes from LaneChain to document. And it's format document as string. Cool. We've got that. And, let's see. We also want to import our user message. These are the inputs into this prompt here. You can see that there's like a template for poems and a template variable for user message. We're defining what those are here so that we can actually use that. We want to just pass things through as a user message, though, so do that. And again, we're piping that into the prompt so that we can get that filled out. And also, into the chat. String output parser is what we need. And then we'll go to LaneChain. And because we're going to read this as a string, so parse this as a string. String output parser. All right. So, now we can stream. Oh, wait. We're going to stream this so we can kind of get, like, a, you know, as they're typing style thing back. We could avoid that and just do it all at once if we wanted to, but it's kind of fun to get the realtime message style thing going, I think. So, we're going to stream the user message, and we're going to, oops, return, stream text. Oh, Hono. That's the Hono function. And that goes to C, which, yeah, this originally kind of came into our function, contains all the context. And let's see here. This is an anonymous function. Now, let's see if it takes. Or, wait.

24. Setting Up API Tokens and Environment Variables

Short description:

We set up the necessary tokens for Cloudflare and workers AI. The tokens allow us to read and retrieve data from the store and enable the AI model to function properly. We created an API token with the required permissions and set the account ID as an environment variable. If you are following along, these variables were previously empty strings, but now they are filled out. Each participant needs to create their own API token.

Oh, yeah. So, we want to do each token. Well, this is a little bit of an overloaded term. It's not the same tokens we were talking about earlier. This is a token in our text stream from Hono, but this is out of the AI space back into, like, response space. But essentially, it allows us to write partial data to the front end. Oh, that's what we streamed out, right? And that is our, should get us there. Again, a little bit, this is, there's a lot going on here. I definitely understand if it's hard to follow. But we have our AI embeddings in our store, so we have all our stuff to get the actual embeddings from whatever was sent to us. We have the ability to read, retrieve existing stuff from the store, which is good. We have, this is the chat model to give us our stuff back, our parameters for the actual bot itself, along with a message that the user sent us, and then just some stream stuff to send things back and forth. This should be all we need. And, you know, I imagine it probably won't work the first time. So, let's, since we're nearing time here, let's go ahead and I'm just gonna stash everything locally I have and move to the completed branch and we'll show you what it takes to get that stuff running. Let me move back my regular toml to the new stuff though. Demo. And demo. Cool. So, we need, we had to do's in our source to fill out the, where was this? Oh, yeah, up here. The API tokens for Cloudflare. And we need that for workers AI to do its thing. There is a little spot in the readme that does tell you how to set these values, but it doesn't tell you where to actually get them from. So, let me walk you through that process and we'll do it real quick for mine as well. So, I have everything set up here. I'm going to run this first command. Regular secret put. So, secrets are protected environment variables. And you just give it the name of the variable you want. This will give you a space to enter a secret value. Account ID comes from, there's a couple different places you can get it. The easiest way is probably to just go to the URL. Well, actually, if you're on the pages, and the workers and pages page already, you can just copy this right here. But it's the same value that's in the URL for the account, too. So, either way, one of those works. So, I just hit the copy value there, paste it here. Success. That's good. Now, for API token, it's a little more complicated. I'm going to, let's copy this again, get this set up to start running. I need to make an API token that allows reading of the workers AI for this application. So, I have to actually go to up here in this upper bar, click on the little user dude. My profile. Over here on the left is API tokens. And we can see we'll have a couple already for our Angular logins. This is the little off screen that we agreed to earlier, but we want to make a new token that we're able to use for this project. So, click on yeah, create token. I don't know if any of these templates have everything we want here. I think honestly, we probably just need workers. This is probably all we need. So, let's go ahead and just use that template. I'm going to rename this to you don't have to even rename it if you don't want to, but I will just for the most sake here. I'll rename this to DevOps JS workshop. Workers AI. Okay. And we can see that this, the permission that gives us is the ability to read from workers AI, which I believe is all we need for this for this token. Um, we can limit it to just an account. That's probably not a bad idea. Just say it's only valid for this shared account here. Don't have to worry about IP filtering. You can if you're hyper paranoid, but it's not required. And then you can find a TTL for how long that token lives. I'll just skip by until Saturday because it's not going to live too long. But if I click on okay, it gives me a little overview here. And I click create token, which gives me this page. So, I'm going to copy that value, go back to my terminal that has this setup. And, boom, I have now set the token and the value or the account ID as necessary. And if you are following along, these were empty strings. If you're on the completed branch, these are filled out now because it does actually add those as environment variables to the account context that comes in. Do each of us have to create an API token? Yes.

25. Creating API Tokens and Deploying

Short description:

We set up the necessary tokens for Cloudflare and workers AI. The tokens allow us to read and retrieve data from the store and enable the AI model to function properly. We created an API token with the required permissions and set the account ID as an environment variable. If you are following along, these variables were previously empty strings, but now they are filled out. Each participant needs to create their own API token.

This is user account specific, not kind of global account specific. I mean, I could share mine with you, but you would be using my account. So, it's probably not a good idea. It's definitely something you want to create on your own. And, again, if you need me to walk through that again, I can. That was API tokens over here. Create token workers AI. And then you could do a little filtering there, if you wanted to, but even just rolling with the defaults there is enough. We can see that that does exist here.

So, if everything went well, we should be able to deploy this. And fingers crossed. Oh. I think the tail. Sorry, she this time. I have. Oh, shoot. Sorry. Maybe that's the whole reason that wasn't working. I keep telling you to run NPX Wrangler deploy. This says something special with Vue. Instead of running NPX Wrangler deploy, if you just run NPX run deploy, this passes an assets folder which loads the data. Oh, man. I'm sorry. I'd forgotten that was a thing for this project. And now, if I load this. Sure enough, we have our little chat. And we could say, what animals remind you of love? And it's responding to us. This follows the goose and the gander that we were looking at earlier. But, tada. We have a working chat app. This is the exact same code that we were just looking at.

Sorry. The missing piece was the NPM run deploy. Not Wrangler deploy. I hope you enjoyed this. I know we ran up on time there. That was a full three hours. I am happy to stay on if people want to get things 100% working. I'll stick around for an extra ten minutes or so. But I know we are at time as well. It was a pleasure. And really, really appreciate you all sticking around and hanging out and hopefully some interesting takeaways. Actually, I have a small outro here. Thanks for making, congrats on making it through. Again, I know that we all know this already. But, serverless AI is not a silver bullet. But it's pretty good. And pretty easy for rapid prototyping. Especially things like rags where you're able to have more static models. And you can use existing solutions for vector databases and stuff like that. What we made was a really cool poet. It's able to give us love poems and quote them in its own little quirky way. But imagine if this was also like actually using developer docs or something like that. It would be super, super useful. In fact, we have a little product called CloudFlareCursor that uses almost exactly the same setup. Instead of seeding love poems, we see the CloudFlare worker docs. And you can ask this, you know, how do I set up a worker that redirects to HTTPS? And it will, you know, use the docs it has and the context it has to give you an actual response for something like that. So, we had a little fun, fun little demo of this. Very, very powerful if you apply it to other applications and stuff like that. I'll put these slides up. But I have a couple, you know, further reading and further kind of learning links as well. But thank you so much. And, again, if you have to drop, I totally understand. Otherwise, I'll hang out and we'll get this working for folks. You wanted me to reshow you the creation of an API token. Yes. If you go to API sorry click this little user dropdown up here and go to my profile. That will give you this screen. And over here on the left is API tokens. Where you can do create token.

26. Using Workers AI Template and Pricing

Short description:

You can use the template from Workers AI beta without making any changes. The value obtained on the next screen needs to be pasted into the Cloudflare API token. The cost is currently free for workers paid plan users, but there will be pricing changes in the future. The cost depends on usage and data set size. Feel free to ask questions on the CloudFlare Discord channel for Vectorize and Workers AI. We appreciate your participation and hope you enjoyed the workshop. Happy Valentine's Day!

And just using the template a workers AI beta will get you there. Honestly, you don't even have to change this if you don't want to. You can just go to continue to summary and you'll be fine. But that value that you get on the next screen is what you want to paste into what the Cloudflare API token this command gives you. Cheers, y'all. And, again, I appreciate you all joining. It's great. I'll monitor questions for the next couple minutes. But, have a lovely Thanksgiving, or not Thanksgiving, Valentine's Day.

Yeah, okay. I was saying it's only in this case, yes, it's only restricted to the poems. But that's because we told it to in our configuration or parameters for the bot. We said use at least two lines from a poem in every single response and don't make up your own. So, it's smart enough in this case to know that it has to use something from this list that we've given it here. But it doesn't have to be the case. You can definitely give it a much more open ended operating spectrum. A lot of that's kind of up to you. And it's kind of interesting. Prompt-based engineering is its own little microcosm of stuff to consider.

Frank, you have a good question about the cost. At the moment, everything is free. I'll have to put an asterisk on that. It's free if you have a workers paid plan. The Vectorize itself, Vectorize is only available to paid workers users, which is why we had to go through the process of adding people to a specific account to actually make this happen. This is while they're in beta. We do plan to change that. We do plan to change the pricing once you kind of go GA. All that pricing information has been released to, is already public information out there. But it, at least, it's a way for us to it. As we mentioned with serverless, it's all usage based. Vector storage, is kind of depends on how often you're querying your data. In this case, it would be not even a penny probably. Because we're not hitting this very hard. But based on what your traffic looks like, you can see you get 30 million for free once we get there, once this goes GA. Unless you're hitting things pretty hard, it may not cost you much at all. But I get a lot of that depends on usage based stuff and there is some calculation for usage examples and stuff on this page. But for hobbyists and users like that, it also depends on your data set size. But I could theoretically be not much, very, very little. I think I covered what I said. I kind of repeated what I had originally missed there. But if there's anything specific you want me to recap, I can cover that. I do have about two more minutes before I'm gonna have to have to drop for other obligations. But I still see people chatting, so I'm not about to leave yet.

Yeah, Forrester. You're welcome. We have a Discord. If you haven't checked out the CloudFlare Discord yet, that's where a lot of us kind of hang out. But there's a specific channel for Vectorize and one for Workers AI as well. Yeah, you're totally, totally welcome to ask any questions you have. This goes for anybody. Not just Forrester here. But yeah, if you have any questions that they think of after this and they want clarification on, hop on CloudFlare Discord, look for that Vectorize or the Workers AI channel. And just drop a line in there and we'll get back to you. We're all pretty good about monitoring that stuff. But cheers, y'all. I think that was really, really fun. I'm amazed that we were actually able to fill the three hours that we had. We managed to do it. And I should do it successfully. So, yeah, I've also got some free time. Is there a docs without prompt template? Yeah, we have a lot of templates and tutorials for this. Vizual. I'm probably saying that wrong. I apologize. A lot of templates out there for this already. My source code out there as well. With a totally free account, this won't work because you do have to have workers paid to work with Vectorize. But you can not, you know, you only need to pay for workers to make that work. There's not a specific billing tier for any of the other AI stuff on top of workers, for the time being at least. All right, y'all. Cheers and have a lovely Valentine's Day. Have fun hanging out with y'all. Peace.

Watch more workshops on topic

Working With OpenAI and Prompt Engineering for React Developers

React Advanced Conference 2023

98 min

Working With OpenAI and Prompt Engineering for React Developers

Top Content

Workshop

Richard Moss

In this workshop we'll take a tour of applied AI from the perspective of front end developers, zooming in on the emerging best practices when it comes to working with LLMs to build great products. This workshop is based on learnings from working with the OpenAI API from its debut last November to build out a working MVP which became PowerModeAI (A customer facing ideation and slide creation tool).
In the workshop they'll be a mix of presentation and hands on exercises to cover topics including:
- GPT fundamentals- Pitfalls of LLMs- Prompt engineering best practices and techniques- Using the playground effectively- Installing and configuring the OpenAI SDK- Approaches to working with the API and prompt management- Implementing the API to build an AI powered customer facing application- Fine tuning and embeddings- Emerging best practice on LLMOps

artificial intelligence

Building Your Generative AI Application

React Summit 2024

82 min

Building Your Generative AI Application

WorkshopFree

Dieter Flick

Generative AI is exciting tech enthusiasts and businesses with its vast potential. In this session, we will introduce Retrieval Augmented Generation (RAG), a framework that provides context to Large Language Models (LLMs) without retraining them. We will guide you step-by-step in building your own RAG app, culminating in a fully functional chatbot.
Key Concepts: Generative AI, Retrieval Augmented Generation
Technologies: OpenAI, LangChain, AstraDB Vector Store, Streamlit, Langflow

artificial intelligence

High-performance Next.js

React Summit 2022

50 min

High-performance Next.js

Workshop

Michele Riva

Next.js is a compelling framework that makes many tasks effortless by providing many out-of-the-box solutions. But as soon as our app needs to scale, it is essential to maintain high performance without compromising maintenance and server costs. In this workshop, we will see how to analyze Next.js performances, resources usage, how to scale it, and how to make the right decisions while writing the application architecture.

performance next.js architecture best practices

Building Serverless Applications on AWS with TypeScript

Node Congress 2021

245 min

Building Serverless Applications on AWS with TypeScript

Workshop

Slobodan Stojanović

Slobodan Stojanović

This workshop teaches you the basics of serverless application development with TypeScript. We'll start with a simple Lambda function, set up the project and the infrastructure-as-a-code (AWS CDK), and learn how to organize, test, and debug a more complex serverless application.
Table of contents: - How to set up a serverless project with TypeScript and CDK - How to write a testable Lambda function with hexagonal architecture - How to connect a function to a DynamoDB table - How to create a serverless API - How to debug and test a serverless function - How to organize and grow a serverless application

Materials referred to in the workshop:
https://excalidraw.com/#room=57b84e0df9bdb7ea5675,HYgVepLIpfxrK4EQNclQ9w
DynamoDB blog Alex DeBrie: https://www.dynamodbguide.com/
Excellent book for the DynamoDB: https://www.dynamodbbook.com/
https://slobodan.me/workshops/nodecongress/prerequisites.html

typescript serverless aws node.js

Leveraging LLMs to Build Intuitive AI Experiences With JavaScript

JSNation 2024

108 min

Leveraging LLMs to Build Intuitive AI Experiences With JavaScript

Workshop

Roy Derks

Shivay Lamba

2 authors

Today every developer is using LLMs in different forms and shapes, from ChatGPT to code assistants like GitHub CoPilot. Following this, lots of products have introduced embedded AI capabilities, and in this workshop we will make LLMs understandable for web developers. And we'll get into coding your own AI-driven application. No prior experience in working with LLMs or machine learning is needed. Instead, we'll use web technologies such as JavaScript, React which you already know and love while also learning about some new libraries like OpenAI, Transformers.js

artificial intelligence machine learning

Serverless for React Developers

React Summit 2022

107 min

Serverless for React Developers

WorkshopFree

Tejas Kumar

Intro to serverlessPrior Art: Docker, Containers, and KubernetesActivity: Build a Dockerized application and deploy it to a cloud providerAnalysis: What is good/bad about this approach?Why Serverless is Needed/BetterActivity: Build the same application with serverlessAnalysis: What is good/bad about this approach?

serverless web development beginner friendly

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Scaling Up with Remix and Micro Frontends

Remix Conf Europe 2022

23 min

Scaling Up with Remix and Micro Frontends

Top Content

Adrien Baron

Maker of clashofstats.com, Vue GWT and Tiny Frontend

Do you have a large product built by many teams? Are you struggling to release often? Did your frontend turn into a massive unmaintainable monolith? If, like me, you’ve answered yes to any of those questions, this talk is for you! I’ll show you exactly how you can build a micro frontend architecture with Remix to solve those challenges.

javascript micro-frontends architecture remix

Full Stack Components

Remix Conf Europe 2022

37 min

Full Stack Components

Top Content

Kent C. Dodds

Creator of EpicWeb.dev, EpicReact.Dev, TestingJavaScript.com

Remix is a web framework that gives you the simple mental model of a Multi-Page App (MPA) but the power and capabilities of a Single-Page App (SPA). One of the big challenges of SPAs is network management resulting in a great deal of indirection and buggy code. This is especially noticeable in application state which Remix completely eliminates, but it's also an issue in individual components that communicate with a single-purpose backend endpoint (like a combobox search for example).
In this talk, Kent will demonstrate how Remix enables you to build complex UI components that are connected to a backend in the simplest and most powerful way you've ever seen. Leaving you time to chill with your family or whatever else you do for fun.

javascript architecture fullstack remix

Understanding React’s Fiber Architecture

React Advanced Conference 2022

29 min

Understanding React’s Fiber Architecture

Top Content

Tejas Kumar

International Keynote Speaker, Germany

We've heard a lot about React's Fiber Architecture, but it feels like few of us understand it in depth (or have the time to). In this talk, Tejas will go over his best attempt at understanding Fiber (reviewed by other experts), and present it in an 'explain-like-I'm-five years old' way.

react architecture react 18 concurrent rendering beginner friendly

Building a Voice-Enabled AI Assistant With Javascript

JSNation 2023

21 min

Building a Voice-Enabled AI Assistant With Javascript

Top Content

Tejas Kumar

International Keynote Speaker, Germany

In this talk, we'll build our own Jarvis using Web APIs and langchain. There will be live coding.

case study artificial intelligence

AI and Web Development: Hype or Reality

JSNation 2023

24 min

AI and Web Development: Hype or Reality

Wes Bos

Full Stack Developer, Speaker & Teacher, Co-host of Syntax.fm podcast.

In this talk, we'll take a look at the growing intersection of AI and web development. There's a lot of buzz around the potential uses of AI in writing, understanding, and debugging code, and integrating it into our applications is becoming easier and more affordable. But there are also questions about the future of AI in app development, and whether it will make us more productive or take our jobs.
There's a lot of excitement, skepticism, and concern about the rise of AI in web development. We'll explore the real potential for AI in creating new web development frameworks, and separate fact from fiction.
So if you're interested in the future of web development and the role of AI in it, this talk is for you. Oh, and this talk abstract was written by AI after I gave it several of my unstructured thoughts.

productivity artificial intelligence

The Rise of the AI Engineer

React Summit US 2023

30 min

The Rise of the AI Engineer

Shawn Swyx Wang

Shawn Swyx Wang

We are observing a once in a generation “shift right” of applied AI, fueled by the emergent capabilities and open source/API availability of Foundation Models. A wide range of AI tasks that used to take 5 years and a research team to accomplish in 2013, now just require API docs and a spare afternoon in 2023. Emergent capabilities are creating an emerging title: to wield them, we'll have to go beyond the Prompt Engineer and write *software*. Let's explore the wide array of new opportunities in the age of Software 3.0!

future of development web development builders and founders artificial intelligence