1. Introduction to the Speaker's Journey
With AI and web GPU, it's an exciting time to be a developer. A lot is going to change, including the way we make apps. I have been making stuff with computers since I was a kid, combining programming and design. However, using different tools for each was creatively stifling. I always wanted to do design and development at the same time, but it seemed impossible.
So, we will get to this question in a few minutes. But, with AI and web GPU, it's just an exciting time to be a developer. It looks like a lot is going to change. I think we're making different apps. Are we going to even call them apps anymore? I'm not sure. But a lot is going to change. And let's explore some of those possibilities today.
Now, who am I? My name is Aryami Naim. Like most of us here in this room, I have been making stuff with computers since I was a kid. This is how the programming side of it looked like in the very beginning. And this is how I did design stuff. I'm actually not that old. I just had old, cracked versions of these. But I was making little apps, little sites, little games. And so it always required both programming and design. So more programming tools, more design tools sort of pile up. And it was always weird. Because when you're making a game or an app, on a day, you're making hundreds, if not thousands of these micro design and development decisions. And a lot of those decisions just don't fit within, let's say, VS Code or 3D Studio MAX. They span the whole spectrum of making an application. So it was always weird that I had to either be in VS Code, for example, or at the time, say, Dreamweaver or in Photoshop. And I had all these little micro ideas. And I felt like in the span of switching between one app to the other, a bunch of them would die. It was just creatively stifling. So it was always weird. And I always felt like, I want to do design development at the same time, but every day I'm waking up, and Morpheus is basically offering me one of two pills. Take the blue pill or the red pill, do either design or development. There's no good or bad pill here, I'm stretching a metaphor right now. But you have to pick one. And I was always like, can I take them both? And he was like, that's not how it works.
2. The Violet Pill and Pure Blue
I always wanted to take design and programming together, seamlessly. Flash was a design and development environment, but it wasn't the right fit. Many attempts have been made to create a seamless design and development environment, but it's a hard nut to crack. That's why I started with pure blue, a powerful programming environment.
And so I always wanted to take them both, like a violet pill. You can just design and program in the same environment seamlessly. I looked for this environment, one of them was this. Anybody remembers this? Yeah? Okay. It's a young audience here, just very few hands.
This is Flash and some people love Flash. I love Flash. Yeah! Give it up for Flash! We have like five people over 30 here. Okay, so for those of you who don't remember, Flash was a design and development environment and I loved it and a lot of people loved it. They were making awesome stuff with it. Was it that violet pill though? Not really. Like you could do programming and design in the same operating system window, but the programming was limited, the design tool wasn't as expressive as say Photoshop or Max and stuff like that. So, it wasn't really a violet pill. It was more like, you know, a blue and red lollipop. It was tasty. It was really good, but not the right stuff.
3. Pure Blue and Violet
If the CPU and GPU can handle that, if the display can actually display the visuals, the programming tool can actually do that. Let's start with some pure blue and then add little bits of red to turn it into violet. We added a sequencing tool for web animation and native devices, followed by a 3D composition tool. The New York Times used Theater.js to reconstruct shots of the World Cup in real-time, allowing users to follow the games through 3D visualizations. Other examples include a recruiting page by Planet and Wildlife and a well-crafted scrolly by Studio Freight. Adding AI to the mix is the next step, and the non-AI scene is built with 3JS, React 3 fiber, and React, with editing capabilities using Theater.js.
If the CPU and GPU can handle that, if the display can actually display the visuals, the programming tool can actually do that. So, let's start with some pure blue. And then add little bits of red to it. It turns, like, violet slowly.
So, the first bit of violet that we added was a sequencing tool. So, this is for people who make animation on the web or, you know, and also some kind of native devices. Then we added a 3D composition tool, and as we go on, we just add, you know, more and more of this violet stuff. Now, it's kind of...it's really crazy how much people can achieve with just that little bit of violet.
So, for example, here's The New York Times. They have been, like, covering, for example, the World Cup. They had some guy in Qatar taking, like, just, I don't know, hundreds, thousands of shots of the game and beaming them to New York, and then they were reconstructing the whole in New York using Theater.js to put the whole thing together because you have designers, developers, journalists, all working together. You don't want to hand things off in between them. It's all, you know, it's a newsroom. It has to work really fast. So, they used Theater.js, and because of that, you can actually, like, follow...you could follow the World Cup and you could follow these 3D visualizations of games, you know, on that same day. This one is another example by Planet and Wildlife. Probably the most hardcore recruiting page ever. And, this one is just probably the most well-crafted scrolly out there by Studio Freight. Big fan of them. So, yeah. The project is called Theater.js. You can check it out on GitHub. We're just adding, you know, more and more violet to it as we go.
All right. So, all of that stuff required no AI. So now let's see what happens when we add a bit of AI to the mix. All right. So, here's my non-AI scene. You know, it's made with 3JS, React 3 fiber if some of you know that, and React, basically. And you know, I can sort of edit things using Theater.js.
4. AI in the Creative Process
Adding AI to the creative process can save time and enhance the workflow. Waiting for a chatbot-like AI can be time-consuming, but with a co-pilot AI, you can work more efficiently. The process involves handling variables, requesting specific effects like studio lighting or film grain, and utilizing code to initialize the GPU and LLM.
Like I can change the variables and stuff like that. Now, that's one way to do it. Now, we have added a bit of AI here. So I can say something like, can we have some studio lighting here? It's a little loud and I don't have my microphone, so it might not actually work. But we'll see. And my internet is not good, so it takes a while. Normally it should take less than a second. So give it a moment. Come on, GPT. I might actually not have internet. Let's see. Well, they said demos are hard. Oh, there you go. Okay. Now, this takes a second normally. By the way, right now we just saved about 10 minutes from an experienced user and about maybe 15 to 20 minutes from someone who is just starting out.
Can we add some film grain here? Now here's the thing. This is the first thing I want to mention about making things with AI. If the AI acts like a chatbot, you can wait for a chatbot. You can wait for GPT-4 to finish its thoughts. Sometimes I choose GPT-4 over GPT-3 because I like the quality of the answer. But if the AI is a co-pilot, if you want it to help you in your creative work as you're doing it, you just don't want to wait for things. All right.
So how does this actually work? Well, this is a code base. It's a pseudocode. But I think it's pretty obvious what's going on. You get a handle to your GPU. You initialize your LLM. You have to warm it up. We haven't done that.
5. The Infinite Loop and App Replacement
The process involves an infinite loop where you wait for a voice command from the user. You generate a prompt based on the command and apply it to the app. This iterative process continues by replacing the app and repeating the steps.
That's why it took a while because we're using a cloud offering. I'm going to tell you later why. Then you have this infinite loop. You wait for a voice command. The user says something. You take that command and you generate a prompt. And the prompt is basically something like, hey, here's my app and here is a command. Please do that command to my app. Right? So I just asked to give me studio lights and it adds studio lights to the app. Now we get that new app. We hot replace it and we rinse and repeat. And that's how the whole thing basically works.
6. Challenges in Production App Development
Setting up the first iteration of this process takes about half an hour to an hour. However, a production app requires more preparation, including handling errors, internet problems, and avoiding mode switching, which can hinder creativity.
Now, of course, this is just the first iteration. Like, if you want to set something like this up, it just takes, I don't know, maybe half an hour or one hour. And that's a far longer way, let's say, from a production app.
A production app requires that, for example, you have to be ready for the LLM to hallucinate. Like give you an app that you just cannot run. So what you do there is, for example, you could retry it or if you have a large LLM, you can just feed the error back to it and it just so that you can give it another go.
Another thing is, as we just saw, if there's an internet problem, or for example, your LLM is just too large and it just takes a while for it to give you an answer, then that's going to be... What it does to the user is that the user kind of has to switch modes. Like on one mode I'm just making stuff with my hands, like editing stuff. And the other mode, I'm like, talking to some agent... This switching mode thing, that's also creatively, just kills little ideas, you don't want to do that, you want to have something that works really fast.
7. AI Writing Code and Different Commands
Alright. Let's try something else. Can we have a widescreen mode when I press the space bar? Oh, it's a white screen. Can we have anamorphic mode? If the LLM understands me, I don't need to be at the keyboard. Hand tracking and pose tracking are more intuitive inputs. The AI can write code for you. How would the command to go into widescreen mode be different? The AI would write code.
Alright. So, all of this was very February 2023, like, you know, there's an LLM editing an app, what else is new? So let's try something else. Can we have, like, when I press the space bar, we go in widescreen mode, and when I press it again, we go back? Everybody, cross your fingers for me, please? Thank you. I want to do this with the local LLM, but I just want to show you how things work in production.
Oh, it's a white screen. Can we have it so that when I press the space bar, we go anamorphic, and when I press it again, we go back? That didn't work. All right. Could you imagine for me that that would work? Wow! Everybody say, wow! So actually this is what's happening here. Let me just explain. So there is this voice model called Whisper, right? And it actually understands everything that you tell it almost better than a human being. Actually in a programming context, it's actually better than a human being. We didn't use it here because we wanted to use a local model just to make things go a little faster. But I didn't realize that all the noise here makes it misunderstand me. But if it didn't misunderstand, which is actually pretty easy to get to, then you're not even bound to the keyboard anymore. If I am sure that the LLM is going to understand me, then I don't need to be here always correcting it. I just said anamorphic and it took anamorphic. If that doesn't happen, then I don't have to be at the keyboard anymore. I can walk away, do things with basically a remote control device. This is just for show. People are not going to be carrying little game controllers. Things like hand tracking and pose tracking are going to make much more sense. Those models have gotten really good in the past couple of months. Six weeks. Two things. You can basically walk away from the keyboard and trackpad. I don't think these would even be the primary modes of input after a while. And also you can have the AI write code for you.
So, here's a question. When I asked the AI to, when I pressed the spacebar, go into widescreen mode and when I press it again, go back, had that worked, how would that command be different from the previous commands that I gave it? Anybody have a guess? Now, there the difference is that this time the AI actually would write code. It would write an algorithm.
8. The Power of User Editing and the Role of React
It would change the algorithm of the application. That actually works. The user can add behaviors to your application or just remove things and just entirely change the nature of your application. However, we still want to give some of that power to the user, so we have to find this sweet spot between too much power and no power at all. And for me, that's where React comes in. The React model says that you can divide your application into small building blocks, and these building blocks are either self-contained or their requirements are explicit. It's just an organisational structure that fits well with an LLM editing our application. Let's have a look at a simple React tree and discuss why an LLM would be well suited to edit this application.
9. React Components and WebGPU
React components can be more fine-grained and smaller, thanks to the LLM model's ability to handle dependencies. WebGPU enables running AI models locally on a machine, prioritizing privacy, latency, and cost.
First of all, does child A have access to context B? Of course not. We can just look at it and just tell. It's explicitly encoded in how we represent the React application. Also, if child B breaks, like it throws an error, if it has an error boundary, it's not going to affect child A, right? It's just... It's its own thing. Now, that happens to work really well for an LLM. But first of all, if it's just editing one component, the source of one of these components, well, if the component breaks, we can basically just retry the LLM run or we could even feed the error back to the LLM and let it basically have another go at it. So, React happens to fit really well with the LLM model. What that actually means in my opinion is that we're just going to be writing more fine grain smaller components. Because do you guys remember at one point we used to separate our components between presentational and logical? We wanted to have the smallest unit of React component possible in order to make the whole thing manageable. But that didn't work out because there was just way too much dependency to track between components. A component could have like 10 props and then you have to get those props and pass them to the next component and it was too much to keep in mind for a human programmer. But guess what, that's not a problem for an LLM. An LLM can easily keep track of hundreds of these dependencies. So what this means, I think, is that we're just going to have tiny, tiny React components and the LLM is just going to basically edit them for us.
Now let's talk about WebGPU. Starting with some definitions, what is WebAssembly? WebAssembly allows you to run untrusted code that doesn't come from someone that you know safely on your CPU fast. Well, WebGPU does the same thing with the other processor on your computer. And that is really good because AI models happen to really love GPUs. So that means that we can run an AI model locally on our machine. Why would we want to do that? Well, we're probably making some sort of... We're basically deciding in favor of privacy, latency and cost. So for example, with privacy, you know, of course, if there's a medical application or something, everybody knows why privacy matters there. But in the case of a creative tool like TheaterGist, for example, I think creative tools are like tools for thought, and it actually helps to be able to create in private for a while that can actually be liberating. So I think privacy even matters in creative tools. Latency, of course, also matters. And as we just saw here, you know, if it bored the audience members, it's going to bore the creator. So, you don't want a chatbot sort of wait times. You want things to just, you know, happen really fast. So, you would put an LLM inside a local machine.
10. Running LLMs on WebGPU
Can we run LLMs using WebGPU? Not yet, as the models are not trained enough. However, there are star coder and replicate code models being trained currently. Once optimized, a code editing instruct model can edit a scene inside the browser.
And of course, cost also matters, but I'm going to skip over it because we don't have that much time.
Now, can we actually run these types of LLMs on using WebGPU? Well you actually can. These are two examples. You can just, you know, try them. Try them in your, you know, latest version of Chrome, for example. They work pretty well. Can they actually do what I was showing you there? Actually no, they can't. Not yet. They're not there yet. The models are not trained enough at the moment. The models that you can actually run fast on your machine. That's why we actually use an online cloud-based model. But they're actually getting there. There's a star coder model and the replicate code model. These are being trained like, you know, right now. Like they're just, you know, getting better and better basically. And, you know, they're pretty small, like 15 billion parameters, 7 billion parameters. They work really fast. They run on a local machine. They also have a lot of WebGPU optimization, low-hanging fruits. Like right now, if you try to run them, it takes a bit of work and it also is still slow. But that's because the WebGPU pipeline is just not optimized yet. So, once it does get optimized, you can actually run a code editing instruct model that basically edits a scene that's, you know, your Notion page or something like that, basically running inside the browser.
11. AI's Impact on Normal Apps
Now you can either wait for these models to mature or you can start developing right now. AI affects both creative and normal apps. Some people are shocked by AI's coding capabilities while others are oblivious. To understand AI's impact, try an experiment with the Uber app and GPT-4. Use the Uber API to create a chatbot that can order rides. Then add the Lyft API for more options and functionality.
Now you can either wait for these models to mature or you can just know where things are heading and start developing right now. So in the case of theater, we're, yeah, I'm going to just, you know, cut into the Q&A time a little bit. All right. So you can either, you know, see where things are heading and develop your application right now. Use a cloud based model and then later you can actually switch back to a local model if it actually makes sense in the case of your application.
Now, until now, we talked about how AI affects a creative application like Theater.js, Blender or even like a productivity application like Notion. But how does it affect a normal app like Uber? There's a lot of, you know, fear, uncertainty, and doubts just going around. I think some people are in just the state of shock and awe with how much, how good AI can code. And some people are entirely oblivious. Maybe that even describes some both of those descriptions apply to some people.
12. The Future of App Development
Are all apps going to go away? I don't think so. Games are not going to go away. I don't think creative apps are going to not be an app anymore. But getting a ride, for sure. You can do that, again, yourself, as a developer at home, it takes a day. Now, is that cause for fear? I don't know how your psychology works. Fear is good for some people. To me, it's just exciting, eventually, when you look at it. Because now we can serve users in a much bigger way.