1. The Story of Vacation Tracker
Hello! I'll tell you a story about Vacation Tracker, a serverless startup using Node.js. It all started with a simple lambda function, and now we have many lambda functions. In 2016, we decided to solve our own problem of tracking leave and remaining days. In 2018, we received requests for a private beta and decided to build a system connected to Slack and calendars. Many startups, companies, and organizations now use our system.
♪♪ Hello! I'll tell you a story about serverless startup. The serverless part is definitely not the most important part of our startup, but on the other side, it's really cool story for someone that is a programmer and working with Node.js and other technologies.
So I'll tell you a story about Vacation Tracker. As I said, at the moment we are 100% serverless startup using Node.js, but everything started with a simple lambda function. Then we added another and another and another. And yeah, that escalated quickly. So now we have a lot of lambda functions, and I'll try to walk you through our story from the first lambda function to the current state in production, and we started more than three years ago.
So first in 2016, we decided to build a live tracking system. Actually, I'm lying. We decided to solve our own problem because our other company, Cloud Horizon, had more than 10 people at that moment. And it was really hard to track who's off, how many PTO days they have remaining for this year, and things like this. We built... We tried to do internal Hackathon, and as every Hackathon, we didn't build anything. So in 2017, we tried to build, to solve our own problem. We tried to find some other live tracking tools, but most of them were like complex HR systems and things like this. So we decided to build something in-house, and we'll build up some kind of proof of concept with Slack. And as always, we decided not to continue at that moment, but we published the landing page.
In 2018, we got a lot of requests through our landing page, more than a hundred people waited in the waiting lists for a private beta, so we finally decided to build something. The idea was really simple. We wanted a system that will track leave requests and the number of remaining days. We wanted to use some kind of single sign on so we don't need to remember more passwords. I hate passwords. We wanted something to be connected to our Slack so we can see the info when we need the info. For example, when someone is not working, we want to see like that person is on vacation and things like this. And finally, we wanted to connect our calendar so we can subscribe to events and see who will not work next month and things like this.
As I said, we were solving our own problems and we don't know if anyone else will use our system but a few months after we released the beta version, we saw that there are many startups that want to use our system. And then we saw some small companies signing up and then some schools and universities and then nonprofits and then teams from many enterprises. And then we saw some government organizations and we saw some other organizations such as churches and many other organizations that they never thought will use the system like this. So there was a real problem and we decided to continue with that idea. And today we have many customers from many large and famous companies and also many cool startups and smaller companies and organizations.
2. The Product and Serverless Architecture
The number of unique users in our system on December 1st was this. Here's the product: Web Dashboard for quotas and locations, Slack integration for leave requests, and Microsoft Teams integration with embedded dashboard. Our first architecture was a simple serverless bot, version 0.1. Let me introduce myself: I'm Slobodan Stojanovic, CTO of CloudHorizon and Vacation Tracker. We chose serverless because of auto-scaling, auto-failover, and cost-effectiveness. It was fast to build a prototype using serverless.
The number of unique users in our system on December 1st was this. Not all of these users are using Vacation Tracker but all of these users went through Vacation Tracker at some point. It's a nice number, but it was a real number that I took from our database. I decided to leave this number from December 1st because I'm pretty sure I will not be able to get this nice number again.
So here's the product. We have Web Dashboard where you can do many things like see the quotas and set up locations and many other things. For Slack users, we also have a nice integration where you can just click on one button or do a slash vacation command and request or approve the leave. For Microsoft Teams, we have the same thing plus some nice cool things such as embedded dashboard inside Microsoft Teams so you don't need to log into a separate system and things like this. But let's talk about interesting things and that's architecture not the system itself.
So our first architecture was a simple serverless bot. This was version 0.1 because this was better and we didn't know if anyone would use our system. So the bot looked something like this. It was really funny because there was no calendar or anything like that inside Slack so we built everything with buttons, but it seems that that was better than Excel spreadsheets and many other things that people are using to track their leaves. And whenever you click on this button or anything like that, it triggers some serverless bot in the background.
So why serverless? That's the obvious first question. Well, this talk is about scaling a serverless startup from 1 to 10001, not 1000 yet. 101 Lambda functions in production, so that's why. I'm just kidding, of course, but now that they stopped the flow, let me introduce myself. I'm Slobodan Stojanovic. I'm CTO of CloudHorizon and also CTO of this product, Vacation Tracker. I'm also co-author of Serverless Applications with Node.js book, which I wrote with my friend Aleksandar Simovic. And I'm also AWS serverless hero. I'm writing a lot about serverless and you can see more articles on my website. There are links to many other websites where I write about serverless. But let's go back to the most important question, why serverless? As you probably know, serverless is an acronym for something like slow, expensive, vendor locking, and I'm obviously kidding, that's not true. I actually really love serverless and we decided to build everything with serverless because at the moment we started this, we were a small team, we are still a small team, and our team was not really experienced with DevOps. So we decided to use something that has auto-scaling and auto-failover links like this. We tried to go as cheap as we can because we bootstrapped our startup so serverless fits that nicely because it's cheap. And it was really fast to build a prototype using serverless. It took us a few days to build that chatbot with that fake calendar and everything.
3. Building a More Complicated Serverless Application
And it was production ready, basically. Our first version wasn't 100% serverless. We decided to add new features. We faced the first problem with new endpoints. Our MongoDB database wasn't serverless. We tried to build a more complicated serverless application by introducing infrastructure as code.
And it was production ready, basically. And of course, serverless also gives us a good starting point for security. Of course, you need to still think about your data and everything, but Lambda functions are secure by themself and it's really easy to secure something that is active for, let's say 100 milliseconds or something like that.
Of course, our first version wasn't 100% serverless. We started as a serverless chatbot. So Slack sends some request to some kind of API gateway. Actually, it's actually called Amazon API gateway, which triggers some Lambda function with some business logic. And that was our prototype. But then we decided to build some backend for that because we want people to be able to store the data and this application to do something meaningful. And then we build some kind of Angular application and the developer that we assigned to this project didn't know serverless. So he decided to use Node.js and Express.js server and MongoDB. So half of our product was serverless and the rest of it was basically a standard traditional application.
There were some clear benefits of this. It was quick and independent, deployments were quick and independent and it was easy to understand and maintain this kind of application because as you saw there were just a few components in the system, it was easy to onboard new people because it's easy to explain how the application works and everything was really cheap. The cost of actually first year for us was a zero dollars per month for AWS. We had some credits for MongoDB and our server but the serverless part was actually zero dollars per month for a long time because they charge you by the number of requests and the first year we didn't have enough requests to start paying anything to AWS.
Then we decided to add some new features because people asked for them and our developer learned serverless and he tried to use serverless for new features so all new features went through that API gateway and then for example for billing we added a new Lambda function connected to Stripe and we added some new end points and things like this. But for new end points we faced the first problem. Our Angular application needs to know how to, where is that end point? Is it in Express.js application or inside some Lambda function? So we tried to do something like this. Instead of going directly to express.js server we used API gateway as a gateway for everything. And, yeah, that was a bit messy and downsides for every part of the system required independent deployment. It was hard to manage everything because we had more and more Lambda functions, it was hard to scale these because of this kind of system and we had a clear bottleneck. Our MongoDB database wasn't serverless, so everything else was scaling except that part. At that moment we had around a hundred paying teams so we decided to improve our architecture because it seems that people want this kind of product. So we tried to build a more complicated serverless application. The first step was introducing infrastructure as a code. We use CloudFormation and we tried to have different services inside that CloudFormation. So for example, when you do something in one service that service can send something to the, let's say API of the other service, but that API doesn't need to be API gateway or a restful API. Sometimes that some kind of interface for notifications that can be sent in the background. For example, this part of the system is still the same.
4. Handling Slack Messages and Architecture Migration
We receive Slack messages through our API gateway, triggering Lambda functions based on user actions. We use Amazon EventBridge for notifications and have business logic handling the data. With 150 Lambda functions, we migrated from old Express services to new serverless services. We adopted hexagonal architecture, made changes in CloudFormation, switched to serverless services, added TypeScript, and replaced MongoDB with DynamoDB.
Whenever we receive a Slack message or something like that, that goes to our API gateway. It triggers some Lambda functions depending on the action that user is using, slash command or click on a button and things like these. Then we use something called Amazon EventBridge to send notifications in the background. And we respond back to Slack and tell Slack that the message is received. And then we have business logic somewhere in the background doing something with that data.
At that moment, we had 150 Lambda functions, which was a lot. And we started doing our first migrations. We had old services on Express and then we built a new serverless services and we somehow migrated the users from using one to other services. And one of the key parts of this was finding a good architecture. We picked hexagonal architecture, but we'll talk about that a bit later. Things we changed everything was inside CloudFormation. We replaced NodeJS server with serverless services still with NodeJS. We started adding TypeScript and we replaced MongoDB with DynamoDB. Not for everything, but for the most of the things.
5. Evolution to Event-Driven Architecture
Benefits: easier deployment, almost 100% uptime, and cost-effectiveness. Downsides: storing state, wasting time on non-business logic, onboarding difficulties, and developer dislike of YAML and configuration. Evolved to event-driven architecture using CQRS to store events. Used AWS AppSync with Managed GraphQL for mutations, event storage, background logic, real-time subscriptions, and fast queries.
Benefits: it was easier to deploy our application. It was still out of scalable. So far we have almost 100% uptime out of the box. We didn't do anything to get that. We were down for I think 30 minutes in total from 2018 and it was still really cheap.
Downsides: we were storing state, not events. We were wasting a lot of time on, not focusing on our business logic, but on some other parts of the system. It was hard to onboard new developers because of many new services. And I realized that the developers don't like YAML and configuration. That moment we had like 600 paying teams.
So we decided to evolve our architecture one more time. And we decided to use event-driven architecture. So we were back to a drawing table and we tried to find another good architecture that will work with hexagonal architecture and help us to solve our problem. And we decided to use Command Query Responsibility Segregation or CQRS.
Why CQRS? As I said, we were storing state, but we wanted to store events. Why events? Because Vacation Tracker is a lot of things happening every time. For example, someone can add you to allocation, assign some leave policy, add some leave days for you. You can request a leave. Someone can change your working weekend. Many things are happening every moment. And the quiz question is always how to calculate remaining PDO days for the current year or some other days for the current year with the data that we have. And of course, storing events helps us a lot, but we do ends.
We decided to remove part of our code. So we decided to use AWS AppSync with Managed GraphQL. AppSync is basically a Managed GraphQL service. So now whenever a dashboard or some other application is writing something, changing something in our application, it sends a mutation. We store that mutation to some event storage table that triggers some background logic. We do some business logic. We use real-time subscriptions to let the frontend know that the business logic is done. And we also store some kind of state, current state in some read-only tables because we want users to be able to run queries really fast using GraphQL from the frontend.
6. Challenges of Onboarding and Testing
At the moment, we have 112 Lambda functions. We have a fully managed GraphQL server that works really good. The system is faster, with less code and better control. We now have a mono repo, which allows code sharing between front end and backend. Onboarding new developers is a challenge, but serverless allows us to assign a new environment and AWS account to each developer. We start with a small part of the system and gradually introduce new features. Testing is important, and we use the hexagonal architecture to isolate business logic from adapters. We test locally and use lambda event adapters in production.
At the moment, we have 112 Lambda functions. As you can see, we removed some of the Lambda functions and there are some clear benefits. We have a fully managed GraphQL server that works really good. The system is actually faster. We have less code, we have better control, and we of course have all benefits from the old architecture.
And now we have a mono repo, which seems like a big benefit because we share some code between front end and the backend. There are still many services to learn and there are velocity templates. Now, instead of some functions and velocity templates, I will not even try to explain. These are like alien language where you transform the request to something that AppSync understands. And that helps us to improve the speed of our system at the moment. But of course there are many challenges in this process.
First obvious challenge is how to onboard new developers. Our current thing has just four developers, all of full stack and one is actually new, one developer just started. We have one marketing person, one customer support person, one product manager and we have some freelance support for marketing and design mostly. The good thing with serverless is that we can assign the new environment and new AWS account to each developer. So the first day you join Vacation Tracker, you'll get the copy of basically everything inside the AWS account that belongs only to you. And as the system is really complex, this is the same diagram that we saw previously but with slightly more details. We don't start with everything immediately, we start with just one small thing, for example, new developer start working with our online dashboard, which is basically React app and then slowly with new features that they're adding or changing inside dashboard, they start learning the backend and how everything works in the backend and then they continue using and learning different parts of our application. And after like three months, they know basically most parts of the system. Not details, of course, but they know how everything works.
7. Testing, Debugging, and Monitoring
Then we have a real event bridge repository and some other repositories that have their own unit and integration test. We use MongoDB and DynamoDB repositories with unit and integration tests. Debugging and monitoring are challenges. The total cost from 2018 was $7,000, but we had some AWS credits. The most expensive bug was with DynamoDB, and fixing it reduced costs by hundreds of dollars per month. We are happy with Serverless and have a team of superhero developers. Evolve your architecture with your product and consider onboarding new team members.
Then we have a real event bridge repository and some other repositories that have their own unit and integration test and we are testing them against the real AWS resources. And of course we have some helper functions and services that has, for example, event parser has its own unit test, not integration because it's not integrated with anything. And for migrations we do something like this. We use that MongoDB repository or part of it with its unit and integration tests. We try to build another repository with the same interface for example, for DynamoDB. And then when we have the same interface we can just switch the dependency in production. Of course, besides that you need to transfer the data but that's another topic.
So testing things in integration as looks something like this. For example, if I want to test DynamoDB repository and I want to do integration tests I can create a database before all tests and destroy it after all tests by just doing some simple commands and waiting for like 10 to 20 seconds more 20 seconds to create a new table. And then at the end, we just want to destroy that table so we don't leave any, basically any trash in our AWS accounts. This account is just for testing, but anyways we just want to remove everything in the end.
Another big challenge we had was debugging and monitoring. And now we can run different types of queries since we all the errors in our system and things like this, we have different dashboards that help us to monitor our application and get alerts and things like these. Of course, the cost is another big challenge with serverless, but the total cost from 2018 was $7,000. And of course we had some AWS credits. So we paid maybe $2,000 so far. I actually calculated the most expensive bug. As you can see, it's half of the bill that we had from the beginning. And the bug was with DynamoDB. As you can see, when we fixed the bug the number of requests from like thousands request every minute dropped to basically zero. And that decreased the costs a lot by like hundreds of dollars per month. And of course we had many other smaller challenges but we are very happy with Serverless so far. They're not related to Serverless that much. They're overall challenges that you have with building a startup anyway. And now with Serverless we have a team of really superhero developers that can develop many things really fast.
So that's it, let's go through a quick summary because this was a bit longer than I expected. So you should evolve your architecture with your product. Something that worked in the beginning can't work now when our product is much bigger and different you need to pick a good architecture because it helps you to keep your migration and onboarding costs low or reasonable. You need to think about onboarding new team members because that's really important part of every system.
8. Onboarding, Architecture, Testing, and Monitoring
Onboarding new team members is important. Hexagonal architecture and CQRS are good fits for serverless. Testing and monitoring are crucial. Subscribe to my website for more on serverless.
You need to think about onboarding new team members because that's really important part of every system. Hexagonal architecture is really nice fit for serverless apps because it helps you to test everything. CQRS is also a nice fit for serverless but also it's an excellent fit for our product. Find something that works for you.
You should test your integrations and application in general, of course and testing is not always enough. You need to monitor your application and track the errors so you can react fast if something breaks. And yeah, that's it. Thank you very much. I'm writing a lot about serverless. I'm doing some free workshops and things like these. If you want to follow more about these architectures and what we are doing with serverless, you can subscribe on my website. Thank you very much.
Serverless Adoption and Fender Locking
61% is not using serverless yet. That's 40% of people using serverless at some point, which is really good. Over the past few years, the percentage has risen from zero to 40%. If we do this question again next year, I'll be happy to see 50 to 60% of people trying serverless. It's not the solution for everything, but it can be used for many different use cases. Let's address a question from the audience about fender locking. I'm not afraid of fender locking because it's just a switching cost when you decide to use something new.
Hey, good to see you. Good to see you. So what do you think? 61% is not using serverless yet. What's going wrong? Oh, that's fine. That's fine. That's something that I expected. And it's okay. I think that's 40% of people using serverless at some point, at least for a small portion of their app is really good. So it's still quite new and there are a lot of new things that you need to learn to be able to like use serverless in your application. So, yeah, I think that's a good percentage. And over the past few years, I saw that percentage rising from zero point something to 40%. So, yeah, that's great. So if we do this question again next year, what's the percentage you'll be expecting then? I don't know, a bit more than 40%. So I'll be happy if I see, let's say 50 to 60% of people trying at least the people that at least tried serverless. So, yeah. At least dipping their toes in the water, right? Yeah. All right, so. It's not the solution for everything, but so far I think you can use serverless for really many different use cases, and every day they're covering more and more things. So, yeah. Yeah, super nice.
So let's go into our questions from the audience. The first question I have is from House of Hala Handebo. He's starting his question with awesome talk, and then like a confetti emoji. So that's good. Some nice feedback from House of Hala Handebo. And his question is, what are your general impressions about fender locking, lock in, when we use specific serverless providers? So this is a topic that's come up a few times yesterday already, fender locking, how do you feel about that? So yeah, this is one of the most important questions related to serverless, and I'm really not afraid of fender locking, because for me, that's auto fender locking, that's basically a switching cost. Whenever you decide to use something, let's say Node.js or PHP or Ruby on Rails. I saw that someone mentioned that we are a small startup not using Ruby on Rails, and yeah. So basically whenever you decide to use something, you have some switching costs. If you need to migrate to something else in the future, you'll need to dedicate some time and you'll need to pay some amount of money to migrate to the other thing.
Benefits, Migrations, and Onboarding
Serverless gives us big benefits, especially for startups. Migrating services is a business risk, but we've successfully migrated from MongoDB to DynamoDB. Every technology decision has switching costs. Onboarding people with zero experience is doable. We start with simple tasks and gradually introduce more services. We're even offering a workshop to learn serverless applications with TypeScript.
And it's the same with serverless. So far, serverless is giving us real big benefits and especially for startups, the amount of time that you need to build something is more important than some other things. So I'm really okay if I need to migrate some services off serverless to something else sometimes in the future, but I don't see why would I need to do that. I'll need to pay some money and I'll need to spend some time doing that, but it's okay, it's a business risk that I'm okay to take. We already migrated a lot of different things, for example, we used MongoDB and migrated to DynamoDB, so that was a big change. I don't think migrating from other AWS resources to something that is not on AWS is a lot more complex than that.
So, yeah. Funny insight that you mentioned even programming language is kind of a lock in. Never really thought of that. Also frameworks and many other things it really depends, for example, whenever you commit to anything, basically, whenever you decide to do something with your product, you have some switching costs related to that, even onboarding people is some kind of lock in when you use different services, you need to change your onboarding procedures and many other things. And yeah, it's difficult. Yeah, never really thought of it like that, but every decision you make technology wise is kind of a lock in. No cool.
Next question is from William RJ Ribeiro. What I can imagine it is hard to find people that already have experience with serverless well, locking, how do you onboard new people that have zero experience? Yeah, so we're doing that right now. We started like a few weeks ago and our new team member, Ivan, shipped his first feature to production in his second week in Vacation Tracker. So I guess that's not that bad. So for example, he has experience with React. So we started with front end tasks and React, and then he slowly started taking some tasks that have some small backend features and things like these. And we're trying to onboard people to a few more simple services that they can understand easily. For example, you have API, you have some Lambda function, which is basically some kind of a handler, and then on the other side you have some database call or something like that. And then when they learn to do that, then slowly we add more services and try to show them the portion of the big picture of our architecture that they're working on. And over time we are able to onboard people with zero experience with serverless to vacation tracker without like big, I don't know, it's like, it was harder for me to onboard people to some non-serverless projects than now to serverless projects. It's like it takes some time and learning curve, but it's doable, it's not that much different from other things. So, basically you're saying that you could even teach me. Sure. Yeah, we can do that after this talk. We need an after-party. Actually, I have a better idea. If anyone wants to learn serverless, we are doing a workshop next week as a part of this conference, so you can learn how to write serverless applications with TypeScript.
Testing Lambda Functions in Production
You can test lambda functions in a production environment by using the same function in a staging environment and promoting it to production without redeployment. Serverless allows for inexpensive new environments that can be duplicated for each developer. Setting up a staging environment and duplicating and anonymizing data can be challenging but not related to serverless. Overall, serverless makes it easier and cheaper to have environments that work and scale the same way as production.
So yeah, that's one of the options. Super nice also. So William, that's included in your ticket price, like we mentioned in the opening. So be sure to check out Slobodan's workshop.
We have time for one question and that question is from Perry. Great panel discussion yesterday that you were in and today's talk also. Is it possible to test lambda functions in a production environment? Oh yeah, it is. Also, there's one good thing with lambda functions is that you can use exactly the same lambda function in, for example, staging environment, test that function and just label that function with production label or something like that. That same code without redeployment will be promoted to production. I'm not using that at the moment, but you can do even that, but yeah, sure.
The good thing with serverless is that new environments are not that expensive and most of the time they're actually $0 so we have one environment per developer that are exactly the same as production. They don't have the same data, but we can fill that database with the same data if we want. But yeah, it's like, of course you can use it in production. It's like anything else. You just don't worry about the server itself. Someone else will do that part of the work, but everything else is completely doable the same way that you did for non-serverless applications. So you can basically duplicate your production environment to make it a staging environment per developer with low cost. With almost zero costs, yeah. That's nice. That's nice. Is usually setting up a staging environment where we had a great talk yesterday also about this setting up a staging environment and duplicating data and usually anonymizing data also of course. Yeah, that's still difficult, but that that's not related to serverless. That's related to your data. But having everything that works and scales the same way is quite... I will not say easy, but it's easier than it was. And definitely cheaper. Cool.
So that's all the time we have for this Q&A session. So if you want to discuss serverless more with Slobodan, Slobodan is going to be on spatial chat in his speaker room. So be sure to go there. We had some more questions in the Q&A channel, but we don't have the time to go into that now. I'll do my best to answer that in the special chat and of course in the Discord after that session. Thanks. All right. It's been a blast having you again Slobodan. Hope to see you again soon. Bye. Bye.