Build your JS Pipeline in Incremental Fashion with GitLab


Introducing CI/CD to your project might be a challenging process. In GitLab we value iteration as one of our key values, and in spirit of iteration we will be happy to share how GitLab could help you to gradually work on bringing your project to CI/CD heaven.


Hello everyone. My name is Ilya Klimov. I'm from GitLab from a managed import team. I'm a senior front-end engineer and I love fast things. So I drive a lot of Celis car. I try to use my one gigabit speed internet connection where possible. And also I love GitLab for fast build times. And while the first two obviously out of context of our conference, I will be happy to share my knowledge with the third one. So we in GitLab are trying to support you for entire devops cycle, starting from create where you create your source code, managing issues, planning, ethics, and so on and so on. And ending with protecting you from different malicious activities and monitoring the health of all your production staging and so on environments. But obviously speaking about entire devops cycle, it will take forever to complete. So let's focus just on these three things. It's verify, package, and release, which is basically what is continuous integration about and continuous delivery will be just right after that, delivering the things right after the release somewhere to your actual running environment. So what's the problem here? It usually starts pretty simple. On the verify stage, somewhere, usually in node.js environment, since we are in javascript conference and we are speaking about node.js environment, even if you're a front-end engineer, you usually run some of your favorite tools to check code quality, to run your tests. For example, in GitLab, we utilize YesLint and Jest for linting. We also maintain our own linting rules and for running tests. Long time ago, we've used Karma to do the things, but frankly speaking, I'm very happy that times are gone. And probably we will introduce some more tooling later. At this step, the main idea is to make sure that everything goes well and that your code behaves as expected. After that, obviously we need to package your code to deliver to production. And I'm pretty happy that, well, I'm pretty long in software development, more than 10 years. And I remember when you need to invent your own delivery tools for a long, long time. So right now, Docker is a standard way of doing things. And I'm pretty happy to have that. And things is cool and we will speak today a lot about making things standard, either entirely for javascript community or just for your company, because every company obviously has its own approach. So the last one is release. And here, the things are not so stable as on package stage. For this talk, I will focus a bit on the kubernetes, which I think is pretty standard for running Docker containers. I realize probably your pipeline may be or your future pipeline may not utilize it. You may probably pick another way of running code, running Docker containers on the bare metal, whatever. But just for now, let's start with this one. And the problem here is that even tools are hard. Running a good test is a very complex thing, which probably worth another talk. Making sure that your code runs properly when your development environment and continuous integration environment have different version of node.js might be tricky and may lead to unpredictable errors. One day I spent half of a day debugging an unknown crash, literally a sick fault, which was one minor in the third part of the version difference of node.js. I never wish to do that again. But as you see, as we're adding more and more tools in our pipeline, even just for these three steps, the complexity grows very quickly. But well, it still thinks that, hey, let's start with a very simple step. Let's make some verification in our continuous integration pipeline. And here comes troubles. One day, your boss comes and says, hey, your pipeline works pretty slow. And you're welcome to the world of one of the two biggest problems of programming. Yeah, now caching validation, but now in a devops world. So you start learning all these fancy things of how to cache, for example, Node modelers between your build steps. So you could avoid running npm install or RAR install on each step. How to deliver artifacts, things which should be persisted across pipelines. For example, test results, coverage results. If you do visual integration testing, it might be a screenshot of failed things. And registry. And registry could be different. Docker registry, maybe for your Docker containers. If you're releasing something to npm, it could be either private npm register, public npm registry. Things differ. But one day, you start to increase the very first thing. So you have a tiny, fancy step, and the boss comes to you and says, hey, you know, automated testing is not enough. I want to be able to have a quality assurance person to go through the changes of the sub branch and to check that everything goes smooth. In GitLab, we call it review applications. So some kind of manual approval. And hello, all this complexity, Docker, kubernetes, Helm, whatever, is arriving already at the first stage. And let's see how GitLab could probably help you with it. So let's start with the first step, which is auto devops. Frankly speaking, the powerful, the power of GitLab CI was the main reason why I joined GitLab long, long ago. I was a great fan of it maybe three years before I joined GitLab. And I'm super happy about auto devops feature and even happy that as a front-end engineer, I contributed in Git. So let's assume you just go and import some real world repository. I've took the vue real world example application, just because my main stack is vue.js and click the default to auto devops pipeline, the settings of new project. And the magic happens. You will have this pipeline, which will automatically have steps, build, code quality, yes, and so on and so on and so on. What's happening? Probably if you are long enough into javascript development, probably you've deployed your application to Heroku. And you know all that magic. You do just Git push and everything works. Well, we are standing a bit on the shoulder of that chance. We are utilizing Heroku buildpack to make and to understand how we could build your application and to provide you tons of different things. We can automatically test that you are not leaking secrets. Maybe you forgot to remove your Amazon web services key from the Git repository. We will let you know. Maybe you are somehow have a bad code, smells and quality, or maybe your front end code is not that safe. We have tons of things to produce, which will automatically appear in your pipeline, just to let you know that what you will have will greatly depend on the tier you have in your GitLab as a standalone or SaaS solution, no matter. The more you pay, obviously, the more things you get. And for example, as I told, I contributed to AutoCode quality tool, which will automatically detect the YesLint, which will run several YesLint specific checks. YesLint security and YesLint security related to react and that contributed to proper selection of YesLint version. If you already have YesLint configuration in your repository. And we will also run obviously your checks too. So but obviously default is not enough for everyone. Zero configuration is cool. Everyone does it. Every build tool, every tool which wants to succeed does it. And AutoDevOps will allow you to quickly have a quick taste of what continuous integration means for you. So the next step is obviously customization. I don't want to go too deep into that because it will be reading like 10 pages of documentation. But it is extremely simple. As long as you know how to configure your application with environment variables, you know how to set up devops. You could configure every build step and you could disable certain steps in AutoDevOps pipeline. You could either enable or disable review apps if you want to have it built for each repository. If you want to have review apps, you will need a kubernetes cluster integration, by the way. And this is pretty simple. But one day that will not be enough and you want to make your hands dirty and you want to contribute something and which we probably didn't take into account with our AutoDevOps pipeline. Remember, the GitLab's motto. Anyone could contribute and we will be happy if you open the merge request if you find any issues or potential improvements. And I don't want to stop here on the learning. You need to learn the new YAML configuration and you can read the docs. They are pretty awesome, trust me. But sometimes every one of us needs a source for inspiration. I suggest you do just three things. First of all, take a look with AutoDevOps configuration you've started. It will let you know how we invoke certain things. It's all open source in your project and we'll let you have a head start. If you want just to choose something minor, just copy, paste, tune, and you're awesome. And probably with two projects which are entire in GitLab, which is one is GitLab UI. GitLab UI is our UI library and we have a very tiny pipeline there, just running tests, releasing to npm and having visual integration, comparing the screenshots. And I always use this YAML file as a source for my inspiration. Because for example, have you ever thought that every time your contributor, internal or external, gives you an update in your yarn log file or npm log file, he could probably update this file to point resolution to some malicious code, malicious version, or even entirely third party code. Do you really want to check this manually with your eyes? Obviously no. And for example, just a few days ago when I looked again in this file preparing for the talk, I discovered Untamper My Log File project, which is dealing with this, checking with npm registries and making sure that log file is telling you the truth and was not altered in the specific and probably malicious way. And if you want to go to something really, really insane, just check our main GitLab repository, We are keeping separate there. And well, it is really close to insanity. If you don't know, approximately 60% of the time GitLab CI SaaS runners are building GitLab. So we really want to talk about speed. Well, if you just think how much money we could save if we could make running our pipelines faster and how happier your developers will be as they will have feedback faster. So let's speak about DAG. This is our GitLab pipeline. It is not even full. The test stage is way more bigger. And usually it runs one by one. This is a philosophy everyone has. We are splitting our jobs by stages. So first stage running first, waiting for all jobs to complete, after that second, and so on and so on. Obviously, this is our flow. Prepare, build images, fix source test. Could we theoretically make it run faster? Yes, of course. Let's zoom in. And if you take a look here, obviously our just job, which is running tests, needs a front-end fixtures job, which is cut here. But trust me, this is the second word, fixtures, here. And this job, obviously, generating some mock data, which is consumed by our tests. So these two jobs are obviously in place and are kind of hard dependency. They need each other. But let's take a look at these jobs. YesLint and graphql Verify. For YesLint, we need just the code. There is no reason to wait for something. So probably maybe we could move it to the earlier stage. I don't like it. I don't like it because it's crushing the entire idea. Obviously, the YesLint is in the test stage and is required to be there. So what to do? And here comes to the rescue our feature, still quite new. Not so shiny new, but new, called Directed Icyclic Graph, which allows you, after that, you get your hands dirty in the previous stage and understand what is the job, what are the job names. Just specify, hey, this job, YesLint as a FOSS, does not need anything. So it could be run immediately, as soon as possible. And this job needs our front-end fixture to be completed. So please wait for this and start our job as soon as possible. So part of our pipeline now looks in this way. And well, it's pretty fun to look, and it's pretty fun to understand how fast we could go with this one. As you can see, there are a lot of different dependencies from one to each other. For example, we could not calculate coverage before all our Jest tests completed, but we want to run it as early as possible. For example, we want to run tests only after we understand which test we want to run. We are paying a lot of not trying to test for the things which definitely did not change. And a lot of other approaches. If you still think, hey, you don't need this, it looks like another kill for my small project, ah, we still utilize it in GitLab UI. And look, it's pretty simple and still shiny. For example, the review job, which allows people to check the things on the separate URL and deploy will be run as soon as our storybook will be built. Yeah, probably we can do a better job on putting words on one line, but hey, we are constantly improving. Same, as soon as the Docker image is built, we could run update screenshots, which will automatically update screenshots. This is a manual job. We can run visual check to check that our screenshots look equal, and container scanner to make sure that nothing malicious is running. This is still awesome, and this feels like a significant improvement, just because it is so easy, at least for me, to understand the approach or the needs. Previously, a long, long time ago when I was working not on GitLab, I was splitting my things across many, many different stages just to make sure I could put something to the things which make sense to me. Look at your code, maybe also have all of these, like have the first test one, test two, test three, test four stage, get rid of it. With a DAG, you could make your thing smooth and quite discoverable because it's a graph. Every software engineer loves a graph, I believe. If there is one thing you probably should give a try in GitLab, this is obviously the DAG, like 10 of 10 points, recommend it. So what to do next? Probably you've optimized it and you're happy with the results, and there are one more thing to do it. One more thing to do is depending on what you want, actually. Probably you could go with full custom pipeline, which is, for example, what we're currently doing GitLab, because sometimes we want unusual tasks and unusual requirements, and you could want to be a truly devops engineer. But there is another option which I suggest you to consider. If you're running multiple projects in your company, for example, my previous company was outsourcing one, so we've created many quite similar projects, just discover in our docs way how you could contribute to specific auto devops template for your company, assuming you're running a standalone GitLab version. And if you do this, probably you could save enough time for other people in your company. So just remember that pipeline improvement is a constant process and not a single time task. Every time take a look at your pipeline, discover the things which go slow. And if you have any questions, any suggestions, always feel free to reach me on Twitter, Xanth underscore UA, and never stop improving your flows. Thank you. Drop the mic, Ilya. What a great talk. Thank you so much for joining us. It's really excellent. Let's welcome Ilya to answer the question that he asked all of us before the talk. So what is the average duration of your front end build pipeline? And I look at these results and I actually I voted 10 minutes to 30 minutes because of those two minutes, right? Many times it takes 12 or 13 minutes. So I fell into the second bucket and I don't look at that cutting edge. So do these numbers surprise you, Ilya? Under 10 minutes, most people. Yes, because usually I expect this like, you know, node.js is not as far in starting up as usual. And while having the pipeline under 10 minutes is either you're kind of quite small project, which is great. I personally am very fond of startups and or your pipeline is kind of super optimized because, hey, everything in our javascript world, unfortunately, is not as fast as possible. So yes, I'm kind of a bit surprised, but I like since we are seeing live results, I'm seeing like people are slowly increasing the second one, 10 minutes to 30 minutes. So well, and I'm really happy that not so many people join it. This like over 30 minutes club. Hello, GitLab. We have like from 40 minutes to six hours pipelines, depending on your luck and the scale. So, yes, you know, you know that story like, OK, I probably grab a coffee while pipeline is running. So for GitLab, it would probably be a lot of coffee. I see that now the under 10 minutes and the 10 minutes to 30 minutes are head to head. They're both at 40 percent. So there are folks that are still not even doing CI. So I guess these numbers are good. Good enough. At least we know the large majority of folks are actually doing CI. What's remind me what the timeout is on on running a build. Like I remember that that used to always be an issue where like there were really like long builds. I never I never remember how long it times out after. So OK, questions from our watchers. Alexa asks, what happens in the background when a pipeline is triggered, especially with GitLab pipelines? Yeah, for sure. It is quite simple and quite complicated simultaneously. Internally in GitLab, you have GitLab CI YAML file, which specifies or not specifies a certain requirements for build machine because for certain steps, you might, for example, not common for frontend. But hey, our webpack sometimes likes to consume a lot of memory. So you could have different machines with runners installed who will consume, who will provide different capabilities, maybe different CPUs, different amount of memory available or whatever. So GitLab marks like, hey, here's the pipeline, depending on certain conditions, who triggers the job, why the job was triggered and so on and so on. It calculates which jobs will be run in this pipeline. After that, all that runners are pinging GitLab like, hey, I'm free, like in McDonald's and saying like, hey, I'm free. Could you give me some job? The runner is pulling up the job description and does exactly what is described there. So each runner, it might be the same one or not the same one, depending on how many you have, is pulling your repository, cloning it, and after that going for exactly job description. Pipeline is a big one. It consists out of the jobs. Also the one thing which probably people sometimes do not realize that even if you are running a free tier on and you are ran out of our limit, I believe it's 2000 minutes, CI minutes per month for free tier, you could either buy minutes or just set up a known runner on any virtual or hardware PC you have, register it in and you will still be able to build your project on your hardware or on your whatever you love, Amazon, Google cloud, azure, with still remaining in free tier. So you can always take control of your hardware. Why I'm saying it is hard, because I believe this sounds quite easy, because in the ground there are a lot of hidden logic like retries, understanding that jobs are stuck because like everyone loves infinite cycles. Retrying job if runner fails, having shared caches so when your caches is on one runner it could be safely pulled from another. It is a big, big part written in Go language and in Ruby and there are a lot of dark code, at least for me as front-ender perspective, tying this together. Got it. That's a great answer. Okay, so question from Vrida, is it worth running front-end in kubernetes? Since there's no CDN, you have to scale the infrastructure all the time as the workloads grow and you can also hit node IO limitations and overhead, etc., etc. So taking into account cost plus time and the simple set of S3 on CloudFront, what would you do? What would you do? Would you set up a front-end on kubernetes? Oh, it's a good question and it really depends on what you're optimizing, like either cost, I mean money or the speed or it is very hard to answer without precise problems. What I mentioned in my talk and I would like to insist again that this setup like with kubernetes and so on is just because it scales well in terms of infrastructure from tiny projects to huge setups like GitLab. But this doesn't mean like, well, if you decided to dump kubernetes because your project is quite small, this does not mean that GitLab could not help you. You could do everything in your pipeline and for my sum of pet projects, I'm just like changing the code and not even using Docker for running, just copying the result and build files to the server and running it. Quite messy and I will never show it in my resume like it's good for infrastructure setup, but come on, we are speaking about building infrastructure step by step. So feel free to optimize what you need on current stage. If you have a spare money, you could probably already go to the things I call blood enterprise, but if not, feel free to go as dirty as you want. We will not prevent you from doing that. Yeah, cost is not an issue. I see that you've started to do a stand up here as well. So Luis asks, in which versions of GitLab enterprise is DAG visualizer enabled? It is everywhere and I'm just not sure. I need to quickly check in the GitLab docs. It might be behind the feature flag and no, I believe it is already available. It was introduced in GitLab 12.2 and the feature flag was removed in GitLab 12.10 and this is like my math and time is hard. It is quite long ago, like approximately 10 months ago. So it should be available for any pretty fresh GitLab. Okay, very cool. So just one more question, now is the time folks, we still have Ilya here with us, so drop your questions in devops Talk Q&A and don't forget that Ilya will be around in his speaker room and in the spatial chat, so you can definitely chat with him there. But one more question, so what would you say, Ilya, is the hardest part with GitLab pipelines, like if you had to think about it? The hardest part with GitLab pipelines are flaky tests, if we speak about frontend. Because due to asynchronous nature of frontend, it is so easy to write a test which does not wait for something, but just relies that something on your CI environment is slow enough or even fast enough. And that means that you are changing some things in CI configuration and suddenly you see a lot of failing tests, which you, hey, probably I messed up my configuration, but it is not. Unfortunately, I do not have like a perfect solution about making sure that your tests are cool and it is not actually a pipeline question. But for me, this is like the darkest part of any frontend related pipeline things in GitLab. Because, well, when it fails every time, you know what to do. When it fails sometimes, everyone is annoyed and, hey, no one is happy when you have like, oh, it failed. Let's try to retry it again and pray for the best. I hear that. Folks, now is the time. Any last questions for Ilya? We'll wait a couple seconds. It takes folks time to many folks saying thanks, awesome talk, cool talk, thank you, very useful. So you should be happy, Ilya. You had a really great talk. So congrats on a really excellent session. I think we're going to wrap it up. Thank you so much for being with us and sharing from your insights on GitLab. Ilya will be around in the speaker room and on the spatial chat and on Discord. So feel free to reach out and ask any more questions that you might have that didn't make it live into the session. Thank you so much. Thank you. Thank you.
32 min
01 Jul, 2021

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Workshops on related topic