Deploy with Speed and Confidence Using Contract Testing and Pact

Bookmark

It’s almost 2021 and we still rely on integrated environments and large end-to-end test suites to release complex, distributed applications called "software". In this talk, Matt breaks down the arguments for such nonsense and shows how a better, faster, safer alternative.



Transcription


Well, thanks everyone coming for my talk to deploy with speed and confidence using contract testing and PACT. My name is Matt Fellows. I'm a core maintainer of PACT. I'm also the co-founder of Pactflow, which is a continuous delivery microservices platform. And you know, if I wasn't working in IT, I'd probably be working in sports and fitness to get away from it all. If you want to contact me after this talk, you can follow me on my handles below. So the agenda for today, we're going to be talking about how to release software, particularly distributed systems. We're going to start with talking about the problem with end to end integration tests. We're then going to talk about how PACT works and the principles of contract testing. And then we're going to finish off with a bit of a demo. So the old way or the current way a lot of companies test their microservices is to do what we call end to end integrated tests. And what that requires is you to stand up your entire platform, something like this, and use a functional api testing tool like Postman or whatnot, and drive requests through the entire system. So for example, you'd push it through, maybe not a user interface with Postman, but you'd push it through the platform, and it's going to pass through all the layers of the system. So microservice A, microservice B, microservice C, and the request is going to make its way through all real systems, it's going to pass over real networks, it's going to talk to real databases, send emails, whatever it is your actual application is going to be doing. Now this is great if the system works, and the test pass, it does give you some level of confidence that your system's working as expected. But this kind of testing comes at a huge cost. The first issue is that it's slow. testing to pass through real layers, they need to do real things, and this can be slow, of course. But also, oftentimes, they can't be run in parallel. And the reason for that is the staple nature of these types of interactions. The second issue with these types of testing is that they're fragile, and they can be non deterministic. So this property of flakiness is very present in these types of testing. So even if they do pass, they may take multiple runs to get there. For example, you need every version of every service in the system to be lined up. And if any of those change, the test could break. If you've got the wrong version of data, the wrong version of the tests, or the wrong configuration for your environment, or just that somebody's tampered with the environment in advance, it's possible that your test will fail. They're very costly to maintain. And when you do find issues, or when you do have an issue, finding the actual problem and the source of the issue can actually be really costly to do. So for example, if you have a failure that's causing microservice B here, well, it may not be visible from the outside why that test actually failed. So you might need to go digging through your log platforms, like Splunk or Simulogic, you have to trace some correlation IDs through the system to actually find out what the problem is. And then you need to find that the code version for that particular service goes to the repository and go digging. It's basically like finding a production issue. So it can be quite costly just to find the bug itself. Oftentimes it fails just because of those flaky reasons I mentioned earlier. Similarly, it's difficult to achieve full coverage this way. So what I mean by that is, you've got multiple systems here, and you've got a lot of different potential scenarios that can go on. And so running your test this way, it's very possible that you're not going to get all the tests you want because, A, they take so long to run, B, they're costly to maintain. And also, you just literally cannot run that many tests in the amount of time because the combinations spiral out of control. Because you test everything together this way, well, then you really have to release everything together this way. Because you don't have confidence if I deploy just a single component that things will continue to work at the end of it. So you now need to deploy things together. And doing that means you've got teams coupled with each other at release time. And that means teams are waiting on other teams to get things done. And we know from Agile theory, that's not very good. And so these types of tests don't scale well. They tend to get worse and worse over time. So let's say you increase the number of teams and components over time in this linear fashion. What you see is this nonlinear relationship with the number of environments or the complexity of environments that you need to manage. You see that the build time goes up or the complexity of those builds and the failure of those builds starts to go up. You also see the risk associated with change moving up exponentially. And of course, we now have developers idle. We have a lot of queues. We have cards on the wall that are referencing other teams waiting for their work to be done before we can get ours done. And this all has a huge cost associated with it. Of course, good tests have the exact opposite properties of what we just talked about. They're fast, they're isolated, they're easy to reason about, and they're easy to maintain. Could mocks come to the rescue here? Well, as you've probably all written, it's very simple to write two separate unit tests on either side of the service boundary. We could write a unit test for the consumer and mocking out the provider, and we could write a unit test for the provider, simulating the consumer. And these are great. They can run really fast. They can be easy to fix, easy to find bugs with. Of course, they may not represent what actually happens in production. And so it's quite possible for you to put an assumption in there that's not valid. And because it's a unit test, we're not checking that assumption later on. And so we have this nice property of all these great properties, but the new problem is that they're not actually representing reality. It's hard to keep both sides in sync. And this is where we can talk about contracts. So you're probably familiar with specification first design, where an api producer specifies or creates a contract using Swagger or something else, and they publish that document to all of its consumers. And there's a number of great properties that come with this. But one of the downsides is when you move and change and modify that version of the contract or the specification, it's easy to accidentally break a consumer, because you don't know what parts of the api they're using. And it requires a lot of diligence to ensure that you don't push out backwards compatible changes. This is where we can talk about consumer driven contracts, which inverts that relationship. Consumers can specify what they need of the provider and write those in their own form of a contract and give it to the provider. Each consumer having its own potential subset of the api. And then the api just needs to implement the superset of all those contracts, and then it can get its job done. This has some really interesting side effects or consequences. The first one is that you'll know when you break a consumer, because the consumers are telling you what they use. You get a form of documentation, because the consumers are giving you exactly what they're using every time they push a build. And you can test things independently. So this is where the tool pact comes in. It combines the idea of fast isolated mocks and unit tests with contracts to ensure that they don't drift. It's an open source, what we call consumer driven contract testing tool. And it's designed to help you test your microservices and distributed systems independently. Its main use cases are in things like javascript web applications, native mobile apps, you're talking to RESTful services, JSON or XML, or talking of message queues. So think of Kafka or SNS and these kinds of things. Its goals are to reduce or remove entirely the need for those end to end integrated tests and reduce reliance on those complicated test environments. Pax benefits are basically because you've got focus for a single integration point, you're only looking at one thing at a time. You don't need to deploy. So because you don't need to deploy, you don't need a test environment to do this form of testing. You get fast and reliable feedback because of it. The bug is always going to be found on your machine. You don't need to go digging through logs. This means those tests run really fast and they scale linearly. And last, because you're testing things independently, you can now release them independently. Okay. Let's quickly talk about how pact works and then we'll show how it works in action. So we have a consumer website, sorry, we have a website that's talking to a product api. We call the website a consumer and we call the api the provider. And the messages are passed between that, the sum of those, we call that the contract. So it's a consumer driven contract testing framework. And so the first thing we're going to do is we're going to write a test from the consumer side first to define the expectations of the provider. So what pact will do is mock out the provider. We never make them talk to each other. And pact will simulate the provider api. And the consumer can say, given I make a request to get 1234 for the product endpoint, I expect to get back some response. We do this for all the things the consumer needs of the provider. At the end of that session, we're going to record those interactions into what we call a pact file or a contract. We're going to share that with a tool like the pact broker or pact flow, which will help us exchange the contract and version the contract across our ecosystem. And then finally, on the provider side, what we're going to do is we're going to pull down the contracts from pact flow. We're going to replay them back against the provider. And pact is now going to simulate the consumer. It's going to replay this request, check the responses. And if they match what the consumer does, we now have symmetry on both sides of this interaction. We have two fast mocks. And we've now got a contract that's ensuring that those two mocks are actually valid. Okay. So we're now going to get into our demo. We're going to use react product catalog website that talks to an ExpressJS backend as our example. We're going to test the product forward slash ID endpoint and show how consumers can help drive the api design. We're going to look at the provider workflow. We're also going to look at how we can gate releases with a tool called can I deploy. So let's look at the application. So here it is. It's a very uninteresting website. I'm sorry. It's not the greatest react website of all time. But you can see here the home page just lists the products. And we can drill down into an individual product to get that get the data for it. And they're going to hit two different endpoints. We're going to test the endpoint for this page. Okay. Looking at our code for this, we'll jump into the consumer test. So if we look over here at our product page, this is our react component. We can see here to populate this page, we need to get some data from an endpoint. Now instead of actually loading this in another way, we're going to talk to an api endpoint and we're going to use a class to do that. So this api class has got a method called get product. And that's actually what's going to get the product data for us. So from a PACT point of view, we can test this method. This is what we care about. This is the target of our PACT test. We don't need to test anything to do with react to do this form of testing. And you can see here it's just going to hit the product forward slash ID endpoint. It's going to send some headers and it's going to then convert the data it comes back into a product class. The product class looks like this. On the provider side, we have a similar thing. We've got a product definition over here. We have our routes that deal with the different endpoints. We have a PACT test over here as well. We're not going to get too much into the PACT test here because it's a lot of config. But basically to run the PACT test on this side, we're just going to stand up the provider, tell PACT how to find the PACT files, and we'll replay them against the provider. Lastly, we're going to share the contract with a PACT broker. In this case, it's a hosted PACT flow. And it's going to show us the current state of the interaction over time. So you can see here that the current version of the consumer is in the master and has been deployed to production. And the provider has also been deployed to production as well. We can drill into the PACT and we can see the various interactions that are supported by this contract. In this case, getting a product with ID 10, we get the ID, the type, and the name back in the body. But most importantly, we're going to use this to show how we can make changes into the system and then promote them through environments. Okay, so let's look at the consumer PACT test first, because this is where we start. So this is our PACT test here. We're going to follow the standard arrange, act, and assert model, just to see how this all works together. So first up, we need to tell PACT, we need to tell our unit test, what our code is about to do. As I said, PACT is a mocking tool. It's going to validate what we actually do. So given that a product with ID 10 exists, making a call to get that product using the get verb at this path, we expect to get back an HTTP 200 with some headers and a body that looks like line 19. You can see the expected product is ID 10, type credit card, name 28 degrees. The like matcher here basically says, we don't care about the values here, we just care that the keys exist and they're of the same type. So later on, when the provider verifies this, we're not going to fail if different IDs come back or even the different products come back. We're not going to get into matchers today either, but we have a flexible library we can use to make this much simpler. So the second thing, too, is to actually call the api. So we configure our api client, and rather than talk to the production client api, we're going to talk to the PACT mock service. Then on line 42, we actually call the method. So that method there is now going to call a real HTTP endpoint, but in this case, it's going to talk to PACT instead of the real thing. And then on line 45, all we need to do is write our unit test assertions for this call. So let's pretend like PACT doesn't exist. Well, what should we test in this unit test to make sure that our code did what we thought it did? So this test is already passing, as you saw before. It's already published a PACT flow. What we want to do is add a new field. What happens if we evolve this api? So this is a product website. It would be nice to actually display a price for the product, right? So let's add price into the mix. So we'll add this new expectation on product here. We'll add it to our product class as well. And then we'll have price available to us to be able to display in the react component. And what I'm going to do is I'm going to check out a new branch of my code, and I'm going to shut down those processes. Let's create a new field, feet, add price. What I'm going to do here is simulate a CI process as if we're doing a continuous deployment. We're in a branch, so this will be a pull request flow. And what I can do is I can run a fake CI. And what this is going to do is it's going to run the test. It's going to publish the contract to PACT flow. We're then going to run a tool called can I deploy? And say is it safe to release this change? And the answer will be no, because this is a new field. The provider has never verified the contract. In fact, the provider hasn't implemented it yet. And so we're going to be told it's not safe to release this yet. So back to my terminal. You can see the test passed up there. We've published the PACTs, and we've said can we deploy? And we can't deploy. So the build has bailed out with a non-zero exit code. You can see, though, in our code base, there is a PACT file that's being created. And you can see it's now got the price property captured in that contract file. We're not going to talk too much about the contract file for now. Just know that it exists, and that's what we use to mediate this whole process. So now that we've added the price property, let's have a look at the PACT flow and see what it sees. Cool. So we can see a new contract has been created at the top here called feet add price. It's yet to be verified. So this version of the consumer can't be deployed anywhere yet, because no provider has implemented its features. So let's go ahead and do that to the provider now. Now I've got some stash changes to avoid demo gods. But what we're going to do is we're going to add the price to the product class and add it to the repository as well, so there's data. So there we go. We can see the price has been added. What we can do now is in theory we can push this into master. So let's see what it looks like when we commit and push this change. The provider should run the test by pulling the contracts down from PACT flow. It'll share the results of that verification back to PACT flow to say if it passed or succeeded. It will then run the check saying can I deploy this version of the provider to production? The answer should be yes, because it already supports the current version of the consumer and it's also got the new functionality for this new branch. So we deploy it to production. Okay. So you can see the provider's run. A whole bunch of assertions have been happening by the PACT framework. It's published the results back to the broker and has run a can I deploy check. Can I deploy says yes. This version of the provider satisfies the current production version of the consumer that it would be deploying to. And so it deploys to production. So if I go back to PACT flow and refresh this page, you'll see that this new feature branch has now been verified by the provider. And the version of the provider that satisfies this contract is the same one that satisfies the production contract of the consumer. And so basically it's now safe to merge this change into the main line and push that to production. So I'm going to pretend like I've merged this. So I'm going to check out back out check out master. And I'm going to run the CI process as if I've just merged this into master now. And so what should happen is now the consumer will run its tests. It will publish the contract. The contract hasn't changed. So any verifications that have happened before are still valid. It will run a can I deploy check. It says can I move this change to production? And because there is a version of the provider that satisfies this needs, it's safe to do so. And then we deploy it. And there we go. We are now in production. So if I refresh this, the current product version of the contract now has the price property in it. As you can see here. What happens if we remove a field from the provider? What happens then? Will we catch a bug? Well, for example, if I remove the ID field, sorry, go to provider and comment that out, I can now run npmt locally without pushing any change and find out if I'm going to break any of my consumers. There you go. We've correctly found that a consumer is using this property and if we removed it, it would fail. But if I remove the version property, what's going to happen here? Well, as we know, no consumers are currently using this property. So it's actually safe to release it, to remove it. There we go. That's an interesting property of packed. Packed is able to pick up changes this way at the attribute level to find out compatibility. Okay. So today we talked about the cost of end to end integrated tests. We saw that there's a high cost of maintenance and that they scale poorly. We saw how contract testing can help with integration testing by combining the approach to fast and isolated unit tests with contracts to prevent drift. We saw how packed works and how we can gate releases using can I deploy. So I hope that talk was useful. Really appreciate you coming on and feel free to contact me on my details there if you'd like to talk further. Thank you very much. Hi Matt. Hello everyone. Thanks for having me. Is the poll result surprising to you or did you expect this? That sounds about normal. Not everyone's suffering so bad. They need to buy bulk coffee and buy themselves a coffee machine like I do at home. But most people find the integration testing or at least end to end integration testing challenging enough that they have to spend a bit of time on it and generally need something to get through that pain because the flaky tests, because it takes time to manage, because chasing down the issues, there's always an excuse to not want to look into those tests and write those tests and maintain those tests. So it's not entirely surprising. But it's also good to hear that not everyone is in so much pain that they need to have caffeine hooked into their veins just to be able to get through the day. Yeah, but flaky tests are always a good excuse to get another cup of coffee. There are a few questions. Manuel Zambrano asked, can we use pack flow with different consumer and provider technologies? I mean, for example, a react frontend and a Python backend serving the api. Yeah, so that's a good question. It's probably the main reason people choose Pact over other alternative sort of contract testing frameworks. So it's worth giving a quick shout out to Spring cloud Contract. So if you're not that you're using Java here, because this is a testing conference for javascript. But if you do use Java, Spring cloud Contract is a decent choice. It's from the folks at Spring. They do have support for other languages. But the way it works is you still write groovy scripts. So I guess technically, you're probably still writing JVM stuff. But one of the benefits of Pact is that it uses a specification that enables you to work across languages. So really early on in its design, the early maintainers recognized that this would be a challenge that polyglot architectures were a thing, and that contract testing needed to support polyglot environments from the beginning. So there's actually a specification that governs the way matching happens in a language independent way. And that means that different consumers can be written for different providers and sort of work in this environment. But the way the matching happens and the verification happens, you can sort of transcend languages. So yes, it's entirely possible to have a javascript front end, an iOS front end, a Swift front end talking to a Java api back end or a Ruby back end or a.NET back end. You can mix and match languages kind of as you please. Amazing. Steve, I'm probably pronouncing that wrong. I'm sorry. Someone asked, how different slash related is contract testing with applying a JSON schema to every single request you perform using, for example, Postman? Yeah, that's a great question. We actually get that question a lot. I've got some articles I can point people to afterwards that sort of talk about the difference between schema testing and contract testing. It goes sort of more skin deep. One of the first things with schema testing, for starters, is that you're only normally looking at the bodies. And you're not looking at the request, the other HTTP things. So the verb, the path, the query string, the headers, as an example, they usually don't get covered in your contract test. But also obviously a very important part of that contract. But obviously you can do that. The second thing is, even if you're using JSON schema on one side of the contract, what's guaranteeing you that that is exactly what the provider needs on the other side. So you do need to make sure there's a way of pinning the schema you use on both sides of the contract and ensuring those schemas are compatible. So for example, let's say you've got version one of the schema on your consumer side, then the provider updates its schema. You need to make sure that those schemas are now in sync. So essentially you could do that. But you just need to make sure that the schemas are always the same. And then the second challenge with that is, if you think about something like contract testing with Pact, the way it wants to work is actually using your intermediates through something like the Pact broker or Pact flow. And what that does is let you version and tag your contracts, just like you would with Git code. So let's say you've got code, right? You're about to deploy this into production. Version two of your code is in Git as well as version one. And you need to migrate production from version one to version two. So you need a process to make sure that you can smoothly transition from one to two. And so again, if you're going to use your own schema testing tool, you're going to need to come up with your own process for evolving that schema from version one to version two, when you've got multiple components in the system. Again so it's not that you can't do that, but you're going to need to build a lot of things that the contract testing frameworks have in them already. What I will say though, is it's a strategy that people seem to be employing and you can probably get 70% of the way there. And that might be good enough for most people's use cases. Thank you so much for this detailed answer. Richard Ford show is asking, is contract testing mainly for unit tests or for end to end tests like with cypress? I guess the question asks a little bit about where would contract testing stand in the testing pyramid? Yeah, so it's a good question. So probably, I'd say there's two questions. One is kind of where it fits in the pyramid and can you use tools like end to end testing tools, which I think as a community, we need to fix that. End to end testing means too many things. We got to fix that. I think we can't have a situation where cypress means end to end test and it's really just testing the UI layer. And then we also can refer to end to end testers passing through the entire platform. So I'll put that challenge out there for someone to try and fix. But let's talk about the first one. So where contract testing fits in the pyramid, it sort of fits in two places. On the consumer side, it's much closer to a unit test. So basically you should be picking a single function as you saw in the talk and writing a unit test for that function. So that's usually pretty straightforward. On the provider side, you've got a few more options about how far up and down the pyramid you go. Usually it sort of sits in the middle. So we would normally overlay the contract test into the middle of that pyramid, the service test or the integration test layer, where basically you would run up your provider as a bit of a black box. You'd stub out any third party dependencies and you start the service up and then PACT would talk to that service, you know, through HTTP mechanisms, passing through probably multiple layers of your application. But usually they still run very quickly. So they're kind of closer to a unit test, but they're not really a unit test because they've got to pass through a few layers. So that's usually where they fit in the pyramid. And in addition to that, what we normally do is we supplement, in order to remove end to end tests, we supplement those contract tests. Well, if you think about it, we've got end to end integration tests we want to get rid of. Contract testing removes the tests from the end to end tests that have the testing for the collaboration and the communication. Can they talk to each other? All the contract bits of that go away. So we shrink the end to end tests a bit. We then look in there and go, well, actually, usually what you find is that end to end tests, a lot of them is actually the consumer not trusting the provider. And so the consumer is writing end to end tests to check the behavior of the provider. So what we do is we take those tests out and go, nope, they shouldn't belong there. They belong in the provider's code base. Their job is to make sure their code works, not the consumer's job. And then you whittle it down to just a few tests. And that's when you can go, well, what's the value in having these few tests left behind? Maybe we can put those in production synthetic tests, or we can put them somewhere else. Or we can remove them altogether because they're not adding value and we can move faster. Now the second part of that question, I think, is around, could you integrate it with something like cypress? The short answer is yes, you could integrate it with cypress. And there's some challenges with doing that. And we're currently sort of making it less problematic to do that form of testing with cypress. But the short of the answer is because typically with cypress tests, you're going to have a lot of overlap. You don't really want to be capturing too many overlapping scenarios in your contract tests because they need to be replayed against the provider. And so that can become a burden on the provider side testing if you add too many. But again, we're doing some work to try and optimize that process to make it easier to use cypress to write tests and to verify on the provider side. And I should mention, we literally at PackFlow, we are writing tests using cypress now for some of our user interfaces. And we're experimenting with generating contracts through that as well. And I suspect the next couple of months we'll probably release a plugin for cypress as an example that could be replicated for others. Okay, cool. I think we have time for one more question. And Auto Gibbon asks, how can I deploy Workflow be integrated into continuous delivery suite? Is there a short answer to this question? If the answer is longer, then maybe we can move it to the speaker room. I'm sorry for interrupting you. The short answer. Yeah, no worries. The short answer is yes, it's pretty common. Very simple. We have some CLI tools you can use. And usually it's just a matter of adding the CLI tool into that step, wherever you want to put that step. And as you saw, just basically taking the version of the current piece of software, the environment you want to deploy to, and just issuing that call. And it's going to return an exit code of zero or one, zero being successful, one being a failure. And that's it. And that's usually what you can do. A non-zero exit code should bail out your pipeline. Okay, awesome. So there were a few more questions. And Matt is going to go into the speaker room on special chat. So if there are any unanswered questions, make sure to join. And Matt, again, thank you so much for joining us today. It was a pleasure to have you here. Yeah, thanks. Thanks for having me, Anna. And thanks, everybody else, for coming along.
32 min
15 Jun, 2021

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Workshops on related topic