Bringing the Power of AI into your Editor with GitHub Copilot


GitHub Copilot is an AI pair programmer tool that puts the collective knowledge of millions of developers right at your fingertips, directly in your IDE.

During the talk, Krzysztof, a core team member behind GitHub Copilot, will demo how Copilot works and discuss the design principles used while creating the project.

He will also dive into some of the project's technical details - how Copilot communicates with its AI, what ""queries"" it uses, how it processes responses, and how it integrates with various IDE to create characteristic Copilot's UX.



Hello, everyone. First of all, I'm really, really glad that you are all here. It's been the last two years were difficult, of course. So thank you all for being here. My name is Krzysztof, and I have some fancy title at GitHub. It doesn't really matter. I've been working for the last year and a half on a project called GitHub Copilot. Hopefully, many of you have heard about GitHub Copilot. It has been a hyped topic on the internet and the social media. But if you haven't, let me quickly introduce you to GitHub Copilot. So what it is? Basically it's a software development productivity tool. You can think about it as a bit more powerful autocomplete. However, unlike traditional autocomplete systems, it's not powered by the semantic information or statical code analysis or anything like that. Instead, it's using codex AI model developed by OpenAI that has been trained on collective knowledge of us all, software developers, in the form of billions of lines of code publicly available in the internet. So thanks to that, GitHub Copilot is not limited to suggesting only single word like variable name or function name, like in case of traditional autocomplete systems. Instead, we can try to suggest whole blocks of code, multi-line blocks of code, that tries to adapt the current context, tries to figure out what is your next step, what you're planning to do, those snippets, those suggestions, adapt to the coding style, to using other files, other functions from your project, and much more. And because of that, we had to come up with a bit different user experience for those suggestions, so we are not using traditional autocomplete widget with a list of functions. Instead we designed this user experience of having this inline, virtual text that's displayed directly in editor, and I will talk a bit more about that in a moment. However, if you haven't seen GitHub Copilot yet, I do have a quick video for you to demo it, and I'll be talking while this video is playing. So yeah, as you can see, this gray text thingy, those are suggestions that are suggested by GitHub Copilot. They are updated as you type in your editor, so you don't need to make any additional action to get those suggestions. You just type in your editor code as you would normally code, and we just try to suggest something useful for you that's hopefully helpful. In this particular demo, there are only single-line completions but we can also suggest multi-line completions in some cases. So you probably want to ask me, yes, this looks cool at the demo, but is it actually useful and does it actually produce any value to the users, improves your productivity? So as it was mentioned in the introduction, Copilot has been in technical preview for almost a year. We've released it in late June last year, so it's literally one year right now. And it has been used by thousands and thousands of software developers around the world. I cannot, of course, share any specific numbers on that. However, there is one number that I can share, and this is this number. So for the users that have Copilot enabled on their file, we see that 35 per cent of the newly written code in this file has been suggested by the GitHub Copilot. Of course, software development is not just typing the code, but imagine if you type code by a couple of hours every day, if that process is 35 per cent more productive, that means that you have maybe two hours more of your time back. So that's really amazing. And this number has been growing steadily for the last year. We haven't started with this number. We started with a much lower number. It has been growing given the improvements that we've done to the project for over the last year, and also, we are sure that we can bring this number way higher. Software estimations are high in the next couple of years. So that was marketing, and now let's actually go into technical details, because I'm a software developer, I'm not really trying to sell you a product, I'm here to talk about what we do. And also, as a note, I'm not a data scientist, I'm not an AI guy, I'm working on this project because I have been a developer tooling expert, I have a lot of experience with Visual Studio Code extensions and with other developer tools, so that's my role in the team. So I'm not going to go too deep into AI or how it works, I just don't really understand that. And that's fine. I imagine that most of us are not AI experts and data scientists, we are just software engineers that try to build useful stuff for our customers. So when I've joined the team, and when Copilot started as a project, we were using really simple architecture. We had Visual Studio Code extension written in typescript, and then we were just communicating directly with AI codecs hosted on by OpenAI. You can do that in your projects right now, you need to access to OpenAI for access to the preview or beta program, whatever it's called, and then you can just send HTTP request to it and it will respond. And actually, this is the architecture that we started our technical preview with around a year ago. There were some additional calls to the for authentication, but that's boring stuff. In principle, it was as simple as that. And then Copilot has become a fairly successful product. There has been a lot of hype in the internet about the product, and people, most common question that people were asking, hey, can I use Copilot in IntelliJ? Can I use Copilot in NeoVim? Can I use Copilot in Visual Studio? Okay, yeah. No one asked that. So, as we scaled the project, we had to make changes to our architecture. Nowadays, Copilot supports multiple editors. Visual Studio Code is still our main editor, and I believe it's still our main part of the user base is using Visual Studio Code, that's fairly natural, I would say, given how popular it is. But we also have really good support for all the IDEs in IntelliJ suite, so PyCharm, Ryder, all those different IDEs based on IntelliJ, they're all supported. We have NeoVim support, and yes, we also have Visual Studio support. So, all those editors use different programming languages and different programming ecosystems to build their extensions. VS Code is Node and javascript typescript. IntelliJ is JVM languages, Java, or Kotlin, or some other fancy languages. In case of Visual Studio, it's C Sharp. Sorry that I need to mention that. So, we didn't want to rewrite our code to target all those platforms, because our code contains quite a lot of different logic, we do some stuff. So this is the same problem that we realise, that this is the same problem that language vendor toolings often have. So, for example, if you develop rust editor language tooling, you want to have this running in all your editors, and those vendors are solving the problem with using concept of language servers, which is basically this process that can be spawned by the editor, which contains all the logic, and then editors become a thin client that just interacts with the language server. In our case, this is called Agent, because that's fancy. Agent contains quite a lot of common logic. It's written in typescript, it's running on Node. I believe for IntelliJ, we actually compile it to native distribution, we do something fancy with Vercel package thingy. So it contains all the code that is around creating the input that we will send to the AI, it creates all the code that's telemetry, that is about synchronisation settings, and all that stuff. Then we've introduced proxy, which is placed for the common code, but it's web service in the cloud, as close to the AI as possible. And then we have also AI on azure this time, so we change our back-end hosting, we are hosting the AI on azure, and I believe Microsoft recently on build announced this AI platform that's in preview right now, which allows anyone to use the big model, big language models from open AI that are hosted on azure. And this is really important because that allows us to scale on the whole planet. Open AI initially just hosts their models in single location, azure is distributed globally, so we can host AI on any data centre that we want to do that. You can probably ask why we have two places for common code, Agent and Proxy. This is a really good question. Basically some of the logic is more connected with your editor, and then it's fairly natural that it runs on your local machine, and some of the logic should live as close to the model in the cloud as possible. So for example, things like AI safety and responsibility features, we kind of want to have that on cloud, not on your local machine, for various reasons. And Agent, anything that's related to your editor, again, it's more natural that it lives on your local machine, that we don't need to synchronise everything to the cloud because that would be complex. So I mentioned that I'm not a data scientist, I'm not an AI expert, but let me talk briefly about Codex. So Codex is the name of the AI model that we are using. And what is Codex? Let me actually, I have a note. Codex is a natural language processing artificial intelligence model based on GPT-3 developed by OpenAI. Yes, thanks, Wikipedia, that's really useful. So let's go slowly here. First of all, it is a model that operates on natural language, which is like English or any other language that it understands, including programming languages. But what it means in principle is that we are just passing our context, our input into it as a string. We don't do any special encoding, we don't use abstract syntax tree to represent your code, we just take a file, push it, and hope for the best. Secondly, this is like artificial intelligence model, which in principle means that this is probabilistic system. We send something into it and then it tries to figure out what's the most probable next word coming in the text. Important thing, most probable doesn't mean it's always correct. It's assumption that it tries to figure out. And also, again, it's probabilistic also on scale. Whenever we send it couple prompts, well, the same prompt couple of times to it, we may get back different results from it. So this is really funny. This required a lot of changing mindset from me as someone working with this system. You know, I'm a software developer, I'm used to function, unit test, input, always the same output, yay. Here we cannot do that, which is really interesting. Also, it's really large-scale model, it has been trained on the billions of lines of code. This means that we cannot easily introspect into it. We cannot really easily understand why it is coming up with some suggestion. We can only manipulate the input and observe the result. And also I've mentioned in this very useful definition that it's based on GPT-3. GPT-3 is a large language model that has been trained on the internet. So that's why codecs understand not only the programming language code but it also understands English or any other language, more or less, so it can understand comments and stuff like that. Basically it means that codecs is a teenager that had a really, really lot of time and read a whole internet and a lot of code, and kind of like you tell it something and it responds to you sometimes, and you don't know why. So how to get this teenager, this model, to do something useful for you? And this brings us to the process called prompt crafting. And prompt crafting is a really fancy name for preparing a string that we send to the model. So initially we started with a fairly simple approach. We just took the cursor position in your file, we take the file content from the top of the file up to your cursor position, we send it to the model, and it works. Surprisingly, it works well enough. However, we were sure that we can do better. And in principle, when working with those AI large language models, the more context we can send to them, the better will be the results. Of course, the context needs to be useful. It's not like we can send some random crap. It would be good if the context is from your files. But yes, the more things we send to it, the better will be the result. So we started with fairly trivial improvements. We added path and language markers, which is just a comment on top of the file, saying this is language javascript, and your name of the file is blah tests for my fancy JS nation demo.js. And even such, the lesson here is even such simple improvements actually make fairly reasonable improvements to the response, to the output that we have observed. Why in this particular example? Using the language name marker means that the model is less likely to be confused by what language it is, like all C family languages look the same, and you really don't want to see those C sharp suggestions in your javascript code, so that's why we put the language marker. So that's step one. However, we still have two big problems in our initial implementation. We looked into the code that's above the cursor position, and, of course, in programming languages, in the vast majority of programming languages, you can put your functions in any order, so there can be useful stuff below your cursor position. So we introduced something called sibling functions detection. It is basically a process where we parse the file and we look for the functions of the same level as the function that you're currently editing, so if you're in class, we look at other class members, if you're in the top level, on the top level of the file, of the module, we look at the top-level functions, and then we reorder stuff. So all the functions that we hope that are useful are above your cursor position, or kind of the point where we send requests. And we do that in memory. It's not like we don't change your file, we do that all in the memory processing, the prompts. And also, the pro tip here is don't have 1700 top-level functions in your file. Don't ask me why, I know that, but, yes, don't do that. And the last interesting part is using the context from other files of your project. Of course, again, in programming languages, we can create as many files as we want. We often create many, many of those files in the project. So we try to figure out what is the code that looks similar to what you're writing right now that exists in the other open tabs in your editor. And this is important part. We don't look at your hard drive or anything like that directly, we only look, we only have access to the files that are open in your editor as tabs. We assume that this is kind of a good signal for us that we can look into this file because if you have them open them in IDE, that kind of like suggests that you're okay with tooling working on those files. Okay. The last but one thing that I want to talk about, and I have 28 seconds so that will be fast, is observing results. As I've mentioned, the model is probabilistic, and it's first of all, it's responses are probabilistic but also with the same prompt, it returns different responses. That means that we can only observe those results at scale. We cannot just unit test the model, we cannot unit test the improvements to the prompt crafting because we don't know if this will be successful for all the cases. The problem space of working with developer tooling in general is that the code can look very differently. There is like tons of the use cases all over around. So we need to make sure that we want to be as helpful to everybody as possible. So we have two parts of observing results here. One is we have offline evaluation system where we clone a couple of thousands of Python repositories from the internet. We try to figure out the functions that are well tested, that have good test coverage, and functions that are well documented, that there is documentation for those functions. We remove the body of those functions, and we try to regenerate the results with the body of the copilot, check if the unit tests are passing again, which is really interesting. The second way of observing results is doing experiments on people. That's why we have quite advanced telemetry system in the copilot. If you're more interested in telemetry that we do, my colleagues from the team has actually published, I believe last week, or something like that, the white paper about how we use telemetry and how it fits with measuring the user satisfaction. We also run some satisfaction user tests on people where they had to answer some surveys about how they feel. The interesting thing here is that, and this will be kind of like, this was mind-blowing for me initially, that it doesn't really matter that suggestions are super accurate. What it seems is that users are really satisfied, even if copilot gives them not 100 per cent accurate suggestion, but it is a starting point for them so they can think about the problem of the next step that gives them scaffold for what they do, which is really, really fascinating. I wanted to talk a bit about user experience design, but we don't have time for that. Thank you all for watching. I believe that's now time for Q&A. We're going to go right into the audience questions. First question from Heno. What are the copyright restrictions for using GitHub Copilot on company-owned code? This is a really great question. It's coming fairly often. First of all, I'm not a lawyer. Don't take anything that I will say as any kind of legal advice. Secondly, what is also important is that it really depends on your company, so I would recommend actually talking with someone from your company. However, our answer is that, in principle, Copilot is not copying the code. What Copilot is doing, it is creating a new suggestion that are unique, that are personalised for your particular context of what you're writing. Copilot is not a copying machine, it's not a pattern-matching machine that's just taking existing snippet of the code from its database or whatever. It's trying to figure out new code for your problems. In very rare cases, Copilot can copy a code, but it's like 0.1 per cent of the cases that it can copy the code directly from the memory, but this happens only for very well-known snippets of the code. Also, we are working on the solution to the problem that should be available soon-ish. Soon-ish? Like a Tesla timeline? No, like soon-ish. Okay, cool. The question is from Kathleen. Does Copilot plan on becoming a paid product? Yes. Currently, Copilot is in a technical preview state where people can use it for free. We use this period of time to improve the product massively. However, it has been announced by Satya Nadala, CEO of Microsoft, on Build, during his keynote, that Copilot will go general availability in this summer. It will become a paid product there. However, it will be freely available to the students and to the verified open source contributors, whatever it means. So if I do some open source work, I can also use it for company work? Yes. Cool. All right. And again, this is not legal advice. Ask your company. If your company allows it. Yes. Next question from Allison. Copilot can suggest what to write, but can it also suggest what to remove? Not right now. So right now, suggesting new code is our main user experience that we provide. Of course, we are looking into various improvements to the AI-driven development tooling in general. I think I've mentioned that I work at this thing called GitHub Next, and we are this team at GitHub that tries to figure out what's the next 10 years of the developer tools, and all the applications of AI is something that we definitely research all the time. Okay. A question from Anonymous. Does Copilot also learn from me as an individual, and will it suggest me code personalized, fitting my coding style? Yes, exactly. So Copilot, as I've mentioned, looks at your file, currently open file, and other tabs open in your editor, which can be your own code, your own personal code. It's not in any way connected to your company. It's not in any way connected to your repository, even. It just looks at the code that's in editor, that's in IDE, to figure out what are the suggestions. So a follow-up question then for me. If I want to improve the suggestions, I should just open more tabs with my code? Yes, this is one way of improving suggestions. Yes, indeed. Nice. So a good question from Anonymous. Is it cheating if developers use Copilot? Anonymous is a junior developer, and thinks it's awesome, but he or she is worried that they will be less of a developer if they utilise it. Is it cheating if a mathematician is using a calculator? I don't think so. I also don't think that using other IDE features in general is cheating. I know that there are some people that say, oh, you shouldn't even use traditional autocomplete because this way you will become a better developer if you will memorise all this stuff. That's not my position. I have been developing developer tools that help developers and lower the barrier to entry for developers for my whole professional life. So I don't think it's cheating. I think that's our goal, to improve your life and your productivity and your happiness. The world is going that way. So you will be left behind because your colleagues will be using it. So it will be a waste of your time. Last question from Mark Tang. Do you... Sorry, I need to read it again. Do you feedback any changes made to code suggestions by the user into the AI? So if I change the suggestion, does the AI know I changed the suggestion and does it improve? So, yes, we currently are collecting such data in our telemetry. So we are checking whether the suggestion stayed the same after 30 seconds of being in your file or if you've changed it. However, we are not using that data to feed into AI directly. So your private code, if you're developing some private code, won't ever be a source of the suggestion for other people. What we do is we only observe the data on what was the percentage of the change, which is like string distance and things like that to kind of observe what are the results. As I've mentioned in the talk, we need to observe at scale. So we need to have this data to see if we are doing useful stuff. All right. Well, thanks a lot. That's all the time we have for Q&A. We're going to go into a little coffee break now. Thanks a lot, Krzysztof. Krzysztof is going to go to the speaker lounge where you can continue the conversation. And well, I invite you all to sign up for the beta. Yes, definitely. Please do sign up for the technical preview. It's free to use. Initially, there has been a long period of waiting when you've signed up for the technical preview. It's not the case right now. We are letting in people really, really quickly nowadays. So please do sign up. All right. Thanks a lot. Thank you. Thank you.
29 min
16 Jun, 2022

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Workshops on related topic