https://twitter.com/ridiculous_fish/status/14527512360594513...
* QuickCheck: https://hackage.haskell.org/package/QuickCheck
* Hypothesis: https://hypothesis.readthedocs.io/en/latest/
* JUnit QuickCheck: https://github.com/pholser/junit-quickcheck
Fuzz testing tools (tools which mutate the inputs to a program in order to find interesting / failing states in that program). Generally paired with code coverage:
* American Fuzzy Lop (AFL): https://github.com/google/AFL
* JQF: https://github.com/rohanpadhye/JQF
Mutation / Fault based test tools (review your existing unit coverage and try to introduce changes to your _production_ code that none of your tests catch)
* PITest: https://pitest.org/
# sample call: https://en.wikipedia.org/w/api.php?action=query&format=json&list=geosearch&gscoord=37.7891838%7C-122.4033522&gsradius=10000&gslimit=100
Then I defined a variable, base_url = "https://en.wikipedia.org/w/api.php?"
Then, like magic, Copilot suggested all the remaining keys that would go in the query params. It even knew which params were to be kept as-is, and which ones would come from my previous code: action = "query" # action=query
format = "json" # or xml
lat = str(latitude.value) # 37.7891838
lon = str(longitude.value) # -122.4033522
gscoord = lat + "%7C" + lon
...
api_path = base_url + "action=" + action + "&format=" + format + ... + "&gscoord=" + gscoord
As a guy who gets easily distracted while programming, Copilot saves me a lot of time and keeps me engaged with my work. I can only imagine what it'll look like 10 years from now.> Copilot regurgitating Quake code, including sweary comments (twitter.com/mitsuhiko)
I added an issue. https://github.com/github/feedback/discussions/6847
Anyone else install in neovim?
https://plugins.jetbrains.com/plugin/17718-github-copilot/re...
I filed a PR because it was a bit frustrating to go through the entire setup and then find out I needed to be granted access.
https://plugins.jetbrains.com/plugin/17718-github-copilot/ve...
EDIT: Hmm, it installed but it refuses to run.
EDIT2: Looks like you can force the plugin to work by editing the plugin.xml contained in github-copilot-intellij-1.0.1.jar within the plugin archive. Just remove the line that includes Rider as incompatible. The same should work for CLion.
api_path = base_url + urllib.parse.urlencode({
'action': action,
'format': letThisBeVariable,
...
'gscoord': str(latitude.value) + '|' + str(longitude.value)
})
see: https://docs.python.org/3/library/urllib.parse.html#urllib.p...Mantra: when inserting data into a context (like an url) escape the data for that context.
https://docs.python-requests.org/en/latest/user/quickstart/#...
When I was playing around a couple years ago with the Fastai courses in language modeling I used the Python tokenize module to feed my model, and with excellent parser libraries like Lark[0] out there it wouldn't take that long to build real quality parsers.
Of course I could be totally wrong and they might just be dumping pure text in, shutter.
This could have been prevented very simply: GitHub avoiding training Copilot on GPL code.
What they can still do is offer a new model excluding GPL code for people who care about it.
[1] - https://www.gnu.org/licenses/gpl-faq.en.html#SourceCodeInDoc...
Looks like for VSCode the shortcut on Linux is Alt-], see: https://github.com/github/copilot-docs/blob/main/docs/visual...
But for neovim, it doesn't mention anything about it in the docs: https://github.com/github/copilot.vim/blob/release/doc/copil...
And, nothing happens when pressing Alt-].
https://arxiv.org/abs/2007.02220
Although our own work shows Copilot is pretty good at adding security flaws on its own:
First, as much as I don't like the idea of Copilot, it seems to be good for boilerplate code. However, the fact that boilerplate code exists is not because of some natural limitation of code; it exists because our programming languages are subpar at making good abstractions.
Here's an example: in Go, there is a lot of `if err == nil` error-handling boilerplate. Rust decided to make a better abstraction and shortened it to `?`.
(I could have gotten details wrong, but I think the point still stands.)
So I think a better way to solve the problem that Copilot solves is with better programming languages that help us have better abstractions.
Second, I personally think the legal justifications for Copilot are dubious at best and downright deception at worst, to say nothing of the ramifications of it. I wrote a whitepaper about the ramifications and refuting the justifications. [1]
(Note: the whitepaper was written quickly, to hit a deadline, so it's not the best. Intro blog post at [2].)
I'm also working on licenses to clarify the legal arguments against Copilot. [3]
I also hope that one of them [4] is a better license than the AGPL, without the virality and applicable to more cases.
Edit: Do NOT use any of those licenses yet! I have not had a lawyer check and fix them. I plan to do so soon.
[1]: https://gavinhoward.com/uploads/copilot.pdf
[2]: https://gavinhoward.com/2021/10/my-whitepaper-about-github-c...
>>> import requests, pprint
>>>
>>>
>>> url = "https://en.wikipedia.org/w/api.php"
>>> resp = requests.get(
... url,
... params=dict(
... action="query",
... list="geosearch",
... format="json",
... gsradius=10000,
... gscoord=f"{latitude.value}|{longitude.value}"
... )
... )
>>>
>>> pprint.pprint(resp.json())
{'batchcomplete': '',
'query': {'geosearch': [{'dist': 26.2,
'lat': 37.7868194444444,
'lon': -122.399905555556,
'ns': 0,
...I typed the following prompt:
def search_wikipedia(lat, lon):
"""
use "requests" to do a geosearch on Wikipedia and pretty-print the resulting JSON
"""
And it completed it with: r = requests.get('https://en.wikipedia.org/w/api.php?action=query&list=geosearch&gsradius=10000&gscoord={0}|{1}&gslimit=20&format=json'.format(lat, lon))
pprint.pprint(r.json())It was a pretty thorough study:
> Our study is the largest to date in the literature: we generated 31,000 test suites for five systems consisting of up to 724,000 lines of source code. We measured the statement coverage, decision coverage, and modified condition coverage of these suites and used mutation testing to evaluate their fault detection effectiveness. We found that there is a low to moderate correlation between coverage and effectiveness when the number of test cases in the suite is controlled for.
Given their data, their conclusion seems pretty plausible:
> Our results suggest that coverage, while useful for identifying under-tested parts of a program, should not be used as a quality target because it is not a good indicator of test suite effectiveness.
That's certainly how I approach testing: I value having a thorough test suite, but I do not treat coverage as a target or use it as a requirement for other people working on the same project.
[1]: https://neverworkintheory.org/2021/09/24/coverage-is-not-str...
I think that could be crucial.
If I read a computer science book, and from that produce a unique piece of code which was not present in the book, I have created a new work which I hold copyright over.
If I train a machine learning algorithm on a computer science book, and that ML algorithm produces some output, that output does not have a new copyright.
In essence, there must be originality for a work to be under a new copyright, and that is likely a requirement that it must be a human author. See this wikipedia page: https://en.wikipedia.org/wiki/Threshold_of_originality#Mecha...
Similarly, if copilot synthesizes a bunch of MIT code and produces a suggestion, that may be MIT still, while if a human does the exact same reading and writing, if it is an original enough derivative, it may be free of the original MIT license.
Where did it come from then? And what license did the original have?
> and is in hundreds of repositories - many with permissive licenses like WTFPL and many including the same comments.
If the original was GPL or proprietary, then all of this copies with different licenses are violating the license of the original. Just because it exists everywhere does not mean Copilot can use it without violating the original license.
> It's not really a large amount of material, either.
No, but I would argue that it is enough for copyright because it is original.
> GitHub claims they haven't found any "recitations" that appeared fewer than 10 times in the training data.
Key word is "claim". We can test that claim. Or rather, you can, if you have access to Copilot, you can try the test I suggested at https://news.ycombinator.com/item?id=28018816 . Let me know the result. Even better, try it with:
// Computes the index of them item.
map_index(
because what's in that function is definitely copyrightable.> With the exceptions mentioned above, what you get back from asking for more code won't just be more and more of a particular work. Realistically I think you'd be able to get significantly more from Google Books.
That can only be tested with time. Or with the test I gave above.
I think that with time, more and more examples will appear until it is clear that Copilot is a problem.
Nevertheless, a court somewhere (I think South Africa) recently ruled that an AI cannot be an inventor. If an AI cannot be an inventor, why can it hold copyright? And if it can't hold copyright, I argue it's infringing.
Again, only time will tell which of us is correct according to the courts, but I intend to demonstrate to them that I am.
> The three parts take roughly the same portion of time, and when I'm writing tests
that bit and have some strong feelings about it. At my current dayjob, writing tests (if it was even done for all code) would easily take anywhere between 50% and 75% of the total development time.
I wish things were easy enough for writing test code not to be a total slog, but sadly there are too many factors in place:
- what should the test class be annotated with and which bits of the Spring context (Java) will get started with it
- i can't test the DB because the tests don't have a local one with 100% automated migrations, nor an in memory one because of the need to use Oracle, so i need to prevent it from ever being called
- that said, the logic that i need to test involves at least 5 to 10 different service calls, which them use another 5 to 20 DB mappers (myBatis) and possibly dozens of different DB calls
- and when i finally figure out what i want to test, the logic for mocking will definitely fail the first time due to Mockito idiosyncrasies
- after that's been resolved, i'll probably need to stub out a whole bunch of fake DB calls, that will return deeply nested data structures
- of course, i still need all of this to make sense, since the DB is full of EAV and OTLT patterns (https://tonyandrews.blogspot.com/2004/10/otlt-and-eav-two-big-design-mistakes.html) as opposed to proper foreign keys (instead you end up with something like target_table and target_table_row_id, except named way worse and not containing a table name but some enum that's stored in the app, so you can't just figure out how everything works without looking through both)
- and once i've finally mocked all of the service calls, DB calls and data initialization, there's also validation logic that does its own service calls which may or may not be the same, thus doubling the work
- of course, the validators are initialized based on reflection and target types, such as EntityValidator being injected, however actually being one of ~100 supported subclasses, which may or may not be the ones you expect due to years of cruft, you can't just do ctrl+click to open the definition, since that opens the superclass not the subclass
- and once all of that works, you have to hope that 95% of the test code that vaguely correseponds to what the application would actually be doing won't fail at any number of points, just so you can do one assertion
I'm not quite sure how things can get that bad or how people can architect systems to be coupled like that in the first place, but at the conclusion of my quasi-rant i'd like to suggest that many of the systems out there definitely aren't easily testable or testable at all.That said, it's nice that at least your workflow works out like that!
Sadly it's easier said than done, since it's not an easy thing to fix for an existing system. We've spent quite some time improving things to ease the pain on writing tests, it was getting better but would never reach the level if we were aware of this problem in the first place - there are tens of thousand tests and we cannot rewrite them all.
I'm not too familiar with your tech stack. But there are two things you mentioned that are especially tricky to handle for testing: DB and service calls.
For DB, there are typically two ways to handle it: Use real DB, or mock it.
Real DB makes people more confident, and don't need to mock too many things. The problem is it can be slow and not parallelizable, or worse, like your case there's no impotent environment at all. We had automated migrations, but the test was run against the SQL Server on the same machine, so it was not parallelizable so the tests took more than a day to run on a single machine. On CI there are tens of machines but still takes hours to finish. In the end, we generalized things a little bit, and used SQLite for testing in a parallel manner. (Many people suggest against this because it's different from production, but the tradeoff really saved us). A more ideal approach is to have SQL sandboxing like Ecto (written in Elixir). Another ideal approach is to have in memory lib that is close to DB, for example, the ORM Entity Framework has an in-memory implementation, which is extremely handy because it's written in C# itself.
If there's no way to leverage real DB, you have to mock it. One thing that might help you is to leverage the Inversion of Control pattern to deal with DB access, there are many doctrines like DDD repositories, Hexagonal, Clean Architecture but essentially they're similar on this point. In this way, you'll have a clean layer to mock, and you can hide the patterns like EAV under those modules. As you leverage them enough, they will evolve and there would be helpers that could simplify the mocking process. According to your description, the best bet I would say is to evolve toward this direction if there's no hope on using real DBs, as you can tuck as much as domain logic into the "core" without touching any of the infrastructures. So that the infrastructure tests could be just very simple and generic.
For service calls, the obvious thing is to mock those calls. The not so obvious thing is to have well-defined service boundaries in the first place. I cannot stress this enough. When people failed to do this, they will feel they're spending a lot of time mocking services, while at the same time they feel they've tested nothing because most things are mocked. Microservices were getting too much hype over the years, but very few people pay enough attention on how to define services boundaries. The ideal microservice should be mostly independent, while occasionally calling others. DDD strategic design is a great tool for designing good service boundaries (while DDD tactic design is yet another hype, just like how people care more about Jira than real Agile, making good things toxic). We were still struggling with this because refactoring microservice is substantially harder than refactoring code within services, but we do try to avoid more mistakes by carefully designing bounded contexts across the system.
With that said, when the service boundaries are well-defined, and if you have things like SQL sandboxing, it's a breeze to test things because most of the data you're testing against is in the same service's DB, and there are very few service calls need to be mocked.
No, it's on you to not assume you know everything about my thought process before I show you otherwise.
Could I have communicated better? Yes. But I didn't assume you knew everything about my thought process. I thought it wasn't necessary for you too until you assumed that you knew my argument better than I did.
> You seem to coming at this as if the law is a purely mechanistic thing that can quickly resolve disputes, overlooking how these things play out in the real world, like Oracle v google going on for a decade or the even longer litigation involving SCO and IBM.
Once again, you are assuming. Yes, I know law is not mechanistic. Yes, I know going to court would take a long time.
Going to court is not the only thing I am doing. I also created new licenses, which I would not have if I only cared about what happened in court.
Going to court would be to attempt to argue for and enforce my viewpoint (indirectly). It would be a last-ditch attempt.
The first thing I am doing is creating new licenses specifically meant to "poison the well" for machine learning on code in general and Copilot in particular. [1]
With those licenses, I hope to make companies nervous about using Copilot for anything that might be using my licenses. This hesitation may only apply to code with my licenses, but the FAQ for those licenses ([2] is an example) are also designed to make lawyers nervous about the GPL and other licenses.
If I succeed in making the hesitation big enough, then Copilot as a paid service would be dead, and hopefully enough companies will prohibit the use of Copilot, as is already being done. [3]
Going to court, then, would only happen if I found someone infringing.
This will be especially helped by the fact that the vast majority of the code under those licenses will be in a language I'm building right now. If there's open source code in the language, then I can search that code for infringements caused by Copilot.
> I mean, what makes you so sure the court is going to give you a quick judgment on the infringement, or that it's going to agree with you about the size of code fragment that that is sufficient to infringe?
Do you think I would be stupid enough to pick an example to bring before court that would not be obviously infringing?
Winning in court is not just about being right, it's also about picking your battles, and I would be very choosy.
> Surely you can can agree that sufficiently small code fragments won't meet this threshold because they're too basic or obvious.
Yes, and as I said above, I won't use any of those.
> Because your whole argument here rests upon that assumption, it comes off as a wish fulfillment scenario where Copilot disappears because nobody likes the risk calculus;
You realize that this is the entire basis for the cybersecurity industry? The entire point is to make it economically infeasible for bad guys to do bad things in cyber space; it's to make the "risk[/reward] calculus" skew in favor of the good guys so much so that bad guys just stop operating.
Making the risk calculus riskier for your opponent is how wars and legal cases are fought too, but such tactics are not confined to the warroom or courtroom. That's why my opening salvo is licenses to sow doubt, to change the perception of the risk calculus. Battles like this are won by "winning minds," which in this case means convincing enough people to be nervous about it.
> your stated goal of 'making Copilot a dead product' seems more emotional than rational.
This is something where you are partially right. There is a lot of emotion behind it, not because I'm an emotional person (I'm actually on the spectrum and less emotional than the average person), but because I objectively considered the ramifications of what GitHub is doing with Copilot, realized how bad those ramifications were, and that lit a fire under me.
I wrote about the ramifications and refuted the dubious legal justifications in a whitepaper [4] for the FSF call for papers [5]. (Intro blog post at [6].)
But if you will read through the paper, you will find that there is rationality in my thoughts. I just happen to think this is a fight worth taking. Thus, the emotion.
> In reality it will take you a long time to get a result, and if enough people find Copilot useful (which I suspect they will), legal departments will adapt to that risk calculus and just figure out the cost of blowing or buying you off in the event that their developers carelessly infringe.
"Buying me off" would include checking that Copilot didn't output my code, and if it did, to follow the license. I'm not sure they would like the added work to use something that is supposed to save work on the easiest part of programming. But even if they did, I would be satisfied.
And that points to another part of my "thought process": the reason that I think I've got a chance is because I think the "reward" side of the risk/reward calculus is not very high with Copilot because it is the easiest part of programming.
Almost everything in programming is harder than writing boilerplate, and as I said in another comment [7], I think there are still better ways of reducing boilerplate. In fact, the language I am working on is designed to help with that. So my perception, which I acknowledge could be wrong, is that the reward for using Copilot is not high, which means I may not have to raise the risk level much for people to change their minds about it.
But the most important point would be to make legal departments and courts recognize that copyright still has teeth, or rather, argue well enough to convince people of that fact, despite what GitHub is saying.
> If it sufficiently improves industrial productivity, it will become established while you're trying to litigate and afterwards people will just avoid crossing the threshold of infringement.
This would be a win in my book too. I am going to be the first person to write boilerplate code in my language, which means that anyone who writes in this language will be "copying" me. I don't care about the boilerplate, though; they can copy that as much as they want.
> Honestly, this exchange makes me glad that I don't publish software and thus don't care about license conditions on a day to day basis.
I feel you on that. The only reason I do is because I feel like my future customers deserve the blueprints to the software they are using the same way the buyers of a building deserve to get the building's blueprints from the architect. If I did not have that opinion, I would probably not publish either.
[1]: https://gavinhoward.com/2021/07/poisoning-github-copilot-and...
[2]: https://yzena.com/yzena-network-license/#frequently-asked-qu...
[3]: https://news.ycombinator.com/item?id=27714418
[4]: https://gavinhoward.com/uploads/copilot.pdf
[5]: https://news.ycombinator.com/item?id=27998109
[6]: https://gavinhoward.com/2021/10/my-whitepaper-about-github-c...
[7]: https://news.ycombinator.com/item?id=29019777
Edit: Clarification and fix typo.
Yep: https://github.com/search?p=1&q=evil+floating+point+bit+leve...
> Quite sure the FSF would be perfectly fine with that.
I believe the person republishing GCC code under MIT would be liable.
Also, I'm not recommending that you use code you know has been incorrectly licensed. Just that in cases where certain "folk code" is seemingly widely available under permissive terms, Copilot isn't doing much that an honest human wouldn't.
A better example against Copilot would be trying to get it to regurgitate some code that has a simple known origin and is always under a non-permissive license.
TLDR; the subtleness of its wrongness destroys my ability to follow my train of thought. I always had to take myself out of my train of thought and evaluate the correctness of the suggestion instead of just writing.
For simple things, intellisense, autocomplete and snippets are far more effective.
For anything more complex, I already know what I want to write.
For exploratory stuff I RTFM
copilot was ineffective at every level for me.