zlacker

[parent] [thread] 28 comments
1. c7DJTL+(OP)[view] [source] 2021-10-27 20:23:14
Bit of a dodgy way to form query parameters though. Other than for a quick script.
replies(8): >>relati+c3 >>Matthi+h5 >>jtsisk+86 >>lijogd+s7 >>sillys+a9 >>pugets+rb >>ljm+Bj >>hannia+eB
2. relati+c3[view] [source] 2021-10-27 20:41:51
>>c7DJTL+(OP)
How so?
replies(6): >>foepys+q4 >>Jon_Lo+P4 >>fireba+U4 >>dastbe+W4 >>cmckn+65 >>gen220+g5
◧◩
3. foepys+q4[view] [source] [discussion] 2021-10-27 20:48:40
>>relati+c3
It has no escape logic. Okay for scripts, as GP stated, very bad for production code.
◧◩
4. Jon_Lo+P4[view] [source] [discussion] 2021-10-27 20:50:12
>>relati+c3
code lacks context sensitive escaping

  api_path = base_url + urllib.parse.urlencode({
    'action': action,
    'format': letThisBeVariable,
    ...
    'gscoord': str(latitude.value) + '|' + str(longitude.value)
  })
see: https://docs.python.org/3/library/urllib.parse.html#urllib.p...

Mantra: when inserting data into a context (like an url) escape the data for that context.

◧◩
5. fireba+U4[view] [source] [discussion] 2021-10-27 20:50:42
>>relati+c3
Concatenating strings for example. As shown, it's the query string equivalent of sql injection.

Use something like URLBuilder, or URIParams, or whatever your platform supports. Don't use string concatenation ever, if at all possible, and if not possible (wtf?), then at least escape strings.

◧◩
6. dastbe+W4[view] [source] [discussion] 2021-10-27 20:50:49
>>relati+c3
the "nice" way of doing this would would be to create a list of your stringified arguments, mapped urlencoding over them, and then join them with the parameter separator. this ends up being resilient to someone adding something that ends up being incorrect, and makes explicit in the code what you're trying to do.
◧◩
7. cmckn+65[view] [source] [discussion] 2021-10-27 20:51:39
>>relati+c3
I usually try to avoid working with URLs as bare strings like this, both for readability and correctness (URL encoding is tricky). With ‘requests’ you can do something like pass a dictionary of your query params and it takes care of forming the actual request URL.

https://docs.python-requests.org/en/latest/user/quickstart/#...

◧◩
8. gen220+g5[view] [source] [discussion] 2021-10-27 20:52:21
>>relati+c3
It's much safer (i.e. fewer degrees of freedom for bugs to appear) to use f-strings in situations like this one.

One correlated but ancillary benefit, is that there are fewer variables to simulate the state for in your brain, while you're reading the code. You don't have to wonder if a variable is going to change on you, in-between when it is initialized and when it is used.

It's safer still to use a library (e.g. urllib3) that does encoding for you (allowing you to omit magic strings like `"%7C"` from the logic of this function alltogther).

Like GP said, very handy for one-off scripts or areas of your codebase where quality is "less important". I may be pedantic, but I wouldn't give this a pass on code review.

9. Matthi+h5[view] [source] 2021-10-27 20:52:24
>>c7DJTL+(OP)
I'm not against "copying" code. I just looked up "python build url query" The first link describes the `urllib.parse. urlencode` function which takes a dict.

So I would build the query like so:

    from urllib.parse import urlencode
    urlencode({
        "action": "query",
        "format": "json",
        ...
        "gscoord": f"{str(latitude.value)}|{str(longitude.value)}",
    })
I think this is orders of magnitude clearer code. But that's a parameter that's subjective that CoPilot can't adjust for (although it can be better).
replies(2): >>lambda+k9 >>e0a74c+Ci
10. jtsisk+86[view] [source] 2021-10-27 20:56:44
>>c7DJTL+(OP)
Even for a quick script this worries me about copilot; if it suggests this, then more people use it and think this is right, commit it, and then copilot suggests this more - that’s a bad feedback loop. At least in StackOverflow you get someone commenting why it’s bad and showing how to use a dictionary instead
replies(1): >>omegal+zM
11. lijogd+s7[view] [source] 2021-10-27 21:02:44
>>c7DJTL+(OP)
How so? I'd prefer a proper structured library, is that what you mean? If so, the Copilot code actually seems not dodgy - because the author _started_ with `base = ...` , indicating that they were string formatting the params.

Or did you mean something else?

12. sillys+a9[view] [source] 2021-10-27 21:11:51
>>c7DJTL+(OP)
Speaking as a former pentester, this is a fine way to form query params in this specific case, if lat and long are floats.

They're the only data you can control, and unless they're strings, it's useless for exploitation. Even denormal floats / INF / NAN won't help achieve an objective.

I broadly agree with you, but people are pummeling Copilot for writing code that I saw hundreds of times. Yes, sometimes I was able to exploit some of that code. But the details matter.

replies(2): >>boredt+jn >>thrash+En
◧◩
13. lambda+k9[view] [source] [discussion] 2021-10-27 21:12:41
>>Matthi+h5
This. Code should be optimized for reading, I think this kind of code is OK for exploratory stuff, but needs to be rewritten later.
replies(1): >>snicke+sh
14. pugets+rb[view] [source] 2021-10-27 21:25:26
>>c7DJTL+(OP)
It probably does suck! I’m not very experienced, and I was just whipping up something quick to test if my random MSFS2020 mod idea could work.
◧◩◪
15. snicke+sh[view] [source] [discussion] 2021-10-27 22:08:24
>>lambda+k9
Well. Code should be optimized first for correctness, and simple string concatenation will not work for URL params.
replies(1): >>odonne+Xn
◧◩
16. e0a74c+Ci[view] [source] [discussion] 2021-10-27 22:18:09
>>Matthi+h5
I'm surprised no one has suggested using `requests` considering how easy, safe and readable it is:

    >>> import requests, pprint
    >>> 
    >>> 
    >>> url = "https://en.wikipedia.org/w/api.php"
    >>> resp = requests.get(
    ...     url, 
    ...     params=dict(
    ...         action="query",
    ...         list="geosearch",
    ...         format="json",
    ...         gsradius=10000,
    ...         gscoord=f"{latitude.value}|{longitude.value}"
    ...     )
    ... )
    >>> 
    >>> pprint.pprint(resp.json())
    {'batchcomplete': '',
     'query': {'geosearch': [{'dist': 26.2,
                              'lat': 37.7868194444444,
                              'lon': -122.399905555556,
                              'ns': 0,
    ...
replies(1): >>thamer+xn
17. ljm+Bj[view] [source] 2021-10-27 22:27:27
>>c7DJTL+(OP)
Copilot may be the first of many computer dictionaries and thesauruses.

Oxford English Dictionary, for example, is a human version of defining language and a thesaurus is a completion engine.

Human language didn't suffer by having a dictionary and thesaurus. Computer language doesn't suffer either.

◧◩
18. boredt+jn[view] [source] [discussion] 2021-10-27 22:59:12
>>sillys+a9
If the example code is everything that Copilot generated, there's no guarantee that lat or long are floats and that seems to be an implementation detail left to the user.

Isn't that a pretty big risk though? Specifically, that people will use co-pilot recommendations "as-is" and give little thought to the actual workings of the recommendation?

After all, if you have to intimately understand the code it's recommending are you really saving that much time over vetting a Googled solution yourself?

◧◩◪
19. thamer+xn[view] [source] [discussion] 2021-10-27 23:00:27
>>e0a74c+Ci
For what it's worth, Copilot can do it.

I typed the following prompt:

    def search_wikipedia(lat, lon):
        """
        use "requests" to do a geosearch on Wikipedia and pretty-print the resulting JSON
        """
And it completed it with:

    r = requests.get('https://en.wikipedia.org/w/api.php?action=query&list=geosearch&gsradius=10000&gscoord={0}|{1}&gslimit=20&format=json'.format(lat, lon))
    pprint.pprint(r.json())
replies(3): >>odonne+Tn >>esjeon+px >>grenoi+og1
◧◩
20. thrash+En[view] [source] [discussion] 2021-10-27 23:01:27
>>sillys+a9
But I would still never not escape the params because you don’t know how that code will change one day or where it will end up, and chances are that you won’t remember to fix it later if you don’t fix it now.

We just had a major failure at work recently because someone decided to not decode URL params and their code worked fine for years because it never mattered… until it did.

Just do it right. It’s so easy. Why risk yourself a ton of headache in the future to save you a few seconds?

◧◩◪◨
21. odonne+Tn[view] [source] [discussion] 2021-10-27 23:03:07
>>thamer+xn
That doesn't exactly do what the guy above you was talking about, though.
◧◩◪◨
22. odonne+Xn[view] [source] [discussion] 2021-10-27 23:03:42
>>snicke+sh
It'll certainly work, just seems sloppy.
replies(1): >>bennyg+Pt
◧◩◪◨⬒
23. bennyg+Pt[view] [source] [discussion] 2021-10-27 23:38:01
>>odonne+Xn
Plenty of edge cases there (e.g. url encoding), but I don't want to preach to the choir and rabbit hole on this minor detail.
◧◩◪◨
24. esjeon+px[view] [source] [discussion] 2021-10-28 00:01:54
>>thamer+xn
It's like a junior dev who doesn't quit unnecessary code golfing. Somehow the AI is more comfortable with string-based URL manipulation, which is a straight anti-pattern.
replies(1): >>disgru+2I1
25. hannia+eB[view] [source] 2021-10-28 00:40:19
>>c7DJTL+(OP)
As a noob, what's the issue with this?
replies(1): >>lolind+sJ
◧◩
26. lolind+sJ[view] [source] [discussion] 2021-10-28 01:57:33
>>hannia+eB
It's harder to read than other methods, and it doesn't encode the URL parameters, which means it potentially produces an invalid URL, and in some cases could lead to security problems (similar to SQL injection).
◧◩
27. omegal+zM[view] [source] [discussion] 2021-10-28 02:28:08
>>jtsisk+86
I think they only pick starred repos not truly in the wild code. That's not a guarantee of good code but still a decent check.
◧◩◪◨
28. grenoi+og1[view] [source] [discussion] 2021-10-28 07:49:27
>>thamer+xn
That's what the rest of the thread is complaining about, it's still slapping the strings in there with basic formatting. No different than the top level approach.
◧◩◪◨⬒
29. disgru+2I1[view] [source] [discussion] 2021-10-28 12:22:40
>>esjeon+px
Presumably because that's what it's seen in the training data. Remember, it doesn't care about what the code does, it's just doing a search for similar looking code.
[go to top]