zlacker

Spinning Up in Deep RL

submitted by samroh+(OP) on 2020-08-17 05:24:18 | 205 points 50 comments
[view article] [source] [go to bottom]

NOTE: showing posts with links only show all posts
◧◩
3. ipsum2+a5[view] [source] [discussion] 2020-08-17 06:36:50
>>_5659+B3
Ben Barry used to do a lot of the design, looks like he left to start his own firm now: https://nonlinear.co/openai
9. kakadz+N7[view] [source] 2020-08-17 07:10:45
>>samroh+(OP)
If you want to play around with Spinning Up in a Docker container, then make sure you git clone the repository, then pip install -e repository. For whatever reason, if you try to directly install it with pip, it doesn't work, at least last time I tried it. Here's a Dockerfile and docker-compose.yaml I created some time ago: https://github.com/joosephook/spinningup-dockerfile
11. svalor+2a[view] [source] 2020-08-17 07:33:44
>>samroh+(OP)
If you are ever interested in the topic of RL, but wish to start learning the concepts on simpler algorithms and keep the "deep" part for later, I maintain a library that has most of the same design goals:

https://github.com/Svalorzen/AI-Toolbox

Each algorithm is extensively commented, self-contained (aside from general utilities), and the interfaces are as similar as I could make them be. One of my goals is specifically to help people try out simple algorithms so they can inspect and understand what is happening, before trying out more powerful but less transparent algorithms.

I'd be happy to receive feedback on accessibility, presentation, docs or even more algorithms that you'd like to see implemented (or even general questions on how things work).

12. janhen+ha[view] [source] 2020-08-17 07:36:13
>>samroh+(OP)
I enormously appreciate the resources OpenAI provides to start out in DRL such as this one. However, OpenAI has (purposely?) left out the brittleness of their algorithms to parameter choice and code-level optimizations [1] in the past. As a researcher myself, I would be more than surprised to hear that OpenAI did not explore this behaviour themselves. Instead, my guess would be that these "inconveniences" would do harm to the Marketing of OpenAI and its algos. Such deeds are far more harmful to proper understanding of DRL and applications than a nice UI is beneficial imo.

[1]https://gradientscience.org/policy_gradients_pt1/

◧◩◪◨
15. 317070+Tc[view] [source] [discussion] 2020-08-17 08:04:14
>>Davidz+p7
All the blogs posted by e.g. this user [0] were generated by GPT-3. [1] Some of those reached the top of HN.

That comment indeed looks a lot like it is generated. It has correlated a bunch of words, but it did not understand that the link between UI and AI is tenuous. It is probably one of the few comments where it is so glaringly obvious. There are likely a lot more comments around which are generated but which went unnoticed.

This comment is not generated, as the links below are dated after the GPT-3 dataset was scraped.

[0] https://news.ycombinator.com/submitted?id=adolos

[1] https://adolos.substack.com/p/what-i-would-do-with-gpt-3-if-...

16. Gnarly+Nd[view] [source] 2020-08-17 08:14:55
>>samroh+(OP)
Plug for the RL specialization out of the University of Alberta, hosted on coursera: https://www.coursera.org/specializations/reinforcement-learn... All courses in the specialization are free to audit.

For those unaware, the university of Alberta is Rich Sutton's home institution, and he approves of and promotes the course.

◧◩
34. unoti+GZ[view] [source] [discussion] 2020-08-17 15:19:43
>>plants+UG
> Are there any resources this comprehensive for any other field of study? ... I was specifically interested in biotech

I recommend the FastAI course on deep learning. Several of their lectures relate to things their students have done in biotech and medical. The main lecturer Jeremy Howard has worked for years at the crossroads of medicinal technology and AI, and routinely discusses this.

The full fastai course is here[1] and free. Here is a blog post and associated video[2] as an example of fastai incorporating biotech into their work. In this example they use AI to upsample the resolution and quality of microscopes.

[1] https://www.fast.ai/

[2] https://www.fast.ai/2019/05/03/decrappify/

◧◩◪◨
37. arthur+f21[view] [source] [discussion] 2020-08-17 15:39:10
>>Davidz+p7
https://www.youtube.com/watch?v=TxHITqC5rxE
◧◩
38. blahbl+P31[view] [source] [discussion] 2020-08-17 15:50:23
>>mement+921
I would highly recommend Sergey Levine's course:

http://rail.eecs.berkeley.edu/deeprlcourse/

For a more mathematical treatment, there's a beautiful book by Puterman:

https://www.amazon.com/Markov-Decision-Processes-Stochastic-...

◧◩
39. ampdep+f81[view] [source] [discussion] 2020-08-17 16:14:57
>>mement+921
I recommend Sutton and Barto

http://www.incompleteideas.net/book/the-book-2nd.html

◧◩◪◨⬒⬓
41. dang+Hm1[view] [source] [discussion] 2020-08-17 17:34:33
>>shawnz+Jy
https://news.ycombinator.com/item?id=24165171
◧◩◪◨⬒
42. dang+Om1[view] [source] [discussion] 2020-08-17 17:34:44
>>317070+Tc
That story is bogus. See https://news.ycombinator.com/item?id=24165040 and https://news.ycombinator.com/item?id=24063832.

In a way the real story is that people are so eager to believe it that it didn't matter that it was untrue. Like Voltaire's God, if it didn't exist it was necessary to invent it.

◧◩
44. dang+Aq1[view] [source] [discussion] 2020-08-17 17:50:16
>>_5659+B3
I got an email asking what's HN's policy on GPT-3 generated comments. I think it's covered by the fact that we don't allow bots on HN (https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...). The purpose of the the threads is human conversation.

Obviously there's an infatuation right now with GPT-3. That's normal. If people keep posting these without disclosing them, I imagine there will be two consequences. One (good for HN) is readers scrutinizing comments more closely and raising the bar for what counts as a high-quality comment. The other (bad for HN) is readers accusing each other of posting generated comments.

Accusing other commenters of being bots is not a new phenomenon ("this sounds like it was written by a Markov chain" has long been an internet swipe) but if it gets bigger, we might have to figure something out. But first we should wait for the original wave of novelty to die down.

47. cbHXBY+Ey1[view] [source] 2020-08-17 18:33:05
>>samroh+(OP)
There was a discussion on r/datascience this weekend about if anyone uses RL. Almost no one does.

https://www.reddit.com/r/datascience/comments/iav3lv/how_oft...

50. flooo+eu3[view] [source] 2020-08-18 12:11:47
>>samroh+(OP)
RL, including contextual bandits, is becoming more popular for personalization, i.e. adapting some system to the preferences of (groups of) individuals.

Plug/Source: I did a lit. review on this topic https://doi.org/10.3233/DS-200028

[go to top]