https://www.metaculus.com/questions/3479/date-weakly-general...
For example, Google's image search results pre-tweaking had some interesting thoughts on what constitutes a professional hairstyle, and that searches for "men" and "women" should only return light-skinned people: https://www.theguardian.com/technology/2016/apr/08/does-goog...
Does that reflect reality? No.
(I suspect there are also mostly unstated but very real concerns about these being used as child pornography, revenge porn, "show my ex brutally murdered" etc. generators.)
You're telling me those are all the most non-professional hairstyles available? That this is a reasonable assessment? That fairly standard, well-kept, work-appropriate curly black hair is roughly equivalent to the pink-haired, three-foot-wide hairstyle that's one of the only white people in the "unprofessional" search?
Each and everyone of them is less workplace appropriate than, say, http://www.7thavenuecostumes.com/pictures/750x950/P_CC_70594... ?
I mean a good example of this is the Pulse[0][1] paper. You may remember it as the white Obama. This became a huge debate and it was pretty easily shown that the largest factor was the dataset bias. This outrage did lead to fixing FFHQ but it also sparked a huge debate with LeCun (data centric bias) and Timnit (model centric bias) at the center. Though Pulse is still remembered for this bias, not for how they responded to it. I should also note that there is human bias in this case as we have a priori knowledge of what the upsampled image should look like (humans are pretty good at this when the small image is already recognizable but this is a difficult metric to mathematically calculate).
It is fairly easy to find adversarial examples, where generative models produce biased results. It is FAR harder to fix these. Since this is known by the community but not by the public (and some community members focus on finding these holes but not fixing them) it creates outrage. Probably best for them to limit their release.
[0] https://arxiv.org/abs/2003.03808
[1] https://cdn.vox-cdn.com/thumbor/MXX-mZqWLQZW8Fdx1ilcFEHR8Wk=...
# whois appspot.com
[Querying whois.verisign-grs.com]
[Redirected to whois.markmonitor.com]
[Querying whois.markmonitor.com]
[whois.markmonitor.com]
Domain Name: appspot.com
Registry Domain ID: 145702338_DOMAIN_COM-VRSN
Registrar WHOIS Server: whois.markmonitor.com
Registrar URL: http://www.markmonitor.com
Updated Date: 2022-02-06T09:29:56+0000
Creation Date: 2005-03-10T02:27:55+0000
Registrar Registration Expiration Date: 2023-03-10T00:00:00+0000
Registrar: MarkMonitor, Inc.
Registrar IANA ID: 292
Registrar Abuse Contact Email: abusecomplaints@markmonitor.com
Registrar Abuse Contact Phone: +1.2086851750
Domain Status: clientUpdateProhibited (https://www.icann.org/epp#clientUpdateProhibited)
Domain Status: clientTransferProhibited (https://www.icann.org/epp#clientTransferProhibited)
Domain Status: clientDeleteProhibited (https://www.icann.org/epp#clientDeleteProhibited)
Domain Status: serverUpdateProhibited (https://www.icann.org/epp#serverUpdateProhibited)
Domain Status: serverTransferProhibited (https://www.icann.org/epp#serverTransferProhibited)
Domain Status: serverDeleteProhibited (https://www.icann.org/epp#serverDeleteProhibited)
Registrant Organization: Google LLC
Registrant State/Province: CA
Registrant Country: US
Registrant Email: Select Request Email Form at https://domains.markmonitor.com/whois/appspot.com
Admin Organization: Google LLC
Admin State/Province: CA
Admin Country: US
Admin Email: Select Request Email Form at https://domains.markmonitor.com/whois/appspot.com
Tech Organization: Google LLC
Tech State/Province: CA
Tech Country: US
Tech Email: Select Request Email Form at https://domains.markmonitor.com/whois/appspot.com
Name Server: ns4.google.com
Name Server: ns3.google.com
Name Server: ns2.google.com
Name Server: ns1.google.comIt's often not worth it to decentralize the computation of the trained model though but it's not hard to get donated cycles and groups are working on it. Don't fret because Google isn't releasing the API/code. They released the paper and that's all you need.
https://twitter.com/joeyliaw/status/1528856081476116480?s=21...
One quote:
> “On the other hand, generative methods can be leveraged for malicious purposes, including harassment and misinformation spread [20], and raise many concerns regarding social and cultural exclusion and bias [67, 62, 68]”
There are two possible ways of interpreting interpreting "gender stereotypes in professions".
biased or correct
https://www.abc.net.au/news/2018-05-21/the-most-gendered-top...
https://www.statista.com/statistics/1019841/female-physician...
This is common in the research PA. People don't want to deal with broccoli man [1].
> We investigated sex differences in 473,260 adolescents’ aspirations to work in things-oriented (e.g., mechanic), people-oriented (e.g., nurse), and STEM (e.g., mathematician) careers across 80 countries and economic regions using the 2018 Programme for International Student Assessment (PISA). We analyzed student career aspirations in combination with student achievement in mathematics, reading, and science, as well as parental occupations and family wealth. In each country and region, more boys than girls aspired to a things-oriented or STEM occupation and more girls than boys to a people-oriented occupation. These sex differences were larger in countries with a higher level of women's empowerment. We explain this counter-intuitive finding through the indirect effect of wealth. Women's empowerment is associated with relatively high levels of national wealth and this wealth allows more students to aspire to occupations they are intrinsically interested in.
Source: https://psyarxiv.com/zhvre/ (HN discussion: https://news.ycombinator.com/item?id=29040132)
There is a Google Colab workbook that you can try and run for free :)
This is the image-text pairs behind: https://laion.ai/laion-400-open-dataset/
It is also available via Hugging Face transformers.
However, the paper mentions T5-XXL is 4.6B, which doesn't fit any of the checkpoints above, so I'm confused.
Like for example the discovery that language models get far better at answering complex questions if asked to show their working step by step with chain of thought reasoning as in page 19 of the PaLM paper [1]. Worth checking out the explanations of novel jokes on page 38 of the same paper. While it is, like you say, all statistics, if it's indistinguishable from valid reasoning, then perhaps it doesn't matter.
It doesn't output it outright, it basically forms it slowly, finding and strengthening more and more finer-grained features among the dwindling noise, combining the learned associations of memorized convolutional texture primitives vs encoded text embeddings. In the limit of enough data the associations and primitives turn out composable enough to suffice for out-of-distribution benchmark scenes.
When you have a high-quality encoder of your modality into a compressed vector representation, the rest is optimization over a sufficiently high-dimensional, plastic computational substrate (model): https://moultano.wordpress.com/2020/10/18/why-deep-learning-...
It works because it should. The next question is: "What are the implications?".
Can we meaningfully represent every available modality in a single latent space, and freely interconvert composable gestalts like this https://files.catbox.moe/rmy40q.jpg ?
Good lord we are screwed. And yet somehow I bet even this isn't going to kill off the they're just statistical interpolators meme.
[1] https://www.deepmind.com/blog/tackling-multiple-tasks-with-a...
Other funding models are possible as well, in the grand scheme of things the price for these models is small enough.
Convolutional filters lend themselves to rich combinatorics of compositions[1]: think of them as of context-dependent texture-atoms, repulsing and attracting over the variations of the local multi-dimensional context in the image. The composition is literally a convolutional transformation of local channels encoding related principal components of context.
Astronomical amounts of computations spent via training allow the network to form a lego-set of these texture-atoms in a general distribution of contexts.
At least this is my intuition for the nature of the convnets.
1. https://microscope.openai.com/models/contrastive_16x/image_b...
TL;DR generative story site creators employ human moderation after horny people inevitably use site to make gross porn; horny people using site to make regular porn justifiably freaked out
Bring your popcorn
I expect that in the practical limit of scale achievable, the regularization pressure inherent to the process of training these models converges to https://en.wikipedia.org/wiki/Minimum_description_length and the correlative relationships become optimized away, leaving mostly true causal relationships inherent to data-generating process.
As a foreigner[], your point confused me anyway, and doing a Google for cultural stuff usually gets variable results. But I did laugh at many of the comments here https://www.reddit.com/r/TooAfraidToAsk/comments/ufy2k4/why_...
[] probably, New Zealand, although foreigner is relative
https://www.wired.com/2016/04/can-draw-bikes-memory-definite...
https://nonint.com/2022/05/04/friends-dont-let-friends-train...
I guess the concern would be: If one of these recipe websites _was_ generated by an AI, the ingredients _look_ correct to an AI but are otherwise wrong - then what do you do? Baking soda swapped with baking powder. Tablespoons instead of teaspoons. Add 2tbsp of flower to the caramel macchiato. Whoops! Meant sugar.
https://www.google.com/search?q=chess+puzzle+mate+in+4&tbm=i...
It would be surprising if AI couldn't do the same search and produce a realistic drawing out of any one of the result puzzles.
2. hentAI automates the process: https://github.com/natethegreate/hent-AI
3. [NSFW] Should look at this person on Twitter: https://twitter.com/nate_of_hent_ai
4. [NSFW] PornHub released vintage porn videos upscaled to 4k with AI a while back. The called it the "Remastured Project": https://www.pornhub.com/art/remastured
5. [NSFW] This project shows the limit of AI-wthout-big-tech-or-corporate-support projects. This project creates female genitalia that don't exist in the real world. Project is "This Vagina Does Not Exist": https://thisvaginadoesnotexist.com/about.html
I would love it.
[1] https://github.com/CompVis/latent-diffusion.git [2] https://imgur.com/a/Sl8YVD5
Naturally there's a python library [1] with some algorithms that are resistant to lossy compression, cropping, brightness changes, etc. Scaling seems to be a weakness though.
For example, what kind of source images are used for the snake made of corn[0]? It's baffling to me how the corn is mapped to the snake body.
[0] https://gweb-research-imagen.appspot.com/main_gallery_images...
[1] https://github.com/nerdyrodent/VQGAN-CLIP.git [2] https://github.com/CompVis/latent-diffusion.git [3] https://imgur.com/a/dCPt35K
I usually consider myself fairly intelligent, but I know that when I read an AI research paper I'm going to feel dumb real quick. All I managed to extract from the paper was a) there isn't a clear explanation of how it's done that was written for lay people and b) they are concerned about the quality and biases in the training sets.
Having thought about the problem of "building" an artificial means to visualize from thought, I have a very high level (dumb) view of this. Some human minds are capable of generating synthetic images from certain terms. If I say "visualize a GREEN apple sitting on a picnic table with a checkerboard table cloth", many people will create an image that approximately matches the query. They probably also see a red and white checkerboard cloth because that's what most people have trained their models on in the past. By leaving that part out of the query we can "see" biases "in the wild".
Of course there are people that don't do generative in-mind imagery, but almost all of us do build some type of model in real time from our sensor inputs. That visual model is being continuously updated and is what is perceived by the mind "as being seen". Or, as the Gorillaz put it:
… For me I say God, y'all can see me now
'Cos you don't see with your eye
You perceive with your mind
That's the end of it…
To generatively produce strongly accurate imagery from text, a system needs enough reference material in the document collection. It needs to have sampled a lot of images of corn and snakes. It needs to be able to do image segmentation and probably perspective estimation. It needs a lot of semantic representations (optimized query of words) of what is being seen in a given image, across multiple "viewing models", even from humans (who also created/curated the collections). It needs to be able to "know" what corn looks like, even from the perspective of another model. It needs to know what "shape" a snake model takes and how combining the bitmask of the corn will affect perspective and framing of the final image. All of this information ends up inside the model's network.Miika Aittala at Nvidia Research has done several presentations on taking a model (imagined as a wireframe) and then mapping a bitmapped image onto it with a convolutional neural network. They have shown generative abilities for making brick walls that looks real, for example, from images of a bunch of brick walls and running those on various wireframes.
Maybe Imagen is an example of the next step in this, by using diffusion models instead of the CNN for the generator and adding in semantic text mappings while varying the language models weights (i.e. allowing the language model to more broadly use related semantics when processing what is seen in a generated image). I'm probably wrong about half that.
Here's my cut on how I saw this working from a few years ago: https://storage.googleapis.com/mitta-public/generate.PNG
Regardless of how it works, it's AMAZING that we are here now. Very exciting!
The harder part here will be getting access to the compute required, but again, the folks involved in this project have access to lots of resources (they've already trained models of this size). We'll likely see some trained checkpoints as soon as they're done converging.
[0] https://creativecloud.adobe.com/discover/article/how-to-use-...
It's been done, starting from plotter based solutions years ago, through the work of folks like Thomas Lindemeier:
https://scholar.google.com/citations?user=5PpKJ7QAAAAJ&hl=en...
Up to and including actual painting robot arms that dip brushes in paint and apply strokes to canvas today:
https://www.theguardian.com/technology/2022/apr/04/mind-blow...
The painting technique isn't all that great yet for any of these artbots working in a physical medium, but that's largely a general lack of dexterity in manual tool use rather than an art specific challenge. I suspect that RL environments that physically model the application of paint with a brush would help advance the SOTA. It might be cheaper to model other mediums like pencil, charcoal, or even airbrushing first, before tackling more complex and dimensional mediums like oil paint or watercolor.