some github.io pages are iframes to the developers home machine or something for a tech demo that can't withstand many users.
But regular github.io static pages ought to be able to withstand millions of users at once.
then, as you like, you can do an "ai zoom out" to get zoomed out pictures, no longer constrained by your lens or distance.
I imagine this to be included relatively soon, just like how panaromas were once a niche thing that became much easier to do with some good ui/ux. pretty much any modern phone can do them without having to struggle with lining up photos and what not.
one thing that does greatly concern me about the demo/site is that they have "authentic" and "recover" as terms. the result here is not authentic nor has anything been "recovered." it's an illusion at best. I personally don't like how they portray the new image as being equivalent as if the lens framed it in the original picture. it's not, as they show themselves in the later portion (near the end) with the text sign. seriously irresponsible framing (pun intended) to what's otherwise very cool tech.
So then you just feed RealFill the 20 pictures you took and your uncle is magically painted in.
This actually feels like it could be an incredibly valuable post-production tool in film and TV, once they get it working consistently across multiple frames.
Not only for more flexibility in "uncropping" after shooting (there was a tree/wall in the way), but this could basically be the holy grail solution for converting 4:3 to widescreen without cutting off content on the top and bottom.
There have been quite a few 4:3-to-widescreen conversions that were done using the original film that was actually shot in widescreen and cropped for TV.
Sometimes, the wider shot makes perfect sense. Sometimes, they keep the original cropped one but cut off top/bottom. Sometimes it's a combination of the two. It all depends on what's being framed -- two people in a car usually benefits from cropping (nobody needs the bottom third of the frame occupied by the car's hood), while a close-up on someone's face usually benefits from extending the sides (otherwise it's an uncomfortable mega-close up that cuts off their mouth).
But having the flexibility to extend horizontally gives you the artistic possibilities.
Probably google can even do this automatically - I would not be surprised if I get suggestions to fix images with cut off buildings via Google Photos in the future! Would be so cool.
The problem I'm solving is converting videos from widescreen to vertical and sometimes you need some extra height.
There's definitely a middle ground here that we perhaps don't have a good word for. E.g. what do we call a painting made by an artist who sat in front of the scene they depicted, vs. a painting made by an artist from their imagination? There's certainly some sense in which the first one was an "authentic" scene.
/jk sorry
My wife and I have been using the Pixel phones since Pixel 6 and we love the camera. Great pictures! But the best features are google photos, auto-tagging, recommending collages, walking down memory lane.
Then you can magic erase tourists from pictures and pic a better shot from a picture you took on the fly....
You add this "authentic image completion" to my kids pics, and it's game over...
I want this on my Pixel 8 asap!
That said, props to them for calling out the limitations so clearly. I really appreciate it when people are up front with the problems like that.
If I buy an "authentic Rolex" and receive a Chinese Rolex clone that's built similarly based on observations of a real Rolex, I'm going to feel scammed and very upset. And I'm much more protective of my memories than I would be of a watch.
Give that a couple generations. “You were at location X and didn’t take a pic. We generated you some selfies, choose one that you like.”
I don’t think either of those things are true. Both can be changed, and are often changed. Much of what we ‘know’ of the past is wrong.
Con: it's from Google so forget about trying it yourself any time soon
I used to be a huge supporter of Google's products, now the name is an instant red flag.
ultimately we oughta think about what we are referring to. if we are talking about a photograph taken by someone, the authenticity is ultimately coming from the combination of the photograph and camera used. so when you think of a genuine photo in this scenario you expect it to be fundamentally taken by the user by a particular camera to create a particular photograph. you can use devices to take a photo without pressing the button, such as a timer, but the photograph and camera are both fundamental to the authenticity of the image. if the camera is no longer entirely involved in the generation of the photograph I would say that it is no longer genuine.
Reference driven as described in the article is more appropriate, but alas it is verbose. normally such pedantry bores me, but in this case it's pretty tantamount to what it is being presented in this case.
Two anecdotes:
1. A friend of mine met his favourite author (traveled from one continent to another for a signing event). When he shaked hands with the author, a friend took a photo. A lady (still hated by us!) step in the middle, and blocked the photo. Maybe an IA or a talented person could remove her, use a footage photo of the author and rebuild the photo... but why? What's the purpose of that?
2. A few months ago during the pandemic I scanned all the printed pictures of my grand parents with my phone. Aftre scanning like 200s, I checked one and I zoomed in: the stupid app applied some IA to make it better and it just was worse. I don't care if it looks better for the untrained eye: my grandparents didn't look like that. I now have stupid horrible verson of the scanned photos, where my grand parents appear with smooth skin and weird eyes.
At least that's what comes to mind with the things I know you can run offline.
Not even a discussion about if this might be harmful or what the risks are or anything, just plain old "THIS FAKE MOMENT WAS REAL AND YOU'LL BELIEVE IT"?!
I really have a hard time with this. Wow I'm upset, more than I expected. The tech is fine yeah but the marketing is just deeply upsetting.
A different angle, if a friend had painted the encounter instead, it wouldn't be exact but it would be a snapshot of a memory.
I'm not hugely arguing in favour of it but I think there's different scales here, from cameras doing "merge pictures half a second apart so people have their eyes open" to "totally change their face".
Facebook: Great. I'd be happy to. Any more detail you'd like to add?
Me: Make us look attractive. Show that we're a having a great time. Also, we went to see the Chatham Lighthouse.
Facebook: OK, done!
...
Facebook: You've received 48 likes. Your mother would like to know if you had any salt water taffy.
Me: Yes, and please create a picture of my oldest daughter having trouble chewing it.
Facebook: Done.
No more so than "virtual," which used to mean "true." Or "literal" which used to be the opposite of "figurative." It's just another word being used auto-autonymically.
Definitio fugit.
But I think the real value -- and this is definitely in Google's favor -- is providing this functionality for photos you have taken in the past.
I have probably 30K+ photos in Google Photos that capture moments from the past 15 years. There are quite a lot of them where I've taken multiple shots of the same scene in quick succession, and it would be fairly straightforward for Google to detect such groupings and apply the technique to produce synthesized pictures that are better than the originals. It already does something similar for photo collages and "best in a series of rapid shots." They surface without my having to do anything.
You might be getting a bit confused because here the training process has to happen every time you use it, whereas in most AI applications you only perform inference for actual use.
Everyon will be able to make all of the other fakes on social media jealous with ease.
I'd love to see a combo of this Google tech and AI upscaling do the same for Babylon 5. They had shot the actors in widescreen format, but the CGI spaceships were only rendered in 4:3 and the files have been lost.
Facebook: I'd be happy to. Are there any more details you'd like to include?
me: Please show how he didn't understand me at first, but then he looks at me and starts crying with love and regret.
Facebook: Done. Your relationship with your father must have been deeply fulfilling.
Regardless, I'm pretty sure "reconstructed" it the honest word to use.
They also need to be very, very careful when introducing capability to falsify photographic images convincingly.
Using the term "authentic" for this (and how do they even know what's an authentic memory?) doesn't sound like being very, very careful. It sounds like being gratuitously reckless.
He even has a picture up of him from his wedding day… standing alone in a tux.
Call it "realistic". Words matter.
(Disclaimer: work for Google but have nothing to do with this project.)
Literally is often used in a sarcastic context. That sarcasm depends on the word meaning what it means.
Intentional
Contextual
Everything about this project goes against the meaning of authenticity.
This and the new demos I saw from WhatsApp's new demo around persona-based AI can really alter someone's perception and memories. I don't think we are considering how it can really impact our understanding of our feelings, perception, memories and mindfulness.
If you take a picture of reality and alter it with Gen AI to do something else and change the moment, what is the new reality? After a while, we might question if it was real or not, and then that might just become the new reality.
In my opinion, GenAI is truly transformational as well as scary, as it can alter our perception. I wonder if anyone else feels this way.
For private pictures, it didn't change your reality, you can lie to yourself, but you've always been able to do that.
I disagree with lying to yourself. For people who are not mindful and aware, this is severely impact their perception.
Where will we be 10 years from now? 50?
I mean, do you not look at the photo after you take it? Even if you don't, you were there and saw the original scene. If your memory fails you, it's on you. If you didn't take an accurate picture, it's on you. Check next time.
If anything meaningful is added, it'll be very noticeable, if it's not meaningful, then what does it matter?
Cameras already do a lot of corrections that don't represent reality.
Hell, our perceptions of colors is different than everyone else's.
It was supposed to adjust eyes to open them if you took multiple photos.
https://www.youtube.com/watch?v=-a583U3Sw44
There's also leaks showing another feature where you can individually swap every person's face to get the perfect photo:
https://www.ign.com/articles/google-pixel-8-leaked-video-ai-...
I definitely agree, Pixel has been at the forefront of computation photography and editing since its inception. Things like night photography that we take for granted now, I remember when Pixel 2 first introduced it and it was honestly mind blowing. this use of computation photography and editing that
I don't know if that's more or less creepy than the AI stuff...
They do take up a lot of space, and just today I asked in photo.stackexchange for backup compression techniques that can exploit inter-image similarities: https://photo.stackexchange.com/questions/132609/backup-comp...
I’ve done this manually in Photoshop more times than I can count.
Usually more automated solutions only hold up to light scrutiny, but that’s rapidly changed in the past year. I’m sitting after this year and I’m a little miffed about it. Oh well.
So your memory is probably better than mine. :)
I just remember some demo of a family shot and it automatically opening a little boys eyes by using another photo. And another auto combining of images so that you could take a lot of photos of a busy tourist place and automatically remove all the people.
A box that takes your gps location, weather, etc and autogenerates a photo from your PoV.
> In exchange of a small fee and a 35 minutes suggestion session, get you and your family implanted with memories of a beautiful vacation that'll last you for a lifetime for fraction of the cost of an actual one.
Some will say "but that isn't a real photo of what was there", but our memories of what was in a photo or a scene aren't perfect anyway.
We expect images that look like photographs — at least when taken by amateurs — to be the result of a documentary process, rather than an artistic one. They might be slightly filtered or airbrushed, but they won't be put together from whole cloth.
But amateur photography is actually the outlier, in the history of "capturing memories"!
If you imagine yourself before the invention of photography, describing your vacation to an illustrator you're commissioning to create a some woodblock-print artwork for a set of christmas cards you're having made up, the conversation you've laid out here is exactly how things would go. They'd ask you to recount what you saw, do a sketch, and then you'd give feedback and iterate together with them, to get a final visual down that reflects things the way you remember them, rather than the way they were, per se.
This has always been the case, you just don't remember it, and the (human) hallucinated details are usually just not important enough to care about.
...that, and other thoughts I have while baked.
Wouldn't an operation like this require some kind of fine-tuning? Or do diffusion models have a way of using images as context, the way one would provide context to an LLM?
https://www.reddit.com/r/StableDiffusion/comments/16uqqrh/ho...
Unfortunately lossless video compression won't help here as it will compress frames individually for lossless.
Iirc it was Bzip2 that bumped that up to 1MB, and there are now compressors with larger windows - but files have also grown, it’s not a solved problem for compression utilities.
It is solved for backup - but, reatic, and a few others will do that across a backup set with no “window size” limit.
…. And all of that is only true for lossless, which does not include images or video.
Indeed, people viewing photographs have always been able to be manipulated by presentation as fact something that is not true -- you dress up smart, in borrowed clothes, when you're really poor; you stand with a person you don't know to indicate association; you get photographed with a dead person as if they're alive; you use a back drop or set; et cetera.
FB AI, make a series of posts about me climbing mount everest, meeting dalai lama, curing cancer, bringing peace to ukraine, changing my name to Melon Tusk, announcing running for president and adopting a dog named Molly
Interesting to be how it illustrates philosophical questions on the nature of reality, the projection of personality, the 'problem of other minds', and such.
I hesitate to say it, but a blockchain is probably part of the solution.
You got to shoot for something just attainable enough to sound credible, while still being at the "enviable" end of the spectrum.
"FB AI, make a series of pictures of my first 3 months at Goldman Sachs in 2021. Include me shaking hands with the VP of software as I receive a productivity award for making them $1m in a week. Include a group photo of me and 12 other people (all C execs and my VP must be there). Crosspost all to LinkedIn, with notifications muted."
"Ok done"
"ChatGPT, take my existing CV and replace entries from 2021 onwards with a job as Head of Performance Monitoring at Goldman Sachs, reporting to VP of software. Include several projects with direct CEO and CFO involvement. Crosspost changes to LinkedIn."
"Ok done"
... and now I can go job-hunting.
Could it be possible that jpg also exploits the repetition at the wavelength of the width of a single picture, so to say? E.g. 4 pictures side-by-side with the same black dot in the center, can all 4 dots be encoded with a single sine wave (simplifying a lot here..) that has peaks at each dot?
You give it a bunch of reference images, then another image with some rectangle removed, and it will fill in the rectangle with information from the reference image.
It's like re-coloring an old black and white photo, or photoshopping out a photo bomber from the background.
The current best theory and understanding of the evolution of the universe is that it will reach maximum entropy (heat death). There is no cycling when this happens. Can you cite what theory or new discovery you have come across that somehow challenges the heat death hypothesis?
I think a use case for AI image manipulation could be more like if I need a picture where I'm poor but wearing smart borrowed clothes, standing with an unassociated associate and a dead alive, with a backdrop, with the only source image beimg selfie of someone else that incidentally caught half of me way in the background
The intent or use cases for these two (lacking a better term) manipulators aren't orthogonal here. The purpose of AI image generation is, well, images generated by AI. It could technically generate images that misrepresent info, but that's more of a side effect reached in a totally different way than staging a scene in an actual photo. It seems like using manipulation to stage misleading photos would be used primarily for the purpose of deceptive activities or subversive fuckery.
Which won't involve much sitting at all, other than on those weekends I'll now be getting off.