RealFill: Image completion using diffusion models

>>flavor+(OP)
These github.io pages always go down once they hit the front page.

replies(3): >>london+09 >>Square+h9 >>a1o+Oe

>>lopati+o8
Works fine for me...

some github.io pages are iframes to the developers home machine or something for a tech demo that can't withstand many users.

But regular github.io static pages ought to be able to withstand millions of users at once.

replies(2): >>lopati+P9 >>callal+AT

>>flavor+(OP)
an interesting use case for this once the compute is there is to basically allow for ai powered digital zoom-out. it could work by instructing the user to take several pictures around the target, and then you take regular pictures of your subject.

then, as you like, you can do an "ai zoom out" to get zoomed out pictures, no longer constrained by your lens or distance.

I imagine this to be included relatively soon, just like how panaromas were once a niche thing that became much easier to do with some good ui/ux. pretty much any modern phone can do them without having to struggle with lining up photos and what not.

one thing that does greatly concern me about the demo/site is that they have "authentic" and "recover" as terms. the result here is not authentic nor has anything been "recovered." it's an illusion at best. I personally don't like how they portray the new image as being equivalent as if the lens framed it in the original picture. it's not, as they show themselves in the later portion (near the end) with the text sign. seriously irresponsible framing (pun intended) to what's otherwise very cool tech.

replies(3): >>gs17+ah >>richar+7i >>satvik+iU

>>lopati+o8
It appears to work ok and I've never witnessed problems with a static content on github pages.

>>dang+it
I do not agree with their usage of the word "Authentic".

replies(2): >>emoden+ub >>crazyg+ic

>>london+09
I see, I think they're just broken for me in general.

replies(1): >>london+Dr

>>flavor+(OP)
"Comparison with Baselines" is shocking

>>flavor+(OP)
Wow. The use case that comes to mind for me is when you take a big family photo (or 20) and someone inevitably ends up cut-off by accident.

So then you just feed RealFill the 20 pictures you took and your uncle is magically painted in.

replies(2): >>jetrin+Fa >>xwdv+qd

>>flavor+(OP)
Wow.

This actually feels like it could be an incredibly valuable post-production tool in film and TV, once they get it working consistently across multiple frames.

Not only for more flexibility in "uncropping" after shooting (there was a tree/wall in the way), but this could basically be the holy grail solution for converting 4:3 to widescreen without cutting off content on the top and bottom.

replies(4): >>emoden+ya >>qingch+6c >>waynen+Gc >>miahwi+5M

>>crazyg+fa
I can see it working great for some stuff but wouldn't you ultimately face the issue with more artistic work that the framing might not be very good if just artificially extending.

replies(1): >>crazyg+ib

>>Jorge1+3a
Also getting everyone smiling with their eyes open at the same time. Phone cameras could record a group photo for five or ten seconds and use the best expression from different times for each person.

replies(4): >>emoden+gb >>patapo+Bb >>twism+Kb >>lazyco+Ff

>>jetrin+Fa
Pixel phones already have some features kind of like this so it makes sense.

>>emoden+ya
It definitely needs to be applied judiciously on a shot-by-shot basis.

There have been quite a few 4:3-to-widescreen conversions that were done using the original film that was actually shot in widescreen and cropped for TV.

Sometimes, the wider shot makes perfect sense. Sometimes, they keep the original cropped one but cut off top/bottom. Sometimes it's a combination of the two. It all depends on what's being framed -- two people in a car usually benefits from cropping (nobody needs the bottom third of the frame occupied by the car's hood), while a close-up on someone's face usually benefits from extending the sides (otherwise it's an uncomfortable mega-close up that cuts off their mouth).

But having the flexibility to extend horizontally gives you the artistic possibilities.

>>aga98m+B9
Perhaps verisimilar then.

>>jetrin+Fa
Or you take a single picture of a group in front of a monument, but cut it off. As I understand it you could find pictures of the monument online, run the model, and have a picture with the group and the entire monument.

Probably google can even do this automatically - I would not be surprised if I get suggestions to fix images with cut off buildings via Google Photos in the future! Would be so cool.

>>jetrin+Fa
From the leaks this may be coming to the Pixel 8

replies(2): >>twism+zc >>ChrisC+qD

>>crazyg+fa
I already use Photoshop Generative Fill for uncropping videos, but it only works for fixed camera shots. Photoshop just added feature where you can just drag the video file in and do the uncrop in one step.

The problem I'm solving is converting videos from widescreen to vertical and sometimes you need some extra height.

replies(2): >>jiggaw+gE >>kennya+Zu1

>>aga98m+B9
The point is that it's not based on hallucination -- it's generated out of the authentic details provided from other images.

There's definitely a middle ground here that we perhaps don't have a good word for. E.g. what do we call a painting made by an artist who sat in front of the scene they depicted, vs. a painting made by an artist from their imagination? There's certainly some sense in which the first one was an "authentic" scene.

replies(2): >>Cobras+eh >>thomas+ht

>>twism+Kb
https://www.theverge.com/2023/9/23/23886765/google-pixel-8-p...

>>flavor+(OP)
I feel like this will be great for wedding photographers

>>crazyg+fa
removing the cameraman from the shot is probably pretty close to the top of the list also

replies(2): >>Wistar+5L >>Gabrys+md1

>>flavor+(OP)
Slightly off-topic: what's the best way right now to remove my ex-wife from an old family portrait and replace her with my current wife?

replies(6): >>pcblue+Qd >>solard+ve >>bradle+ef >>Adverb+El >>tremon+jp >>true_r+Nq

>>Jorge1+3a
You don’t even need to take the photo, with enough images of each family member and images of a tourist destination you can just automatically construct a photo of everyone together at the location, saving the costs and carbon footprint of getting everyone together.

replies(2): >>cubefo+Kh >>TheJoe+vV

>>011000+Zc
Give it five years for the tech. Right now? Probably easier to get back with the ex to make the portrait correction.

/jk sorry

>>011000+Zc
Ask everyone to pose for a new photo

>>lopati+o8
Images aren't loading for me which is kind of a bummer for this specifically... :/

>>011000+Zc
The kind of stuff the op is doing—-changing the composition to reflect a picture that could have been taken—-is one thing. But what you are asking feels Stalin-esque to me. A picture is a record of a point in time and you can’t change the past.

replies(3): >>jooste+ui >>solard+2k >>true_r+oT

>>flavor+(OP)
This is what will make the Pixel compelling.

My wife and I have been using the Pixel phones since Pixel 6 and we love the camera. Great pictures! But the best features are google photos, auto-tagging, recommending collages, walking down memory lane.

Then you can magic erase tourists from pictures and pic a better shot from a picture you took on the fly....

You add this "authentic image completion" to my kids pics, and it's game over...

I want this on my Pixel 8 asap!

replies(1): >>ehsank+9I

>>jetrin+Fa
I feel like this is already a thing with certain photo editing applications, if not built into phones themselves.

replies(1): >>AuryGl+RQ

>>endisn+b9
Agreed, the "Reference-Driven Generation" part is totally fine, but "authentic" is overselling it.

>>crazyg+ic
Yeah, except it's still absolutely vulnerable to hallucination. Look at the last set of images on "Limitations" page. The algorithm knows that there's a sign with text there, and it uses the original image to get the right letters in there, but it randomly reorders the letters rather than using the source image. "Real" and "authentic" is extremely misleading here.

That said, props to them for calling out the limitations so clearly. I really appreciate it when people are up front with the problems like that.

>>xwdv+qd
And then why demand "photos" of family excursions at all, when it is just an AI imagining how things probably were happening at the time, or would have happened? We should just stick to our own imperfect memory.

replies(1): >>therei+aX

>>dang+it
Cool tech, but plastering "authentic" all over this kind of generated photography is really disingenuous and just rubs me the wrong way. I get that it's informed by real details from other photos, but that's not what authentic means.

If I buy an "authentic Rolex" and receive a Chinese Rolex clone that's built similarly based on observations of a real Rolex, I'm going to feel scammed and very upset. And I'm much more protective of my memories than I would be of a watch.

replies(7): >>cmdli+Ri >>101008+5l >>drewco+Tn >>neilv+Bq >>thomas+Tr >>Quercu+Ur >>dang+gt

>>endisn+b9
Nice idea. Might not need multiple pics given Google’s image dataset and ability to recognise what you’re looking at.

Give that a couple generations. “You were at location X and didn’t take a pic. We generated you some selfies, choose one that you like.”

replies(1): >>cvwrig+hN

>>bradle+ef
A picture is a record of a point in time and you can’t change the past.

I don’t think either of those things are true. Both can be changed, and are often changed. Much of what we ‘know’ of the past is wrong.

>>flavor+(OP)
Pro: a cool and useful looking technology

Con: it's from Google so forget about trying it yourself any time soon

I used to be a huge supporter of Google's products, now the name is an instant red flag.

replies(1): >>famous+5a5

>>bhaney+Rh
I would argue that authentic is a relative term, and actually helped me understand the product more easily. IMO, it is “authentic” because, compared to other image fills, it tries to fill in the data using real data from other photos.

replies(6): >>endisn+pj >>jamesh+Zj >>sdfghs+Rk >>tremon+so >>marric+Pp >>HaZeus+Pq

>>cmdli+Ri
IDK, when I think authentic, I think "genuine", and no image generation is genuine by definition. this is not a bad thing necessarily, but it's important to frame these things correctly.

ultimately we oughta think about what we are referring to. if we are talking about a photograph taken by someone, the authenticity is ultimately coming from the combination of the photograph and camera used. so when you think of a genuine photo in this scenario you expect it to be fundamentally taken by the user by a particular camera to create a particular photograph. you can use devices to take a photo without pressing the button, such as a timer, but the photograph and camera are both fundamental to the authenticity of the image. if the camera is no longer entirely involved in the generation of the photograph I would say that it is no longer genuine.

Reference driven as described in the article is more appropriate, but alas it is verbose. normally such pedantry bores me, but in this case it's pretty tantamount to what it is being presented in this case.

replies(1): >>tremon+ar

>>cmdli+Ri
We need a new word: authentish.

replies(1): >>smcnal+1t

>>bradle+ef
Sure you can, just as you can change people's memories and implant false ones. Hell, in this dystopia we're headed towards, it'll probably be a subscription service where you can rewrite 5 bad memories a month for $29/mo

>>cmdli+Ri
HN loves arguing about words.

replies(1): >>debugn+Qm

>>flavor+(OP)
Come on, Google, push this to Google Photos.

>>bhaney+Rh
Yeah, I think the first example is bad. This shouldn't be used for the photos you took. What's the purpose of having a photo if it wasn't the real moment you captured? I could understand the usage in marketing or event photographies, but for memories with your loved ones (as the first example tries to show it) it just doesn't make sense to me.

Two anecdotes:

1. A friend of mine met his favourite author (traveled from one continent to another for a signing event). When he shaked hands with the author, a friend took a photo. A lady (still hated by us!) step in the middle, and blocked the photo. Maybe an IA or a talented person could remove her, use a footage photo of the author and rebuild the photo... but why? What's the purpose of that?

2. A few months ago during the pandemic I scanned all the printed pictures of my grand parents with my phone. Aftre scanning like 200s, I checked one and I zoomed in: the stupid app applied some IA to make it better and it just was worse. I don't care if it looks better for the untrained eye: my grandparents didn't look like that. I now have stupid horrible verson of the scanned photos, where my grand parents appear with smooth skin and weird eyes.

replies(2): >>IanCal+1n >>nuance+Lo

>>flavor+(OP)
Something…seems fishy? Like the example with the guy next to the robot figure. Their model happened to predict exactly the same type of figure?! Diffusion models are not omnipotent…

replies(3): >>IshKeb+ko >>bjornl+mo >>foota+Io

>>011000+Zc
Assuming you are asking about a generative AI way, you could use photos of your new wife to train a LoRA with kohya-ss, then with A1111 you could do an img2img repaint using the ControlNet extension to make sure you get a similar pose. With enough experimentation you could probably get at least one decent result.

At least that's what comes to mind with the things I know you can run offline.

>>flavor+(OP)
I have been working on a holographic camera, but the ultra-cheap pinhole cameras I chose for the array have two issues: the exposure can't be controlled and the lenses are poorly aligned. I can calibrate away most lens aberrations with OpenCV, but some of the outliers have so much cropping that I am discarding 75% of my good pixels to get a coherent result. I was considering using NeRFs to reproject the ideal camera angles, but COLMAP is not very tolerant of brightness fluctuations and NeRF training is relatively slow (considering my goal is video). This would be a nice solution to my problem, because I have a comprehensive set of angles to pull context from.

>>sdfghs+Rk
And recent HN posts love to twist them and reinterpret them just for promotion.

>>flavor+(OP)
So is the weather just hallucinated then? We're just making up memories and calling them real? And advertising this blatently, called rainy days sunny and sunny days rainy? My god I hate this so much.

Not even a discussion about if this might be harmful or what the risks are or anything, just plain old "THIS FAKE MOMENT WAS REAL AND YOU'LL BELIEVE IT"?!

I really have a hard time with this. Wow I'm upset, more than I expected. The tech is fine yeah but the marketing is just deeply upsetting.

replies(1): >>dymk+X01

>>101008+5l
I totally agree with 2. I'm less sure on 1. Imagine it's perfect - it would be an accurate representation of what was really there. The real photo is a snapshot of a very specific time that doesn't represent the broader context of what happened.

A different angle, if a friend had painted the encounter instead, it wouldn't be exact but it would be a snapshot of a memory.

I'm not hugely arguing in favour of it but I think there's different scales here, from cameras doing "merge pictures half a second apart so people have their eyes open" to "totally change their face".

>>flavor+(OP)
Cool tech as others have said, but of course, for thee but not for me with Google, unless I missed a link to a GitHub repo. (That's why OpenAI is called OpenAI - not open source, but at least open access!)

>>flavor+(OP)
Me: Facebook AI, please post an entry about my vacation on Cape Cod and create a bunch of photos to go with it.

Facebook: Great. I'd be happy to. Any more detail you'd like to add?

Me: Make us look attractive. Show that we're a having a great time. Also, we went to see the Chatham Lighthouse.

Facebook: OK, done!

...

Facebook: You've received 48 likes. Your mother would like to know if you had any salt water taffy.

Me: Yes, and please create a picture of my oldest daughter having trouble chewing it.

Facebook: Done.

replies(4): >>Shakat+Fo >>y-curi+SW >>derefr+m01 >>seydor+ml1

>>bhaney+Rh
> plastering "authentic" all over this kind of generated photography is really disingenuous

No more so than "virtual," which used to mean "true." Or "literal" which used to be the opposite of "figurative." It's just another word being used auto-autonymically.

Definitio fugit.

replies(1): >>thomas+hs

>>flavor+(OP)
There's definitely value in providing this functionality for photographs taken in the present.

But I think the real value -- and this is definitely in Google's favor -- is providing this functionality for photos you have taken in the past.

I have probably 30K+ photos in Google Photos that capture moments from the past 15 years. There are quite a lot of them where I've taken multiple shots of the same scene in quick succession, and it would be fairly straightforward for Google to detect such groupings and apply the technique to produce synthesized pictures that are better than the originals. It already does something similar for photo collages and "best in a series of rapid shots." They surface without my having to do anything.

replies(3): >>thesua+RB >>Boppre+EP >>fenoma+7b1

>>flavor+(OP)
Google might as well just be making up tech considering none of this stuff ever gets released.

replies(2): >>Grazes+xM >>js4eve+ho1

>>flavor+(OP)
I think using allusions to realism with AI is a dangerous road to start out on.

>>syntax+tl
That's the entire point. It didn't "happen" to predict exactly the same type of figure. It used the context photos to know what type of figure it should render.

You might be getting a bit confused because here the training process has to happen every time you use it, whereas in most AI applications you only perform inference for actual use.

>>syntax+tl
I wonder if he is holding that umbrella to aid the model in recovering the 3d scene/scale from the reference images.

>>flavor+(OP)
When will they re-release all the old Star Trek TV shows in 1080p resolution and 16:9 aspect ratio?

replies(2): >>Shakat+6p >>dragon+0G

>>cmdli+Ri
How do they know the data from the other photos is real?

>>flavor+(OP)
I suspect this will do a pretty good job at defeating watermarks.

>>simone+Sn
Sounds like the plot line to an episode of Black Mirror, but also something that is far too likely to happen.

replies(2): >>simone+Lp >>ormax3+xz

>>syntax+tl
The model gets the reference images as "context", so it can just copy the robot from one of the other images.

replies(1): >>syntax+Bu

>>101008+5l
Is IA French for AI? (Like UE and many other abbreviations)? I could look it up but might as well ask the question.

replies(2): >>nargek+Cr >>smcnal+ys

>>flavor+(OP)
Creating a fake life is going to be so easy soon.

Everyon will be able to make all of the other fakes on social media jealous with ease.

replies(1): >>pbhjpb+Zm1

>>drcode+oo
There are already applications like https://www.topazlabs.com/topaz-video-ai and https://tensorpix.ai/ -- So it doesn't seem unreasonable that some of these deep learning models could upscale all these old TV episodes to at least 4k.

I'd love to see a combo of this Google tech and AI upscaling do the same for Babylon 5. They had shot the actors in widescreen format, but the CGI spaceships were only rendered in 4:3 and the files have been lost.

>>011000+Zc
Scissors and a Pritt stick.

>>Shakat+Fo
me: Facebook AI, please post a tender moment between me and my father when I was a boy. Include some photos.

Facebook: I'd be happy to. Are there any more details you'd like to include?

me: Please show how he didn't understand me at first, but then he looks at me and starts crying with love and regret.

Facebook: Done. Your relationship with your father must have been deeply fulfilling.

>>cmdli+Ri
It's "authentic" in the same way that when you see something labeled authentic it makes you more likely to question if it's actually what it says it is because authentic thing don't need such labels plastered on them.

Regardless, I'm pretty sure "reconstructed" it the honest word to use.

>>flavor+(OP)
Recover those precious memories of things that never happened, only with Google!

>>flavor+(OP)
Somewhat covertly I deep down wish that human's desire for pretty looking pictures will fade away over time, due to the ubiquity of pretty looking pictures produced by auto post processing. And at ultima their liking of pretty people and shiny new stuff in general. I don't want to sound negative or pedantic, I just would like that people prefer inner beauty in the broader sense.

replies(2): >>hansoo+NH >>Ajedi3+m18

>>bhaney+Rh
They really need to not use the term "authentic" to name this.

They also need to be very, very careful when introducing capability to falsify photographic images convincingly.

Using the term "authentic" for this (and how do they even know what's an authentic memory?) doesn't sound like being very, very careful. It sounds like being gratuitously reckless.

>>011000+Zc
I know someone who did something similar. He remarried then went back to and cropped or deleted ten years of Facebook photos to make it look like he never had a previous relationship and just ten years of boys nights.

He even has a picture up of him from his wedding day… standing alone in a tux.

>>cmdli+Ri
I'd call that "contextual" rather than authentic.

replies(1): >>waynes+ut

>>flavor+(OP)
this page consistently crashes chrome on iOS

>>endisn+pj
I think "composite" would be more accurate to describe this process. As in, "complete a picture using image composition".

>>nuance+Lo
Yes it is !

IA -> Intelligence Artificielle

>>lopati+P9
Might be blocked on some corp networks because it's all anonymous user generated content.

>>bhaney+Rh
This literally goes against the meaning of the word authenticity.

Call it "realistic". Words matter.

replies(1): >>Timon3+Em1

>>bhaney+Rh
Seems like it's only a matter of degree, given that modern cell phone cameras take image bursts and combine them into a single output image. Filling in details in a scene from other photos taken at the same time doesn't really seem that different to me. And seeing that photography has never really been capturing real life exactly, is it really that big a deal? Look at Ansel Adams - he heavily edited his "real-life" photographs, and changed them over the years as he made subsequent prints.

(Disclaimer: work for Google but have nothing to do with this project.)

>>drewco+Tn
Virtual never meant real.

Literally is often used in a sarcastic context. That sarcasm depends on the word meaning what it means.

>>nuance+Lo
In Spanish, too — and other subject–object–verb languages

>>jamesh+Zj
Like “truthy.”

>>bhaney+Rh
Ok, we've taken authenticity out of the title above.

>>crazyg+ic
Here are some better words of the top of my head:

Intentional

Contextual

Everything about this project goes against the meaning of authenticity.

>>flavor+(OP)
[stub for offtopicness]

replies(2): >>aga98m+B9 >>bhaney+Rh

>>HaZeus+Pq
let me give you an example. when i draw a mustache on a face in ms paint in brown, that's contextual but not authentic.

>>flavor+(OP)
The current advancement in Generative AI is a bit scary, in my opinion. May I be pessimistic?

This and the new demos I saw from WhatsApp's new demo around persona-based AI can really alter someone's perception and memories. I don't think we are considering how it can really impact our understanding of our feelings, perception, memories and mindfulness.

If you take a picture of reality and alter it with Gen AI to do something else and change the moment, what is the new reality? After a while, we might question if it was real or not, and then that might just become the new reality.

In my opinion, GenAI is truly transformational as well as scary, as it can alter our perception. I wonder if anyone else feels this way.

replies(1): >>theult+Ou

>>foota+Io
Ahh I see, this makes a lot more sense now!

>>debars+su
Nobody shows their true reality in public pictures anyway, it's all staged in some way.

For private pictures, it didn't change your reality, you can lie to yourself, but you've always been able to do that.

replies(1): >>debars+2v

>>theult+Ou
but when you take a picture that capture personal moment and some software without your consent alters it with some generative stuff, what would that lead to?

I disagree with lying to yourself. For people who are not mindful and aware, this is severely impact their perception.

replies(1): >>theult+Aw

>>flavor+(OP)
For the last 2-3 years, on an almost weekly basis, I am blown away by the progress made in AI. Huge steps forward. It actually happened twice in the last 24 hours alone.

Where will we be 10 years from now? 50?

replies(1): >>Heidar+3F

>>debars+2v
> but when you take a picture that capture personal moment and some software without your consent alters it with some generative stuff, what would that lead to?

I mean, do you not look at the photo after you take it? Even if you don't, you were there and saw the original scene. If your memory fails you, it's on you. If you didn't take an accurate picture, it's on you. Check next time.

If anything meaningful is added, it'll be very noticeable, if it's not meaningful, then what does it matter?

Cameras already do a lot of corrections that don't represent reality.

Hell, our perceptions of colors is different than everyone else's.

>>Shakat+Fo
https://petapixel.com/2022/12/14/man-fakes-an-entire-month-o...

replies(1): >>anticr+dm1

>>flavor+(OP)
Hasn’t something like this been around for a year or so to “decensor” hentai pics?

>>jawns+ao
Every picture is a picture from the past though

replies(6): >>jawns+8D >>royalt+mD >>parine+EL >>ekianj+w51 >>makapu+1f1 >>ameliu+9U2

>>flavor+(OP)
Is this similar to what GoPro cameras do to remove the selfie-stick? They use video content from adjacent frames to remove the pole and fill in with pixels. I get that the approach here can use imagery that's frames completely differently.

>>thesua+RB
Philosophically, yes. But some photo-editing techniques rely on data that is not backfillable and must be recorded at capture time. And even in cases where there is no functional impediment to applying it against historical photos, sometimes there is product gatekeeping to contend with.

>>thesua+RB
Oh yeah, what about this old Kodak I found in my grandpa's attic that prints pictures showing how people are going to die?

replies(1): >>chii+V81

>>twism+Kb
Leaks? Wasn't this a launch feature in Google Photos, while it was still Google+ Photos?

It was supposed to adjust eyes to open them if you took multiple photos.

replies(2): >>twism+FL >>Grazes+IL

>>qingch+6c
> widescreen to vertical

You’re a monster.

replies(1): >>qingch+F02

>>brap+iw
What was the second time in the last 24 hours?

replies(1): >>brap+RF

>>Heidar+3F
https://youtu.be/MVYrJJNdrEg

replies(1): >>pbhjpb+sm1

>>drcode+oo
This requires other pictures of the environment to use to infer what should fill in the gaps, which will not exist for every shot in those series. (TOS and TNG were already rereleased in 1080p, though.) I suppose you could use outpainting to construct the rest of the scene in one frame, and use that as the reference for other frames in the same shot.

replies(1): >>pbhjpb+Hm1

>>nuance+5q
This is a beautiful post! Thank you!

>>sergio+wf
The demo of the new upcoming Magic Editor they gave at I/O was quite magical.

https://www.youtube.com/watch?v=-a583U3Sw44

There's also leaks showing another feature where you can individually swap every person's face to get the perfect photo:

https://www.ign.com/articles/google-pixel-8-leaked-video-ai-...

I definitely agree, Pixel has been at the forefront of computation photography and editing since its inception. Things like night photography that we take for granted now, I remember when Pixel 2 first introduced it and it was honestly mind blowing. this use of computation photography and editing that

replies(1): >>tinyho+yT

>>waynen+Gc
… especially on highly reflective subject surfaces such as cars.

>>thesua+RB
Here's a picture of me in the future.

replies(2): >>miohta+3M >>drewbe+lN

>>ChrisC+qD
That's "Top Shot" which is the entire frame. The feature I'm referring to would adjust multiple faces in a frame by selecting the same faces/sections from different frames to a single target frame

replies(1): >>ChrisC+uR

>>ChrisC+qD
Yeah there is a version of the smart fill available on Pixel phones

>>parine+EL
John Titor, is that you?

replies(1): >>ortusd+IQ

>>crazyg+fa
wow x2. You're right video is where this is really cool. Take enough video of a scene and you can then create most any photo from any angle in it.

>>Workac+ho
Ehh this stuff gets put to use on their pixel phones

>>richar+7i
If they wanted to, they could find real pictures of you, taken by other Google users who were there at the same time.

I don't know if that's more or less creepy than the AI stuff...

>>parine+EL
Where you get that camera at??

>>jawns+ao
That's exactly why I've been keeping all "duplicates" in my photo collections.

They do take up a lot of space, and just today I asked in photo.stackexchange for backup compression techniques that can exploit inter-image similarities: https://photo.stackexchange.com/questions/132609/backup-comp...

replies(4): >>syntax+WW >>randyr+K31 >>bick_n+Wf1 >>RockRo+Xf1

>>miohta+3M
No, it's Mitch Hedberg.

replies(1): >>thejaz+dY

>>lazyco+Ff
I’m a photographer that does families/weddings.

I’ve done this manually in Photoshop more times than I can count.

Usually more automated solutions only hold up to light scrutiny, but that’s rapidly changed in the past year. I’m sitting after this year and I’m a little miffed about it. Oh well.

replies(1): >>kennya+5v1

>>twism+FL
I swear that's what was announced, but I assume you're right because you actually know the term Top Shot, and I had no memory of that.

So your memory is probably better than mine. :)

I just remember some demo of a family shot and it automatically opening a little boys eyes by using another photo. And another auto combining of images so that you could take a lot of photos of a busy tourist place and automatically remove all the people.

>>bradle+ef
This is why my family portrait is going to be a painting. In the future, you can just paint in the new generations and we can all be in the same frame together.

>>ehsank+9I
What's so magical about that I/O? I get the point of improving the quality of a picture. But editing the picture so that it includes things that didn't really happen... why even care besides trying to impress others?

replies(1): >>ehsank+XF3

>>london+09
Images are broken for me.

>>endisn+b9
This already exists in Stable Diffusion and others, called outpainting.

>>xwdv+qd
That reminds me of https://hackaday.com/2023/06/02/ai-camera-imagines-a-photo-o...

A box that takes your gps location, weather, etc and autogenerates a photo from your PoV.

>>flavor+(OP)
There was a subreddit called something like r/bubbling where people would edit pictures of women in bikinis and actually cover more of the image, but in such a way that your brain was fooled into completing the image and seeing a nude woman. I thought it was a technical marvel, however creepy.

>>simone+Sn
Incredible. Man, am I going to be telling my grandkids about a time when you could believe your eyes and ears on the internet.

replies(1): >>DaiPlu+s91

>>Boppre+EP
Suggestion: stack the images vertically or horizontally. Frequency spectrum compression schemes like JPG will see the similarity in the fine details.

replies(2): >>bayesi+e91 >>bondar+GP1

>>cubefo+Kh
I'd imagine in the future we could have services such as this one:

> In exchange of a small fee and a 35 minutes suggestion session, get you and your family implanted with memories of a beautiful vacation that'll last you for a lifetime for fraction of the cost of an actual one.

>>flavor+(OP)
Mix this with some nerf/guasian spat, or other 3D rendering and we have photos where you can re-frame after being shot. No more selfie sticks, perhaps use both cameras at one time to capture more of a scene for infill.

Some will say "but that isn't a real photo of what was there", but our memories of what was in a photo or a scene aren't perfect anyway.

replies(1): >>drumtt+201

>>ortusd+IQ
I had an ant farm. They didn't grow shit!

>>pedalp+iX
I like the idea of a 3d generated scene we can explore with VR

>>simone+Sn
When you think about it, the only thing that's weird about this hypothetical conversation is the context of it being about (purported) photographs.

We expect images that look like photographs — at least when taken by amateurs — to be the result of a documentary process, rather than an artistic one. They might be slightly filtered or airbrushed, but they won't be put together from whole cloth.

But amateur photography is actually the outlier, in the history of "capturing memories"!

If you imagine yourself before the invention of photography, describing your vacation to an illustrator you're commissioning to create a some woodblock-print artwork for a set of christmas cards you're having made up, the conversation you've laid out here is exactly how things would go. They'd ask you to recount what you saw, do a sketch, and then you'd give feedback and iterate together with them, to get a final visual down that reflects things the way you remember them, rather than the way they were, per se.

replies(2): >>jprete+Q41 >>jayuni+Vb1

>>flavor+(OP)
How much VRAM (or system RAM) would you need to run this, and how much processing time does it take to process the reference images and let it generate the fills?

>>flavor+(OP)
Question: does this model only do outpainting, or does it also do super-resolution? Could this model be fed all the frames of a really awful security-camera video, in order to then synthesize a high-resolution still image of a suspect?

replies(1): >>pbhjpb+Ll1

>>crypto+Rm
> We're just making up memories and calling them real?

This has always been the case, you just don't remember it, and the (human) hallucinated details are usually just not important enough to care about.

>>Boppre+EP
most duplicates are from the same vantage point. these are not. i.e. you don't need to keep them all.

replies(1): >>beagle+bl1

>>derefr+m01
This is an interesting point. Usually people claim technology goes inexorably forward, yet here we are, merrily destroying trust in the most objective method we have to record the past!

replies(1): >>pbhjpb+9l1

>>thesua+RB
Not the pictures where you age people artificially

>>royalt+mD
but how did you know it wasnt a coincidence that the picture depicted a similar scene in the past?

>>syntax+WW
I got really good compression using this technique with JPEG XL, I'm sure there's even a good reason why it works so well but it's been a long time and I don't seem to remember why.

>>y-curi+SW
What if we're already living in the future, and everything we're experiencing right-now is being AI generated?

...that, and other thoughts I have while baked.

>>flavor+(OP)
Seems like the real utility of this technique will be as a way to vastly improve the temporal stability with a variety of generative video techniques. For example, if you are trying to use a video as a base for a new generative video: Take the first frame of your video and run it through SD with the control net of your choice. Then take that initial image and run it through this process to generate a new base model and then use that to generate your second frame. Now you can use that second frame to feed back into your model and rinse and repeat, always using the past few frames to inform the latest.

replies(1): >>goodma+Md1

>>jawns+ao
> ..fairly straightforward for Google to detect such groupings and apply the technique to produce synthesized pictures that are better than the originals.

Wouldn't an operation like this require some kind of fine-tuning? Or do diffusion models have a way of using images as context, the way one would provide context to an LLM?

replies(1): >>sangno+ee1

>>derefr+m01
https://web.archive.org/web/20140222103103/http://subterrane...

>>flavor+(OP)
So "computer, enhance!" is now real?

>>waynen+Gc
Initially I read the above as removing the cameramen from the process of taking photos (which is also where this is going)

>>dwalli+La1
That makes sense. If I understand correctly, this 'loopback' technique is being used below as you describe. Alarming video, btw.

https://www.reddit.com/r/StableDiffusion/comments/16uqqrh/ho...

>>fenoma+7b1
I think simpler algorithms (e.g. image histograms) can get you a long way. Regardless of the mechanism, Google Photos already has the capability to detect similar images, which is used to generate animated gifs.

>>thesua+RB
Every existing pictures are.

replies(1): >>positu+QR1

>>Boppre+EP
Tiled/stacked approach as others mention is good, and probably the best approach. Could also try doing an uncompressed format (even just .png uncompressed) or something simple like RLE then 7zip them together since 7zip is the only archive format that does inter-file (as opposed to intra-file) compression as far as I am aware.

Unfortunately lossless video compression won't help here as it will compress frames individually for lossless.

replies(1): >>adrian+uh1

>>Boppre+EP
Stupid question. Would a block based deduplicating file system solve this?

>>flavor+(OP)
I have a ton of potato definition videos along with matching high res photos from my childhood that were made by one of those cheap CF-card cameras at the time. Would be cool if this could restore those shitty video frames based on the reference photos as well.

>>bick_n+Wf1
Inter file compression has been solved ever since tar|gz

replies(3): >>daniel+Wh1 >>tehsau+1k1 >>beagle+3l1

>>adrian+uh1
Not even remotely an efficient scheme for images or video.

>>adrian+uh1
That’s for lossless compression, i think there’s special opportunities for multi image lossy

>>adrian+uh1
Not so. Gzip’s window is very small - 32K in the original gzip iirc, which meant even identical copies of a 33KB file would bot help each other.

Iirc it was Bzip2 that bumped that up to 1MB, and there are now compressors with larger windows - but files have also grown, it’s not a solved problem for compression utilities.

It is solved for backup - but, reatic, and a few others will do that across a backup set with no “window size” limit.

…. And all of that is only true for lossless, which does not include images or video.

>>jprete+Q41
Photographs haven't been able to be trusted since almost the beginning. Trusted as an image of a real scene that is.

Indeed, people viewing photographs have always been able to be manipulated by presentation as fact something that is not true -- you dress up smart, in borrowed clothes, when you're really poor; you stand with a person you don't know to indicate association; you get photographed with a dead person as if they're alive; you use a back drop or set; et cetera.

replies(1): >>jprete+qE1

>>randyr+K31
Those have been used for denouncing and super resolution for 30 years now - they are not useless. And storage is cheap, just keep them all.

replies(1): >>beagle+2y1

>>simone+Sn
You guys are very unambitious.

FB AI, make a series of posts about me climbing mount everest, meeting dalai lama, curing cancer, bringing peace to ukraine, changing my name to Melon Tusk, announcing running for president and adopting a dog named Molly

replies(1): >>toyg+zo1

>>derefr+P01
And you can tell it in advance which person you want it to reveal as the suspect!

>>ormax3+xz
This is the weirdest video I ever watched. It's like Black Mirror ... but in real life ... and a somewhat happy ending.

>>brap+RF
Lex Friedman's 3D realistic avatar interviewing Mark Zuckerberg in a generated space (two floating heads).

Interesting to be how it illustrates philosophical questions on the nature of reality, the projection of personality, the 'problem of other minds', and such.

>>thomas+Tr
"Realistic" is the wrong word, since that's what infill models are already doing, and the word is already used for that. You'd have to find something that differentiates between plausibly realistic and contextual realistic infill.

replies(1): >>thomas+Am5

>>dragon+0G
A lot of the shots are on the same set, so you'd want the system to use a whole series (season) as samples.

>>CrzyLn+Qo
Can we cryptographically sign a photo in a way that shows it was generated in a particular place? I'm thinking of some sort of beacon in a location that allows you to say this person was here, at least. I'm not sure if it's possible to go beyond presence and indicate anything else about the situation?

I hesitate to say it, but a blockchain is probably part of the solution.

replies(2): >>ameliu+3V2 >>famous+5b5

>>flavor+(OP)
ohhh Another research paper from google that will leads nowhere

>>Workac+ho
Agreed, I also suspect this. Since they don't release anything most of their "fantastic" papers are probably just BS made to let people think they are still relevant

>>seydor+ml1
But see, that's the sort of thing that would give it away.

You got to shoot for something just attainable enough to sound credible, while still being at the "enviable" end of the spectrum.

"FB AI, make a series of pictures of my first 3 months at Goldman Sachs in 2021. Include me shaking hands with the VP of software as I receive a productivity award for making them $1m in a week. Include a group photo of me and 12 other people (all C execs and my VP must be there). Crosspost all to LinkedIn, with notifications muted."

"Ok done"

"ChatGPT, take my existing CV and replace entries from 2021 onwards with a job as Head of Performance Monitoring at Goldman Sachs, reporting to VP of software. Include several projects with direct CEO and CFO involvement. Crosspost changes to LinkedIn."

"Ok done"

... and now I can go job-hunting.

replies(1): >>seydor+Fq1

>>toyg+zo1
I can see AI Consulting to be the next incarnation of social media expert

>>flavor+(OP)
this is crashing (black screen) my firefox on iphone

>>qingch+6c
Mind if I ask why you'd need to do that? It's a huge amount of if the frame being generated artificially, especially if you're talking cinema aspect ratio wide-screen.

replies(1): >>qingch+v02

>>AuryGl+RQ
You're sitting?

replies(1): >>AuryGl+L4a

>>beagle+bl1
That was supposed to be denoising, not denouncing, DYAC. Just noticed, too late to Edit Now.

>>pbhjpb+9l1
These aren’t even remotely comparable to AI photo manipulation.

replies(2): >>pbhjpb+4r9 >>froggi+cv9

>>syntax+WW
>in the fine details

Could it be possible that jpg also exploits the repetition at the wavelength of the width of a single picture, so to say? E.g. 4 pictures side-by-side with the same black dot in the center, can all 4 dots be encoded with a single sine wave (simplifying a lot here..) that has peaks at each dot?

>>makapu+1f1
If it hasn't been taken/made/captured yet, it isn't a picture. It's just the potential for one.

>>flavor+(OP)
Does anyone else notice that the example images they provided look like they included their test data in the training set? E.g., the picture with the couch where they cut out the dog in it. How should the network know that there was a dog on the couch? The only explanation is: It knows the reference image.

replies(1): >>ameliu+PU2

>>kennya+Zu1
If you're trying to convert widescreen content so it looks good on TikTok, Reels, Shorts, then for the most part you can crop a vertical chunk from the centre of the frame, and pan if necessary to keep the action in frame. Sometimes though the shot is too wide and you can't crop that vertical chunk out, so you have to crop it as vertical as you can and then add something to the top and bottom to fill out the frame, otherwise you have a whole shot that isn't in vertical and it breaks the flow of the video.

>>jiggaw+gE
I know. What would my parents think of what has become of their son..? Sorry, Mum and Dad! o_O

replies(1): >>jiggaw+nh3

>>thesua+RB
Every state machine is bound to cycle at some point, even if it has the size of the universe.

replies(1): >>flanke+nV2

>>KETpXD+xY1
That's the idea, right?

You give it a bunch of reference images, then another image with some rectangle removed, and it will fill in the rectangle with information from the reference image.

>>pbhjpb+Zm1
There is this trend going on where hardware vendors are increasingly locking down their hardware, and this could be a part of the solution you are looking for. Not everyone will be happy about it, however.

>>ameliu+9U2
This is not true, its very trivial to design a state machine that won't cycle.

replies(1): >>ameliu+KY2

>>flanke+nV2
Sorry, forgot to add that it should be reversible, like the laws of physics.

replies(2): >>flanke+TS4 >>flanke+Fyc

>>qingch+F02
I know your dad would smack you on the back of the head and tell you in that all-too-familiar tone of voice: “Just turn your phone sideways!”

>>tinyho+yT
What does "didn't even happen" mean? That girl was standing there with the balloons. She just happened to be slightly off frame, and that moment is now gone and you'll never get a proper framed photo of that moment.

It's like re-coloring an old black and white photo, or photoshopping out a photo bomber from the background.

>>ameliu+KY2
I would really like to see the proof that it's impossible to design a reversible state machine that won't cycle. But even if you do prove that, you would also have to prove that if the laws of physics are reversible that the universe is reversible.

The current best theory and understanding of the evolution of the universe is that it will reach maximum entropy (heat death). There is no cycling when this happens. Can you cite what theory or new discovery you have come across that somehow challenges the heat death hypothesis?

>>anigbr+xi
Looking at the paper for this specifically, it's a simple one. Wouldn't take someone who knew what they were doing more than a couple days to implement. But generally, i agree.

>>pbhjpb+Zm1
you can but any sort of even trivial editing would undo it.

>>Timon3+Em1
> contextual realistic

That works fine.

>>nuance+5q
#NoFilter

>>jprete+qE1
Agreed. My point was that trusting images ('seeing is believing') has always been at issue whilst we might imagine it is a new thing, the scale of the issue is different -- phenomenally so -- but it's not a category difference. Many people were convinced by the fairy hoaxes based on image manipulation in the early 20th Century (~1917). They fell for it hook-line-and-sinker, images made with ML weren't needed.

>>jprete+qE1
This is a bit of a stretch, but the end results from either manipulation technique would be comparable if they were meant to skew the truth the same way. However, that sounds stupid as shit when I read it back, but I'm not entirely sure why.

I think a use case for AI image manipulation could be more like if I need a picture where I'm poor but wearing smart borrowed clothes, standing with an unassociated associate and a dead alive, with a backdrop, with the only source image beimg selfie of someone else that incidentally caught half of me way in the background

The intent or use cases for these two (lacking a better term) manipulators aren't orthogonal here. The purpose of AI image generation is, well, images generated by AI. It could technically generate images that misrepresent info, but that's more of a side effect reached in a totally different way than staging a scene in an actual photo. It seems like using manipulation to stage misleading photos would be used primarily for the purpose of deceptive activities or subversive fuckery.

>>kennya+5v1
Damn, autocorrect. Quitting.

Which won't involve much sitting at all, other than on those weekends I'll now be getting off.

>>ameliu+KY2
Also, not all laws of physics are time reversible, i.e. the second law of thermodynamics.

zlacker

RealFill: Image completion using diffusion models