HN Reader

SHARP, an approach to photorealistic view synthesis from a single image

531

108

1 month agoby dvrp

"Unsplash > Gen3C > The fly video" is nightmare fuel. View at your own risk: https://apple.github.io/ml-sharp/video_selections/Unsplash/g...

1 month agoby superfish

Can someone ELI5 what this does? I read the abstract and tried to find differences in the provided examples, but I don't understand (and don't see) what the "photorealistic" part is.

1 month agoby Leptonmaniac

Well, I got _something_ to work on Apple Silicon:

https://github.com/rcarmo/ml-sharp (has a little demo GIF)

I am looking at ways to approximate Gaussian splats without having to reinvent the wheel, but I'm a bit over my depth since I haven't been playing a lot of attention to those in general.

1 month agoby rcarmo

I note the lack of human portraits in the example cases.

My experience with all these solutions to date (including whatever apple are currently using) is that when viewed stereoscopically the people end up looking like 2d cutouts against the background.

I haven't seen this particular model in use stereoscopically so I can't comment as to its effectiveness, but the lack of a human face in the example set is likely a bit of a tell.

Granted they do call it "Monocular View Synthesis", but i'm unclear as to what its accuracy or real-world use would be if you cant combine 2 views to form a convincing stereo pair.

1 month agoby supermatt

cuda gpu only

https://github.com/apple/ml-sharp#rendering-trajectories-cud...

1 month agoby moondev

Is there a link with some sample gaussian splat files coming from this model? I couldn't find it.

Without that that it's hard to tell how cherry-picked the NVS video samples are.

EDIT: I did it myself, if anyone wants to check out the result (caveat, n=1): https://github.com/avaer/ml-sharp-example

1 month agoby avaer

> photorealistic 3D representation from a single photograph in less than a second

1 month agoby yodon

Apple's Spatial Scene in the Photos app shows similar behavior, turning a single photo into a small 3D scene that you can view by tilting the phone. Demo here: https://files.catbox.moe/93w7rw.mov

1 month agoby derleyici

Impressive but something doesn't feel right to me.. Possibly too much sharpness, possibly a mix of cliches, all amplified at once.

1 month agoby tartoran

So this is the secret sauce behind Cinematic mode. The fake bokeh insanity has reached its climax!

1 month agoby brcmthrowaway

This is incredibly cool. It's interesting how it fails in the section where you need to in-paint. SVC seems to do that better than all the rest, though not anywhere close to the photorealism of this model.

Is there a similar flow but to transform either a video/photo/NeRF of a scene into a tighter, minimal polygon approximation of it. The reason I ask is that it would make some things really cool. To make my baby monitor mount I had to knock out the calipers and measure the pins and this and that, but if I could take a couple of photos and iterate in software that would be sick.

1 month agoby arjie

I could not find any mention of it but does this use regenerative AI? I can’t imagine it able to accomplish anything like this without using a large graphical Model in the back.

1 month agoby nashashmi

In Chapter D.7 they describe: "The complex reflection in water is interpreted by the network as a distant mountain, therefore the water surface is broken."

This is really interesting to me because the model would have to encode the reflection as both the depth of the reflecting surface (for texture, scattering etc) as well as the "real depth" of the reflected object. The examples in Figure 11 and 12 already look amazing.

Long tail problems indeed.

1 month agoby Dumbledumb

Works great, model file is 2.8 GB, on M2 rendering took a few seconds, result is guassian .ply file but repo implementation requires CUDA card to render video, I have used one of webgl live renderers from here https://github.com/scier/MetalSplatter?tab=readme-ov-file#re...

1 month agoby diimdeep

Apple dropping this is interesting. They've been quiet on the flashy AI stuff while everyone else is yelling about transformers, but 3D reconstruction from single images is actually useful hardware integration stuff.

What's weird is we're getting better at faking 3D from 2D than we are at just... capturing actual 3D data. Like we have LiDAR in phones already, but it's easier to neural-net your way around it than deal with the sensor data properly.

Five years from now we'll probably look back at this as the moment spatial computing stopped being about hardware and became mostly inference. Not sure if that's good or bad tbh.

Will include this one in my https://hackernewsai.com/ newsletter.

1 month agoby alexgotoi

That is really impressive. However, it was a bit confusing at first because in the koala example at the top, the zoomed in area is only slightly bigger than the source area. I wonder why they didn't make it 2-3x as big in both axes like they did with the others.

1 month agoby benatkin

This is great for turning a photo into a dynamic-IPD stereo pair + allows some head movement in VR.

1 month agoby Geee

This seems like what they have been doing with album covers on applemusic for a couple years.

1 month agoby pluralmonad

This would be really fun to create stereoscopic videos with. Take a video input, offset x+0.5 or some coefficient, take the output, put them side by side (or interlaced for shutter glasses) and viola! 3D movies.

1 month agoby reactordev

It would be interesting to see how much better this algorithm would be with a stereo pair as input.

Not only do many VR and AR systems acquire stereo, we have historical collections of stereo views in many libraries and museums.