Low Profile (staggered) Stereovision?

The Art of Stereoscopic Photo questions and tutorials
Post Reply
78163
Posts: 11
Joined: Sat Feb 29, 2020 11:33 pm

Low Profile (staggered) Stereovision?

Post by 78163 »

Ladies and Gents,

I know this question exists probably on some abstract openCV forum, but I haven't been able to reasonably answer it.

Problem: Hardware constraints restrict me from widely mounting two cameras with sufficient IPD to generate a reliable depthmap (i.e. noise reduction).

Proposed solution: I was wondering if staggering the cameras in a 2D plane versus a 1D line could 'give me' that extra camera separation and my needed IPD for depthmap reliability. Here is a visual:

Image

I'm trying to wrap my mind around it, but instead of lateral offset differences, the 'scale' of the objects in field would be different. Sort of like an optical flow style problem, but instantaneous with two cameras. Is this feasible? Does anyone have a good research thread to deep dive on?

Cons: Obviously camera FOV would be narrowed and the lead camera would obscure the trail camera FOV. If the concept works, what other downsides would there be to this setup?

Sorry if this is common knowledge...my google-fu wasn't strong enough to find it!

v/r,

x78163

User avatar
Realizator
Site Admin
Posts: 900
Joined: Tue Apr 16, 2019 9:23 am
Contact:

Re: Low Profile (staggered) Stereovision?

Post by Realizator »

Hi 78163,
Interesting question.
The first idea is that after scaling one of your images you'll actually get the same equivalent stereobase. You can not cheat a physics here.
Some ideas on this:

1. If your space is limited - for example, 50 mm width, you need to use it as effective as possible.
1st idea is to put a cameras as close as possible, like this:
Image

Camera's PCB size is 25 mm, and in this case you get 25 mm stereobase.

This is not the most effective way.

So you can try to use a modules like this:
Image

In this case you'll have about 44 mm stereo base.

2. Tricky "animal vision" approach
There are also "pigeon vision" or "horse vision" approach.
The idea is that camera's axis are not parallel.

Image

Image

In this case only a part of your FOV can give you 3D information (overlapped zone). Actually you'll get some additional stereo base, as your cameras optics (I mean not a pinhole, but M12) will have an angle beetween their axis, but it's just a several millimeters in comparison with the parallel mount. Also, for this camera setup all classic OpenCV calibration functions will do a crazy things, and you need to do a lot of code manually (like cutting a part of an image after undistortion for further stereoscopic rectification).
Eugene a.k.a. Realizator

stereomaton
Posts: 215
Joined: Tue May 21, 2019 12:33 pm
Location: France

Re: Low Profile (staggered) Stereovision?

Post by stereomaton »

It's interesting, but I do not have time to think about this in details.
If you try it, let us know.

As Realizator, I doubt it would let you gain base size because you would have somehow to project one of your images into the reference of the second one. Furthermore, you would probably have some unwanted vertical parallax, both up and down (!), which is very hard to deal with.

On the other hand, I did a small experiment with small base system: In the commercial world, I know that the RED Hydrogen One has a very small base but creates photos that users like (well, this device sometimes "expand" artificially the depth depending on the scene to present it to the user). It has around 13mm base, but I do not know the equivalent focal length which also plays a role in depth scale.

I found a photo taken by the RH1 on the Internet and extracted the pair (reduced to 720p):
https://ibb.co/tcKCfZT & https://ibb.co/Yp577HG (from https://www.dropbox.com/sh/fgu0ogu2a5i2 ... ge_h4v.jpg)

As you can see in the next image, even with a such small base on 720p image, you could retrieve quite a lot of good quality depth information. Here, it was extracted with a (slow) dense optical flow estimator, whose result was projected into horizontal direction (ie. ignoring vertical displacement). Estimated disparity was between -2.09 and 4.64 pixels. You could perhaps not extract that amount of details in realtime easily, but the information is truly there.

Image

This base is very small for the subject in question, because the depth of interest is very large.
Depending on the subject, as soon as you can get sub-pixel accuracy, you might get some results even with a small base.
Stereophotographer and hacker
Despite my quite active participation in the forum, I am not in the StereoPi team
StereoPi (v1) Standard Edition + CM3Lite module + a few cameras

78163
Posts: 11
Joined: Sat Feb 29, 2020 11:33 pm

Re: Low Profile (staggered) Stereovision?

Post by 78163 »

Gents,

Interesting. I'm particularly intrigued by the wide angle lens concept. That really is an interesting concept that I'm mentally face-palming myself about. Though I don't think I'll pursue it as it seems the benefit is not worth the effort based on the performance feedback.

Seeing the demo with the red camera convinced me that the stereobase width is not as big of a deal as I thought. I can definitely proceed with my project without having to re-code the universe :lol:

However, I'm still trying to wrap my mind around some things.
The first idea is that after scaling one of your images you'll actually get the same equivalent stereobase. You can not cheat a physics here.
That's what is really bugging my brain. Long story short, in flight school we're taught about visual illusions and how to mentally extract 3d information from monocular cues (night vision, etc.). One of the most important is Retinal Image Size -> "Increasing and Decreasing Size of Objects". Here is the flight publication that teaches these cues: https://www.yumpu.com/en/document/read/ ... procedures. They are on pages 1-14 to 1-16. Basically in flight, objects that grow in size are closer than objects that stay the same size.

So, here's my cognitive problem. Placing one camera in front of the other seems like you are scaling up the image from just one camera, in my mind I don't think all objects in the scene would scale uniformly. For instance, scaling a picture makes the mountains larger too, which would not realistically be the case with one camera slightly closer to the mountains. That's why I'm viewing it more from an optical flow perspective almost.
Furthermore, you would probably have some unwanted vertical parallax, both up and down (!), which is very hard to deal with.
For sure! That would be a pain to sort out. Though once 'dealt with' it could actually provide some interesting navigation cues.

Thanks for the amazing feedback and your patience with my off the wall question!

v/r

x78163

User avatar
Realizator
Site Admin
Posts: 900
Joined: Tue Apr 16, 2019 9:23 am
Contact:

Re: Low Profile (staggered) Stereovision?

Post by Realizator »

78163 wrote:
Sat Mar 28, 2020 5:57 pm
The first idea is that after scaling one of your images you'll actually get the same equivalent stereobase. You can not cheat a physics here.
That's what is really bugging my brain. Long story short, in flight school we're taught about visual illusions and how to mentally extract 3d information from monocular cues (night vision, etc.). One of the most important is Retinal Image Size -> "Increasing and Decreasing Size of Objects". Here is the flight publication that teaches these cues: https://www.yumpu.com/en/document/read/ ... procedures. They are on pages 1-14 to 1-16. Basically in flight, objects that grow in size are closer than objects that stay the same size.

So, here's my cognitive problem. Placing one camera in front of the other seems like you are scaling up the image from just one camera, in my mind I don't think all objects in the scene would scale uniformly. For instance, scaling a picture makes the mountains larger too, which would not realistically be the case with one camera slightly closer to the mountains. That's why I'm viewing it more from an optical flow perspective almost.
x78163
Ah, now I got your idea. Actually, if we zoom both images to the "same" size, they will be a bit different, as they see a bit different perspective, as originally the same object for both cameras has a different solid angel (named Omega as a rule). But I considered you'd use both images for a disparity map, where OpenCV try to find a horizontal difference between the similar fragments on the left and right image.
In your case, as I understand, you use either biological neural network (I mean your brain), or optical-flow like computational approach. This is another approach.
I'd like to say, that StereoBM function (for computing disparity) is the only real-time algo in OpenCV for the depth map without hardware acceleration. But optical flow is more costly procedure. A have had an idea to use H264 encoder, as it compares 16x16 fragments between a sequence of images. You might know about motion vectors, calculated as a part of H264 encoding by Pi's hardware (here and here on YouTube). Unfortunately, it has some limitations and will not work for either depth map or optical flow. But the bonus is just a couple of percents of CPU load. May be I will come back to these experiments.
Eugene a.k.a. Realizator

User avatar
Realizator
Site Admin
Posts: 900
Joined: Tue Apr 16, 2019 9:23 am
Contact:

Re: Low Profile (staggered) Stereovision?

Post by Realizator »

stereomaton wrote:
Fri Mar 27, 2020 2:37 pm
As you can see in the next image, even with a such small base on 720p image, you could retrieve quite a lot of good quality depth information. Here, it was extracted with a (slow) dense optical flow estimator, whose result was projected into horizontal direction (ie. ignoring vertical displacement). Estimated disparity was between -2.09 and 4.64 pixels. You could perhaps not extract that amount of details in realtime easily, but the information is truly there.
1280x720 is too high load for the Pi to get a real-time depth map. But in my experiment with "2D lidar-like mode" I used not all frame, but just a strip in the middle for the 640x480 image with ~30% height (instead of 320x240 for all frame). And got a good FPS. It means you can use a very small strip from 1280 image to get this. The bottleneck here is an image undistortion - we need to undistort the whole "big" frame to cut a small strip later, and this is a computationally heavy operation.
Now we're going to the scope of extreme solution optimization to get the maximum performance from our hardware. It's challenging and very interesting! We've not yet touched our GPU power and OpenGL - it might be helpful for undistortion on-the-go. Custom shaders and all that staff is a challenge too...
Eugene a.k.a. Realizator

stereomaton
Posts: 215
Joined: Tue May 21, 2019 12:33 pm
Location: France

Re: Low Profile (staggered) Stereovision?

Post by stereomaton »

Sure. As I said, it is not in the context of real time depth extraction (it took several tens of seconds on a 4GHz processor, monothread).
I only pointed out that, although the signal is small, it is present in the image.
The signal is smaller and smaller when your subject is placed farther away.
Of course, it is harder to extract when it's subtle.
Stereophotographer and hacker
Despite my quite active participation in the forum, I am not in the StereoPi team
StereoPi (v1) Standard Edition + CM3Lite module + a few cameras

78163
Posts: 11
Joined: Sat Feb 29, 2020 11:33 pm

Re: Low Profile (staggered) Stereovision?

Post by 78163 »

Code: Select all

But I considered you'd use both images for a disparity map, where OpenCV try to find a horizontal difference between the similar fragments on the left and right image.
I was at first. But then started to realize that it would be blending two techniques at the same time, which certainly isn't supported by OpenCV.
A have had an idea to use H264 encoder, as it compares 16x16 fragments between a sequence of images. You might know about motion vectors, calculated as a part of H264 encoding by Pi's hardware (here and here on YouTube). Unfortunately, it has some limitations and will not work for either depth map or optical flow. But the bonus is just a couple of percents of CPU load. May be I will come back to these experiments.
1280x720 is too high load for the Pi to get a real-time depth map. But in my experiment with "2D lidar-like mode" I used not all frame, but just a strip in the middle for the 640x480 image with ~30% height (instead of 320x240 for all frame). And got a good FPS. It means you can use a very small strip from 1280 image to get this.
I think you are onto something here. It's too bad the 16x16 chunking doesn't work natively with the depth map or optical flow. But it gave me an idea with respect to the flight realm (this might be a different topic).

A group is working on generating a real-time flight path vector from ardupilot (https://diydrones.com/profiles/blogs/ai ... 3#comments).

Since the flight path vector represents the future (linear*) flight path of the aircraft, we know where in the FPV image is the most important place to look. Additionally, we know the aircraft velocity and responsiveness.

*not perfectly linear, but safe to approximate as such

Cropping the image around the flight path vector with the crop mask coupled to aircraft velocity and responsiveness, we can basically remove all of the extraneous background noise for "non-threat" obstacles. As the aircraft velocity increases, the cropped image is smaller and tighter around the flight path vector. As the velocity decreases, the cropped image increases in size until it is 100% of the original image.

Image

Biologically we do the same thing when riding a bike...low speed focus a meter in front, high speed focus out on the horizon.

Computationally this might be too expensive for just one pi, but if you 'stream' the images from one single board computer to another, there might be a slight advantage. When slowing back down, you might run into problems since the small images won't compare to the large images.

I'm sure this has been implemented elsewhere since it is pretty intuitive, but your mention of the H264 encoder made me think, why not target the most important part of the image and dump the rest since processing power is the limiting factor.

78163
Posts: 11
Joined: Sat Feb 29, 2020 11:33 pm

Re: Low Profile (staggered) Stereovision?

Post by 78163 »

Sure. As I said, it is not in the context of real time depth extraction (it took several tens of seconds on a 4GHz processor, monothread).
I only pointed out that, although the signal is small, it is present in the image.
That definitely is a hurdle!

stereomaton
Posts: 215
Joined: Tue May 21, 2019 12:33 pm
Location: France

Re: Low Profile (staggered) Stereovision?

Post by stereomaton »

It depends on the application (some are not real time), but you did not give context.
By the way, the disparity is linked to the base size, but also to the focal length (longer focal = narrower field of view = biggest disparity for the same z)
Stereophotographer and hacker
Despite my quite active participation in the forum, I am not in the StereoPi team
StereoPi (v1) Standard Edition + CM3Lite module + a few cameras

stereomaton
Posts: 215
Joined: Tue May 21, 2019 12:33 pm
Location: France

Re: Low Profile (staggered) Stereovision?

Post by stereomaton »

I thought of this geometric problem again, and I came to something that I knew but missed in my previous answers.

Actually, you can totally align the images produced by the "staggered disposition", but the result will probably not be what you want. After alignment, the disparity will denote depth in a direction perpendicular to the line between the cameras, thus not in front of your plane. Also, because the field of view will be on the side of the equivalent stereo system, the model might be more sensible to bad calibration.
Stereophotographer and hacker
Despite my quite active participation in the forum, I am not in the StereoPi team
StereoPi (v1) Standard Edition + CM3Lite module + a few cameras

Post Reply