# Stereoscopic reconstruction from VHR optical images pair¶

This section describes how to convert pair of stereo images into elevation information.

The standard problem of terrain reconstruction with available **OTB
Applications** contains the following steps:

- Estimation of displacements grids for epipolar geometry transformation
- Epipolar resampling of the image pair using those grids
- Dense disparity map estimation
- Projection of the disparities on a Digital Surface Model (DSM)

Let’s go to the third dimension!

## Estimate epipolar geometry transformation¶

The aim of this application is to generate resampled grids to transform images in epipolar geometry. Epipolar geometry is the geometry of stereo vision. The operation of stereo rectification determines transformations to apply to each image such that pairs of conjugate epipolar lines become collinear, parallel to one of the image axes and aligned. In this geometry, the objects present on a given row of the left image are also located on the same line in the right image.

Applying this transformation reduces the problem of elevation (or stereo correspondences determination) to a 1-D problem. We have two images image1 and image2 over the same area (the stereo pair) and we assume that we know the localization functions (forward and inverse) associated for each of these images.

The forward function allows to go from the image referential to the geographic referential:

where h is the elevation hypothesis, \((i,j)\) are the pixel coordinates in image1 and (long,lat) are geographic coordinates. As you can imagine, the inverse function allows to go from geographic coordinates to the image geometry.

For the second image, in that case, the expression of the inverse function is:

Using jointly the forward and inverse functions from the image pair, we can construct a co-localization function \(f_{image1 \rightarrow image2}\) between the position of a pixel in the first and its position in the second one:

The expression of this function is:

The expression is not really important, what we need to understand is that if we are able to determine for a given pixel in image1 the corresponding pixel in image2, as we know the expression of the co-localization function between both images, we can determine by identification the information about the elevation (variable h in the equation)!

We now have the mathematical basis to understand how 3-D information can be extracted by examination of the relative positions of objects in the two 2-D epipolar images.

The construction of the two epipolar grids is a little bit more complicated in the case of VHR optical images.That is because most of passive remote sensing from space use a push-broom sensor, which corresponds to a line of sensors arranged perpendicularly to the flight direction of the spacecraft. This acquisition configuration implies a slightly different strategy for stereo-rectification (see here ).

We will now explain how to use the *StereoRectificationGridGenerator*
application to produce two images which are **deformation grids** to
resample the two images in epipolar geometry.

```
otbcli_StereoRectificationGridGenerator -io.inleft image1.tif
-io.inright image2.tif
-epi.elevation.avg.value 50
-epi.step 5
-io.outimage1 outimage1_grid.tif
-io.outright outimage1_grid.tif
```

The application estimates the displacement to apply to each pixel in
both input images to obtain epipolar geometry.The application accept a
‘step’ parameter to estimate displacements on a coarser grid. Here we
estimate the displacements every 10 pixels. This is because in most
cases with a pair of VHR and a small angle between the two images, this
grid is very smooth. Moreover, the implementation is not *streamable*
and uses potentially a lot of memory. Therefore it is generally a good
idea to estimate the displacement grid at a coarser resolution.

The application outputs the size of the output images in epipolar
geometry. **Note these values**, we will use them in the next step to
resample the two images in epipolar geometry.

In our case, we have:

```
Output parameters value:
epi.rectsizex: 4462
epi.rectsizey: 2951
epi.baseline: 0.2094
```

The epi.baseline parameter provides the mean value (in \(pixels.meters^{-1}\)) of the baseline to sensor altitude ratio. It can be used to convert disparities to physical elevation, since a disparity of this value will correspond to an elevation offset of one meter with respect to the mean elevation.

we can now move forward to the resampling in epipolar geometry.

## Resample images in epipolar geometry¶

The former application generates two grids of displacements. The
*GridBasedImageResampling* allows to resample the two input images in
the epipolar geometry using these grids. These grids are intermediary
results not really useful on their own in most cases. This second step
*only* consists in applying the transformation and resample both images.
This application can obviously be used in lot of other contexts.

The two commands to generate epipolar images are:

```
otbcli_GridBasedImageResampling -io.in image1.tif
-io.out image1_epipolar.tif
-grid.in outimage1_grid.tif
-out.sizex 4462
-out.sizey 2951
```

```
otbcli_GridBasedImageResampling -io.in image2.tif
-io.out image2_epipolar.tif
-grid.in outimage2_grid.tif
-out.sizex 4462
-out.sizey 2951
```

As you can see, we set *sizex* and *sizey* parameters using output
values given by the *StereoRectificationGridGenerator* application to
set the size of the output epipolar images.

Figure 1: Extract of resample image1 and image2 in epipolar geometry over Pyramids of Cheops. ©CNES 2012

We obtain two images in epipolar geometry, as shown in
Figure 1. Note that the application allows to
resample only a part of the image using the *-out.ulx* and *-out.uly*
parameters.

## Disparity estimation: Block matching along epipolar lines¶

Finally, we can begin the stereo correspondences lookup process!

Things are becoming a little bit more complex but do not worry. First,
we will describe the power of the *BlockMatching* application.

The resampling of our images in epipolar geometry allows us to constrain the search along a 1-dimensional line as opposed to both dimensions, but what is even more important is that the disparities along the lines, i.e. the offset along the lines measured by the block-matching process can be directly linked to the local elevation

An almost complete spectrum of stereo correspondence algorithms has been
published and it is still augmented at a significant rate! See for
example .
The **Orfeo Toolbox** implements different strategies for block
matching:

- Sum of Square Distances block-matching (SSD)
- Normalized Cross-Correlation (NCC)
- Lp pseudo-norm (LP)

An other important parameter (mandatory in the application!) is the range of disparities. In theory, the block matching can perform a blind exploration and search for a infinite range of disparities between the stereo pair. We need now to evaluate a range of disparities where the block matching will be performed (in the general case from the deepest point on Earth, the Challenger Deep . to the Everest summit!)

We deliberately exaggerated but you can imagine that without a smaller range the block matching algorithm can take a lot of time. That is why these parameters are mandatory for the application and as a consequence we need to estimate them manually. This is pretty simple using the two epipolar images.

In our case, we take one point on a *flat* area. The image coordinate in
\(image_{1}\) is \([1970,1525]\) and in \(image_{2}\) is
\([1970,1526]\). We then select a second point on a higher region
(in our case a point near the top of the Pyramid of Cheops!). The image
coordinate of this pixel in \(image_{1}\) is \([1661,1299]\) and
in \(image_{2}\) is \([1633,1300]\). So you see for the
horizontal exploration, we must set the minimum value lower than
\(-30\) (the convention for the sign of the disparity range is from
image1 to image2).

Note that this estimation can be simplified using an external DEM in the
*StereoRectificationGridGenerator* application. Regarding the vertical
disparity, in the first step we said that we reduced the problem of 3-D
extraction to a 1-D problem, but this is not completely true in general
cases. There might be small disparities in the vertical direction which
are due to parallax errors (i.e. epipolar lines exhibit a small shift in
the vertical direction, around 1 pixel). In fact, the exploration is
typically smaller along the vertical direction of disparities than along
the horizontal one. You can also estimate them on the epipolar pair (in
our case we use a range of \(-1\) to \(1\)).

One more time, take care of the sign of this minimum and this maximum for disparities (always from image1 to image2).

The command line for the *BlockMatching* application is :

```
otbcli_BlockMatching -io.inleft image1_epipolar.tif
-io.inright image2_epipolar.tif
-io.out disparity_map_ncc.tif
-bm.minhd -45
-bm.maxhd 5
-bm.minvd 1
-bm.maxvd 1
-mask.inleft image1_epipolar_mask.tif
-mask.inright image2_epipolar_mask.tif
-io.outmetric 1
-bm.metric ncc
-bm.subpixel dichotomy
-bm.medianfilter.radius 5
-bm.medianfilter.incoherence 2.0
```

The application creates by default a two bands image : the horizontal and vertical disparities.

The *BlockMatching* application gives access to a lot of other powerful
functionalities to improve the quality of the output disparity map.

Here are a few of these functionalities:

-io.outmetric: if the optimal metric values image is activated, it will be concatenated to the output image (which will then have three bands: horizontal disparity, vertical disparity and metric value)

-bm.subpixel: Perform sub-pixel estimation of disparities

-mask.inleft and -mask.inright: you can specify a no-data value which will discard pixels with this value (for example the epipolar geometry can generate large part of images with black pixels) This mask can be easily generated using the

*BandMath*application:otbcli_BandMath -il image1_epipolar.tif -out image1_epipolar_mask.tif -exp "if(im1b1<=0,0,255)"

otbcli_BandMath -il image2_epipolar.tif -out image2_epipolar_mask.tif -exp "if(im1b1<=0,0,255)"

-mask.variancet : The block matching algorithm has difficulties to find matches on uniform areas. We can use the variance threshold to discard those regions and speed-up computation time.

-bm.medianfilter.radius 5 and -bm.medianfilter.incoherence 2.0: Applies a median filter to the disparity map. The median filter belongs to the family of nonlinear filters. It is used to smooth an image without being biased by outliers or shot noise. The radius corresponds to the neighbourhood where the median value is computed. A detection of incoherence between the input disparity map and the median-filtered one is performed (a pixel corresponds to an incoherence if the absolute value of the difference between the pixel value in the disparity map and in the median image is higher than the incoherence threshold, whose default value is 1). Both parameters must be defined in the application to activate the filter.

Of course all these parameters can be combined to improve the disparity map.

Figure 2: Horizontal disparity and optimal metric map

## From disparity to Digital Surface Model¶

Using the previous application, we evaluated disparities between images. The next (and last!) step is now to transform the disparity map into an elevation information to produce an elevation map. It uses as input the disparity maps (horizontal and vertical) to produce a Digital Surface Model (DSM) with a regular sampling. The elevation values is computed from the triangulation of the “left-right” pairs of matched pixels. When several elevations are available on a DSM cell, the highest one is kept.

First, an important point is that it is often a good idea to rework the
disparity map given by the *BlockMatching* application to only keep
relevant disparities. For this purpose, we can use the output optimal
metric image and filter disparities with respect to this value.

For example, if we used Normalized Cross-Correlation (NCC), we can keep
only disparities where optimal metric value is superior to \(0.9\).
Disparities below this value can be consider as inaccurate and will not
be used to compute elevation information (the *-io.mask* parameter can
be used for this purpose).

This filtering can be easily done with **OTB Applications** .

We first use the *BandMath* application to filter disparities according
to their optimal metric value:

```
otbcli_BandMath -il disparity_map_ncc.tif
-out thres_hdisparity.tif uint8
-exp "if(im1b3>0.9,255,0)"
```

Then, we concatenate thresholded disparities using the
*ConcatenateImages* :

```
otbcli_ConcatenateImages -il thres_hdisparity.tif thres_vdisparity.tif
-out thres_hvdisparity.tif
```

Now, we can use the *DisparityMapToElevationMap* application to compute
the elevation map from the filtered disparity maps.

```
otbcli_DisparityMapToElevationMap -io.in disparity_map_ncc.tif
-io.left image1.tif
-io.right image2.tif
-io.lgrid outimage1_pyramid.tif
-io.rgrid outimage2_pyramid.tif
-io.mask thres_hdisparity.tif
-io.out disparity_map_ssd_to_elevation.tif
-hmin 10
-hmax 400
-elev.default 50
```

It produces the elevation map projected in WGS84 (EPSG code:\(4326\)) over the ground area covered by the stereo pair. Pixels values are expressed in meters.

Figure 3: Extract of the elevation map over Pyramids of Cheops.

This is it Figure 3 shows the output DEM from the Cheops pair.

## One application to rule them all in multi stereo framework scheme¶

An application has been added to fuse one or multiple stereo
reconstruction(s) using all in one approach : *StereoFramework* . It
computes the DSM from one or several stereo pair. First of all the user
have to choose his input data and defines stereo couples using
*-input.co* string parameter. This parameter use the following
formatting convention “ \(index_{0}\) \(index_{1}\),
\(index_{2}\) \(index_{3}\), …”, which will create a first
couple with image \(index_{0}\) and \(index_{1}\), a second with
image \(index_{1}\) and \(index_{2}\), and so on. If left blank
images are processed by pairs (which is equivalent as using “ 0 1,2 3,4
5 ” …). In addition to the usual elevation and projection parameters,
main parameters have been splitted in groups detailled below:

- Output :
output parameters : DSM resolution, NoData value, Cell Fusion method,

- : output projection map selection.
- : Spatial Sampling Distance of the output DSM in meters
- : DSM empty cells are filled with this float value (-32768 by default)
- : Choice of fusion strategy in each DSM cell (max, min, mean, acc)
- : Output DSM
- : Output DSM extent choice

- Stereorect :
Direct and inverse stereorectification grid subsampling parameters

- : Step of the direct deformation grid (in pixels)
- : Sub-sampling of the inverse epipolar grid

- BM :
Block Matching parameters.

- : Block-matching metric choice (robust SSD, SSD, NCC, Lp Norm)
- : Radius of blocks for matching filter (in pixels, \(2\) by default)
- : Minimum altitude below the selected elevation source (in meters, -20.0 by default)
- : Maximum altitude above the selected elevation source (in meters, 20.0 by default)

- Postproc :
Post-Processing parameters

- : use bijection consistency. Right to Left correlation is computed to validate Left to Right disparities. If bijection is not found pixel is rejected
- : use median disparities filtering (disabled by default)
- : use block matching metric output to discard pixels with low correlation value (disabled by default, float value)”);

- Mask :
Compute optional intermediate masks.

- : Mask for left input image (must have the same size for all couples)
- : Mask for right input image (must have the same size for all couples)
- : This parameter allows to discard pixels whose local variance is too small. The size of the neighborhood is given by the radius parameter. (disabledd by default)

## Stereo reconstruction good practices¶

The parameters and are used inside the application to derive the minimum and maximum horizontal disparity exploration, so they have a critical impact on computation time. It is advised to choose an elevation source that is not too far from the DSM you want to produce (for instance, an SRTM elevation model). Therefore, the altitude from your elevation source will be already taken into account in the epipolar geometry and the disparities will reveal the elevation offsets (such as buildings). It allows you to use a smaller exploration range along the elevation axis, causing a smaller exploration along horizontal disparities and faster computation.

and have also a deep impact in time consumption, thus they have to be carefully chosen in case of large image processing.

To reduce time consumption it would be useful to crop all sensor images
to the same extent. The easiest way to do that is to choose an image as
reference, and then apply *ExtractROI* application on the other sensor
images using the fit mode option.

## Algorithm outline¶

The following algorithms are used in the application: For each sensor pair

- Compute the epipolar deformation grids from the stereo pair (direct and inverse)
- Resample into epipolar geometry with BCO interpolator
- Create masks for each epipolar image : remove black borders and resample input masks
- Compute horizontal disparities with a block matching algorithm
- Refing Disparities to sub-pixel precision with a dichotomy algorithm
- Apply an optional Median filter
- Filter disparities based on the correlation score (optional) and exploration bounds
- Translate disparities in sensor geometry
- Convert disparity map to 3D map

Then fuse all 3D maps to produce DSM with desired geographic or cartographic projection and parametrizable extent.