Pinhole Camera Model

Consider a 1D pinhole camera where the distance from the image sensor to the pinhole is the focal length .
The pinhole is the focal point.
TBD diagram
For reasons that will become apparrent, the focal length is measured in units of pixels.
Define an image coordinate system in units of pixels with origin directly behind the focal point and axis pointing down, axis pointing out into the scene
Define a camera coordinate system in physical units like meters with the same origin and orientation as the image coordinate system.
If light is entering the camera from a 2D object point , what is the coordinate of the image of that point?
TBD diagram, similar triangles
Answer by AAA similar triangles:
This works because both and are measured in pixels and both and are measured in meters (or at least the same physical units).
The approach extends to a 3D scene and a 2D imager with right, down, and out, 3D object point in meters, and 2D image point in pixels:
(1)
The negations in those equations can be inconvenient mathematically.
- They do correspond to the physical inversions that occur in a true pinhole camera, and also in many lens-based cameras.
- But in practice the image can be flipped as part of its transfer off the image sensor.
Abstracting away the inversions enables a different model for the pinhole camera that swaps the place of the pinhole and the imager.
- TBD diagram
- In this model, the pinhole is renamed the center of projection (CoP).
We can also account for the possibility that the CoP is behind some arbitrary point in pixels on the imager, which allows us to move the origin of the image coordinate system to the center of the upper left pixel.
- Typically and where the image is pixels wide and pixels tall. Slight errors in camera manufacturing can make the actual as-built values of and vary (for each specific camera) from this ideal.
The orientation of the image coordinate system remains right, down, and out into the scene.
The camera coordinate system origin remains at the CoP, is still measured in physical units like meters, and still has the same orientation as the image coordinate system.
Pixels are typically square, however, on some cameras this is not true—it is possible for pixels to be rectangular.
- The mathematical model can accommodate such cameras by using independent horizontal and vertical focal lengths .
- In the case of non-square pixels, the “length of a pixel” is typically the smaller of the two pixel side-lengths, and are all measured in units of this length.
Undoing the negations in (1) and incorporating gives the standard pinhole camera projection model:
(2)
It is also common to encapsulate these four intrinsic camera parameters in the camera matrix

(3)
The pinhole camera projection from point in physical units like meters in camera frame to image point in pixels can then be stated as a perspective projection:

or
(4)
The intrinsic parameters and are physical properties of the camera. They may be different for each individual camera, but they are not expected to change over time, except for some hopefully small effects due to vibration and thermal expansion.
How can they be determined?
For some cameras, typically expensive ones intended for machine vision, the manufacturer may supply the camera with a calibration sheet giving these values.
More commonly, and nearly always for consumer-grade cameras, we have to run an experiment to determine them.
- For a known object point and known corresponding image point , the projection model (2) can be considered to give two constraints on the four unknown parameters .
- Thus, it should be theoretically possible to solve for the unknowns given observations of two object points and their corresponding image points.
- It is preferable to use many more pairs to average out any measurement errors.
Conceptually it is not hard to measure the image points, as we can look for them in the actual image data.
- In practice we want to find them automatically. There are various ways to do that, but a common approach is to design a calibration object with well-defined corners between light and dark regions. These can be found automatically in the image by methods similar to edge detection.
- Nowadays variations of chessboards are very common calibration objects, partly for this reason.
It is conceptually harder to measure the locations of the object points in physical units like meters in camera frame.
- You could use a tape measure, but this would be subject to human error, and would be very tedious.
- Perhaps surprisingly, for a calibration object with known geometry—like a specific chessboard pattern—it is possible to simultaneously solve for both the camera intrinsic parameters and the relative pose of the chessboard to the camera.
Fortunately OpenCV implements all of this in the method cvCalibrateCamera2(objectPoints, imagePoints, pointCounts, imageSize, cameraMatrix, distCoeffs, rvecs, tvecs, flags) which assumes a chessboard-style calibration object
- objectPoints - coordinates of points on the calibration object relative to a frame on the object (so this is determined by the design of the chessboard pattern)
- imagePoints, and pointCounts - observed pixel locations corresponding to all the object points, typically for multiple views of the object in different (unknown) poses
- imageSize - size of the camera image in pixels
- cameraMatrix - written on output to be the reconstructed camera matrix as in equation (3) above, which includes the four intrinsic parameters
- distCoeffs - a vector of four or 5 additional camera intrinsic parameters that are also written on output, more info on them below (the number of coefficients calculated, 4 or 5, is determined by the size of the passed matrix into which they will be stored)
- rvecs, tvecs - optional paramters, default NULL. If non-null then these are storage space to return the calibration object pose in camera frame for each of the provided views. More info on these below.
- flags - various options to control the calibration algorithm can be specified here.
Note: any calibration approach based only on camera images is limited to recovering in units of pixels.
- It is not possible by such methods to measure in physical units like meters, or equivalently, to determine the physical size of a pixel.
- This can only be done if the focal length is known in physical units a-priori, or by using a microscope to measure the spacing of pixels on the actual imaging chip.
- For related reasons, some knowledge about the physical shape of an object (such as the chessboard dimensions) is required in order for its pose to be unambiguously recovered by a monocular camera.
- This is true even if the camera can be moved around to observe the object from different viewpoints, unless some information about the geometric relationships of those viewpoints is externally supplied.

Modeling Distortion

In practice we use lens systems instead of actual pinholes.
However, the projection model is nearly the same!
Or at least we can account for the differences introduced by a real lens assembly in a pre-processing step called undistortion.
Undistortion warps each actual camera image so that it appears as if it were acquired by a pinhole camera.
Two common types of distortion are modeled: radial and tangential.
Radial distortion effects are due to non-idealities in the way lenses are commonly ground.
- If uncorrected, they can make the image of a square object appear like a “pincushion”, with curved edges and rounded corners.
- OpenCV typically uses a simplified mathematical model for radial distortion where either two or three parameters approximate the pincushioning effect so that it can be corrected.
Tangential distortion effects are due to the fact that when the lens is assembled to the image sensor its optical axis is generally not perfectly perpendicular to the image plane.
- If uncorrectecd, they can make the image of a square object appear like a trapezoid, even when viewed straight-on.
- OpenCV typically uses a simplified mathematical model for tangential distortion where two parameters approximate the trapezoidal effect so that it can be corrected.
We will not cover the details here, but cvCalibrateCamera2() can also estimate these four or five distortion parameters given the same input data about chessboard corners as above.
- The results are returned in the distCoeffs parameter in the order .
The distortion parameters define a nonlinear warping of the actual camera image to a corrected undistorted image that would be seen by a camera with the same camera matrix .
OpenCV provides the function cvInitUndistortMap(intrinsics, distortions, mapx, mapy) to create a fast lookup datastructure for performing this nonlinear warping for every frame.
- intrinsics, distortions - and or OpenCV matrices, respectively, with or . These are the same as would be returned by cvCalibrateCamera2().
- mapx, mapy - 32-bit floating point single-channel images of the same size (width and height in pixels) as the camera images. These are storage space for the generated lookup datastructures. Each pixel in the undistorted image is generated by interpolating the color at location in the original camera image.
cvRemap(src, dest, map1, map2, flags, fillval) can then be used to perform the remapping for each frame.
- src, dest - the raw camera image and the undistorted image, respectively. Same size/depth/channels.
- map1,map2 - here the precomputed mapx and mapy datastructures
- flags,fillval - allows selection of the pixel interpolation algorithm; fillval is the value to use if the source pixel would have been outside the image

Here is a code snippet demonstrating the use of these APIs:

IplImage uimg = null, mapx = null, mapy = null;

protected IplImage process(IplImage frame){

  if(uimg == null){
    uimg = cvCreateImage(cvGetSize(frame), frame.depth(), frame.channels());
    mapx = cvCreateImage(cvGetSize(frame), IPL_DEPTH_32F, 1);
    mapy = cvCreateImage(cvGetSize(frame), IPL_DEPTH_32F, 1);
    cvInitUndistortMap(intrinsics, distortions, mapx, mapy);
  }

  cvRemap(frame, uimg, mapx, mapy,
          CV_INTER_LINEAR+CV_WARP_FILL_OUTLIERS, cvScalarAll(0));

  return uimg;
}

protected void release() {
  if (uimg != null) { uimg.release(); uimg = null; }
  if (mapx != null) { mapx.release(); mapx = null; }
  if (mapy != null) { mapy.release(); mapy = null; }
  super.release();
}

TBD figures

Extrinsic Calibration

The camera matrix and distortion parameters are properties of the camera itself, independent of how it may be mounted on the robot.
For some applications it can also be useful to know the extrinsic parameters of the camera, which can be defined as the homogenous rigid transformation matrix taking camera frame to robot frame.
With some care, the optional rvecs, tvecs outputs from cvCalibrateCamera2() can be used to recover .
Each of these is a list of vectors.
- Each tvec is the translation vector giving the location of the origin of a coordinate system attached to the chessboard (typically corresponding to the lower left interior chessboard corner) in camera frame.
- Each rvec corresponds to a rotation matrix taking a right-handed 3D coordinate frame attached to the chessboard (typically right, down, into the chessboard) to camera frame.
- The function is called an exponential map or [Rodrigues transform](http://en.wikipedia.org/wiki/Rodrigues’rotationformula). Geometrically, is a 3D vector that defines a spatial rotation: the axis of rotation is the direction of and the amount of rotation about that axis the length in radians, measured CCW with the right hand rule.
- The OpenCV API cvRodrigues2(src, dest, jacobian) implements
  - src,dest - either and or vice-versa to compute the inverse function (also called a log map)
  - jacobian - if non-null then the Jacobian of the transformation is stored here
cvCalibrateCamera2() computes these extrinsic parameters using a method called homography estimation
If you already know the intrinsics (camera matrix and distortion coefficients) then you can access just the algorithm for extrinsic chessboard pose by homography estimation with the API solvePnP(objectPoints, imagePoints, intrinsics, distortions, rvec, tvec)
To perform camera extrinsic calibration for our robot we have a special chessboard which includes markings so that it can be placed on the floor with a predetermined transformation from chessboard frame to robot frame.
Let be an rvec, tvec pair for this chessboard as reported by either cvCalibrateCamera2() or solvePnP() and use these to construct
. (5)
Then
(6)
This extrinsic calibration need only be performed once for a given mounting of the camera on the robot.
TBD figures

3D Ball Tracking

Why didn’t we need any of the above for the visual servoing task in lab 4?
- Because in that task, the object was at a fixed height in world (and hence robot) frame.
- That greatly simplifies the problem because the heading to the object is determined by the column of (the center of) its image, and the distance to the object is determined by the row of (the center of) its image.
- And we didn’t even need to precisely calculate the heading or distance in order to implement simple but effective bang-bang controllers.
Given a calibrated camera, we can now formally give the equations for the location of the ball in robot frame given that it is at height in meters.
Let the image of the center of the ball be at pixel location .
Then the center of the ball must be along the ray starting at the origin in camera frame (the center of projection) and proceeding along the direction
(8)
from the camera through pixel location on the image plane at focal length (assuming square pixels) and into the scene.
Transform the ray from camera frame to robot frame.
- The starting point of the ray in robot frame is the translation vector from :
  (9)
- The direction of the ray in robot frame is given by rotating by the rotation matrix from :
  (10)
Let be the coordinates of the ball in robot frame.
The ball is at an unknown range in meters along the ray from the camera:
(11)
Solve for by intersecting this ray with the plane at height in robot frame:

(12)
Finally plug the solution for from (12) back into (11) to get the ball coordinates in robot frame ( will come out by construction).
TBD figures
If we know the ball’s physical diameter in meters then we can track it in a more general setting where it can move anywhere in 3D, not limited to a known height.
- Measure the diameter in pixels of the image of the ball.
- The basic pinhole projection model (2) enables us to solve for the depth of the ball in camera frame by considering the projections of the leftmost and rightmost points on a diameter of the ball in physical units in camera frame:
  
  (13)
- Intersect the ray with the plane at that distance in camera frame to get the range to the ball:
  
  (14)
  and then plug the result from (14) back into (11).
- Or simply rearrange (2) to calculate the remaining ball 3D coordinates in camera frame:
  (15)
  and then transform by to get the ball coordinates in robot frame.
- TBD figures

CS4610/CS5335: L12 - Camera Calibration and Monocular 3D Vision

Pinhole Camera Model

Modeling Distortion

Extrinsic Calibration

3D Ball Tracking