Affiliate links on Android Authority may earn us a commission. Learn more.

Capturing depth: structured light, time of flight, and the future of 3D imaging

Beyond light field photography, this article examines several other other methods of adding depth information to otherwise 2-D photography, and why this ability is getting so much attention in the mobile device market.

•

Published onJune 22, 2018

In a recent article, I looked at the demise of Lytro, maker of the first consumer “light field” camera, and what it meant for the future of this technology in mobile devices. As intriguing as some of its results might be, light field imaging isn’t the only option for capturing depth information and producing 3D images with mobile devices. One of the more interesting possibilities – one you might already be using – is the concept of “structured light,” a term that covers several related methods for adding depth information to otherwise ordinary “2D” photography.

Both light field photography and structured light have only become practical in the past decade or two, owing to the development of relatively inexpensive graphics processing hardware and sophisticated image processing algorithms.

Using this approach permits simple camera hardware to deliver images that would have been impossible just a few years ago.

Together, they’ve enabled the consumer-market use of computational photography methods, in which calculations take the place (and then some) of conventional optics in manipulating the light (data) that makes up the image. Using this approach, in which the data provided by digital image sensors is processed to derive additional information beyond what we see in the plain “snapshot,” permits simple camera hardware to deliver images that would have been impossible just a few years ago.

Structured light, in particular, is based on a fairly easy principle to understand. In addition to the camera itself, a structured light system adds a light source, a projector of some sort, to illuminate the object being imaged with stripes or similar patterns that are then “seen” by the camera. The regular geometry of this illumination is distorted by the surface of the object, and from this distortion a depth map of the object can be calculated. There’s no need for any of this to be visible to the user, either. The pattern of lines can just as effectively be projected in invisible infrared (IR) light, and still be readily picked up by the camera sensor.

KU Leuven Structured light imaging works by noting how a 3D object distorts a grid or other regular pattern of light projected onto it

You’ve very likely already seen this method at work; it’s the basis of one of the more popular gaming accessories to be introduced in recent memory, Microsoft’s Kinect line of motion sensors used with their Xbox gaming consoles. (More correctly, this method was the basis of the original Kinect; with the introduction of the Kinect for Xbox One in 2013, Microsoft changed from an IR structured light system to a different depth map method, which we’ll look at in a moment.) If you look at an original Kinect, you’ll see what looks like two cameras near the center of the device, plus another optical component located well off to the left of center. That’s the IR source, and it projects a grid of lines to be “seen” by the IR camera, a 640 x 480 monochrome sensor that’s the rightmost of the two center cameras. The other is a 1280 x 960 RGB camera, which captures full-color visible light imagery.

Microsoft Microsoft Kinect motion- and depth-sensing game controller. The leftmost optical element is the IR projector.

The IR system, operating at 30fps, provided depth information on any object within a range of roughly four to 11 feet in front of the unit. This could be combined with the color camera’s data to effectively generate a limited 3-D version of what was in the Kinect’s field of view. All of this only cost about $150 at launch.

Structured light is based on an easy-to-understand principle, one you would know from Microsoft's original Kinect sensor for Xbox or more recently in the iPhone X's FaceID sensor.

The Kinect for Xbox One used another method to produce data on the depth aspect of a scene. This model abandoned the IR-based structured light approach in favor of a time of flight camera. The basic hardware used in this method is very similar to the structured-light system — it just needs a light source and a camera. In this case, the light source flashes at regular intervals, and the individual pixels of the camera measure how long it takes the light to reach the subject at a given location, get reflected, and return — kind of like sonar. Since light travels at a very precisely-known speed (covering about a foot every one billionth of a second), measuring that time gives you the distance to the subject. Again, processor speeds only reached the point where this could be performed economically in consumer-market gear fairly recently. A 3GHz clock rate, for instance, can measure distances with an accuracy of about 2 inches, enough to get a pretty good idea of how a human body is oriented and what it’s doing.

Sony Xperia XZ2, showing a scan from their 3D Creator imaging app.

Sony also recently made some noise in the consumer 3D imaging area with the “3D Creator” app it introduced last year on its then-flagship Xperia XZ1 smartphone. This one is the closest to the “light field” approach discussed in the Lytro article last week. However, rather than capturing the image from multiple perspectives simultaneously, Sony asks the user to physically move the phone around to permit the camera to scan the object.

Besides that, the process is very similar. Sophisticated algorithms take the set of images captured from all angles and match up features to synthesize a 3D image. It’s somewhat time consuming, and still far from being perfect, but it shows yet another viable path to three-dimensional imaging.

But, so what?

Throughout its history, 3D imaging has been basically a gimmick. It shows up every so often in the entertainment industry to making a splash, and then rapidly fades from the public eye (as we covered here).

The reason for this sudden interest in 3D in the mobile market turns out to have very little to do with how TV and movies have used it in the past.

This sudden interest in 3D in the mobile market turns out to have very little to do with how TV and movies. Note that in all of the discussion so far, not a word has been said about capturing stereoscopic imagery — the traditional “3D” picture or movie — for direct viewing.

Instead, one of the biggest factors driving the addition of 3D imaging capabilities to mobile tech is the recent explosion of interest in virtual reality and augmented reality. A good VR experience relies on being able to produce all sorts of objects in convincing 3D — including yourself and your personal items, should you want to bring them into the virtual world you’re experiencing.

Of course, the creators of VR games, tours, and other such immersive environments can create breathtakingly realistic three-dimensional versions of Tokyo, Arkham Asylum, or the Millenium Falcon, but they have no idea how to put you, or your fellow VR travelers there. You’re going to have to provide those images yourself.

Augmented reality, which places computer-generated images into the world around you, can also be vastly improved not only by capturing good models of everyday objects, but also by better understanding what your surroundings are really like in terms of depth.

Placing a CGI character on the real table in front of you is a lot less convincing when that character sinks a few inches into the table top, or walks through it. Adding accurate depth information to high-resolution photos or videos can also enhance device security, as more and more mobile devices turn to facial recognition and other biometric techniques to replace older forms of protection like passcodes and patterns.

Another recent development driving interest in 3D imaging is the rise of 3D printing technology at the consumer level. While professional — or even serious amateur — use of this tech requires far more accurate 3D capture of objects than what’s currently possible with smartphone-level imaging, a lot of home solid-print enthusiasts will be perfectly happy with what their structured-light or time-of-flight imaging systems can give them in their current state.

Capturing depth in addition to the usual two dimensions is going to be a must-have feature for our mobile devices in the very near future.

Quality keeps improving, too. Citing the VR and AR markets among the factors driving the growth of market interest in 3D computer vision, mobile device chip maker Qualcomm last fall announced their SLiM (Structured Light Module) turnkey 3D camera module. When used in conjunction with the company’s Spectra “image signal processor” parts, it delivers a claimed depth accuracy of down to 0.1mm.

Other efforts aimed at bringing high-quality depth imaging to smartphones are also underway. Caltech demonstrated a nanophotonic coherent imager (NCI) chip last year, which relies on an array of scanning laser beams to produce a depth map of objects within its field of view. So far it exists only as a tiny, low-resolution device, but Caltech researchers believe it could be scaled up to much higher resolution imagers and remain inexpensive enough for inclusion in consumer devices.

Given the level of interest and investment from major players in the industry, it’s pretty clear more than just a few people believe capturing depth in addition to the usual two dimensions will be a must-have feature for our mobile devices in the very near future. Don’t be too surprised if your next smartphone sees the world in all three dimensions — and even better than you do.

Let us know how important or useful you think this tech is for mobile devices in the comments below.