We were delighted to get the chance to talk cameras and 3D perception with Ed Niedermeyer and Kirsten Korosec on The Autonocast. This episode is jam-packed with information. So, I've taken the time to offer high-level takeaways below.
Beyond camera vs lidar
"If I was brought in to run this effort at Tesla tomorrow, the most important thing that I would be doing would not be restoring the radar or finding a lidar start-up to crown as the winner by putting them on every Tesla. I would move the cameras on the car, putting them in a different place." - Jason Devitt, CEO
There are two myths fueling a heated debate in the autonomous vehicle space:
- Cameras simply "guess" the distance to nearby objects using neural networks.
- Lidar is the only reliable sensor for measuring depth or for recovering accurate scene geometry.
Both are incorrect. Cameras with overlapping fields of view can measure depth from parallax using techniques called multi-view geometry or stereo.
The origin of the camera myth
Companies like Tesla use monocular vision systems to power advanced driver assistance features. With this approach*, depth is estimated using neural networks that rely on prior knowledge about discrete objects (bikes, pedestrians, other cars), their size, and semantic cues (parallel lines appear to converge as they recede into the distance; points higher in the image are generally farther away, and so on). If the system encounters something it has never seen in training, it will fail.
The loudest voices in perception have highlighted both the power and pitfalls of monocular techniques without considering the staggering potential of parallax: more than one camera with overlapping fields of view.
*Monocular systems can also use motion parallax to recover depth information about stationary objects. Motion parallax does not work when the car is stationary, e.g. when you are in the middle of an unprotected turn and trying to gauge the distance to oncoming traffic, or just moving off from a parked position. It also does not give accurate returns on other moving objects.
The origin of the lidar myth
The original lidar versus camera showdown occurred during the DARPA Grand Challenges. Except, it wasn't much of a showdown. In the early 2000s, there was no viable solution based on computer vision.
"If I had ten years to program the software, I'd use cameras." - David Hall, founder of Velodyne, DARPA Grand Challenge 2004
Lidar was the right decision… eighteen years ago!
Using cameras as the primary sensor for depth and semantic segmentation means more accurate results on both. - Jason Devitt
Our team of 20 scientists come from companies like Argo, SRI, Google, Mapbox, and Apple. They have worked on the ML technology that runs on your iPhone, spent over a decade building 3D sensing applications for satellites, designed camera systems for big AV companies, and more. Our team has spent the last seven years developing a camera-only perception platform called VIDAS.
VIDAS enables vehicles to understand their surroundings in 3D and in real time using automotive grade cameras, with applications from in-dash displays to driver assistance and autonomy. More information about our technology can be found here.
Our flagship customers are in robotics and defense. This year, we are releasing a devkit and SDK for Tier 1 automotive suppliers and OEMS. Most of these devkits have been reserved. If you're interested in reserving one, you can start by booking a live demo with us. And, of course, give our episode a listen.