Exclusive: Autonomous vehicle vision company Oculii partners with one of China’s largest automakers

The Geely partnership is a significant step for Oculii—we talked to the startup's CEO to get the lowdown on its tech

July 21, 2021

• 13 min read

There’s a major debate in the autonomous vehicle space over which vision tech will get us to true autonomy: regular ol’ cameras or lidar (light-emitting optical sensors).

But the conversation generally skips right over radar, which emits radio waves rather than light and is the cheapest and most ubiquitous detection system available today.

Enter Oculii: Founded in 2015, the US-based startup uses AI to dress up radar, a sensing system more than 80 years old, and to help autonomous vehicles “see.” Its name, a spin on the plural of “oculus,” was inspired by the company’s goal to serve as the eyes for future autonomous systems. Its partnerships include Great Wall Motors, Baidu, and Nvidia, and it has raised more than $76 million to date.

And Oculii’s cofounder and CEO Steven Hong exclusively told us the company just struck a partnership with Geely, one of China’s largest automakers—and the startup’s biggest automaker partner yet.

Geely owns Volvo and has invested $9 billion in Daimler, the maker of Mercedes-Benz.
Last year, Geely linked up with Intel’s autonomous driving unit, Mobileye, for advanced ADAS.

Through the partnership, Geely will integrate Oculii’s technology into cars it makes in China, though Oculii wasn’t able to provide further details on the extent or length of the partnership at this time.

But we did get more information about how exactly Oculii’s tech works, straight from Hong. Read on for our discussion about the history of radar, why it makes sense to repurpose the old tech for a new job, and how augmented radar compares to lidar.

In your view, where does radar fit into the perception stack, and why is it less talked about than lidar?

It's pretty interesting: Any autonomous system—whether it's a self-driving car, robot, or self-flying drone—no matter how incredible its intelligence is, it has to be able to perceive the world around it in order to interact safely with all of us. So perception is the tip of the spear when you think about how an autonomous system thinks, sees, and acts.

There are three main modalities for perception. There are camera-based systems; there are lidars, which are based on sending out pulses of light and ranging with that; and then radar, which is based on radio waves in order to create environmental information. Typically, the optical-based systems—camera and lidar—are the most talked about, because traditionally those two have much better spatial resolution than radar. You can see things in more detail and clarity, and you can perceive things the same way a human does.

Now, radar has been used for decades, especially in our automotive systems, for enabling critical safety features like cruise control and emergency braking. There are millions of radars already deployed in vehicles and on the road today. But those radars traditionally have been very low resolution—even though they're very cost-effective and very efficient, you can't really see very much because of the poor resolution. Everything’s blurry.

How does Oculii’s technology fit in, as far as those existing radar sensors?

The technology that we're developing enables those same radar sensors—the ones that are already mass-market-proven and deployed in the millions—to be up to a hundred times better from a resolution and a performance standpoint. That enables radar to be at the same type of spatial resolution and performance as camera and lidar, but these radars are two-to-three orders of magnitude cheaper—and already market-proven and deployed on cars. So what this software enables is autonomy and safety on a mass scale.

Today, if you look at lidars, they're probably too expensive to put on your passenger consumer vehicle. You'll see them on the Waymo self-driving cars in Silicon Valley [Ed. Note: and in the Phoenix, AZ, suburbs] that costs hundreds of thousands of dollars, but you won't see them on your Honda Civic that you're actually going to drive on a daily basis. That’s because the lidar technology today is still too expensive and unproven to be on a passenger [consumer] vehicle.

But that Honda Civic already has several radars. And by adding in our software, we can now enable it to have similar types of resolutions, similar types of performance—and in the future, even potentially superior performance—compared to an optical sensor. So it doesn't necessarily change the cost-size-power envelope that makes it mass manufacturable and scalable.

Can you talk about how radar has changed in the past 70-odd years, since World War II-era radar technology?

Every radar that's ever been built over the last decade-and-a-half has followed a very similar principle, in the sense that radars are traditionally what we call “dumb” sensors. They send the same signal out over and over again; it’s constant, repetitive, and it never changes. As a result, you can get great performance from a radar, but in order to get great performance, the radar's got to be big. It's got to have thousands, if not tens of thousands, of antennas. If your radar is big enough and the aperture is large enough, then you can deliver resolution and performance that is orders of magnitude better than anything you can deliver optically.

So the interesting thing is if you look at the military—the Air Force, the Navy, or even space-based satellites looking down onto the ground—radar actually often provides superior resolution and performance to anything optical, but in order to get that performance, it’s got to be massive. And the size also results in higher power costs, which makes it unattractive for automotive. That’s always been the compromise in radar design: You can build a great radar with very high performance if you're willing to spend the money to have more antennas, a larger size, and a device that consumes more power.

But to be honest, the reason why there are millions of radars on the car today and on the road is because those radars are very small, compact, and cheap, with very few antennas. So you have this dichotomy of trade-offs that you really have to balance.

It’s why automotive systems have that cost-size-power envelope: These sensors cost on the order of $50 or so. In order for four-to-six of them to go on your Honda Civic, the radar sensor has to be cheap enough—you can’t have it cost up to 30% of the cost of the car. That’s why the radars currently on cars are low performance and blurry. It's because they're limited in terms of physical size and number of antennas.

How exactly does your software fit in, in terms of making those existing sensors more intelligent?

Our software breaks this fundamental design choice in radar, which is that you should send the same signal out over and over.

We use an AI-based adaptive waveform that sends different information at different times and learns from the environment so that you're always optimally encoding the information that you need, depending on what scenario you're in. This is a very big shift in the radar paradigm, but it’s commonplace in the communication industry, in which I did my PhD at Stanford. Until about six years ago, I didn't know anything about radar.

In communications, we've always done adaptive modulation and coding; that’s a big reason why, over the past two or three decades, you've seen phones go from kilobit per second dial-up speeds to gigabit speeds with 5G, even though you don't have a bigger phone with thousands of antennas. The reason this is possible is because the processing has gotten more sophisticated, and you embed different information at different times to take advantage of the channel.

So we brought that concept from communication into the radar side, and effectively unlocked the capabilities of radar—using software to expand resolution as opposed to more antennas. This approach allows us to deliver higher resolution for any radar platform, whether it's the small radars on your car for safety systems, like cruise control and collision warning, or the high-end radar that are being built for the next generation of autonomous platforms, or even the military systems with already large antenna arrays.

Our radars natively sense how fast, and in what direction, every single point that it measures is moving. In combination, these allow for a 360-degree, high-resolution and high-performance perception stack, which is two or three orders of magnitude cheaper than a comparable system using a light-based ranging system, like lidar, would have to achieve.

We get a lot of questions about how it compares to lidar. Lidars today, although very expensive, are also quite good, and they provide really good resolution, particularly from a distance of 0-to-75 meters. I would say that from a resolution standpoint, our radar is still not as good in the 0-to-50 meter range—but from the 50-to-450- or 500-meter range, our radar actually outperforms even the median and high-end lidars.

Our audience is relatively well-versed in AI—can you describe how exactly the AI model works?

This is a very different type of AI than has been traditionally used in the camera-based system, for example. Camera, and actually even audio-based AI, is completely passive—in the sense that you can collect a ton of data and then scrape it offline, and because your cameras and microphones are all passive devices, you're just listening. Your AI is completely one-way receiving, and all of the information being received doesn't depend on anything actively happening in the environment.

On the other hand, our AI is what we call “in-the-loop AI.” The way it works is we use an adaptive active transmission that is in the loop making decisions on what to send out based on what it just received. So this type of AI is very different because you have to not just make intelligent decisions on what you receive, but then you have to actively use that information to decide what you do next.

So this type of AI, at a high level, has to be trained and modeled differently. Because of the type of systems that we deploy into—it’s embedded into the tiny, cheap, low-cost, lightweight radars inside the car, which also have limited memory and compute—it also has to be super efficient when implemented, because it can't go on these super computers that are in the cloud.

If you look at the entire radar landscape, we are the only company in the entire world doing radar like this. Every other startup, every other big company, is trying to improve radars by adding more antennas. We are the only ones using this adaptive type of waveform to improve the resolution, both spatially and in sensitivity and in range and field of view.

So you’re saying the system makes decisions based on what it receives to send out slightly different waveforms. Can you elaborate on how this is different from a traditional system?

So if you look at a traditional system, the reason why it requires so many things is because each of the antennas is a slightly different measurement of the same signal at the same time. So the idea is that you send the same signal out over and over repetitively, but you measure it a thousand times. If you have a thousand antennas, and each measurement is slightly better, then that slight change in “phase” is how you determine the direction of the target and where it's coming from.

So what we’re doing is not just sending the same signal, but embedding different phase at different times. What that means is that instead of sending the same constant signal, every waveform we send out has a slightly different phase that we offset at slightly different times based on what we’ve received. The whole concept here is, rather than having physical antennas take a thousand measurements, we use the software to generate additional information that we don't have physically.

If you're familiar with computational photography, it’s a little like that. For example, in your iPhone, each of your lenses is giving you a slightly different field of view, a slightly different resolution, aperture, lighting condition scenario, and more, but your system is taking the data from all three of these, then combining it in such a way that you're getting much more information out of the three lenses than you would out of each of them individually added together.

What we’re doing here is kind of similar in the sense that each of the antennas is giving us something slightly different. And in some of the computational photography systems, particularly in the video-based systems, they use that information to then actively adjust aperture, field of view, the different settings in the next frame of capture—so that you can focus on the target in particular, or you can blur out the target in a different area. In many ways, this is similar because we're using information to change what we then acquire in the next frame.

So in a way, it’s like those topographical maps with different layers on tracing paper that you can choose to overlay, right? So maybe one layer shows height of land and one shows depth of sea, and so on, and when you lay them all on top of each other, they may tell you something new about the topography. Plus, you can look closer at one area if you want more information on it.

That's a great way to describe it. Each one of these measurements is like a different “sheet,” in your analogy, and each sheet individually doesn't give you enough information. But when you put it all together, you get the complete picture, which is much richer than each one of those sheets can deliver independently. That collective combination will give you more information, since each sheet is giving a different take on the environment.

And you’re saying traditional radar is like just one of those sheets, correct? And since it’s sending out the same signal over and over, you don’t have the ability to “zoom in” on one area to gain more information—or, in this case, the target that’s approaching from a certain direction?

Exactly. Most of the big radar manufacturers, they're very hardware-centric, so they've always thought about this problem from a hardware perspective. If you look at the radars in your car, for example, they lock the software five years before it ships, and it's the same software—from when it was in the factory to it’s obsolete 20 or 30 years later. The software never changes, adapts, or learns. But now, you’re seeing companies like Tesla making the car a software system. They’re doing constant updates and adjustments.

So we’ve seen this gradual shift toward a software-centric type of architecture. That’s something really exciting in the automotive industry right now.

This interview has been edited for length and clarity.

Keep up with the innovative tech transforming business

Tech Brew keeps business leaders up-to-date on the latest innovations, automation advances, policy shifts, and more, so they can make informed decisions about tech.