Skip to content

Ask not what the compiler can do for you

NarsGNA edited this page Jun 21, 2023 · 4 revisions

By Sampras Lopes, Nishant Joshi & Sanchith Hegde

Everyone knows that Rust is the language of the gods. Here’s how we still managed to shoot ourselves in the foot. We were gearing up for a release and had just deployed to our internal test environment. We were not particularly seeking any adventure that evening but our application promptly crashed. We checked the logs to quickly find out that we had reached the thread stack limit.

On redeploying after enabling backtraces(RUST_BACKTRACE = 1), we learnt that Rust does not print a backtrace for stack overflows. Completely clueless, we rolled back to the latest stable version.

The code was working fine locally. So, we tried reproducing the issue in another test environment, the failures were intermittent and random, adding to our woes. We decided to get our hands dirty and run the code on rust-lldb with restricted stack size. After multiple runs, we saw some of the frames had many KBs of stack usage.

We tried bisecting through the commits and after a long agonizing wait, were able to locate the commit causing the issue.

Dormammu, I've Come to Bargain

We reached the offending function and as it turned out, the bug was literally one word long. See if you can spot it.

Consider the example below:

struct PsqlWrapper;

struct KafkaWrapper {
	inner: PsqlWrapper
}

#[async_trait::async_trait]
trait DBInterface {
	async fn do_something_first(&self);
	async fn do_something_second(&self);
}

#[async_trait::async_trait]
impl DBInterface for PsqlWrapper {
	async fn do_something_first(&self) {
		// Do some Psql related stuff
	}
	async fn do_something_second(&self) {
		// Do some Psql related stuff
	}
}

#[async_trait::async_trait]
impl DBInterface for KafkaWrapper {
	async fn do_something_first(&self) {
		// Do some kafka related stuff ...
		self.inner.do_something_first().await
	}


	async fn do_something_second(&self) {
		self.do_something_second().await
	}
}

We have 2 structs here. The PsqlWrapper is a struct that allows us to connect to the Postgres Database. In addition, we implemented a KafkaWrapper around the PsqlWrapper. Inside the KafkaWrapper, we generated a kafka event where needed, and intended to call the underlying function from the PsqlWrapper. So, we just used PsqlWrapper as a field within KafkaWrapper.

But instead of calling the function from the PsqlWrapper, we ended up calling the one from the KafkaWrapper. So, the fix here was simple.

- // Instead of
- self.do_something_second().await
+ // It should be
+ self.inner.do_something_second().await

We caused an unconditional recursion unbeknownst to the Rust compiler. What?

There was no reason to suspect the Kafka wrapper because the failing API relied on do_something_second which we knew didn’t add any logic at all. We also used the PsqlWrapper directly in the local setup. That’s why it worked fine locally.

Who’s to blame here?

Mostly us, but we wanted to understand how this evaded the ever watchful rust compiler. In cases of unconditional recursion, Rust provides us with a helpful warning indicating that there is a recursion. But, in this case neither Rust nor Clippy gave us any hint. Why is that the case? Is it because of async? turns out, it isn’t. In the case of async, Rust is quite strict. It throws an error at compile time. How did rust fail to catch this problem? It's how the async is implemented here that allowed the issue to sneak through.

async_trait is the second culprit under investigation. Currently, Rust on its own doesn’t support async traits, so someone came up with a clever way to achieve it. (GG dtolnay/async_trait). They used the dyn Future. Fun fact, Rust itself recommends using the dyn Future if we want to achieve recursion. However Rust doesn’t warn us about unconditional_recursion when we use it.

async fn do_something() {
    do_something().await;
}

/*
error[E0733]: recursion in an `async fn` requires boxing
  --> src/main.rs:10:25
   |
10 | async fn do_something() {
   |                         ^ recursive `async fn`
   |
   = note: a recursive `async fn` must be rewritten to return a boxed `dyn Future`
   = note: consider using the `async_recursion` crate:[ https://crates.io/crates/async_recursion](https://crates.io/crates/async_recursion)
*/

If you’re curious, here’s the general idea behind how async_trait (the crate) achieves async trait (the behavior). It takes a function that returns impl Future<Output = …> and converts it into Pin<Box<dyn Future<Output = …>>>. This acts like a concrete type and Rust allows it to be the return type of a function inside traits.

(Pin<Box<...>> just pins the internal value to a specific location in memory, and prevents it from moving)

A function that returns an impl Future:

fn do_something() -> impl Future<Output = ()> {
    async { do_something().await }
}

/*
error[E0720]: cannot resolve opaque type
 --> src/main.rs:7:22
  |
7 | fn do_something() -> impl Future<Output = ()> {
  |                      ^^^^^^^^^^^^^^^^^^^^^^^^ recursive opaque type
8 |     async { do_something().await }
  |     ------------------------------
  |     |       |
  |     |       async closure captures itself here
  |     returning here with type `[async block@src/main.rs:8:5: 8:35]`
*/

A function that returns a dyn Future:

fn do_something() -> Pin<Box<dyn Future<Output = ()>>> {
    Box::pin(async { do_something().await })
}

/*
    Finished dev [unoptimized + debuginfo] target(s) in 0.02s
*/

However, it’s clear that this is something that was overlooked while writing as well as reviewing our code. Got to love a language that makes you trust it so much.

It's a compiler not a Jedi, don't expect it to read minds.

So, when we talk about Rust being very secure, there comes a point where one must ponder about just how much responsibility we can leave on the compiler. Even with all the brilliant features that the compiler provides there are always some things that the compiler might not catch. This is where the precision of the developer and the keen eye of the reviewer matter the most.