← Main

Maintainable string and bytes pre-allocation in Rust

by David Sherret

A common performance optimization in software development is to pre-allocate strings/bytes before appending to them. In Rust, failing to do this may cause the implementation of std::string::String or std::vec::Vec to frequently reallocate bytes internally to deal with its growing size, which is slow. Additionally, the amount of bytes we end up with at the end may be way more than we actually need.

Why we pre-allocate

Take the following code:

fn main() {
  let mut text = String::new();
  let spaces = " ".repeat(100); // a 100 byte string
  println!("Len: {}, Capacity: {}", text.len(), text.capacity());
  for _ in 0..9 {
    text.push_str(&spaces);
    println!("Len: {}, Capacity: {}", text.len(), text.capacity());
  }
}

When we run this, we can see the string being reallocated often as the length increases and in the end we're left with an over-allocated string of 1600 bytes instead of 900:

Len: 0, Capacity: 0
Len: 100, Capacity: 100
Len: 200, Capacity: 200
Len: 300, Capacity: 400
Len: 400, Capacity: 400
Len: 500, Capacity: 800
Len: 600, Capacity: 800
Len: 700, Capacity: 800
Len: 800, Capacity: 800
Len: 900, Capacity: 1600

However, when the capacity is correctly pre-allocated:

fn main() {
  let mut text = String::with_capacity(900);
  let spaces = " ".repeat(100); // a 100 byte string
  println!("Len: {}, Capacity: {}", text.len(), text.capacity());
  for _ in 0..9 {
    text.push_str(&spaces);
    println!("Len: {}, Capacity: {}", text.len(), text.capacity());
  }
}

We only allocate text once and end up with a string equal to its capacity:

Len: 0, Capacity: 900
Len: 100, Capacity: 900
Len: 200, Capacity: 900
Len: 300, Capacity: 900
Len: 400, Capacity: 900
Len: 500, Capacity: 900
Len: 600, Capacity: 900
Len: 700, Capacity: 900
Len: 800, Capacity: 900
Len: 900, Capacity: 900

Problem

Seen this kind of code before?

let capacity = items
  .iter()
  .filter_map(|i| i.maybe_name.as_ref())
  .enumerate()
  .map(|(i, name)| if i > 0 { 2 } else { 0 } + name.len())
  .sum::<usize>();
let mut text = String::new();
text.try_reserve_exact(capacity)?;

for (i, name) in items
  .iter()
  .filter_map(|i| i.maybe_name.as_ref())
  .enumerate()
{
  if i > 0 {
    text.push_str(", ");
  }
  text.push_str(name);
}
debug_assert_eq!(text.len(), capacity);

The above code:

  1. Calculates the byte capacity of the string ahead of time.
  2. Allocates bytes with that capacity, returning an error when it doesn't have enough memory to allocate.
  3. Builds up the final string without causing additional allocations of text.
  4. Finally, it does a debug assertion to ensure the final text length matches the capacity we calculated ahead of time.

Although this code is performant, it has a couple of problems:

  1. It's complicated.
  2. No single source of truth.
    • The capacity calculation code could get out of sync with the code that builds the string... the debug assertion helps, but not if all scenarios aren't tested in debug.

Solution

A solution to this problem is to make the code to calculate the capacity the same as the code to build up the string. I've rolled this up into a crate that makes it easy: capacity_builder

// same functionality as the above code, but simpler
use capacity_builder::StringBuilder;

let text = StringBuilder::<String>::build(|builder| {
  for (i, name) in items
    .iter()
    .filter_map(|i| i.maybe_name.as_ref())
    .enumerate()
  {
    if i > 0 {
      builder.append(", ");
    }
    builder.append(name);
  }
})?;

This runs the closure twice: once to compute the capacity and a second time to build the string. The final string will have a length equal to its capacity and will never reallocate itself while it's being built.

Some features:

  • Prevents allocations in the closure by only accepting values by reference (possible thanks to Rust's amazing borrow checker)
  • Numbers can be appended (or anything that implements capacity_builder::StringAppendable)
  • Can be made to work with any string type and not just std::string::String
  • Building up bytes is possible via capacity_builder::BytesBuilder

I've been integrating this into Deno's codebase and it's enabled us to start pre-allocating strings/vectors in complex cases that previously weren't maintainable.

If you have any suggestions or run into any issues, please open an issue on the project's GitHub: https://github.com/dsherret/capacity_builder