Wrapping Rust Structs as Ruby Objects Using the `TypedData` Trait

The Translation Problem: Bridging Ruby and Systems Languages

In the mid-19th century, as the first international telegraph networks were being constructed, they faced a fundamental problem: different operators spoke different languages. A message sent in French had to be carefully translated into English at the border — a manual process that was slow, error-prone, and required specialized clerks. If a clerk made a mistake or lost track of a message, the information was gone forever.

This is, as it so happens, a fairly accurate description of the challenges we face when bridging high-level languages like Ruby with systems languages like C or Rust. Ruby and Rust speak very different languages when it comes to memory management. When you want to pass a complex data structure — like a struct — between them, you need a careful translation process.

Historically, wrapping a low-level C struct within a Ruby object required writing a C extension and relying on macros like Data_Wrap_Struct or, later, TypedData_Wrap_Struct. This approach provided a way to bypass Ruby’s performance bottlenecks by dropping down to a compiled language. However, it also introduced significant maintenance burdens and security risks to the codebase.

When you wrap a C struct, you are responsible for bridging the gap between C’s manual memory management and Ruby’s garbage collector. You must explicitly define how Ruby should allocate the memory, how it should free the memory when the object is no longer needed, and — critically — how it should traverse the struct to find any embedded Ruby objects during the garbage collection marking phase.

In a traditional C extension, this requires defining an rb_data_type_t struct with a series of callback functions:

static const rb_data_type_t my_struct_type = {
    "my_struct",
    {
        my_struct_mark,
        my_struct_free,
        my_struct_memsize,
    },
    0, 0, RUBY_TYPED_FREE_IMMEDIATELY
};

A single mistake in these callback functions — such as forgetting to mark an embedded VALUE or freeing memory that Ruby still references — often leads to memory leaks or segmentation faults that are notoriously difficult to debug in a production environment.

When it comes to wrapping low-level structures in Ruby objects, there are a few possibilities besides traditional C extensions. One could use FFI (Foreign Function Interface) to bind directly to a shared library. FFI, though, still requires careful manual memory management and carries runtime overhead. Another option is a modern Rust extension framework. Some developers — myself included — prefer the philosophy and compile-time safety of the magnus gem, so that is the one we will be discussing here.

Introducing the `TypedData` Trait

To expose a Rust struct to Ruby, the magnus gem provides the TypedData trait. You can think of TypedData as the modern, memory-safe equivalent of TypedData_Wrap_Struct. It defines the contract between your Rust data and the Ruby runtime.

Instead of writing manual C callbacks, magnus handles the integration automatically through procedural macros. When you implement the TypedData trait for a Rust struct, you instruct Ruby on how to safely encapsulate that struct within a Ruby object. This ensures that when the Ruby object is garbage collected, the underlying Rust struct is properly dropped, preventing memory leaks without the risk of segmentation faults.

Building a Practical Example

Let’s examine how to implement this in practice. Imagine we are building an application that processes geographic coordinates. We want to perform complex distance calculations quickly in Rust, but we need to interact with these coordinates as standard objects in Ruby.

First, let’s look at the Rust struct. We use the #[magnus::wrap] macro to automatically implement the TypedData trait.

use magnus::{class, define_class, function, method, prelude::*, Error};

#[magnus::wrap(class = "GeoCoordinate", free_immediately, size)]
struct GeoCoordinate {
    latitude: f64,
    longitude: f64,
}

impl GeoCoordinate {
    fn new(latitude: f64, longitude: f64) -> Self {
        Self { latitude, longitude }
    }

    fn distance_to(&self, other: &GeoCoordinate) -> f64 {
        // A conceptual distance calculation for demonstration
        ((self.latitude - other.latitude).powi(2) + 
         (self.longitude - other.longitude).powi(2)).sqrt()
    }
}

The #[magnus::wrap] macro is the cornerstone of this integration. The class = "GeoCoordinate" argument specifies the name of the Ruby class that will wrap our struct. The free_immediately and size arguments are performance optimizations that provide Ruby’s garbage collector with hints about how to manage the object’s lifecycle.

Of course, defining the struct is only the first part of the process. We also need to register the class and its methods with the Ruby runtime so that they can actually be invoked from our Ruby code.

We do this within a function annotated with #[magnus::init]. This function serves as the entry point for our native extension; it executes when the extension is first loaded by Ruby.

#[magnus::init]
fn init() -> Result<(), Error> {
    // Define the Ruby class, inheriting from Object
    let class = define_class("GeoCoordinate", class::object())?;
    
    // Bind the Rust methods to the Ruby class
    class.define_singleton_method("new", function!(GeoCoordinate::new, 2))?;
    class.define_method("distance_to", method!(GeoCoordinate::distance_to, 1))?;
    
    Ok(())
}

In this setup, we use define_class to create the GeoCoordinate class. We then bind the Rust new function to the Ruby .new singleton method, and the distance_to method to the corresponding Ruby instance method. Notice that magnus automatically handles the type conversion: the distance_to method expects a reference to another GeoCoordinate, and magnus safely unboxes the Ruby object back into a Rust reference before invoking the method.

Once compiled, this extension allows us to use the Rust struct natively within Ruby, as if it were a normal Ruby class. For example, if we had two points we wanted to find the distance between, we could do so like this:

require_relative "geo_coordinate_ext"

point_a = GeoCoordinate.new(40.7128, -74.0060)
point_b = GeoCoordinate.new(34.0522, -118.2437)

# Ruby implicitly passes point_b, and magnus converts it to &GeoCoordinate
distance = point_a.distance_to(point_b)
puts "Distance: #{distance}"

Memory Safety and the Garbage Collector

When we pass free_immediately and size to the #[magnus::wrap] macro, we are making specific promises to the Ruby garbage collector. Before you use these flags in production, though, it is wise to ensure you understand exactly what promises you are making to the Ruby garbage collector. Memory safety is a primary benefit of Rust; misusing these flags can circumvent those protections.

The size argument instructs magnus to report the exact memory footprint of the Rust struct (using std::mem::size_of) to Ruby. Ruby’s garbage collector uses this information to accurately track memory usage and decide when to trigger a collection cycle. If you allocate significant memory on the heap within your Rust struct — for example, by storing a large Vec<f64> — you should consider implementing a custom size function to report the heap allocation size to Ruby as well.

The free_immediately argument indicates that the Rust struct can be dropped as soon as the Ruby object is swept, without waiting for the main thread or a safe execution context. This is safe for our GeoCoordinate struct because it only contains primitive f64 values and does not interact with the Ruby VM during its destruction.

Handling Embedded Ruby Objects

One may wonder: what happens if our Rust struct needs to hold a reference to a Ruby object? For instance, what if we wanted to store a metadata string or an array of tags directly on the coordinate?

If your Rust struct contains a Ruby VALUE (represented in magnus by types like Value, RString, or RArray), you must not use the free_immediately flag without careful consideration. Furthermore, you must explicitly tell Ruby’s garbage collector about these embedded objects so they are not prematurely collected during the mark phase.

To achieve this, magnus allows you to implement the DataTypeFunctions trait to provide a custom mark function. This manual marking process, strictly speaking, reintroduces a small element of the manual memory management we were trying to avoid. However, Rust’s type system still ensures we don’t accidentally free the memory ourselves. Here is how we might implement it:

use magnus::{gc, DataTypeFunctions, RString, TypedData};

#[magnus::wrap(class = "TaggedCoordinate", size)]
struct TaggedCoordinate {
    latitude: f64,
    longitude: f64,
    tag: RString, // A Ruby string embedded in our Rust struct
}

// We must manually implement mark to protect the Ruby string
impl DataTypeFunctions for TaggedCoordinate {
    fn mark(&self) {
        gc::mark(self.tag);
    }
}

By calling gc::mark on the embedded RString, we ensure that Ruby knows this string is still in use as long as the TaggedCoordinate object is alive. Failing to do this would result in a segmentation fault if the string were garbage collected and we later attempted to access it from Rust.

Strategic Integration and Trade-offs

While wrapping Rust structs in Ruby objects provides a powerful tool for optimization, it is not a silver bullet. It introduces a Foreign Function Interface (FFI) boundary, and crossing this boundary has a slight performance overhead. Every time Ruby calls a Rust method, magnus must parse the arguments, unbox the underlying Rust structs, perform the computation, and box the result back into a Ruby object.

Therefore, we generally should design extensions to minimize boundary crossings. Instead of creating fine-grained APIs where Ruby constantly asks Rust for individual properties — like point.latitude and point.longitude in a tight loop — we should favor coarse-grained APIs where you pass an entire collection of data to Rust. Allow Rust to perform a complex, computationally intensive batch of work, and return the final result to Ruby.