Optimizing Active Record Memory Usage in Large Rails Background Jobs
When engineering teams scale Ruby on Rails applications, background job processors like Sidekiq, Resque, or Solid Queue frequently become the primary consumers of infrastructure resources. Processing large datasets in these background workers often leads to severe memory bloat, resulting in Out-of-Memory (OOM) crashes and escalating cloud hosting costs.
Key takeaways for engineering leaders and developers:
- The Problem: Active Record is designed for developer ergonomics, not strictly for memory efficiency. Loading thousands of records simultaneously instantiates large, memory-heavy Ruby objects.
- The Mechanism: Each Active Record model tracks its attributes, original database state, and association cache. When allocated in bulk, these objects overwhelm the Ruby Garbage Collector (GC), leading to memory fragmentation and permanent bloat.
- Immediate Mitigation: Replacing standard iterators with batch processing methods like
find_each, and utilizingpluckto retrieve scalar values without instantiating full model objects. - Long-Term Strategy: Upgrading to modern Ruby versions (Ruby 3.2 or 3.3) to leverage Variable Width Allocation and improved garbage collection, which structurally reduces the memory overhead of the entire application.
The Architecture of Active Record Memory
To understand why a background job consumes so much RAM, we must first look at what happens when Active Record translates a database row into a Ruby object.
When you execute a query, the PostgreSQL or MySQL adapter returns an array of raw strings and integers. Active Record takes this raw data and allocates an ActiveRecord::Base instance for every single row. These instances are inherently heavy. They maintain a hash of the current attributes, a hash of the original attributes (for tracking changes), an association cache, and numerous internal state flags.
If we ask the database for 50,000 records at once, the Ruby virtual machine must request external memory from the operating system using the C malloc function. It creates 50,000 complex objects, holding them all in the ObjectSpace simultaneously.
This leads directly to memory fragmentation. When the job finishes and the objects are eventually garbage collected, the underlying operating system often cannot easily reclaim the fragmented memory space. As a result, the worker process retains a large memory footprint indefinitely, forcing your cloud provider to terminate the container with an OOM error.
The Danger of Unbounded Queries
The most common anti-pattern in Rails background jobs is the unbounded query. Developers often write code that works perfectly in development with a small dataset, but fails in production due to resource exhaustion.
Consider a daily job that emails inactive users:
```ruby class InactiveUserMailerJob < ApplicationJob def perform users = User.where(active: false, last_login_at: ...30.days.ago)users.each do |user| UserMailer.reengagement_email(user).deliver_later end end end
<p>
In this example, calling <code>.each</code> on the Active Record relation forces the framework to load the entire result set into memory at once. If you have 100,000 inactive users, you have instantiated 100,000 heavy Ruby objects.
</p>
</section>
<section id="practical-strategies-for-memory-reduction">
<h1>Practical Strategies for Memory Reduction</h1>
<p>
We can mitigate this bloat by changing how we instruct Active Record to fetch and instantiate data.
</p>
<section id="batch-processing-with-find_each">
<h2>1. Batch Processing with <code>find_each</code></h2>
<p>
The most practical, durable solution for iterating over large tables is batch processing. Active Record provides <code>find_each</code> and <code>find_in_batches</code> to handle this automatically.
</p>
```ruby
class InactiveUserMailerJob < ApplicationJob
def perform
users = User.where(active: false, last_login_at: ...30.days.ago)
# Loads records in batches of 1,000 by default
users.find_each do |user|
UserMailer.reengagement_email(user).deliver_later
end
end
end
Under the hood, find_each orders the records by their primary key and uses LIMIT and OFFSET (via the WHERE clause) to fetch only 1,000 rows at a time. The garbage collector can easily clean up the previous 1,000 objects before the next batch is loaded, keeping the job’s memory footprint flat and predictable.
2. Bypassing Instantiation with pluck
If you do not strictly need the full Active Record model, you should avoid creating it entirely. When a background job only needs to trigger an API call or enqueue another job with specific IDs, pluck is the correct tool.
Highly efficient: Returns an array of integers directly from the adapter
user_ids = User.where(active: false).pluck(:id)
<p>
By using <code>pluck</code>, we skip the Active Record instantiation pipeline completely. The memory required for an array of integers is trivially small compared to an array of model instances.
</p>
</section>
<section id="discarding-state-with-select">
<h2>3. Discarding State with <code>select</code></h2>
<p>
At times, you must pass a model instance to a service object or mailer, but you know you will only use a few specific columns. By default, <code>SELECT *</code> retrieves every text column, JSONB payload, and integer in the table.
</p>
<p>You can explicitly limit the memory payload by specifying the columns you need:</p>
```ruby
users = User.select(:id, :email, :first_name).where(active: false)
users.find_each do |user|
UserMailer.reengagement_email(user).deliver_later
end
By omitting heavy columns — like a bio text field or a preferences JSON payload — the resulting Ruby objects are significantly smaller.
4. Releasing Memory with Garbage Collection Tuning
While it is generally recommended to let Ruby manage its own garbage collection, long-running jobs that process millions of records can sometimes outpace the GC’s heuristics.
If you have exhausted all batching and selective loading strategies and still face memory bloat, you can manually hint the garbage collector between large batches. This is a last resort, but it is effective in highly specific, memory-constrained environments:
```ruby User.find_in_batches(batch_size: 2000) do |batch| process_batch(batch)Clear the batch from local scope
batch = nil
Manually trigger a major GC cycle
GC.start end
</section>
</section>
<section id="the-impact-of-ruby-version-upgrades">
<h1>The Impact of Ruby Version Upgrades</h1>
<p>
While application-level optimization is necessary, the most comprehensive solution to memory bloat is often upgrading the underlying language and framework.
</p>
<p>
Recent versions of Ruby have introduced significant improvements to memory management. Ruby 3.1 and 3.2 implemented Variable Width Allocation (VWA), which allows the VM to store small strings directly inside the <code>RVALUE</code> slot, bypassing the need for external <code>malloc</code> calls entirely. This significantly reduces memory fragmentation across the entire application.
</p>
<p>
Furthermore, running your background jobs on older, unsupported versions of Ruby or Rails forces your infrastructure to work harder. By prioritizing a version upgrade, engineering teams often see immediate, structural reductions in memory consumption — directly lowering the required instance sizes on AWS, Heroku, or Render.
</p>
<p>
Ultimately, writing memory-efficient background jobs requires a deliberate approach to data retrieval. By avoiding unbounded queries, leveraging batch processing, and keeping your infrastructure upgraded, you can ensure your background workers remain stable and cost-effective as your application scales.
</p>
</section>Sponsored by Durable Programming
Need help maintaining or upgrading your Ruby on Rails application? Durable Programming specializes in keeping Rails apps secure, performant, and up-to-date.
Hire Durable Programming