Monitoring Production Metrics Before and After a Major Ruby Version Bump

Mar 15, 2026

In the 17th century, the medical profession faced a profound technical challenge. Physicians understood that fever was a symptom of illness, but they had no objective way to quantify it. Assessing a patient’s temperature relied entirely on a doctor placing a hand on the patient’s forehead – a highly subjective and unreliable method. The solution arrived with the invention of the thermoscope, and eventually the medical thermometer. This instrument allowed physicians to move from subjective feeling to precise, numerical measurement.

Similarly, software development teams often rely on intuition when evaluating the performance of an application. For much of a project’s history, developers might deploy new code and wait to see if the application “feels” slower, or if users complain. When undertaking a major Ruby version bump – such as upgrading from Ruby 3.2 to 3.3, or 3.3 to 3.4 – relying on subjective feeling is a recipe for instability.

Before we invest the engineering hours required to upgrade the Ruby interpreter in production, we need an objective baseline. We need to measure our application’s vital signs so we can definitively prove whether the new version improved performance, caused a memory leak, or introduced unacceptable latency.

Before we get into that, though, let’s take a step back and consider what we should measure. In this article, we will examine how to monitor production metrics before and after a major Ruby version bump, focusing on the specific indicators that are most likely to change during a runtime upgrade:

Memory Utilization (RSS): The amount of RAM your Ruby processes consume.
Garbage Collection (GC) Activity: The frequency and duration of memory cleanup cycles.
p95 and p99 Response Times: The latency experienced by the slowest 5% and 1% of your users.
Error Rates: The frequency of 500-level HTTP responses and application exceptions.

Approaches to Performance Verification

There are three major approaches to verifying the performance impact of a Ruby upgrade. Depending on the particular circumstances you find yourself in, one of them may be more useful than the other two.

The first is relying entirely on synthetic, local benchmarks. This involves writing scripts that execute specific methods millions of times to measure raw throughput. This is helpful for understanding the interpreter’s theoretical capabilities, but it rarely reflects the real-world complexity of database queries, network latency, and concurrent user traffic.

The second is executing a load testing suite against a staging environment. This is often necessary, and it provides a strong safety net. Maintaining a staging environment that perfectly mirrors production traffic patterns, though, is notoriously difficult and expensive.

The third option is capturing and comparing live production metrics before and after the upgrade. Generally speaking, this is my preferred method. By taking a snapshot of production behavior on the old Ruby version and comparing it to the new version under similar load, we gain an accurate, undeniable picture of the upgrade’s impact.

Key Metrics to Monitor

When evaluating a new Ruby version, the changes to the runtime environment typically manifest in three specific areas: memory utilization, CPU execution time, and error rates.

Memory Utilization and Garbage Collection

Ruby, strictly speaking, manages memory for you – at least in the sense that you don’t need to manually allocate and free memory, as you do in C or Rust. The Ruby Garbage Collector (GC) handles object lifecycles automatically.

One may wonder: if the garbage collector works automatically, why do we need to monitor it? The answer is straightforward. Major Ruby releases frequently introduce changes to how the GC operates or how objects are structured in memory.

For example, Ruby 3.2 introduced YJIT as a production-ready feature^[1], and Ruby 3.3 optimized it further^[2]. While YJIT improves execution speed, it also requires allocating executable memory for the compiled machine code.

To monitor this effectively, we must track two primary metrics:

Resident Set Size (RSS): The total amount of physical memory the Ruby process consumes^[8].
Garbage Collection Pauses: The time the interpreter spends freezing the application to clean up unreferenced objects.

By way of a memory aide, you can think of the Resident Set Size (RSS) as the size of the “plot of land” your application has claimed from the operating system – even if it isn’t actively building on every square inch.

You can expose these metrics programmatically using the built-in GC.stat hash^[7]. Let’s see how this API works interactively before we write a script:

$ irb
irb(main):001> GC.stat[:minor_gc_count]
=> 32
irb(main):002> GC.start
=> nil
irb(main):003> GC.stat[:major_gc_count]
=> 14

The exact numbers you see when you run this will likely vary, of course, depending on what your interpreter has executed since it launched.

Using this programmatic interface, we can build a Rack middleware that logs GC statistics before and after a request:

class GCMonitorMiddleware
  def initialize(app)
    @app = app
  end

  def call(env)
    gc_stat_before = GC.stat
    
    status, headers, response = @app.call(env)
    
    gc_stat_after = GC.stat
    minor_gc_count = gc_stat_after[:minor_gc_count] - gc_stat_before[:minor_gc_count]
    major_gc_count = gc_stat_after[:major_gc_count] - gc_stat_before[:major_gc_count]
    
    # Of course, in a production application, you would send this to Datadog, Prometheus, etc.
    Rails.logger.info("Minor GCs: #{minor_gc_count}, Major GCs: #{major_gc_count}")
    
    [status, headers, response]
  end
end

To use this in a Rails application, you would add it to your configuration:

# config/application.rb
config.middleware.use GCMonitorMiddleware

Note that this middleware wraps the application request. Because it sits at the outer edge of the Rack stack, it captures the garbage collection activity for the entire lifecycle of the HTTP request.

If you notice a sudden spike in major GC cycles after a version bump, the new Ruby version might be allocating objects differently, or a gem you updated for compatibility might have a memory leak.

Execution Speed and Response Times

The second area of focus is execution speed. Upgrading the Ruby version often yields a “free” performance boost due to internal interpreter optimizations. Of course, we call this boost “free” because we did not have to rewrite any of our application code to achieve it.

To quantify this, we look at response times. Average response time, though, is a deeply flawed metric. If 99 requests take 10 milliseconds, and 1 request takes 5 seconds, the average might look acceptable, but that one user had a terrible experience.

Instead, we monitor the p95 and p99 response times^[3]. The p95 response time indicates that 95 percent of requests were completed faster than the given threshold. If your p95 response time drops from 300ms to 250ms after a Ruby upgrade, you have definitively proven a performance gain.

Error Rates and Exceptions

Of course, speed and memory efficiency are irrelevant if the application is broken. A major Ruby bump often requires updating dozens of gems. These dependency updates can introduce subtle regressions.

Before the upgrade, establish a baseline of 500 Internal Server Errors and handled exceptions. A slight increase in warnings – perhaps due to newly deprecated syntax – is expected. A spike in fatal errors indicates a failure in the upgrade process.

Establishing the Baseline

Before you deploy the new Ruby version, you must establish your baseline. Let’s illustrate this with a practical scenario.

First, ensure your monitoring tools – whether Datadog^[4], New Relic^[5], or a custom Prometheus setup^[6] – are capturing the metrics discussed above.

Next, select a representative time window. A standard Tuesday afternoon is generally better than a quiet Sunday morning. You can then record the following baseline metrics:

p95 Response Time: 210ms
Average Memory per Puma Worker: 450MB
Error Rate: 0.02%

Once the baseline is recorded, you deploy the new Ruby version. Before you do so, however, it’s wise to ensure the latest known-good version of your codebase is committed to source control and that your rollback procedure is tested.

Analyzing the Aftermath

After the deployment, wait for the application to stabilize. Interpreters often need time to “warm up” – particularly if they are utilizing a JIT compiler that must execute code multiple times before optimizing it^[9].

Compare the new metrics against your established baseline. You also may notice a few things. Perhaps the memory per Puma worker increased to 500MB, but the p95 response time dropped to 180ms.

This presents an engineering trade-off. The application requires slightly more RAM – perhaps necessitating larger server instances – but delivers a noticeably faster experience to the end user. Because we measured these metrics objectively, we can make an informed decision about whether the upgrade was successful, rather than relying on subjective feeling.

Boundaries and Limitations

Although it’s outside the scope of this article to detail every possible monitoring tool, it’s worth noting that the principles remain the same regardless of your infrastructure.

Note: The exact metrics you see in your particular application will likely vary. A computationally heavy scientific application will see different benefits from a Ruby upgrade than a standard web API that spends most of its time waiting for database queries to return.

Monitoring production metrics will not catch every logic bug, nor will it replace a comprehensive test suite. It is, however, the only way to definitively prove the operational impact of a major language upgrade. By replacing intuition with objective measurement, we ensure that our infrastructure remains sustainable, performant, and reliable for years to come.

Monitoring Production Metrics Before and After a Major Ruby Version Bump

Approaches to Performance Verification

Key Metrics to Monitor

Memory Utilization and Garbage Collection

Execution Speed and Response Times

Error Rates and Exceptions

Establishing the Baseline

Analyzing the Aftermath

Boundaries and Limitations

You May Also Like

Prioritizing Customer Wellbeing and System Stability During Major App Upgrades

How to Handle Frozen String Literals When Upgrading Legacy Ruby Apps

Implementing Redis Caching to Alleviate Database Load in Legacy Rails Apps