The Impact of the Global Interpreter Lock (GIL) on Upgrading to Ruby 3.3
In the 1840s, the British railway system encountered a severe logistical bottleneck known as the “break of gauge.” Different railway companies had built tracks of varying widths, meaning cargo had to be manually unloaded and reloaded whenever it crossed from one company’s line to another. It didn’t matter how fast the individual locomotives were; the systemic design created an unavoidable choke point that delayed shipments and increased costs.
Similarly, the Ruby programming language has long wrestled with its own systemic choke point: the Global Interpreter Lock (GIL) — also known in modern Ruby as the Global VM Lock (GVL). If you aim to optimize p95 response times and reduce cloud infrastructure costs, the GIL represents a fundamental architectural constraint.
When executing a Ruby and Rails upgrade, we might anticipate immediate performance gains. However, fully realizing the benefits of Ruby 3.3 requires understanding how its new features — particularly the M:N thread scheduler and YJIT enhancements — interact with the GIL. In this article, we will examine the mechanics of the GIL, how Ruby 3.3 mitigates its limitations, and the practical implications for scaling large and complex applications.
Understanding the GIL
Ruby, strictly speaking, does not execute multiple threads of Ruby code simultaneously — at least not within a single process of the standard MRI (Matz’s Ruby Interpreter) implementation.
The GIL is a mutex that prevents multiple native threads from concurrently executing Ruby C extensions or Ruby bytecode. If you have a Puma web server running five threads, those threads take turns holding the lock. While one thread executes Ruby code, the others must wait.
It isn’t completely false to say Ruby has threading, and it’s often in this sense of the word that developers discuss Puma’s concurrency. The GIL is released during I/O operations — such as waiting for a PostgreSQL database query to return, or making an external HTTP request. During these I/O waits, another thread can acquire the lock and execute Ruby code.
For heavily I/O-bound applications, traditional threading works well. But for CPU-bound tasks — like parsing massive JSON payloads, rendering complex Action View templates, or executing intricate business logic — the GIL becomes a severe bottleneck.
Three Approaches to Ruby Concurrency
There are three major approaches to scaling a Ruby application around the GIL. Depending on the particular circumstances you find yourself in, one of them may be more useful than the other two.
The first is multi-processing. This is the traditional Rails approach, utilizing tools like Unicorn or running Puma in clustered mode with multiple workers. Because each worker is an entirely separate operating system process, each has its own GIL. This achieves true parallelism but consumes significantly more RAM, directly increasing AWS or Heroku infrastructure costs.
The second is multi-threading. This involves increasing the thread count within a single process. As noted, this is highly efficient for I/O-bound workloads but degrades quickly when tasks become CPU-bound due to GIL contention.
The third is leveraging modern Ruby VM optimizations and concurrency models, specifically Ractors and the new M:N threading model introduced in Ruby 3.3. Generally speaking, this represents the future of Ruby performance, though it requires a deeper understanding of the underlying engine.
Ruby 3.3 and the M:N Thread Scheduler
Historically, Ruby used a 1:1 threading model: every Ruby Thread mapped directly to one native OS thread. In highly concurrent applications, managing hundreds of OS threads introduces significant context-switching overhead at the kernel level.
One may wonder: how can we handle massive concurrency without this kernel-level overhead? The answer is straightforward: Ruby 3.3 introduces an M:N thread scheduler to address this constraint. This system maps M Ruby threads to N native OS threads. By way of a memory aide, you can think of this like a busy restaurant: instead of assigning a dedicated waiter (OS thread) to every single customer (Ruby thread), a smaller pool of waiters efficiently serves a larger number of customers, managing the context switching in user-space rather than relying on the operating system kernel.
To enable M:N threading in Ruby 3.3, we use the RUBY_MN_THREADS environment variable. In its simplest form, you could start a single-process Puma server like this:
$ RUBY_MN_THREADS=1 bundle exec puma
However, a production deployment typically uses multiple workers and specific thread counts. We can accomplish that by passing in flags for threads (-t) and workers (-w):
$ RUBY_MN_THREADS=1 bundle exec puma -t 5:5 -w 3
In this command, -t 5:5 configures Puma to use a minimum and maximum of 5 threads per worker, while -w 3 instructs it to spawn 3 worker processes. Additionally, note that the exact number of threads and workers you should use will likely vary depending on your specific infrastructure. When activated, Ruby manages the thread execution scheduling internally. This is significant, because although the GIL still exists, the overhead of acquiring and releasing it across many threads is reduced. For large applications juggling numerous background jobs or WebSocket connections, this can noticeably decrease memory fragmentation and CPU context-switching overhead.
YJIT’s Impact on the GIL
Of course, the most celebrated feature of recent Ruby releases is YJIT (Yet Another Ruby JIT). While YJIT does not remove the GIL, it profoundly impacts how our application interacts with it. One may wonder: how does a JIT compiler affect a concurrency lock? The answer is straightforward.
By compiling Ruby bytecode into native machine code, YJIT drastically accelerates CPU-bound execution. When a thread executes faster, it holds the GIL for a shorter duration. This reduces contention across the entire process.
Consider a standard Rails API endpoint that serializes a complex object graph to JSON:
class ReportsController < ApplicationController
def index
# A CPU-heavy serialization task
@reports = Report.includes(:metrics, :historical_data).limit(1000)
render json: @reports.as_json(include: [:metrics, :historical_data])
end
end
Under Ruby 3.2 without YJIT, this serialization might take 200ms, during which the GIL is locked. Other threads in the Puma worker are blocked. By upgrading to Ruby 3.3 and enabling YJIT, that same serialization might drop to 120ms. You also may notice that the 80ms saved is not solely a latency improvement for that specific request; it is 80ms of unblocked execution time returned to the other threads in the process.
Thus, YJIT acts as an indirect concurrency multiplier.
Trade-offs and Caveats of Upgrading
As with any major architectural shift, adopting Ruby 3.3’s concurrency features involves distinct trade-offs. The primary trade-off is between increased performance and increased complexity. M:N threading and YJIT can significantly reduce memory fragmentation and CPU context-switching, though they also introduce new variables into your production environment. You are trading the predictable, albeit slower, 1:1 thread mapping for a more complex user-space scheduler.
However, there are a few warnings and caveats to consider before enabling M:N threads or rolling out a comprehensive upgrade.
First, before you use any of these new concurrency features in production, it is wise to ensure your test suite is robust. New concurrency models can expose race conditions in legacy code that were previously masked by the 1:1 thread scheduler or slower execution times. Faster execution means threads interleave differently, which can trigger subtle thread-safety bugs.
Second, be aware of the memory overhead of YJIT. While it speeds up execution and frees the GIL faster, the compiled machine code consumes additional RAM. We must weigh this increased memory footprint against the CPU and latency improvements.
Conclusion
Of course, the Global Interpreter Lock remains a core component of the Ruby architecture, but it is no longer the immovable obstacle it once was. Through the introduction of M:N threading and the continuous refinement of YJIT, Ruby 3.3 provides powerful new tools for managing concurrency.
By carefully applying these modern features, we can significantly improve p95 response times, optimize resource utilization, and ensure our applications scale efficiently well into the future.
Sponsored by Durable Programming
Need help maintaining or upgrading your Ruby on Rails application? Durable Programming specializes in keeping Rails apps secure, performant, and up-to-date.
Hire Durable Programming