The go-to resource for upgrading Ruby, Rails, and your dependencies.

How to Benchmark Cloud Provider Performance for Ruby on Rails Apps


When migrating a Ruby on Rails application to a new cloud provider, engineering teams often rely on generic compute benchmarks. These synthetic tests measure raw CPU cycles or memory bandwidth. Ruby on Rails, though, presents a unique operational profile. A standard Rails application is heavily reliant on object allocation, garbage collection cycles within the Ruby Virtual Machine (VM), and database connection pooling. Relying on generic benchmarks to estimate production performance will lead to inaccurate capacity planning and escalating cloud infrastructure costs.

To evaluate whether a cloud environment will meet your application’s requirements, you need a testing methodology that reflects your actual workload. This guide outlines how to structure and execute a realistic performance benchmark for Ruby on Rails across different infrastructure providers.

Establishing the Benchmark Application

We cannot benchmark infrastructure using an empty Rails controller. A blank render plain: "OK" endpoint bypasses the framework’s middleware, ActiveRecord instantiation, and view rendering. It tests the web server’s ability to accept connections, not the infrastructure’s ability to run your code.

For an accurate assessment, you should deploy a dedicated benchmark application that mimics the behavior of your production system. This application needs endpoints that exercise different architectural boundaries:

  1. I/O Bound Endpoint: An endpoint that executes a moderately complex database query, retrieves 50 to 100 records, and serializes them to JSON.
  2. CPU Bound Endpoint: An endpoint that allocates a large number of Ruby objects, forcing the Ruby VM to trigger garbage collection.
  3. Memory Bound Endpoint: An endpoint that loads a large dataset into memory and performs filtering or sorting operations.

Deploying this benchmark application across your candidate cloud providers ensures you are measuring the exact same codebase, database schema, and Ruby version.

Configuring the Environment for Consistency

Before generating load, we must ensure the deployment configurations are identical across providers. Variations in infrastructure fine-tuning will skew your results.

First, lock the Ruby version and apply the same Ruby VM optimizations across all targets. If your production environment uses jemalloc to reduce memory fragmentation or enables YJIT for performance, the benchmark environments must do the same.

Second, configure the application server consistently. For Puma, set the exact same number of workers and threads. A common baseline for a standard 2 vCPU instance is:

# config/puma.rb
workers 2
threads 5, 5

Finally, ensure the database instances provisioned by each cloud provider have comparable specifications. You should match RAM, CPU, and storage IOPS as closely as possible. Ensure the database is located in the same geographic region as the application servers to minimize network latency bottlenecks.

Executing the Load Test

To measure the infrastructure’s capacity, we will use a load generation tool. Tools like wrk or k6 are designed to sustain high concurrency levels. It is important to run the load generator from a separate, isolated instance — preferably in the same cloud region — so that network transit over the public internet does not introduce artificial latency.

Here is an example of using wrk to simulate 100 concurrent connections across 4 threads for 60 seconds:

$ wrk -t4 -c100 -d60s http://internal-benchmark-app-ip/api/v1/io_heavy

When interpreting the output, we must look beyond the average response time. Averages hide the experience of your slowest requests. Instead, focus on the 95th percentile (p95) and 99th percentile (p99) response times. The p95 response time indicates that 95 percent of requests were completed in that time or less. If your average response time is 50ms but your p95 is 800ms, your infrastructure is likely queuing requests under load.

Analyzing Bottlenecks and Trade-offs

Once you have gathered the data for each endpoint across your candidate providers, you can compare the results. You may notice that Provider A handles I/O bound requests faster due to lower latency between the application and database tiers, while Provider B excels at CPU bound tasks because of a newer generation of underlying hypervisor hardware.

One may wonder: if Provider A and Provider B offer identical vCPU and RAM specifications, why do the results differ? The answer is straightforward. Cloud providers utilize different CPU architectures, hypervisor layers, and network routing topographies. These underlying differences surface when your application pushes the limits of the Ruby VM.

These metrics allow you to evaluate the cost-to-performance ratio. If Provider A is 20 percent more expensive but reduces your p95 response times by 40 percent on computationally heavy endpoints, you have concrete data to justify the infrastructure cost. Conversely, if the performance delta is negligible, you can safely select the more economical option without risking application latency.

Benchmarking large and complex applications requires rigorous methodology. By simulating realistic workloads and focusing on percentiles rather than averages, you can make informed architectural decisions that ensure long-term stability and performance for your Ruby on Rails application.

Sponsored by Durable Programming

Need help maintaining or upgrading your Ruby on Rails application? Durable Programming specializes in keeping Rails apps secure, performant, and up-to-date.

Hire Durable Programming