Prioritizing Customer Wellbeing and System Stability During Major App Upgrades
When civil engineers need to widen a major highway, they face a significant constraint: they cannot close the road. Millions of commuters rely on the route every day. Instead of a single, massive closure, construction is phased. Temporary lanes are painted, traffic is shifted, and new concrete is poured alongside active traffic. The goal is to improve the infrastructure while minimizing disruption to daily life.
Similarly, when planning a major Ruby on Rails upgrade, engineering teams often focus intensely on the technical mechanics of the transition. We analyze deprecation warnings, audit dependencies, and update syntax across the application. The underlying motivation for these efforts, though, extends beyond maintaining a modern stack. The ultimate goal of system stability is protecting customer wellbeing.
A major app upgrade introduces significant risk to the production environment, particularly for large and complex applications. When an application becomes unstable, customers experience frustration, interrupted workflows, and potential data loss. This technical instability translates directly into business risk. To mitigate this risk during the migration to a new framework version, we must architect our upgrade processes around the principle of uninterrupted service.
This article explores a battle-tested workflow for executing complex framework modernizations while prioritizing the customer experience. We will examine how to decouple risky operations, implement progressive delivery, and establish robust monitoring systems.
Decoupling Database Migrations from Code Deployments
During a standard deployment, running database migrations alongside code changes is common practice. For routine feature development, this approach is often sufficient. When executing a major Rails upgrade, however, combining schema changes with significant framework updates creates a high-risk scenario.
If the new Rails version introduces unexpected behavior, reverting the code deployment is straightforward. If the deployment included destructive database migrations — such as dropping a column or renaming a table — rolling back the code without also reverting the database schema will result in application errors. This tight coupling guarantees extended downtime while the team attempts to untangle the state of the system.
To protect system stability, we must decouple these operations using the expand and contract pattern.
Note: Implementing the expand and contract pattern requires additional engineering effort and coordination compared to a single deployment. The resulting system resilience, however, makes this investment essential for zero-downtime upgrades.
The Expand and Contract Pattern
The expand and contract pattern allows us to make database changes across multiple, low-risk deployments. Instead of renaming a column in a single step, we break the process into distinct phases.
First, we add the new column to the database schema. This is the expansion phase.
class AddNewEmailFormatToUsers < ActiveRecord::Migration[7.0]
def change
add_column :users, :new_email_format, :string
end
end
We deploy this change independently. The application continues to use the old column, and the new column remains unused. This deployment carries minimal risk.
Next, we update the application code to write to both the old and new columns simultaneously, while continuing to read from the old column. In Rails, we might do this by overriding the setter method or using a before_save callback:
class User < ApplicationRecord
before_save :sync_email_format
private
def sync_email_format
# Write to both the old and new columns
self.new_email_format = self.email
end
end
Once we are confident the data is synchronizing correctly for new and updated records, we run a background task to backfill the existing data into the new column for older records.
After verifying the backfill is complete, we update the application to read from the new column instead of the old one. Finally, in a subsequent deployment, we remove the old column. This is the contraction phase.
By spreading the schema change across multiple, isolated deployments, we ensure that the application can be safely rolled back at any point without causing data corruption or downtime for the customer.
Progressive Delivery and Dual Booting
Deploying a major Rails version bump to the entire user base simultaneously maximizes the impact of any undiscovered regressions. Progressive delivery limits this blast radius by exposing the new code to a small subset of traffic.
When upgrading Rails, we can configure the application to run under both the old and new framework versions. This technique is known as dual booting. We manage dependencies using a secondary Gemfile.next that specifies the target Rails version.
First, we create a symlink from Gemfile.next to our standard Gemfile, or we use a tool like the bootboot gem. This allows us to conditionally load different dependencies based on an environment variable.
For example, we might start our server like this to use the next version of Rails:
$ BUNDLE_GEMFILE=Gemfile.next bundle exec rails server
By maintaining both a Gemfile.lock and a Gemfile.next.lock, we ensure that our existing production environment remains perfectly stable while we test the upgrade.
Gradual Rollouts with Load Balancers
Because we cannot run two different versions of Rails within the same Ruby process, progressive delivery for a framework upgrade happens at the infrastructure level, rather than within the application code itself.
We provision a separate set of application servers running the Gemfile.next environment. Then, using our load balancer or ingress controller, we route a small percentage of production traffic to these upgraded instances.
We might begin by routing 1% or 2% of traffic to the upgraded instances. We then closely monitor error rates and performance metrics. If the system remains stable, we gradually increase the percentage. If errors spike, we can adjust the load balancer rules immediately, routing all traffic back to the stable legacy instances.
This approach ensures that if a critical failure occurs, only a minimal number of customers are affected, and service can be restored almost instantly.
Comprehensive Telemetry and Monitoring
Strictly speaking, you cannot protect what you cannot measure. Prioritizing customer wellbeing requires knowing immediately when the system degrades. Of course, relying on customer support tickets to identify production issues guarantees a poor user experience.
Before initiating a major app upgrade, we must establish baseline metrics for application health. This includes error rates, request latency, and database query performance.
Measuring System Stability
During the rollout phase, we compare real-time metrics against our established baseline. If the 95th percentile (p95) response time increases significantly under the new Rails version, we know the upgrade has introduced a performance bottleneck. This often requires infrastructure fine-tuning or Ruby VM optimization before resuming the rollout.
We should configure automated alerts for key indicators.
- Error Rate Spikes: A sudden increase in HTTP 500 errors indicates an immediate stability crisis.
- Memory Leaks: Upgrading Ruby or Rails can sometimes introduce memory bloat. Monitoring memory consumption prevents out-of-memory crashes.
- Increased Latency: Slower response times degrade the customer experience even if the application does not technically fail.
Tools like Datadog, New Relic, or open-source alternatives provide the necessary visibility to catch regressions before they impact a significant portion of the user base.
Conclusion
Whether handled by an internal team or a specialized Rails upgrade service, technical modernizations are fundamentally risk management exercises. By decoupling database migrations, utilizing progressive delivery, and enforcing rigorous monitoring, we transition the focus toward actively preserving system stability.
The ultimate measure of a successful Ruby and Rails upgrade is the degree to which customers remain completely unaware of the transition. We achieve this transparency by embedding safety nets and fallback mechanisms throughout the modernization lifecycle.
Sponsored by Durable Programming
Need help maintaining or upgrading your Ruby on Rails application? Durable Programming specializes in keeping Rails apps secure, performant, and up-to-date.
Hire Durable Programming