Understanding CVE-2008-3790: Ruby REXML Denial of Service Vulnerability
Imagine a corporate mailroom designed to handle incoming requests efficiently. An employee receives a letter instructing them to print ten copies of an enclosed flyer and distribute them to specific departments. The mailroom efficiently handles this routine task.
However, suppose a malicious actor sends a slightly different set of instructions. They send a single envelope containing a memo labeled “A”. This memo says, “Print ten copies of memo B.” Memo B, in turn, says, “Print ten copies of memo C.” This pattern continues through several layers. By the time the mailroom reaches memo F, the single initial envelope has expanded into a demand to print tens of thousands of pages. The mailroom grinds to a halt, completely consumed by the cascading instructions, unable to process any legitimate mail.
This scenario illustrates a fundamental risk in systems that process recursive instructions without limits. In the context of parsing structured data, this exact mechanism underlies CVE-2008-3790, a denial-of-service (DoS) vulnerability discovered in the REXML module within the Ruby standard library.
The Mechanics of XML Entities
To understand how this vulnerability manifested in Ruby, we need to examine how XML handles document structure and reusability. XML allows developers to define “entities,” which act essentially as variables or macros within a document.
An entity definition maps a specific name to a block of text or data. When the XML parser encounters a reference to that entity later in the document, it replaces the reference with the defined value. This feature is remarkably useful for avoiding repetition or managing special characters.
<!DOCTYPE document [
<!ENTITY author "Alice">
]>
<document>
<metadata>
<written_by>&author;</written_by>
</metadata>
</document>
When an XML parser processes this document, it expands &author; into Alice. For a single substitution, the computational cost is negligible. The issue arises because the XML specification permits entities to contain references to other entities.
The Exploit Scenario: The Billion Laughs Attack
The ability to nest entities creates a dangerous opening for resource exhaustion. An attacker can craft a document where a few layers of nested entities expand exponentially. This specific technique is widely known as the “Billion Laughs” attack, named after a classic proof-of-concept payload involving the word “lol”.
Consider a simplified version of the attack payload:
<!DOCTYPE root [
<!ENTITY lol "lol">
<!ENTITY lol1 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
<!ENTITY lol2 "&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;">
<!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
<!-- This pattern continues -->
<!ENTITY lol9 "&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;">
]>
<root>&lol9;</root>
When an application attempts to parse this document, it must resolve the <root> node. To do so, it evaluates &lol9;. That single entity expands into ten copies of &lol8;, which expand into a hundred copies of &lol7;, and so forth. By the time the parser reaches the base entity, it is attempting to allocate memory for a billion instances of the string “lol”.
In Ruby versions 1.8.6, 1.8.7, and early 1.9, the REXML standard library parsed these recursive entity definitions without any inherent limitations on the total number of expansions. When presented with a maliciously crafted payload, the REXML parser would faithfully attempt to resolve every nested entity. The Ruby process would rapidly consume available CPU cycles and allocate massive amounts of memory, ultimately causing the application to hang or crash entirely.
The Impact on Ruby Applications
The severity of CVE-2008-3790 stemmed from how commonly XML was used for data interchange during that era. Applications relying on XML-RPC, SOAP services, or standard REST APIs that accepted XML payloads were all potential targets.
Because REXML was the built-in XML parser for Ruby, many applications relied on it by default. If an application accepted unauthenticated XML data from external users and parsed it using REXML::Document.new(), an attacker could trigger the denial-of-service condition with a payload smaller than a single kilobyte. The disproportionate impact—a tiny request causing a complete system failure—is the hallmark of an asymmetric denial-of-service attack.
The Remediation: Limiting Entity Expansion
The solution to the entity explosion vulnerability is not to abandon XML or disable entities entirely, as they serve legitimate purposes. Instead, the parser needs a safeguard to prevent unbounded recursion.
The Ruby core team addressed CVE-2008-3790 by introducing a hard limit on the number of entity expansions REXML is permitted to perform while parsing a single document.
Starting with the patched versions (and continuing into modern Ruby releases), REXML includes a configuration option, REXML::Document.entity_expansion_limit. By default, this value is set to 10,000.
require 'rexml/document'
# The default limit introduced after the vulnerability was patched
puts REXML::Document.entity_expansion_limit
# => 10000
When the REXML parser encounters an entity, it increments a counter. If the total number of expansions across the document exceeds this limit, the parser raises an exception (REXML::ParseException) and halts processing immediately. This crucial limitation ensures that the parser remains responsive and cannot be coerced into exponential memory allocations.
While modern Ruby development often favors JSON over XML for API communication, applications that still process XML data must remain vigilant. The introduction of the expansion limit in REXML effectively neutralized the Billion Laughs attack for Ruby’s standard library, transforming a critical architectural vulnerability into a manageable parsing error.
Sponsored by Durable Programming
Need help maintaining or upgrading your Ruby on Rails application? Durable Programming specializes in keeping Rails apps secure, performant, and up-to-date.
Hire Durable Programming