How Rubinius SendSites Work – Part 1
Recently, Rubinius switched from using a simple method dispatch caching mechanism to using a significantly more powerful mechanism known as a SendSite. Over the next couple of posts, we’ll look into the Rubinius SendSite implementation, commencing with an overview of what SendSites are in part 1. In part 2, we’ll examine how SendSites are used in the method dispatch process.
Before we dive in and start looking at the Rubinius SendSite class, it may be worthwhile reviewing some of the terminology that will be used, and particularly, the origins of the term SendSite.
Ruby and Rubinius draw heavily on the Smalltalk language and implementation; within Smalltalk, perhaps the central concept is the idea of message passing, whereby objects interact via the sending of messages; we talk of objects sending messages to receivers and getting back responses. In practice, this is almost identical to saying that code calls a method and gets back a result, which is how the process is commonly described in most languages.
However, there is one key distinction: message sending makes clearer the concept of duck-typing, and encourages a coding style known as “Tell, Don’t Ask”. In Smalltalk and Ruby, we don’t really care what the type of the receiver is; we only care whether or not it can respond to the message we send. Similarly, in the “Tell, Don’t Ask” coding style, we tell receiver objects what we want them to do based on our internal state, we don’t ask the receiver for details of their state in order to make decisions. The result is that it is easier to replace the receiver object with another object that understands the same message, but perhaps performs the request in a different way.
What is a SendSite?
Ultimately, it is this very capability that complicates method dispatch in Ruby, and makes the use of method caching and other optimisations desirable: if the receiver class can change at any time, resolving exactly which implementation of the message to dispatch to cannot be determined definitively until the actual point-in-time when the message is dispatched. However, it is also true that most times, a given message send (i.e. send site) in a piece of code will resolve at dispatch time to the same receiving code (i.e. method)…
If we could therefore somehow cache the result of this method resolve process, the next time we reach the same send site, we can perform a quick check to determine if the receiving method is still the same as last time, and if so, use an optimised dispatch process. This could could range from the simple, such as jumping directly to the method code via a cached reference, to the complex, such as in-lining and JIT-ing frequently called methods into directly executable machine code at the send site.
The Rubinius SendSite, therefore, is an object that is created for every send site (method call) in the Rubinius bytecode, and facilitates these kinds of optimisations.
With that bit of background behind us, let’s dive in and see how Rubinius defines a SendSite…
SendSite: Half Ruby class, half C struct
We saw above that a SendSite represents a location in code where a message send (aka method call) takes place. At its most basic, a SendSite needs only record the name of the message that is to be sent; indeed, before SendSites were added, a reference to the Ruby symbol identifying the message name was all that was recorded in the Rubinius bytecode. However, by replacing the symbol of the message name with a data structure, we gain the ability to store additional information at the send site, and in particular, information that can be used to speed up method dispatch.
Rubinius SendSites, like a number of other core classes integral to the Shotgun VM, need to be accessible from both Ruby and C code. As most of the use of SendSite is in C code in the VM, and is performance critical, SendSite instance data is stored in the fields of a C struct:
- The name of the message (i.e. method) this send site sends (calls)
- A reference back to the CompiledMethod instance in which the send site exists.
- A reference to the Selector instance corresponding to the message name (see Selectors below)
- The receiver class
- The CompiledMethod corresponding to this message on the receiver class, as encountered on the last dispatch. When a message is dispatched, this is the target object that needs to be located; it contains the bytecode for the method on the receiver.
- The module
- The primitive index if the SendSite resolves to a primitive method
- A pointer to some C data;
- For an FFI send site, holds the address of the FFI stub function to call.
- For a primitive send site, holds the address of the primitive function to call.
- hits, misses:
- Counters for the number of times the SendSite has successfully and unsuccessfully cached the receiver method respectively.
- A function pointer (functor) to the method lookup function that will be used by the SendSite to perform method dispatch.
Ruby code can access most fields of this C struct via the
SendSite#at method, which is implemented as a Rubinius primitive.
The two most important data items in a SendSite are the symbol of the method name to which the SendSite relates, and the address of a lookup function to use to resolve the message name to a method object to which to dispatch. These two fields (and the reference to the containing CompiledMethod) are the only ones populated when a SendSite is initialized, and are sufficient to resolve a message send to a receiver method (albeit, via a slower path).
We saw above that a SendSite contains a reference to a Selector object. A Selector is an object that represents a message (i.e. method) name. It consists of the symbol of a message, plus an array of links back to every SendSite that uses the same message. This can be extremely useful, as it provides the ability to locate all direct uses of a particular message (although indirect uses such as via send and the various evals are not caught).
Selectors are not used in the method dispatch process; they exist solely to provide a reverse lookup for a given method name to the SendSites that use it. Nonetheless, this is an extremely useful capability; it is used to find and reset SendSites impacted by a redefinition of a method, and is also extremely handy for finding the messages most often used. In fact, it is this capability that lies behind the -ps and -pss flags that can be used when launching Shotgun; upon exiting, these flags cause a summary to be printed of the most frequently encountered Selectors and SendSites respectively:
ads@ads-kubuntu:~/rubinius$ shotgun/rubinius -ps 10 -e '0'
Total Selectors: 1168
Top 10, by receives:
Note that this shows the most frequently sent messages, which is not the same as the most frequently executed methods; for that, we need to know the receiver as well. For example, the method #at is the most frequently exexcuted message, but is actually distributed across three different receiver methods (
In Part 2, we’ll look at the lifecycle of a SendSite, and see how it influences the method dispatch process.