How Rubinius SendSites Work – Part 1

Recently, Rubinius switched from using a simple method dispatch caching mechanism to using a significantly more powerful mechanism known as a SendSite. Over the next couple of posts, we’ll look into the Rubinius SendSite implementation, commencing with an overview of what SendSites are in part 1. In part 2, we’ll examine how SendSites are used in the method dispatch process.

2006-09_042-london-night.jpg

Origins

Before we dive in and start looking at the Rubinius SendSite class, it may be worthwhile reviewing some of the terminology that will be used, and particularly, the origins of the term SendSite.

Ruby and Rubinius draw heavily on the Smalltalk language and implementation; within Smalltalk, perhaps the central concept is the idea of message passing, whereby objects interact via the sending of messages; we talk of objects sending messages to receivers and getting back responses. In practice, this is almost identical to saying that code calls a method and gets back a result, which is how the process is commonly described in most languages.

However, there is one key distinction: message sending makes clearer the concept of duck-typing, and encourages a coding style known as “Tell, Don’t Ask”. In Smalltalk and Ruby, we don’t really care what the type of the receiver is; we only care whether or not it can respond to the message we send. Similarly, in the “Tell, Don’t Ask” coding style, we tell receiver objects what we want them to do based on our internal state, we don’t ask the receiver for details of their state in order to make decisions. The result is that it is easier to replace the receiver object with another object that understands the same message, but perhaps performs the request in a different way.

What is a SendSite?

Ultimately, it is this very capability that complicates method dispatch in Ruby, and makes the use of method caching and other optimisations desirable: if the receiver class can change at any time, resolving exactly which implementation of the message to dispatch to cannot be determined definitively until the actual point-in-time when the message is dispatched. However, it is also true that most times, a given message send (i.e. send site) in a piece of code will resolve at dispatch time to the same receiving code (i.e. method)…

If we could therefore somehow cache the result of this method resolve process, the next time we reach the same send site, we can perform a quick check to determine if the receiving method is still the same as last time, and if so, use an optimised dispatch process. This could could range from the simple, such as jumping directly to the method code via a cached reference, to the complex, such as in-lining and JIT-ing frequently called methods into directly executable machine code at the send site.

The Rubinius SendSite, therefore, is an object that is created for every send site (method call) in the Rubinius bytecode, and facilitates these kinds of optimisations.

With that bit of background behind us, let’s dive in and see how Rubinius defines a SendSite…

SendSite: Half Ruby class, half C struct

We saw above that a SendSite represents a location in code where a message send (aka method call) takes place. At its most basic, a SendSite needs only record the name of the message that is to be sent; indeed, before SendSites were added, a reference to the Ruby symbol identifying the message name was all that was recorded in the Rubinius bytecode. However, by replacing the symbol of the message name with a data structure, we gain the ability to store additional information at the send site, and in particular, information that can be used to speed up method dispatch.

Rubinius SendSites, like a number of other core classes integral to the Shotgun VM, need to be accessible from both Ruby and C code. As most of the use of SendSite is in C code in the VM, and is performance critical, SendSite instance data is stored in the fields of a C struct:

name:
The name of the message (i.e. method) this send site sends (calls)
cm:
A reference back to the CompiledMethod instance in which the send site exists.
selector:
A reference to the Selector instance corresponding to the message name (see Selectors below)
data1:
The receiver class
data2:
The CompiledMethod corresponding to this message on the receiver class, as encountered on the last dispatch. When a message is dispatched, this is the target object that needs to be located; it contains the bytecode for the method on the receiver.
data3:
The module
data4:
The primitive index if the SendSite resolves to a primitive method
c_data:
A pointer to some C data;

  • For an FFI send site, holds the address of the FFI stub function to call.
  • For a primitive send site, holds the address of the primitive function to call.
hits, misses:
Counters for the number of times the SendSite has successfully and unsuccessfully cached the receiver method respectively.
lookup:
A function pointer (functor) to the method lookup function that will be used by the SendSite to perform method dispatch.

Ruby code can access most fields of this C struct via the SendSite#at method, which is implemented as a Rubinius primitive.

The two most important data items in a SendSite are the symbol of the method name to which the SendSite relates, and the address of a lookup function to use to resolve the message name to a method object to which to dispatch. These two fields (and the reference to the containing CompiledMethod) are the only ones populated when a SendSite is initialized, and are sufficient to resolve a message send to a receiver method (albeit, via a slower path).

Selectors

We saw above that a SendSite contains a reference to a Selector object. A Selector is an object that represents a message (i.e. method) name. It consists of the symbol of a message, plus an array of links back to every SendSite that uses the same message. This can be extremely useful, as it provides the ability to locate all direct uses of a particular message (although indirect uses such as via send and the various evals are not caught).

Selectors are not used in the method dispatch process; they exist solely to provide a reverse lookup for a given method name to the SendSites that use it. Nonetheless, this is an extremely useful capability; it is used to find and reset SendSites impacted by a redefinition of a method, and is also extremely handy for finding the messages most often used. In fact, it is this capability that lies behind the -ps and -pss flags that can be used when launching Shotgun; upon exiting, these flags cause a summary to be printed of the most frequently encountered Selectors and SendSites respectively:

ads@ads-kubuntu:~/rubinius$ shotgun/rubinius -ps 10 -e '0'


Total Selectors: 1168
Top 10, by receives:

name receives send sites
at 15694 131
equal? 13074 47
misses 12748 2
hits 12746 2
[] 11842 1180
kind_of? 5865 183
<= 4390 53
size 4293 225
hash 3967 11

Note that this shows the most frequently sent messages, which is not the same as the most frequently executed methods; for that, we need to know the receiver as well. For example, the method #at is the most frequently exexcuted message, but is actually distributed across three different receiver methods (Time#at, Tuple#at, and Array#at).

In Part 2, we’ll look at the lifecycle of a SendSite, and see how it influences the method dispatch process.

9 Responses to “How Rubinius SendSites Work – Part 1”

  1. Thanks, it’s really nice to see you guys blog about this stuff. One comment – it would be nice to provide links to gitweb for the source of the classes you mention

  2. This is cool stuff. Thanks for the nice writeup.

    SendSites look like an interesting variation on polymorphic inline method caches. The Selector class in particular seems useful. Runtime performance and metrics aside, that sort of information can be very useful for building an IDE with refactoring capabilities. Glad to see that coming to Ruby.

  3. agardiner Says:

    crayz: Links to the source is a great idea, I’ll do that from now on.

    josh: A sendsite lookup function that uses polymorphic inline caching is planned; that’s the great thing about send sites – the flexibility they provide allows different approaches to be taken within the same framework; I plan to cover this in part 2…

    Cheers,

    Adam

  4. Not sure why you’re using the word SendSite for something that is in all the literature called a CallSite. I realize that you want to emphasize the message passing idiom, but even most SmallTalk literature call these structures CallSites.

    Josh: CallSite’s allow inline caches, but are not the same thing. Also note that this CallSite implementation is more like a monomorphic inline cache at the moment, rather than a polymorphic one.

  5. […] How Rubinius SendSites Work – Part 1 […]

  6. […] Building a Better Ruby Covering the developmnet of Rubinius, a new Ruby implementation and virtual machine « How Rubinius SendSites Work – Part 1 […]

  7. agardiner Says:

    Ola: You’re right, and CallSite was certainly considered as the class name. However, I believe one of the reasons Evan settled on the name SendSite is that it is used as an argument to opcodes that all start with send_.

  8. […] How Rubinius SendSites Work – Part 1 […]

Leave a reply to agardiner Cancel reply