How Rubinius SendSites Work – Part 1

Recently, Rubinius switched from using a simple method dispatch caching mechanism to using a significantly more powerful mechanism known as a SendSite. Over the next couple of posts, we’ll look into the Rubinius SendSite implementation, commencing with an overview of what SendSites are in part 1. In part 2, we’ll examine how SendSites are used in the method dispatch process.

Origins

Before we dive in and start looking at the Rubinius SendSite class, it may be worthwhile reviewing some of the terminology that will be used, and particularly, the origins of the term SendSite.

Ruby and Rubinius draw heavily on the Smalltalk language and implementation; within Smalltalk, perhaps the central concept is the idea of message passing, whereby objects interact via the sending of messages; we talk of objects sending messages to receivers and getting back responses. In practice, this is almost identical to saying that code calls a method and gets back a result, which is how the process is commonly described in most languages.

However, there is one key distinction: message sending makes clearer the concept of duck-typing, and encourages a coding style known as “Tell, Don’t Ask”. In Smalltalk and Ruby, we don’t really care what the type of the receiver is; we only care whether or not it can respond to the message we send. Similarly, in the “Tell, Don’t Ask” coding style, we tell receiver objects what we want them to do based on our internal state, we don’t ask the receiver for details of their state in order to make decisions. The result is that it is easier to replace the receiver object with another object that understands the same message, but perhaps performs the request in a different way.

What is a SendSite?

Ultimately, it is this very capability that complicates method dispatch in Ruby, and makes the use of method caching and other optimisations desirable: if the receiver class can change at any time, resolving exactly which implementation of the message to dispatch to cannot be determined definitively until the actual point-in-time when the message is dispatched. However, it is also true that most times, a given message send (i.e. send site) in a piece of code will resolve at dispatch time to the same receiving code (i.e. method)…

If we could therefore somehow cache the result of this method resolve process, the next time we reach the same send site, we can perform a quick check to determine if the receiving method is still the same as last time, and if so, use an optimised dispatch process. This could could range from the simple, such as jumping directly to the method code via a cached reference, to the complex, such as in-lining and JIT-ing frequently called methods into directly executable machine code at the send site.

The Rubinius SendSite, therefore, is an object that is created for every send site (method call) in the Rubinius bytecode, and facilitates these kinds of optimisations.

With that bit of background behind us, let’s dive in and see how Rubinius defines a SendSite…

SendSite: Half Ruby class, half C struct

We saw above that a SendSite represents a location in code where a message send (aka method call) takes place. At its most basic, a SendSite needs only record the name of the message that is to be sent; indeed, before SendSites were added, a reference to the Ruby symbol identifying the message name was all that was recorded in the Rubinius bytecode. However, by replacing the symbol of the message name with a data structure, we gain the ability to store additional information at the send site, and in particular, information that can be used to speed up method dispatch.

Rubinius SendSites, like a number of other core classes integral to the Shotgun VM, need to be accessible from both Ruby and C code. As most of the use of SendSite is in C code in the VM, and is performance critical, SendSite instance data is stored in the fields of a C struct:

name:

The name of the message (i.e. method) this send site sends (calls)

cm:

A reference back to the CompiledMethod instance in which the send site exists.

selector:

A reference to the Selector instance corresponding to the message name (see Selectors below)

data1:

The receiver class

data2:

The CompiledMethod corresponding to this message on the receiver class, as encountered on the last dispatch. When a message is dispatched, this is the target object that needs to be located; it contains the bytecode for the method on the receiver.

data3:

The module

data4:

The primitive index if the SendSite resolves to a primitive method

c_data:

A pointer to some C data;

For an FFI send site, holds the address of the FFI stub function to call.
For a primitive send site, holds the address of the primitive function to call.

hits, misses:

Counters for the number of times the SendSite has successfully and unsuccessfully cached the receiver method respectively.

lookup:

A function pointer (functor) to the method lookup function that will be used by the SendSite to perform method dispatch.

Ruby code can access most fields of this C struct via the SendSite#at method, which is implemented as a Rubinius primitive.

The two most important data items in a SendSite are the symbol of the method name to which the SendSite relates, and the address of a lookup function to use to resolve the message name to a method object to which to dispatch. These two fields (and the reference to the containing CompiledMethod) are the only ones populated when a SendSite is initialized, and are sufficient to resolve a message send to a receiver method (albeit, via a slower path).

Selectors

We saw above that a SendSite contains a reference to a Selector object. A Selector is an object that represents a message (i.e. method) name. It consists of the symbol of a message, plus an array of links back to every SendSite that uses the same message. This can be extremely useful, as it provides the ability to locate all direct uses of a particular message (although indirect uses such as via send and the various evals are not caught).

Selectors are not used in the method dispatch process; they exist solely to provide a reverse lookup for a given method name to the SendSites that use it. Nonetheless, this is an extremely useful capability; it is used to find and reset SendSites impacted by a redefinition of a method, and is also extremely handy for finding the messages most often used. In fact, it is this capability that lies behind the -ps and -pss flags that can be used when launching Shotgun; upon exiting, these flags cause a summary to be printed of the most frequently encountered Selectors and SendSites respectively:
ads@ads-kubuntu:~/rubinius$ shotgun/rubinius -ps 10 -e '0'
Total Selectors: 1168 Top 10, by receives:

name	receives	send sites
at	15694	131
equal?	13074	47
misses	12748	2
hits	12746	2
[]	11842	1180
kind_of?	5865	183
<=	4390	53
size	4293	225
hash	3967	11

Note that this shows the most frequently sent messages, which is not the same as the most frequently executed methods; for that, we need to know the receiver as well. For example, the method #at is the most frequently exexcuted message, but is actually distributed across three different receiver methods (Time#at, Tuple#at, and Array#at).

In Part 2, we’ll look at the lifecycle of a SendSite, and see how it influences the method dispatch process.

This entry was posted on March 19, 2008 at 10:39 pm and is filed under shotgun with tags selector, sendsite. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

9 Responses to “How Rubinius SendSites Work – Part 1”

crayz Says:
March 20, 2008 at 1:06 am

Thanks, it’s really nice to see you guys blog about this stuff. One comment – it would be nice to provide links to gitweb for the source of the classes you mention

Reply
josh Says:
March 20, 2008 at 2:37 am

This is cool stuff. Thanks for the nice writeup.

SendSites look like an interesting variation on polymorphic inline method caches. The Selector class in particular seems useful. Runtime performance and metrics aside, that sort of information can be very useful for building an IDE with refactoring capabilities. Glad to see that coming to Ruby.

Reply
agardiner Says:
March 20, 2008 at 9:24 am

crayz: Links to the source is a great idea, I’ll do that from now on.

josh: A sendsite lookup function that uses polymorphic inline caching is planned; that’s the great thing about send sites – the flexibility they provide allows different approaches to be taken within the same framework; I plan to cover this in part 2…

Cheers,

Adam

Reply
Ola Bini Says:
March 20, 2008 at 11:43 pm

Not sure why you’re using the word SendSite for something that is in all the literature called a CallSite. I realize that you want to emphasize the message passing idiom, but even most SmallTalk literature call these structures CallSites.

Josh: CallSite’s allow inline caches, but are not the same thing. Also note that this CallSite implementation is more like a monomorphic inline cache at the moment, rather than a polymorphic one.

Reply
Sp3w » Blog Archive » Linkage 2008.03.24 Says:
March 25, 2008 at 5:35 am

[…] How Rubinius SendSites Work – Part 1 […]

Reply
How Rubinius SendSites Work - Part 2 « Building a Better Ruby Says:
April 1, 2008 at 3:28 pm

[…] Building a Better Ruby Covering the developmnet of Rubinius, a new Ruby implementation and virtual machine « How Rubinius SendSites Work – Part 1 […]

Reply
agardiner Says:
April 1, 2008 at 3:35 pm

Ola: You’re right, and CallSite was certainly considered as the class name. However, I believe one of the reasons Evan settled on the name SendSite is that it is used as an argument to opcodes that all start with send_.

Reply
Nome do Jogo » Blog Archive » Rails Podcast Brasil - Episódio 10 Says:
August 26, 2008 at 12:18 am

[…] How Rubinius SendSites Work Â Part 1 […]

Reply
-= Linkage 2008.03.24 =- Says:
January 27, 2009 at 2:37 am

[…] How Rubinius SendSites Work – Part 1 […]

Reply

Building a Better Ruby