Archive for sendsite

How Rubinius SendSites Work – Part 2

Posted in shotgun with tags on April 1, 2008 by agardiner

In part 1 of this post, we introduced the concept of Rubinius SendSites and looked at the Ruby class / C struct used to represent them; in part 2, we will be looking at the life-cycle of SendSite objects, and in particular, how they are used to optimise the method dispatch process.


SendSite Instantiation

The lifecycle of a SendSite starts with instantiation, which happens in one of two ways:

  • when Ruby source is compiled to bytecode, and
  • when an .rbc (Rubinius compiled) file is unmarshaled.


SendSite objects are initially created during the bytecode compilation process; at all points in the compiled bytecode where a method call exists, a SendSite object is created (using for that message send site (see #send, #send_with_block, #send_with_register, and #send_super in lib/compiler/generator.rb). The resulting SendSite object is stored in the CompiledMethod literals tuple, and the index of this SendSite literal is inserted into the bytecode as the argument to the send_* opcode.

By way of example, take a look at the following simple hello_world.rb script:

puts "hello world"
puts "bye!"

Using the Rubinius debugger, we can examine the bytecode that is generated for this script (and which will be saved in compiled form as hello_world.rbc):

ads@ads-kubuntu:~/rubinius$ shotgun/rubinius -debug hello_world.rb[Debugger activated]
rbx:debug> d 0 25
   Bytecode instructions [0-25] in compiled method __script__:
           # line 1:       puts "hello world"
  => 0000: push_literal    "hello world"
     0002: string_dup
     0003: push_self
     0004: set_call_flags  1
     0006: send_stack      #<SendSite:0x39 name=puts hits=0 misses=0>, 1
     0009: pop
           # line 2:       puts "bye!"
     0010: push_literal    "bye!"
     0012: string_dup
     0013: push_self
     0014: set_call_flags  1
     0016: send_stack      #<SendSite:0x41 name=puts hits=0 misses=0>, 1
     0019: pop
     0020: push_true
     0021: sret

Here we can see two SendSite objects used on the two calls to the puts method. Notice in particular that each send instruction has its own distinct SendSite object, despite the same selector (puts) being used.

Unmarshaling .rbc files

When a Ruby source (.rb) file is first compiled, a corresponding .rbc file is also created; this compiled file will be used instead of the .rb file each subsequent time the source file is run or required, provided recompilation is not necessary. So the other place where SendSite objects can be instantiated is in the unmarshal_sendsite function in shotgun/lib/cpu_marshal.c.


Ultimately, whether created via compilation or unmarshaling, a SendSite object is created via a call to send_site_create in shotgun/lib/sendsite.c; (the Ruby method calls SendSite.create, which is implemented as a Rubinius primitive: a Ruby method whose body is implemented in C code, rather than Ruby).

The C function send_site_create initializes the SendSite struct, looking up the Selector from the method name, and setting the SendSite lookup function to _cpu_ss_basic, which is found in shotgun/lib/cpu_instructions.c. At this point, our SendSite is ready for action.

SendSites and Method Dispatch

(Note: The following description of the method dispatch process is likely to change in future, although the general principles should remain the same).

When a method call is performed, via the execution of a send_* instruction, the SendSite lookup function is used to determine what actions are taken to dispatch the method. The following code shows how the lookup function is used as a function pointer, and is lifted from cpu_send_message in shotgun/lib/cpu_instructions.c:

  ss = SENDSITE(msg->send_site);
  msg->state = state;
  msg->c = c;
  msg->name = ss->name;

The very first time a SendSite is used, the lookup function in the SendSite struct is set to _cpu_ss_basic as we saw above. This is just one of a number of different functions that can be used by a SendSite as the send site lookup function.


This is the slow path lookup function that uses no optimisations to dispatch a method. It calls cpu_lookup_method to find the method on the receiver (navigating up the superclass/metaclass hierarchy until it finds the method or falls back to method_missing), determines if the method is handled by method_missing or not, and then does a very important thing: it patches (modifies) the SendSite lookup function using either cpu_patch_mono or cpu_patch_missing. Next, it attempts to execute the method as a primitive, and then finally, calls cpu_perform, which is the function that actually sends the message by creating a new method context and activating it.

Once a send site has been dispatched the first time via this slow path, it will have been patched to use a more optimal lookup function, based upon the type of receiver/method that was found, so that subsequent sends from the same location use an optimised dispatch process represented by one of the specialised lookup functions described next.

Specialised lookup functions

Each of the following SendSite method lookup functions represents an optimised method dispatch process:

A lookup function that attempts to use a CompiledMethod cached in the SendSite from the last send at the same send site.
A lookup function that attempts to use the primitive whose index is cached in the SendSite from the last send at the same send site. Note that this lookup function is patched into a SendSite by the send_primitive instruction.
A lookup function that is used when a call to a native method using FFI is encountered. Note that this lookup function is patched into a SendSite by the primitive nfunc_call, which is provides the implementation of the FFI NativeFunction#call method.
A lookup function used when a receiver is found to contain no method matching the selector (method name). If the receiver is of the same class as the last send, it adds the method name to the list of arguments on the stack, and then dispatches to the cached method_missing implementation.
A lookup function that is used when a SendSite reaches a threshhold of misses (currently 10,000). It is the equivalent of the slow path in _cpu_ss_basic, but without any attempt to (re-)patch the lookup function. This ensures the SendSite uses the slow path on each dispatch, which is probably appropriate if the SendSite has missed this many times. This lookup function is patched into a SendSite by_cpu_ss_mono when it hits the threshhold.

Lookup function patching

Each time (other than the first) that a SendSite is used to dispatch a method call, a check needs to be performed to determine if the class of the receiver object matches that which is cached in the SendSite. If the receiver is the same, the optimised path represented by the current lookup function can proceed, and method dispatch is relatively swift. However, when the receiver class is different than the class cached on the SendSite, it is necessary to drop back to the slow approach represented by _cpu_ss_basic, find the appropriate method using the receiver class hierarchy, and then re-patch the lookup function based upon the current receiver object’s class.

Each of the above lookup functions (with the obvious exception of _cpu_ss_disabled) performs this same check at the start of the function, falling back to _cpu_ss_basic if the receiver class does not match. Similarly, we’ve seen above that _cpu_ss_basic handles the patching for _cpu_ss_mono and _cpu_ss_mono_missing, and described how the other special cases are handled.

Flushing the cache

Astute observers might be wondering “what happens when a method on a class is redefined?”. In this situation, any previously executed SendSites would be caching a now superseded CompiledMethod instance, and this would not be detected just by checking the receiver’s class during method dispatch.

The answer is that whenever a method is added or redefined, all SendSites using the method selector are reset to use _cpu_ss_basic. This is achieved using the Selector class, instances of which maintain a list of all SendSites using the given selector. See the function selector_clear_by_name in shotgun/lib/selector.c if you are interested in the details of how this is achieved.

Future Plans

At present, there are only a small number of relatively simple optimised method dispatch functions available for use with SendSites, and all of these lookup functions are monomorphic. In future, however, the flexibility and rich type information gathered by SendSites are likely to be exploited by further reworking of the method dispatch process, and additional lookup function implementations. Some ideas under consideration include:

  • Polymorphic inline caches for use when a selector is found to resolve to different receivers. The most common receivers will be cached, and a quick scan of these receiver types will be performed before dropping back to the slow path if the receiver is not matched. This should improve dispatch performance for messages that commonly resolve to different receivers, such as to_s.
  • Making the dispatch process more modular and flexible to allow chaining, whereby steps in the method dispatch process can be chained together and performed one after another. This will be useful for preventing a proliferation of specialised dispatch functions in combination with other pointcut style functions, such as invoking the debugger or an instrumenting profiler. Instead, these steps could be optionally added/enabled for individual SendSites, providing a finer grain of control.

How Rubinius SendSites Work – Part 1

Posted in shotgun with tags , on March 19, 2008 by agardiner

Recently, Rubinius switched from using a simple method dispatch caching mechanism to using a significantly more powerful mechanism known as a SendSite. Over the next couple of posts, we’ll look into the Rubinius SendSite implementation, commencing with an overview of what SendSites are in part 1. In part 2, we’ll examine how SendSites are used in the method dispatch process.



Before we dive in and start looking at the Rubinius SendSite class, it may be worthwhile reviewing some of the terminology that will be used, and particularly, the origins of the term SendSite.

Ruby and Rubinius draw heavily on the Smalltalk language and implementation; within Smalltalk, perhaps the central concept is the idea of message passing, whereby objects interact via the sending of messages; we talk of objects sending messages to receivers and getting back responses. In practice, this is almost identical to saying that code calls a method and gets back a result, which is how the process is commonly described in most languages.

However, there is one key distinction: message sending makes clearer the concept of duck-typing, and encourages a coding style known as “Tell, Don’t Ask”. In Smalltalk and Ruby, we don’t really care what the type of the receiver is; we only care whether or not it can respond to the message we send. Similarly, in the “Tell, Don’t Ask” coding style, we tell receiver objects what we want them to do based on our internal state, we don’t ask the receiver for details of their state in order to make decisions. The result is that it is easier to replace the receiver object with another object that understands the same message, but perhaps performs the request in a different way.

What is a SendSite?

Ultimately, it is this very capability that complicates method dispatch in Ruby, and makes the use of method caching and other optimisations desirable: if the receiver class can change at any time, resolving exactly which implementation of the message to dispatch to cannot be determined definitively until the actual point-in-time when the message is dispatched. However, it is also true that most times, a given message send (i.e. send site) in a piece of code will resolve at dispatch time to the same receiving code (i.e. method)…

If we could therefore somehow cache the result of this method resolve process, the next time we reach the same send site, we can perform a quick check to determine if the receiving method is still the same as last time, and if so, use an optimised dispatch process. This could could range from the simple, such as jumping directly to the method code via a cached reference, to the complex, such as in-lining and JIT-ing frequently called methods into directly executable machine code at the send site.

The Rubinius SendSite, therefore, is an object that is created for every send site (method call) in the Rubinius bytecode, and facilitates these kinds of optimisations.

With that bit of background behind us, let’s dive in and see how Rubinius defines a SendSite…

SendSite: Half Ruby class, half C struct

We saw above that a SendSite represents a location in code where a message send (aka method call) takes place. At its most basic, a SendSite needs only record the name of the message that is to be sent; indeed, before SendSites were added, a reference to the Ruby symbol identifying the message name was all that was recorded in the Rubinius bytecode. However, by replacing the symbol of the message name with a data structure, we gain the ability to store additional information at the send site, and in particular, information that can be used to speed up method dispatch.

Rubinius SendSites, like a number of other core classes integral to the Shotgun VM, need to be accessible from both Ruby and C code. As most of the use of SendSite is in C code in the VM, and is performance critical, SendSite instance data is stored in the fields of a C struct:

The name of the message (i.e. method) this send site sends (calls)
A reference back to the CompiledMethod instance in which the send site exists.
A reference to the Selector instance corresponding to the message name (see Selectors below)
The receiver class
The CompiledMethod corresponding to this message on the receiver class, as encountered on the last dispatch. When a message is dispatched, this is the target object that needs to be located; it contains the bytecode for the method on the receiver.
The module
The primitive index if the SendSite resolves to a primitive method
A pointer to some C data;

  • For an FFI send site, holds the address of the FFI stub function to call.
  • For a primitive send site, holds the address of the primitive function to call.
hits, misses:
Counters for the number of times the SendSite has successfully and unsuccessfully cached the receiver method respectively.
A function pointer (functor) to the method lookup function that will be used by the SendSite to perform method dispatch.

Ruby code can access most fields of this C struct via the SendSite#at method, which is implemented as a Rubinius primitive.

The two most important data items in a SendSite are the symbol of the method name to which the SendSite relates, and the address of a lookup function to use to resolve the message name to a method object to which to dispatch. These two fields (and the reference to the containing CompiledMethod) are the only ones populated when a SendSite is initialized, and are sufficient to resolve a message send to a receiver method (albeit, via a slower path).


We saw above that a SendSite contains a reference to a Selector object. A Selector is an object that represents a message (i.e. method) name. It consists of the symbol of a message, plus an array of links back to every SendSite that uses the same message. This can be extremely useful, as it provides the ability to locate all direct uses of a particular message (although indirect uses such as via send and the various evals are not caught).

Selectors are not used in the method dispatch process; they exist solely to provide a reverse lookup for a given method name to the SendSites that use it. Nonetheless, this is an extremely useful capability; it is used to find and reset SendSites impacted by a redefinition of a method, and is also extremely handy for finding the messages most often used. In fact, it is this capability that lies behind the -ps and -pss flags that can be used when launching Shotgun; upon exiting, these flags cause a summary to be printed of the most frequently encountered Selectors and SendSites respectively:

ads@ads-kubuntu:~/rubinius$ shotgun/rubinius -ps 10 -e '0'

Total Selectors: 1168
Top 10, by receives:

name receives send sites
at 15694 131
equal? 13074 47
misses 12748 2
hits 12746 2
[] 11842 1180
kind_of? 5865 183
<= 4390 53
size 4293 225
hash 3967 11

Note that this shows the most frequently sent messages, which is not the same as the most frequently executed methods; for that, we need to know the receiver as well. For example, the method #at is the most frequently exexcuted message, but is actually distributed across three different receiver methods (Time#at, Tuple#at, and Array#at).

In Part 2, we’ll look at the lifecycle of a SendSite, and see how it influences the method dispatch process. Continue reading