How Rubinius SendSites Work – Part 2
In part 1 of this post, we introduced the concept of Rubinius SendSites and looked at the Ruby class / C struct used to represent them; in part 2, we will be looking at the life-cycle of SendSite objects, and in particular, how they are used to optimise the method dispatch process.
The lifecycle of a SendSite starts with instantiation, which happens in one of two ways:
- when Ruby source is compiled to bytecode, and
- when an .rbc (Rubinius compiled) file is unmarshaled.
SendSite objects are initially created during the bytecode compilation process; at all points in the compiled bytecode where a method call exists, a SendSite object is created (using SendSite.new) for that message send site (see #send, #send_with_block, #send_with_register, and #send_super in lib/compiler/generator.rb). The resulting SendSite object is stored in the CompiledMethod literals tuple, and the index of this SendSite literal is inserted into the bytecode as the argument to the send_* opcode.
By way of example, take a look at the following simple hello_world.rb script:
puts "hello world" puts "bye!"
Using the Rubinius debugger, we can examine the bytecode that is generated for this script (and which will be saved in compiled form as hello_world.rbc):
ads@ads-kubuntu:~/rubinius$ shotgun/rubinius -debug hello_world.rb[Debugger activated] rbx:debug> d 0 25 Bytecode instructions [0-25] in compiled method __script__: # line 1: puts "hello world" => 0000: push_literal "hello world" 0002: string_dup 0003: push_self 0004: set_call_flags 1 0006: send_stack #<SendSite:0x39 name=puts hits=0 misses=0>, 1 0009: pop # line 2: puts "bye!" 0010: push_literal "bye!" 0012: string_dup 0013: push_self 0014: set_call_flags 1 0016: send_stack #<SendSite:0x41 name=puts hits=0 misses=0>, 1 0019: pop 0020: push_true 0021: sret
Here we can see two SendSite objects used on the two calls to the puts method. Notice in particular that each send instruction has its own distinct SendSite object, despite the same selector (puts) being used.
Unmarshaling .rbc files
When a Ruby source (.rb) file is first compiled, a corresponding .rbc file is also created; this compiled file will be used instead of the .rb file each subsequent time the source file is run or required, provided recompilation is not necessary. So the other place where SendSite objects can be instantiated is in the unmarshal_sendsite function in shotgun/lib/cpu_marshal.c.
Ultimately, whether created via compilation or unmarshaling, a SendSite object is created via a call to send_site_create in shotgun/lib/sendsite.c; (the SendSite.new Ruby method calls SendSite.create, which is implemented as a Rubinius primitive: a Ruby method whose body is implemented in C code, rather than Ruby).
The C function send_site_create initializes the SendSite struct, looking up the Selector from the method name, and setting the SendSite lookup function to _cpu_ss_basic, which is found in shotgun/lib/cpu_instructions.c. At this point, our SendSite is ready for action.
SendSites and Method Dispatch
(Note: The following description of the method dispatch process is likely to change in future, although the general principles should remain the same).
When a method call is performed, via the execution of a send_* instruction, the SendSite lookup function is used to determine what actions are taken to dispatch the method. The following code shows how the lookup function is used as a function pointer, and is lifted from cpu_send_message in shotgun/lib/cpu_instructions.c:
ss = SENDSITE(msg->send_site); msg->state = state; msg->c = c; msg->name = ss->name; ss->lookup(msg);
The very first time a SendSite is used, the lookup function in the SendSite struct is set to _cpu_ss_basic as we saw above. This is just one of a number of different functions that can be used by a SendSite as the send site lookup function.
This is the slow path lookup function that uses no optimisations to dispatch a method. It calls cpu_lookup_method to find the method on the receiver (navigating up the superclass/metaclass hierarchy until it finds the method or falls back to method_missing), determines if the method is handled by method_missing or not, and then does a very important thing: it patches (modifies) the SendSite lookup function using either cpu_patch_mono or cpu_patch_missing. Next, it attempts to execute the method as a primitive, and then finally, calls cpu_perform, which is the function that actually sends the message by creating a new method context and activating it.
Once a send site has been dispatched the first time via this slow path, it will have been patched to use a more optimal lookup function, based upon the type of receiver/method that was found, so that subsequent sends from the same location use an optimised dispatch process represented by one of the specialised lookup functions described next.
Specialised lookup functions
Each of the following SendSite method lookup functions represents an optimised method dispatch process:
- A lookup function that attempts to use a CompiledMethod cached in the SendSite from the last send at the same send site.
- A lookup function that attempts to use the primitive whose index is cached in the SendSite from the last send at the same send site. Note that this lookup function is patched into a SendSite by the send_primitive instruction.
- A lookup function that is used when a call to a native method using FFI is encountered. Note that this lookup function is patched into a SendSite by the primitive nfunc_call, which is provides the implementation of the FFI NativeFunction#call method.
- A lookup function used when a receiver is found to contain no method matching the selector (method name). If the receiver is of the same class as the last send, it adds the method name to the list of arguments on the stack, and then dispatches to the cached method_missing implementation.
- A lookup function that is used when a SendSite reaches a threshhold of misses (currently 10,000). It is the equivalent of the slow path in _cpu_ss_basic, but without any attempt to (re-)patch the lookup function. This ensures the SendSite uses the slow path on each dispatch, which is probably appropriate if the SendSite has missed this many times. This lookup function is patched into a SendSite by_cpu_ss_mono when it hits the threshhold.
Lookup function patching
Each time (other than the first) that a SendSite is used to dispatch a method call, a check needs to be performed to determine if the class of the receiver object matches that which is cached in the SendSite. If the receiver is the same, the optimised path represented by the current lookup function can proceed, and method dispatch is relatively swift. However, when the receiver class is different than the class cached on the SendSite, it is necessary to drop back to the slow approach represented by _cpu_ss_basic, find the appropriate method using the receiver class hierarchy, and then re-patch the lookup function based upon the current receiver object’s class.
Each of the above lookup functions (with the obvious exception of _cpu_ss_disabled) performs this same check at the start of the function, falling back to _cpu_ss_basic if the receiver class does not match. Similarly, we’ve seen above that _cpu_ss_basic handles the patching for _cpu_ss_mono and _cpu_ss_mono_missing, and described how the other special cases are handled.
Flushing the cache
Astute observers might be wondering “what happens when a method on a class is redefined?”. In this situation, any previously executed SendSites would be caching a now superseded CompiledMethod instance, and this would not be detected just by checking the receiver’s class during method dispatch.
The answer is that whenever a method is added or redefined, all SendSites using the method selector are reset to use _cpu_ss_basic. This is achieved using the Selector class, instances of which maintain a list of all SendSites using the given selector. See the function selector_clear_by_name in shotgun/lib/selector.c if you are interested in the details of how this is achieved.
At present, there are only a small number of relatively simple optimised method dispatch functions available for use with SendSites, and all of these lookup functions are monomorphic. In future, however, the flexibility and rich type information gathered by SendSites are likely to be exploited by further reworking of the method dispatch process, and additional lookup function implementations. Some ideas under consideration include:
- Polymorphic inline caches for use when a selector is found to resolve to different receivers. The most common receivers will be cached, and a quick scan of these receiver types will be performed before dropping back to the slow path if the receiver is not matched. This should improve dispatch performance for messages that commonly resolve to different receivers, such as to_s.
- Making the dispatch process more modular and flexible to allow chaining, whereby steps in the method dispatch process can be chained together and performed one after another. This will be useful for preventing a proliferation of specialised dispatch functions in combination with other pointcut style functions, such as invoking the debugger or an instrumenting profiler. Instead, these steps could be optionally added/enabled for individual SendSites, providing a finer grain of control.