Archive for April, 2008

Rubinius on GitHub

Posted in News with tags , on April 11, 2008 by agardiner

With all the recent press given to Git and GitHub, I thought it worth mentioning that, while the main Rubinius Git repository continues to be hosted at, there is now a post-commit hook that pushes all commits onto the Rubinius GitHub repository.

So you can now take advantage of the great GitHub features such as news feeds, effortless forking, source code browsing etc against the Rubinius source. Neat!


Shotgun Rewrite Underway

Posted in News, shotgun with tags , on April 11, 2008 by agardiner

Some big changes are underway on the Rubinius VM at present: Shotgun is being completely rewritten! This change was brought about by some fairly significant rework required in order to change the behavior of argument evaluation in method calls.

Currently, Rubinius evaluates method arguments from right-to-left, whereas MatzRuby and JRuby evaluate arguments left-to-right. So code like the following:

a = [1,2,3]
foo(a.shift, a.shift, a.shift)

evaluates to foo(1,2,3) in MatzRuby and JRuby, but to foo(3,2,1) in Rubinius.

While it is generally considered unwise to rely on argument evaluation order (languages such as C specify that argument evaluation order is undefined, and at the discretion of the compiler writer), it turns out there is a significant base of Ruby code that does in fact depend upon this behavior, not the least being Rails’ ActiveSupport.

As a result, it was decided to rework Rubinius to also evaluate method arguments in left-to-right order. This requires changes to both the compiler and to the Shotgun VM, since the order in which arguments are passed on the stack now needs to be reversed.

From C to C++

While making the necessary changes to Shotgun, a tipping point was reached, and Evan decided to bite the bullet and re-write Shotgun in C++. The reasons he gave for this decision were as follows:

  1. Tests!: Shotgun had evolved from an initial prototype, and unlike the rest of the Rubinius code base, had very little in the way of test coverage. The substantial changes required to the VM internals to accommodate the argument order reversal, and the lack of tests to validate the changes, was the single biggest factor leading to the decision. The new VM aims to have 100% test coverage using CxxTest.
  2. Modularity: The opportunity provided by starting fresh, as well as the better code organisation capabilities (classes, namespaces, etc) provided by C++ mean that the new VM will be more modular, which should make it easier to extend and maintain.
  3. Better match between VM and Ruby semantics: The use of an object-oriented language for the VM provides a better semantic match to Ruby. Language support for method chaining, exceptions, and so forth mean that the VM implementation will more closely mirror the semantics of Ruby. This (combined with a cleaner architecture) should make it more understandable to Rubinius contributors, as well as potentially a better target for Garnet/Duby style code generation than C.
  4. STL: Many of the built-in types required by the VM (tuple, array, hash, list, etc) can be built on classes provided by the C++ Standard Template Library.
  5. Stronger typing: C++ is a stronger typed language than C, and this should help reduce problems such as the “Attempted field access on non-reference” errors that often occur when working on the VM.

The decision to rewrite Shotgun was a big one, and will certainly set back progress a little in the short-term. However, the cleaner architecture, test coverage, and other advantages accruing from the change should pay-off substantially over time.

Building Rubinius in Ruby

Posted in News with tags , on April 2, 2008 by agardiner

I had intended to write a post in the near future about why building Rubinius in Ruby was important – but I see today that Mathieu Martin has beaten me to it, with the first in a series of articles on Rubinius. In Part 1: Rubies all the way down, he makes the case for building Rubinius in Ruby, setting out the pros and dispelling the myths about dynamic languages being too slow to self-host. It’s a great read, and a much more eloquent argument than I could have made.

I’m certainly looking forward to future articles in this series!

How Rubinius SendSites Work – Part 2

Posted in shotgun with tags on April 1, 2008 by agardiner

In part 1 of this post, we introduced the concept of Rubinius SendSites and looked at the Ruby class / C struct used to represent them; in part 2, we will be looking at the life-cycle of SendSite objects, and in particular, how they are used to optimise the method dispatch process.


SendSite Instantiation

The lifecycle of a SendSite starts with instantiation, which happens in one of two ways:

  • when Ruby source is compiled to bytecode, and
  • when an .rbc (Rubinius compiled) file is unmarshaled.


SendSite objects are initially created during the bytecode compilation process; at all points in the compiled bytecode where a method call exists, a SendSite object is created (using for that message send site (see #send, #send_with_block, #send_with_register, and #send_super in lib/compiler/generator.rb). The resulting SendSite object is stored in the CompiledMethod literals tuple, and the index of this SendSite literal is inserted into the bytecode as the argument to the send_* opcode.

By way of example, take a look at the following simple hello_world.rb script:

puts "hello world"
puts "bye!"

Using the Rubinius debugger, we can examine the bytecode that is generated for this script (and which will be saved in compiled form as hello_world.rbc):

ads@ads-kubuntu:~/rubinius$ shotgun/rubinius -debug hello_world.rb[Debugger activated]
rbx:debug> d 0 25
   Bytecode instructions [0-25] in compiled method __script__:
           # line 1:       puts "hello world"
  => 0000: push_literal    "hello world"
     0002: string_dup
     0003: push_self
     0004: set_call_flags  1
     0006: send_stack      #<SendSite:0x39 name=puts hits=0 misses=0>, 1
     0009: pop
           # line 2:       puts "bye!"
     0010: push_literal    "bye!"
     0012: string_dup
     0013: push_self
     0014: set_call_flags  1
     0016: send_stack      #<SendSite:0x41 name=puts hits=0 misses=0>, 1
     0019: pop
     0020: push_true
     0021: sret

Here we can see two SendSite objects used on the two calls to the puts method. Notice in particular that each send instruction has its own distinct SendSite object, despite the same selector (puts) being used.

Unmarshaling .rbc files

When a Ruby source (.rb) file is first compiled, a corresponding .rbc file is also created; this compiled file will be used instead of the .rb file each subsequent time the source file is run or required, provided recompilation is not necessary. So the other place where SendSite objects can be instantiated is in the unmarshal_sendsite function in shotgun/lib/cpu_marshal.c.


Ultimately, whether created via compilation or unmarshaling, a SendSite object is created via a call to send_site_create in shotgun/lib/sendsite.c; (the Ruby method calls SendSite.create, which is implemented as a Rubinius primitive: a Ruby method whose body is implemented in C code, rather than Ruby).

The C function send_site_create initializes the SendSite struct, looking up the Selector from the method name, and setting the SendSite lookup function to _cpu_ss_basic, which is found in shotgun/lib/cpu_instructions.c. At this point, our SendSite is ready for action.

SendSites and Method Dispatch

(Note: The following description of the method dispatch process is likely to change in future, although the general principles should remain the same).

When a method call is performed, via the execution of a send_* instruction, the SendSite lookup function is used to determine what actions are taken to dispatch the method. The following code shows how the lookup function is used as a function pointer, and is lifted from cpu_send_message in shotgun/lib/cpu_instructions.c:

  ss = SENDSITE(msg->send_site);
  msg->state = state;
  msg->c = c;
  msg->name = ss->name;

The very first time a SendSite is used, the lookup function in the SendSite struct is set to _cpu_ss_basic as we saw above. This is just one of a number of different functions that can be used by a SendSite as the send site lookup function.


This is the slow path lookup function that uses no optimisations to dispatch a method. It calls cpu_lookup_method to find the method on the receiver (navigating up the superclass/metaclass hierarchy until it finds the method or falls back to method_missing), determines if the method is handled by method_missing or not, and then does a very important thing: it patches (modifies) the SendSite lookup function using either cpu_patch_mono or cpu_patch_missing. Next, it attempts to execute the method as a primitive, and then finally, calls cpu_perform, which is the function that actually sends the message by creating a new method context and activating it.

Once a send site has been dispatched the first time via this slow path, it will have been patched to use a more optimal lookup function, based upon the type of receiver/method that was found, so that subsequent sends from the same location use an optimised dispatch process represented by one of the specialised lookup functions described next.

Specialised lookup functions

Each of the following SendSite method lookup functions represents an optimised method dispatch process:

A lookup function that attempts to use a CompiledMethod cached in the SendSite from the last send at the same send site.
A lookup function that attempts to use the primitive whose index is cached in the SendSite from the last send at the same send site. Note that this lookup function is patched into a SendSite by the send_primitive instruction.
A lookup function that is used when a call to a native method using FFI is encountered. Note that this lookup function is patched into a SendSite by the primitive nfunc_call, which is provides the implementation of the FFI NativeFunction#call method.
A lookup function used when a receiver is found to contain no method matching the selector (method name). If the receiver is of the same class as the last send, it adds the method name to the list of arguments on the stack, and then dispatches to the cached method_missing implementation.
A lookup function that is used when a SendSite reaches a threshhold of misses (currently 10,000). It is the equivalent of the slow path in _cpu_ss_basic, but without any attempt to (re-)patch the lookup function. This ensures the SendSite uses the slow path on each dispatch, which is probably appropriate if the SendSite has missed this many times. This lookup function is patched into a SendSite by_cpu_ss_mono when it hits the threshhold.

Lookup function patching

Each time (other than the first) that a SendSite is used to dispatch a method call, a check needs to be performed to determine if the class of the receiver object matches that which is cached in the SendSite. If the receiver is the same, the optimised path represented by the current lookup function can proceed, and method dispatch is relatively swift. However, when the receiver class is different than the class cached on the SendSite, it is necessary to drop back to the slow approach represented by _cpu_ss_basic, find the appropriate method using the receiver class hierarchy, and then re-patch the lookup function based upon the current receiver object’s class.

Each of the above lookup functions (with the obvious exception of _cpu_ss_disabled) performs this same check at the start of the function, falling back to _cpu_ss_basic if the receiver class does not match. Similarly, we’ve seen above that _cpu_ss_basic handles the patching for _cpu_ss_mono and _cpu_ss_mono_missing, and described how the other special cases are handled.

Flushing the cache

Astute observers might be wondering “what happens when a method on a class is redefined?”. In this situation, any previously executed SendSites would be caching a now superseded CompiledMethod instance, and this would not be detected just by checking the receiver’s class during method dispatch.

The answer is that whenever a method is added or redefined, all SendSites using the method selector are reset to use _cpu_ss_basic. This is achieved using the Selector class, instances of which maintain a list of all SendSites using the given selector. See the function selector_clear_by_name in shotgun/lib/selector.c if you are interested in the details of how this is achieved.

Future Plans

At present, there are only a small number of relatively simple optimised method dispatch functions available for use with SendSites, and all of these lookup functions are monomorphic. In future, however, the flexibility and rich type information gathered by SendSites are likely to be exploited by further reworking of the method dispatch process, and additional lookup function implementations. Some ideas under consideration include:

  • Polymorphic inline caches for use when a selector is found to resolve to different receivers. The most common receivers will be cached, and a quick scan of these receiver types will be performed before dropping back to the slow path if the receiver is not matched. This should improve dispatch performance for messages that commonly resolve to different receivers, such as to_s.
  • Making the dispatch process more modular and flexible to allow chaining, whereby steps in the method dispatch process can be chained together and performed one after another. This will be useful for preventing a proliferation of specialised dispatch functions in combination with other pointcut style functions, such as invoking the debugger or an instrumenting profiler. Instead, these steps could be optionally added/enabled for individual SendSites, providing a finer grain of control.