Archive for the shotgun Category

Shotgun Rewrite Underway

Posted in News, shotgun with tags , on April 11, 2008 by agardiner

Some big changes are underway on the Rubinius VM at present: Shotgun is being completely rewritten! This change was brought about by some fairly significant rework required in order to change the behavior of argument evaluation in method calls.

Currently, Rubinius evaluates method arguments from right-to-left, whereas MatzRuby and JRuby evaluate arguments left-to-right. So code like the following:

a = [1,2,3]
foo(a.shift, a.shift, a.shift)

evaluates to foo(1,2,3) in MatzRuby and JRuby, but to foo(3,2,1) in Rubinius.

While it is generally considered unwise to rely on argument evaluation order (languages such as C specify that argument evaluation order is undefined, and at the discretion of the compiler writer), it turns out there is a significant base of Ruby code that does in fact depend upon this behavior, not the least being Rails’ ActiveSupport.

As a result, it was decided to rework Rubinius to also evaluate method arguments in left-to-right order. This requires changes to both the compiler and to the Shotgun VM, since the order in which arguments are passed on the stack now needs to be reversed.

From C to C++

While making the necessary changes to Shotgun, a tipping point was reached, and Evan decided to bite the bullet and re-write Shotgun in C++. The reasons he gave for this decision were as follows:

  1. Tests!: Shotgun had evolved from an initial prototype, and unlike the rest of the Rubinius code base, had very little in the way of test coverage. The substantial changes required to the VM internals to accommodate the argument order reversal, and the lack of tests to validate the changes, was the single biggest factor leading to the decision. The new VM aims to have 100% test coverage using CxxTest.
  2. Modularity: The opportunity provided by starting fresh, as well as the better code organisation capabilities (classes, namespaces, etc) provided by C++ mean that the new VM will be more modular, which should make it easier to extend and maintain.
  3. Better match between VM and Ruby semantics: The use of an object-oriented language for the VM provides a better semantic match to Ruby. Language support for method chaining, exceptions, and so forth mean that the VM implementation will more closely mirror the semantics of Ruby. This (combined with a cleaner architecture) should make it more understandable to Rubinius contributors, as well as potentially a better target for Garnet/Duby style code generation than C.
  4. STL: Many of the built-in types required by the VM (tuple, array, hash, list, etc) can be built on classes provided by the C++ Standard Template Library.
  5. Stronger typing: C++ is a stronger typed language than C, and this should help reduce problems such as the “Attempted field access on non-reference” errors that often occur when working on the VM.

The decision to rewrite Shotgun was a big one, and will certainly set back progress a little in the short-term. However, the cleaner architecture, test coverage, and other advantages accruing from the change should pay-off substantially over time.

How Rubinius SendSites Work – Part 2

Posted in shotgun with tags on April 1, 2008 by agardiner

In part 1 of this post, we introduced the concept of Rubinius SendSites and looked at the Ruby class / C struct used to represent them; in part 2, we will be looking at the life-cycle of SendSite objects, and in particular, how they are used to optimise the method dispatch process.


SendSite Instantiation

The lifecycle of a SendSite starts with instantiation, which happens in one of two ways:

  • when Ruby source is compiled to bytecode, and
  • when an .rbc (Rubinius compiled) file is unmarshaled.


SendSite objects are initially created during the bytecode compilation process; at all points in the compiled bytecode where a method call exists, a SendSite object is created (using for that message send site (see #send, #send_with_block, #send_with_register, and #send_super in lib/compiler/generator.rb). The resulting SendSite object is stored in the CompiledMethod literals tuple, and the index of this SendSite literal is inserted into the bytecode as the argument to the send_* opcode.

By way of example, take a look at the following simple hello_world.rb script:

puts "hello world"
puts "bye!"

Using the Rubinius debugger, we can examine the bytecode that is generated for this script (and which will be saved in compiled form as hello_world.rbc):

ads@ads-kubuntu:~/rubinius$ shotgun/rubinius -debug hello_world.rb[Debugger activated]
rbx:debug> d 0 25
   Bytecode instructions [0-25] in compiled method __script__:
           # line 1:       puts "hello world"
  => 0000: push_literal    "hello world"
     0002: string_dup
     0003: push_self
     0004: set_call_flags  1
     0006: send_stack      #<SendSite:0x39 name=puts hits=0 misses=0>, 1
     0009: pop
           # line 2:       puts "bye!"
     0010: push_literal    "bye!"
     0012: string_dup
     0013: push_self
     0014: set_call_flags  1
     0016: send_stack      #<SendSite:0x41 name=puts hits=0 misses=0>, 1
     0019: pop
     0020: push_true
     0021: sret

Here we can see two SendSite objects used on the two calls to the puts method. Notice in particular that each send instruction has its own distinct SendSite object, despite the same selector (puts) being used.

Unmarshaling .rbc files

When a Ruby source (.rb) file is first compiled, a corresponding .rbc file is also created; this compiled file will be used instead of the .rb file each subsequent time the source file is run or required, provided recompilation is not necessary. So the other place where SendSite objects can be instantiated is in the unmarshal_sendsite function in shotgun/lib/cpu_marshal.c.


Ultimately, whether created via compilation or unmarshaling, a SendSite object is created via a call to send_site_create in shotgun/lib/sendsite.c; (the Ruby method calls SendSite.create, which is implemented as a Rubinius primitive: a Ruby method whose body is implemented in C code, rather than Ruby).

The C function send_site_create initializes the SendSite struct, looking up the Selector from the method name, and setting the SendSite lookup function to _cpu_ss_basic, which is found in shotgun/lib/cpu_instructions.c. At this point, our SendSite is ready for action.

SendSites and Method Dispatch

(Note: The following description of the method dispatch process is likely to change in future, although the general principles should remain the same).

When a method call is performed, via the execution of a send_* instruction, the SendSite lookup function is used to determine what actions are taken to dispatch the method. The following code shows how the lookup function is used as a function pointer, and is lifted from cpu_send_message in shotgun/lib/cpu_instructions.c:

  ss = SENDSITE(msg->send_site);
  msg->state = state;
  msg->c = c;
  msg->name = ss->name;

The very first time a SendSite is used, the lookup function in the SendSite struct is set to _cpu_ss_basic as we saw above. This is just one of a number of different functions that can be used by a SendSite as the send site lookup function.


This is the slow path lookup function that uses no optimisations to dispatch a method. It calls cpu_lookup_method to find the method on the receiver (navigating up the superclass/metaclass hierarchy until it finds the method or falls back to method_missing), determines if the method is handled by method_missing or not, and then does a very important thing: it patches (modifies) the SendSite lookup function using either cpu_patch_mono or cpu_patch_missing. Next, it attempts to execute the method as a primitive, and then finally, calls cpu_perform, which is the function that actually sends the message by creating a new method context and activating it.

Once a send site has been dispatched the first time via this slow path, it will have been patched to use a more optimal lookup function, based upon the type of receiver/method that was found, so that subsequent sends from the same location use an optimised dispatch process represented by one of the specialised lookup functions described next.

Specialised lookup functions

Each of the following SendSite method lookup functions represents an optimised method dispatch process:

A lookup function that attempts to use a CompiledMethod cached in the SendSite from the last send at the same send site.
A lookup function that attempts to use the primitive whose index is cached in the SendSite from the last send at the same send site. Note that this lookup function is patched into a SendSite by the send_primitive instruction.
A lookup function that is used when a call to a native method using FFI is encountered. Note that this lookup function is patched into a SendSite by the primitive nfunc_call, which is provides the implementation of the FFI NativeFunction#call method.
A lookup function used when a receiver is found to contain no method matching the selector (method name). If the receiver is of the same class as the last send, it adds the method name to the list of arguments on the stack, and then dispatches to the cached method_missing implementation.
A lookup function that is used when a SendSite reaches a threshhold of misses (currently 10,000). It is the equivalent of the slow path in _cpu_ss_basic, but without any attempt to (re-)patch the lookup function. This ensures the SendSite uses the slow path on each dispatch, which is probably appropriate if the SendSite has missed this many times. This lookup function is patched into a SendSite by_cpu_ss_mono when it hits the threshhold.

Lookup function patching

Each time (other than the first) that a SendSite is used to dispatch a method call, a check needs to be performed to determine if the class of the receiver object matches that which is cached in the SendSite. If the receiver is the same, the optimised path represented by the current lookup function can proceed, and method dispatch is relatively swift. However, when the receiver class is different than the class cached on the SendSite, it is necessary to drop back to the slow approach represented by _cpu_ss_basic, find the appropriate method using the receiver class hierarchy, and then re-patch the lookup function based upon the current receiver object’s class.

Each of the above lookup functions (with the obvious exception of _cpu_ss_disabled) performs this same check at the start of the function, falling back to _cpu_ss_basic if the receiver class does not match. Similarly, we’ve seen above that _cpu_ss_basic handles the patching for _cpu_ss_mono and _cpu_ss_mono_missing, and described how the other special cases are handled.

Flushing the cache

Astute observers might be wondering “what happens when a method on a class is redefined?”. In this situation, any previously executed SendSites would be caching a now superseded CompiledMethod instance, and this would not be detected just by checking the receiver’s class during method dispatch.

The answer is that whenever a method is added or redefined, all SendSites using the method selector are reset to use _cpu_ss_basic. This is achieved using the Selector class, instances of which maintain a list of all SendSites using the given selector. See the function selector_clear_by_name in shotgun/lib/selector.c if you are interested in the details of how this is achieved.

Future Plans

At present, there are only a small number of relatively simple optimised method dispatch functions available for use with SendSites, and all of these lookup functions are monomorphic. In future, however, the flexibility and rich type information gathered by SendSites are likely to be exploited by further reworking of the method dispatch process, and additional lookup function implementations. Some ideas under consideration include:

  • Polymorphic inline caches for use when a selector is found to resolve to different receivers. The most common receivers will be cached, and a quick scan of these receiver types will be performed before dropping back to the slow path if the receiver is not matched. This should improve dispatch performance for messages that commonly resolve to different receivers, such as to_s.
  • Making the dispatch process more modular and flexible to allow chaining, whereby steps in the method dispatch process can be chained together and performed one after another. This will be useful for preventing a proliferation of specialised dispatch functions in combination with other pointcut style functions, such as invoking the debugger or an instrumenting profiler. Instead, these steps could be optionally added/enabled for individual SendSites, providing a finer grain of control.

How Rubinius SendSites Work – Part 1

Posted in shotgun with tags , on March 19, 2008 by agardiner

Recently, Rubinius switched from using a simple method dispatch caching mechanism to using a significantly more powerful mechanism known as a SendSite. Over the next couple of posts, we’ll look into the Rubinius SendSite implementation, commencing with an overview of what SendSites are in part 1. In part 2, we’ll examine how SendSites are used in the method dispatch process.



Before we dive in and start looking at the Rubinius SendSite class, it may be worthwhile reviewing some of the terminology that will be used, and particularly, the origins of the term SendSite.

Ruby and Rubinius draw heavily on the Smalltalk language and implementation; within Smalltalk, perhaps the central concept is the idea of message passing, whereby objects interact via the sending of messages; we talk of objects sending messages to receivers and getting back responses. In practice, this is almost identical to saying that code calls a method and gets back a result, which is how the process is commonly described in most languages.

However, there is one key distinction: message sending makes clearer the concept of duck-typing, and encourages a coding style known as “Tell, Don’t Ask”. In Smalltalk and Ruby, we don’t really care what the type of the receiver is; we only care whether or not it can respond to the message we send. Similarly, in the “Tell, Don’t Ask” coding style, we tell receiver objects what we want them to do based on our internal state, we don’t ask the receiver for details of their state in order to make decisions. The result is that it is easier to replace the receiver object with another object that understands the same message, but perhaps performs the request in a different way.

What is a SendSite?

Ultimately, it is this very capability that complicates method dispatch in Ruby, and makes the use of method caching and other optimisations desirable: if the receiver class can change at any time, resolving exactly which implementation of the message to dispatch to cannot be determined definitively until the actual point-in-time when the message is dispatched. However, it is also true that most times, a given message send (i.e. send site) in a piece of code will resolve at dispatch time to the same receiving code (i.e. method)…

If we could therefore somehow cache the result of this method resolve process, the next time we reach the same send site, we can perform a quick check to determine if the receiving method is still the same as last time, and if so, use an optimised dispatch process. This could could range from the simple, such as jumping directly to the method code via a cached reference, to the complex, such as in-lining and JIT-ing frequently called methods into directly executable machine code at the send site.

The Rubinius SendSite, therefore, is an object that is created for every send site (method call) in the Rubinius bytecode, and facilitates these kinds of optimisations.

With that bit of background behind us, let’s dive in and see how Rubinius defines a SendSite…

SendSite: Half Ruby class, half C struct

We saw above that a SendSite represents a location in code where a message send (aka method call) takes place. At its most basic, a SendSite needs only record the name of the message that is to be sent; indeed, before SendSites were added, a reference to the Ruby symbol identifying the message name was all that was recorded in the Rubinius bytecode. However, by replacing the symbol of the message name with a data structure, we gain the ability to store additional information at the send site, and in particular, information that can be used to speed up method dispatch.

Rubinius SendSites, like a number of other core classes integral to the Shotgun VM, need to be accessible from both Ruby and C code. As most of the use of SendSite is in C code in the VM, and is performance critical, SendSite instance data is stored in the fields of a C struct:

The name of the message (i.e. method) this send site sends (calls)
A reference back to the CompiledMethod instance in which the send site exists.
A reference to the Selector instance corresponding to the message name (see Selectors below)
The receiver class
The CompiledMethod corresponding to this message on the receiver class, as encountered on the last dispatch. When a message is dispatched, this is the target object that needs to be located; it contains the bytecode for the method on the receiver.
The module
The primitive index if the SendSite resolves to a primitive method
A pointer to some C data;

  • For an FFI send site, holds the address of the FFI stub function to call.
  • For a primitive send site, holds the address of the primitive function to call.
hits, misses:
Counters for the number of times the SendSite has successfully and unsuccessfully cached the receiver method respectively.
A function pointer (functor) to the method lookup function that will be used by the SendSite to perform method dispatch.

Ruby code can access most fields of this C struct via the SendSite#at method, which is implemented as a Rubinius primitive.

The two most important data items in a SendSite are the symbol of the method name to which the SendSite relates, and the address of a lookup function to use to resolve the message name to a method object to which to dispatch. These two fields (and the reference to the containing CompiledMethod) are the only ones populated when a SendSite is initialized, and are sufficient to resolve a message send to a receiver method (albeit, via a slower path).


We saw above that a SendSite contains a reference to a Selector object. A Selector is an object that represents a message (i.e. method) name. It consists of the symbol of a message, plus an array of links back to every SendSite that uses the same message. This can be extremely useful, as it provides the ability to locate all direct uses of a particular message (although indirect uses such as via send and the various evals are not caught).

Selectors are not used in the method dispatch process; they exist solely to provide a reverse lookup for a given method name to the SendSites that use it. Nonetheless, this is an extremely useful capability; it is used to find and reset SendSites impacted by a redefinition of a method, and is also extremely handy for finding the messages most often used. In fact, it is this capability that lies behind the -ps and -pss flags that can be used when launching Shotgun; upon exiting, these flags cause a summary to be printed of the most frequently encountered Selectors and SendSites respectively:

ads@ads-kubuntu:~/rubinius$ shotgun/rubinius -ps 10 -e '0'

Total Selectors: 1168
Top 10, by receives:

name receives send sites
at 15694 131
equal? 13074 47
misses 12748 2
hits 12746 2
[] 11842 1180
kind_of? 5865 183
<= 4390 53
size 4293 225
hash 3967 11

Note that this shows the most frequently sent messages, which is not the same as the most frequently executed methods; for that, we need to know the receiver as well. For example, the method #at is the most frequently exexcuted message, but is actually distributed across three different receiver methods (Time#at, Tuple#at, and Array#at).

In Part 2, we’ll look at the lifecycle of a SendSite, and see how it influences the method dispatch process. Continue reading

Shotgun: The Rubinius Virtual Machine

Posted in shotgun with tags , , , , on March 18, 2008 by agardiner

As I stated in my introductory post, I intend with this blog to delve into some of the implementation details of Rubinius. However, as I’ve contemplated various topics to write about, I’ve realised I first need to introduce some of the core underlying concepts and (Ruby) classes unique to Rubinius.

The most important of these (and the topic of this post) are those that relate to the Rubinius execution environment: the Shotgun Virtual Machine, and the various Ruby classes that provide access to Shotgun internals.


Shotgun: A Virtual Machine

As mentioned elsewhere, Rubinius is heavily influenced by the implementation of Smalltalk-80, and borrows many of the same concepts and even some of it’s class names from there. Like Smalltalk and Java, Rubinius compiles Ruby source code into a lower-level machine-independent instruction set that is executed on a virtual machine, known as Shotgun.

The Shotgun virtual machine has many similarities to a real computer, such as (virtual) CPUs and an instruction set, but also many higher-level abstractions (such as managed memory and a garbage collector), that make it easier to target as an execution environment for a high-level dynamic language such as Ruby.

Shotgun is currently written in C, although some portions of the source code are actually generated from Ruby (e.g. the opcode and primitive implementations are defined as embedded C code inside Ruby methods). In the future (post-1.0), the plan is to have more of the C code generated from Ruby or a Ruby-like language (Garnet), much as how Squeak (a Smalltalk implementation) implemented a virtual machine in Squeak.

Shotgun Architecture

Shotgun is written in a relatively clean and easy to follow style. It contains no global variables, and consists of a layered architecture: at the root is an environment, within which machines are instantiated. Each machine represents an entire Ruby/Rubinius virtual machine, and runs in its own native (OS) thread. Machines can communicate via an inter-machine message channel, but are otherwise totally separate and isolated.

Within a machine, there exists a virtual CPU, which runs one or more (green) threads. A Shotgun CPU effectively represents a native thread on the underlying hardware, whereas a Shotgun thread represents a Ruby thread. Just like a real CPU, the Shotgun virtual CPU pre-emptively multi-tasks (Shotgun) threads. At present, a Shotgun machine always has a single CPU, so all Shotugn threads within a single machine therefore execute on a single native thread. In the future (again, post-1.0) it is planned to implement what is known as an m:n threading model, whereby a pool of m native threads are used to execute n Ruby threads.

At the next level down from threads are what are known as tasks. Each Shotgun task maintains an operand stack (Shotgun is a stack-based VM) and a reference to the current execution context. Tasks are very similar to threads, but lack pre-emption or scheduling. In practice, they are similar to Ruby 1.9 fibres, although unlike fibres, there is currently no way to co-operatively multi-task (or yield to a co-routine) using Rubinius tasks.

A context represents something similar to a stack frame in C or Java. It represents the current execution context, and as such, it provides:

  • a link back to the caller of the current method;
  • a reference to the compiled method currently being executed;
  • instruction (IP), stack (SP), and frame (FP) pointers for the current instruction, current stack operand, and the operand stack pointer location at the commencement of the current method respectively;
  • the current scope for resolving constant and method lookups; and
  • storage for all local variables in the current scope.

Finally, each context has an associated compiled method, which contains the instruction bytecodes to be executed for the method to which the context relates. Compiled methods are the result of compiling Ruby source into Shotgun bytecode, and are the units of execution in Shotgun. A compiled method contains:

  • the bytecode instruction sequence that tells Shotgun what actions to take;
  • the number and names of any local variables used in the bytecode;
  • the static scope, used for resolving constant and method lookups; and
  • a tuple containing the literals contained in the source code that cannot be represented directly as opcode arguments (e.g. strings, symbols, method calls etc).

Key Rubinius Classes

Without further ado, let’s look at the Ruby classes that correspond to the concepts above… but this time, we’ll work from the bottom up.


In Rubinius, Ruby code is compiled down to bytecode , which is then executed by Shotgun, the Rubinius virtual machine. The compilation process is reasonably complex (see here for a detailed overview), but the end result is that Ruby code is converted into a sequence of integers, representing the VM opcodes and any arguments they take. The class that represents this bytecode in Rubinius is InstructionSequence, which is a sub-class of ByteArray.

The InstructionSequence class does not have many useful instance methods, since it is essentially a representation of the Shotgun machine language. However, the class source file defines a number of related classes for working with InstructionSequences that are useful, including:


Defines the full set of Shotgun instructions or opcodes, and includes useful metadata about each instruction. This includes information about the number and purpose of any opcode arguments, whether the opcode changes the flow of execution, the number of stack operands consumed and produced, etc.


This class is used to encode and decode an instruction sequence between symbolic and bytecode representations. It is used by the compiler, to convert a generated instruction sequence consisting of opcode symbols and arguments into the actual bytecode executed by Shotgun and saved to disk in .rbc files. It is also used by tools such as the debugger to disassemble the bytecode of a CompiledMethod into something that can be displayed on screen, or to modify bytecode to support debugging.


A CompiledMethod represents the compiled source code for a Ruby method (or top-level script, i.e. Ruby code that is not part of a method body). As such, a CompiledMethod contains an InstructionSequence instance containing the compiled bytecode for the method source, the number and names of any local variables used in the method, details of the method scope, and a whole bunch of other attributes.

A CompiledMethod is the main executable unit in a Rubinius program. It is the output created by the Rubinius compiler that is then passed to Shotgun for execution and/or persisted to disk. CompiledMethod instances can be obtained from any method definition using the #compiled_method accessor on a Method or UnboundMethod object.

CompiledMethod objects are also nested; each Ruby source file that is compiled by Rubinius creates a single top-level CompiledMethod object named __script__, which is then run when the (compiled) file is loaded. Any CompiledMethod can contain other CompiledMethod objects as literals; so when a Ruby script is executed that contains, for example, a def statement, the bytecode for the new method will be compiled into its own CompiledMethod object, and this CompiledMethod will then be added to the literals tuple of the containing CompiledMethod. From there, it can then be referenced by opcodes such as add_method, which hook a CompiledMethod up to a symbol in a method table.

MethodContext and BlockContext

The next level up from a CompiledMethod is an execution context, in the form of either a MethodContext or a BlockContext (depending upon whether we are dealing with the execution of a method or a block). Where a CompiledMethod represents the executable instructions for a given method or top-level script, an execution context represents the actual execution of Rubinius code.

MethodContext and BlockContext instances provide a way to inspect and modify the execution environment. Not surprisingly, they are therefore a key component enabling the Rubinius debugger to do its thing. However, they also make implementation of eval bindings and continuations almost trivial, since an execution context contains all the necessary details to resolve binding references relative to some other context (e.g. a caller’s context), and to save and restore execution state.


As we saw earlier, a Shotgun task maintains an operand stack and a reference to the current execution context. Tasks are also the building blocks for Ruby threads, and provide a way to transfer an execution context from one Ruby thread to another.

The Task class provides access to a the current execution contex, via Task#current_context, and to the operand stack (the latter being of interest primarily to the debugger).


The Thread class provides an implementation of the Ruby Thread class semantics using a combination of Ruby code, Tasks, and Rubinius (Shotgun) primitives: the execution context for a thread is maintained via an associated Task, and methods that control thread scheduling and execution are implemented as primitives.


In this post, we’ve introduced the Shotgun virtual machine, and looked at how it models an execution environment through the concepts of machines, cpus, tasks, etc. However, there is a good deal more to Shotgun that we’ve not even touched on, and which will have to be saved for a future post.

I hope you’ve found this post informative; feel free to ask questions, provide feedback, or indicate the areas you’d like to know more about using the comments facility below.