Shotgun: The Rubinius Virtual Machine
As I stated in my introductory post, I intend with this blog to delve into some of the implementation details of Rubinius. However, as I’ve contemplated various topics to write about, I’ve realised I first need to introduce some of the core underlying concepts and (Ruby) classes unique to Rubinius.
The most important of these (and the topic of this post) are those that relate to the Rubinius execution environment: the Shotgun Virtual Machine, and the various Ruby classes that provide access to Shotgun internals.
Shotgun: A Virtual Machine
As mentioned elsewhere, Rubinius is heavily influenced by the implementation of Smalltalk-80, and borrows many of the same concepts and even some of it’s class names from there. Like Smalltalk and Java, Rubinius compiles Ruby source code into a lower-level machine-independent instruction set that is executed on a virtual machine, known as Shotgun.
The Shotgun virtual machine has many similarities to a real computer, such as (virtual) CPUs and an instruction set, but also many higher-level abstractions (such as managed memory and a garbage collector), that make it easier to target as an execution environment for a high-level dynamic language such as Ruby.
Shotgun is currently written in C, although some portions of the source code are actually generated from Ruby (e.g. the opcode and primitive implementations are defined as embedded C code inside Ruby methods). In the future (post-1.0), the plan is to have more of the C code generated from Ruby or a Ruby-like language (Garnet), much as how Squeak (a Smalltalk implementation) implemented a virtual machine in Squeak.
Shotgun is written in a relatively clean and easy to follow style. It contains no global variables, and consists of a layered architecture: at the root is an environment, within which machines are instantiated. Each machine represents an entire Ruby/Rubinius virtual machine, and runs in its own native (OS) thread. Machines can communicate via an inter-machine message channel, but are otherwise totally separate and isolated.
Within a machine, there exists a virtual CPU, which runs one or more (green) threads. A Shotgun CPU effectively represents a native thread on the underlying hardware, whereas a Shotgun thread represents a Ruby thread. Just like a real CPU, the Shotgun virtual CPU pre-emptively multi-tasks (Shotgun) threads. At present, a Shotgun machine always has a single CPU, so all Shotugn threads within a single machine therefore execute on a single native thread. In the future (again, post-1.0) it is planned to implement what is known as an m:n threading model, whereby a pool of m native threads are used to execute n Ruby threads.
At the next level down from threads are what are known as tasks. Each Shotgun task maintains an operand stack (Shotgun is a stack-based VM) and a reference to the current execution context. Tasks are very similar to threads, but lack pre-emption or scheduling. In practice, they are similar to Ruby 1.9 fibres, although unlike fibres, there is currently no way to co-operatively multi-task (or yield to a co-routine) using Rubinius tasks.
A context represents something similar to a stack frame in C or Java. It represents the current execution context, and as such, it provides:
- a link back to the caller of the current method;
- a reference to the compiled method currently being executed;
- instruction (IP), stack (SP), and frame (FP) pointers for the current instruction, current stack operand, and the operand stack pointer location at the commencement of the current method respectively;
- the current scope for resolving constant and method lookups; and
- storage for all local variables in the current scope.
Finally, each context has an associated compiled method, which contains the instruction bytecodes to be executed for the method to which the context relates. Compiled methods are the result of compiling Ruby source into Shotgun bytecode, and are the units of execution in Shotgun. A compiled method contains:
- the bytecode instruction sequence that tells Shotgun what actions to take;
- the number and names of any local variables used in the bytecode;
- the static scope, used for resolving constant and method lookups; and
- a tuple containing the literals contained in the source code that cannot be represented directly as opcode arguments (e.g. strings, symbols, method calls etc).
Key Rubinius Classes
Without further ado, let’s look at the Ruby classes that correspond to the concepts above… but this time, we’ll work from the bottom up.
In Rubinius, Ruby code is compiled down to bytecode , which is then executed by Shotgun, the Rubinius virtual machine. The compilation process is reasonably complex (see here for a detailed overview), but the end result is that Ruby code is converted into a sequence of integers, representing the VM opcodes and any arguments they take. The class that represents this bytecode in Rubinius is InstructionSequence, which is a sub-class of ByteArray.
The InstructionSequence class does not have many useful instance methods, since it is essentially a representation of the Shotgun machine language. However, the class source file defines a number of related classes for working with InstructionSequences that are useful, including:
Defines the full set of Shotgun instructions or opcodes, and includes useful metadata about each instruction. This includes information about the number and purpose of any opcode arguments, whether the opcode changes the flow of execution, the number of stack operands consumed and produced, etc.
This class is used to encode and decode an instruction sequence between symbolic and bytecode representations. It is used by the compiler, to convert a generated instruction sequence consisting of opcode symbols and arguments into the actual bytecode executed by Shotgun and saved to disk in .rbc files. It is also used by tools such as the debugger to disassemble the bytecode of a CompiledMethod into something that can be displayed on screen, or to modify bytecode to support debugging.
A CompiledMethod represents the compiled source code for a Ruby method (or top-level script, i.e. Ruby code that is not part of a method body). As such, a CompiledMethod contains an InstructionSequence instance containing the compiled bytecode for the method source, the number and names of any local variables used in the method, details of the method scope, and a whole bunch of other attributes.
A CompiledMethod is the main executable unit in a Rubinius program. It is the output created by the Rubinius compiler that is then passed to Shotgun for execution and/or persisted to disk. CompiledMethod instances can be obtained from any method definition using the #compiled_method accessor on a Method or UnboundMethod object.
CompiledMethod objects are also nested; each Ruby source file that is compiled by Rubinius creates a single top-level CompiledMethod object named __script__, which is then run when the (compiled) file is loaded. Any CompiledMethod can contain other CompiledMethod objects as literals; so when a Ruby script is executed that contains, for example, a def statement, the bytecode for the new method will be compiled into its own CompiledMethod object, and this CompiledMethod will then be added to the literals tuple of the containing CompiledMethod. From there, it can then be referenced by opcodes such as add_method, which hook a CompiledMethod up to a symbol in a method table.
MethodContext and BlockContext
The next level up from a CompiledMethod is an execution context, in the form of either a MethodContext or a BlockContext (depending upon whether we are dealing with the execution of a method or a block). Where a CompiledMethod represents the executable instructions for a given method or top-level script, an execution context represents the actual execution of Rubinius code.
MethodContext and BlockContext instances provide a way to inspect and modify the execution environment. Not surprisingly, they are therefore a key component enabling the Rubinius debugger to do its thing. However, they also make implementation of eval bindings and continuations almost trivial, since an execution context contains all the necessary details to resolve binding references relative to some other context (e.g. a caller’s context), and to save and restore execution state.
As we saw earlier, a Shotgun task maintains an operand stack and a reference to the current execution context. Tasks are also the building blocks for Ruby threads, and provide a way to transfer an execution context from one Ruby thread to another.
The Task class provides access to a the current execution contex, via Task#current_context, and to the operand stack (the latter being of interest primarily to the debugger).
The Thread class provides an implementation of the Ruby Thread class semantics using a combination of Ruby code, Tasks, and Rubinius (Shotgun) primitives: the execution context for a thread is maintained via an associated Task, and methods that control thread scheduling and execution are implemented as primitives.
In this post, we’ve introduced the Shotgun virtual machine, and looked at how it models an execution environment through the concepts of machines, cpus, tasks, etc. However, there is a good deal more to Shotgun that we’ve not even touched on, and which will have to be saved for a future post.
I hope you’ve found this post informative; feel free to ask questions, provide feedback, or indicate the areas you’d like to know more about using the comments facility below.