SonicVM: There Is a Need for Speed, but Let’s Not Crash

SonicVM: There Is a Need for Speed, but Let’s Not Crash

Ethereum has established an industry standard for blockchain block processing, providing a comprehensive range of developer tools and an extensive ecosystem for decentralized applications.

However, the Ethereum Virtual Machine (EVM) is known to be limited in its scalability. To enhance EVM scalability, Sonic Labs has developed a new high-performance virtual machine, SonicVM. To achieve scalability, the Sonic Labs tech team applied numerous virtual machine optimizations to the EVM.

For each optimization, a fundamental engineering question arises: Does the SonicVM remain EVM-compatible? This is not only a question concerning the SonicVM, but it in fact leads us to a fundamental question in computer engineering: does an optimized system work correctly? 

For a computer engineer, it is usually easier to show the correctness of an unoptimized and simple system. The reason is that a common optimization strategy involves identifying frequently executed paths in the system and specializing them.

As a consequence, optimized systems, such as virtual machines, have many specialized execution paths and become highly complex, making it challenging to assert compliance with the EVM standard. For the SonicVM, we have achieved compliance using Conformance Testing (CT). This testing approach systematically generates tests (including corner cases) of a virtual machine, achieving a very high testing coverage even for many specialised execution paths in the system. 

Performance Engineering

SonicVM’s optimizations are rooted in empirics — we observe the virtual machine's behavior by running real-world transactions from various mainnets, including Ethereum, Fantom, and Sonic, using the Substates framework.

The Substate framework enables the execution of transactions in isolation by simulating the network components. We measure the VM’s bottlenecks with the substate framework and devise new optimizations to overcome them. We iterate over the process until the virtual machine's performance exceeds our expectations using ideas from Performance Engineering.

The process, as illustrated above, has three steps:

  1. Profile and measure.
  2. Identify and study the bottlenecks.
  3. Improve the code and continue with a new cycle until the performance of the SonicVM is sufficient. 

Optimizations

Currently, the EVM bytecode is limited to a maximum of 256 opcodes, including special PUSHx instructions that interleave opcodes and data in the EVM bytecode. The interleaving of code instructions and data keeps the EVM simple, but it does not come without drawbacks, as the instruction length is dependent on the type of the PUSHx instruction where x is the size of the constant between 1 and 32 bytes. 

We introduced a novel Long-Format encoding for smart contracts. SonicVM’s Long-Format encoding accommodates a larger number of different instructions (up to 65,536) and constant arguments for instructions. One of the advantages of this format is to encode frequent PUSHx more efficiently.

For example, to push 32 bytes (the size of a machine word in the EVM) of data onto the stack in the EVM with the PUSH32 instruction, it will require reading 33 bytes in the code array for decoding the PUSH32 instruction. The first byte would be the opcode 0x7f followed by 32 data bytes. While decoding the PUSH32 instruction, an out-of-bounds check is performed on each byte of the PUSH32 operation, which is costly. By extending the instruction with a 2-byte argument, the number of bounds checks can be significantly reduced. For push instructions followed by more than 2 bytes of data, a new DATA instruction is introduced.

The translation from EVM bytecode to Long-Format encoding incurs an extra code. This cost is amortized by storing the translated smart contract in a code cache, such that for another execution of the smart contract, the Long-Format encoded smart contract can be directly executed. The diagram below shows the just-in-time translation of the EVM bytecode to the Long-Format encoding:

The converter converts the EVM bytecode to the LFVM code. The converter is only invoked if the smart contract cannot be found in the code cache. At transaction time, when the smart contract is executed, the long-format representation of the smart contract is used to execute it.

The translation to Long-Format has another advantage: the jump instructions in the EVM must be statically checked to determine whether their jump destinations are legitimate. The analysis is a static preprocessing step that identifies all valid jump destinations within a smart contract’s bytecode. In the EVM, jump instructions, such as JUMP and JUMPI, are only allowed to target locations explicitly marked by JUMPDEST.

To enforce this constraint, the EVM scans the entire bytecode before execution and builds a table of valid jump destinations whose first instruction is a JUMPDEST instruction. This table is then used at runtime to ensure that all jumps go to safe and intended locations, preventing code from jumping into data that may cause a security vulnerability.

SonicVM performs the expensive JUMPDEST analysis and the conversion to Long-Format encoding together. The JUMPDEST opcodes are no longer checked in the Long-Format representation, as they have already been verified at the translation time from EVM bytecode to the Long-Format encoding. In addition, the Long-Format EVM segments the bytecode into code and data segments. The segmentation is related to the jump analysis, and the SonicVM has special instructions for commencing a data segment.

Another optimization is related to hash functions. Smart contracts frequently use hash functions, which can become a bottleneck when left unoptimized. Since this task is computationally intensive, the SonicVM caches expensive hashing computations that frequently occur on the same input. We further optimize the computation of hash functions by using a C implementation of Keccak256 rather than the native Go implementation, which is slower. Other techniques are more technical, such as pre-allocated maximum size stacks and function inlining tricks for dispatching instructions, which can further enhance performance. 

Overall, we achieve an average gas rate of 1,300 MGas/s on the Sonic mainnet for the first 11 million blocks, measuring block processing with the substate framework using the Carmen S5 database. The experiment has been performed on an AMD Ryzen 9 7950X3D processor with 124 GB of RAM and NVMe drives. We have run the same experiment with Geth’s EVM using a recent Go-Ethereum version 1.15.0. The Geth’s EVM achieves only 200MGas/s. Hence, our SonicVM is 6.5x faster than Geth’s EVM for block processing. 

Conformance Testing

The question is, when is a VM such as SonicVM EVM compatible? EVM compatibility is primarily defined by the reference implementation, Geth, and changes constantly with new revisions, such as Cancun and Prague (the most recent reincarnations of the EVM).

Revisions add new instructions, such as transient storage operations, and may change the gas consumption for instructions. Though it is believed there is a single EVM, in practice, there are fourteen different EVMs. Each EVM behaves slightly differently from all the other revisions and is only active for specific segments of the blockchain. Processing of old blocks is needed (with the assigned EVM variant) for processing historical blocks.

The specification of virtual machines (and programming languages in general) is not a new topic in the theory of programming languages (a classical field in computer science). It has been explored since the 1970s through various semantic frameworks, including denotational semantics, operational semantics, and numerous other attempts (axiomatic, abstract state machines, etc.) to formally describe the computations of a virtual machine or programming languages in general. 

Although the Ethereum community recognized the need for a specification for the EVM and its revisions, none of the currently available specifications are suitable for machine-based verification or validation of an EVM implementation. For example, the Yellow Paper attempts to capture EVM semantics in a semi-formal manner, but it is neither executable nor precise enough. Alternatively, there is a reference implementation in Python called EELS, which does not provide a formal description for reasoning and state exploration. 

To ensure conformance with the EVM industry standard, the most appropriate choice is to use the reference implementation, Geth, as the ground truth. If our own EVM implementations behave as if it was executed on Geth’s EVM, we can assume that we are EVM compatible. This might be the most practical interpretation of EVM compatibility. 

If we use Geth as a reference for conformity with all 14 revisions, how can we make sure that our implementation aligns with Geth’s implementation of the EVM?

To achieve this, we have developed a new conformance testing framework for EVMs. The conformance testing encompasses a new formal specification for all revisions, a test generator that generates edge cases, and a new state interface for EVMs (that we added to Geth and implemented in SonicVM) that exposes the internal state of a virtual machine for cross-checking whether the execution of a single instruction (aka a state transition) is correct.

The first step of the CT framework is to distill a formal specification from Geth’s implementation. The specification is not a classical specification written on paper. Instead, it is an executable specification expressed as a state transition system. The state of the virtual machine comprises virtual machine elements, such as code, memory, storage, transient storage, the gas and program counter, and the stack. 

The handwritten formal specification of the EVM, with its 14 revisions, is expressed as a set of constraints, allowing test cases to be generated from it via a test generator with a constraint solver. Every time a new revision is introduced, the formal specification must be updated. The actual work for the update is quite often a small delta, but the specification must capture the semantics of all supported revisions.

The constraints of the specification are fed into a test generator that produces corner and typical test cases. A test case is a micro-test that consists of a test state and a result state, along with a transition. By conducting these micro tests, we can thoroughly cover the functionality of a virtual machine in a very short period, with billions of micro tests. The testing is performed at the instruction level, rather than the smart contract level.

For running micro tests, a virtual machine requires a new test interface that permits priming the state of the virtual machine, performing a single transition, and querying the state after the transition. 

With the conformance setup, we can test virtual machines to see whether they comply with our written formal specification of the EVM — even with Ethereum’s Geth virtual machine. If the Geth’s virtual machine behavior diverges from the specification, we refer to it as a Type I error. The specification is incorrect and needs to be amended to include the behavior found by the test case.

The executable specification for EVMs must have two fundamental properties. The first property is that the specification is complete, i.e. for every state of the virtual machine, there must be a subsequent state that is defined. The second property is that the specification is deterministic, i.e. for a given input state, we have only a single output state (and not multiple different output states). 

For example, a specification for a JUMPDEST state transition is expressed as follows:

If the gas counter is greater than or equal to 1, and the current operation that the pc counter is pointing to in the code represents the JUMPDEST operation, then the state s is updated by decrementing the gas by one and incrementing the program counter to the next instruction in the code.

With a large quantity of micro tests generated from the specification, we can assume that the specification aligns with Geth’s EVM with a very high probability. The testing setup for validating the specification is shown below:

The test generator produces a test state. Geth’s EVM is primed with the test state, and a single state transition is executed using the new test interface in Geth. The Result state of Geth is cross-checked with the result state of the formal specification for the given test state. 

If both result states coincide, we can conclude that the micro test was successful, and we can produce another micro test. If the micro test fails, we know that the handwritten specification is erroneous, and we need to amend it so that the micro test passes. After conducting a sufficient number of micro tests (currently in the billions), we cease validation of the formal specification for Geth’s EVM and use it for conformance testing of newly written EVM implementations, such as SonicVM.

To use the test generator and the specification for checking the conformance of SonicVM, we extend the virtual machine with a test adapter. The test adapter permits the injection of a new state, can perform a single transition (or multiple transitions), and permits reading the result state after the test. Hence, this test mode provides extended functionality beyond running the virtual machine for executing a single smart contract. 

If the result state of the SonicVM diverges from the result state of the specification, we have found a type II error. Type II errors must be fixed in the virtual machine’s implementation so that the micro test can pass and we can state that the new virtual machine is EVM compatible.

For example, in an early version of the SonicVM, the address range of a JUMP instruction was not checked correctly. In case that the arithmetic computation for the JUMP destination overflowed, the first instruction was executed assuming it was a JUMPDEST. Though in a normal setup, it still failed because we did not have JUMPDEST as a first instruction in the EVM bytecode. Hence, we could not identify the issue. However, the testcase generator identified the issue by generating as a first instruction a JUMPDEST instruction that showed that the implementation was wrong.

The conformance test framework has been conducive to the development of SonicVM. We have identified several issues in the preliminary versions of SonicVM. With our conformance testing framework, we achieve 100% code coverage of SonicVM, running over 16 billion test cases. The conformance testing framework is highly optimized, enabling the execution of 16 billion tests in approximately 12½ hours.

Summary

Our SonicVM is a high-performance and Geth-compatible Ethereum Virtual Machine. Performance engineering gives us a scalable EVM. By extending the instruction set and combining instructions and data, we created a new instruction format called long format encoding for SonicVM, which is faster than using the EVM’s encoding.

We have numerous other optimizations that make SonicVM 6.5 times faster than Geth’s EVM for the first 11 million blocks of Sonic’s mainnet. We ensure that SonicVM is Geth-compatible with our conformance testing framework. The conformance testing framework consists of a validated specification that is tested with a new test interface. We achieve ultra-high program coverage (i.e. 100%), running over 16 billion test cases with each Conformance Testing execution.