One of the biggest challenges of the Ethereum ecosystem is how to achieve low latency and high throughput under the premise of limited resources (for example, CPU, bandwidth, memory, disk space).

The degree of decentralization of the system depends on the ability of the weakest node in the network to verify the rules of the system. High-performance protocols that can run on low-resource hardware are called “scalable.”

In this article, we will delve into the principles of modern “Layer 2 solutions”, the security models of these solutions, and the strategies adopted to solve the scalability problem of Ethereum.

The intended readers of this article are those who are interested in cryptography. If you want to learn more about Ethereum’s cutting-edge scalability technologies and how to design and build such systems, don’t miss this article.

In this article, important keywords and concepts have been bolded, because these are terms you often encounter when you understand cryptocurrency technology. The concepts involved in this article are more complex. If you encounter confusion in reading, please don’t give up, keep the clouds open and see the moonlight.

Blockchain resource requirements

In decentralized networks such as Bitcoin and Ethereum, there are three main resource requirements for running nodes [1]:

  • Bandwidth : The cost of downloading and broadcasting blockchain-related data.
  • Calculation : The cost of running calculations in scripts or smart contracts.
  • Storage : The cost of storing transaction data for indexing purposes, and the cost of storing “state” in order to continue processing new transaction blocks [2].

Factors affecting performance are:

  • Throughput : the number of transactions that the system can process per second
  • Latency : the time required to process a transaction

The ideal feature of emerging cryptocurrency networks such as Bitcoin and Ethereum is decentralization. So the question is, how does the network achieve decentralization?

  • Low trust : With this feature, anyone can independently verify that the total supply of bitcoin will never exceed 21 million, and that the bitcoins they hold are not forged. The person running the node software can independently calculate the latest state and verify that the block generation process follows all rules.
  • Low cost : If the running cost of the node software is high, people will choose to rely on a trusted third party to verify the status. High cost means high trust requirements, which is what we are trying to avoid.

Another desirable feature is scalability: throughput (and latency) can grow (decrease) super-linearly as the cost of running nodes increases . This definition is great, but it does not specify the inertia of “trust”. Therefore, we additionally defined “decentralized scalability”: to achieve scalability without increasing the assumption of system trust . The runtime environment of Ethereum is EVM (Ethereum Virtual Machine). In EVM, transactions need to bear different costs when performing different operations (for example, the cost of storage operations is greater than that of adding operations). The calculation unit of the transaction is called “gas”. In the Ethereum system, the upper limit of gas per block is set to 12.5 million gas. On average, a block can be mined every 12.5 seconds. It can be seen that the latency of Ethereum is 12.5 seconds and the throughput is 1 million gas per second . You may ask a question: What can 1 million gas per second do?

  • About 47 “simple transfer” transactions can be completed per second. The so-called “simple transfer” transaction refers to the most basic transaction such as “A transfers a sum of ETH to B”. Each transaction requires 21,000 gas.
  • Approximately 16 ERC20 token transfers can be completed per second. This type of transaction requires more storage operations than ETH transfer transactions, so each transaction requires about 60,000 gas.
  • Approximately 10 Uniswap asset exchange operations can be completed per second. The average cost of a token to transaction is about 102,000 gas.
  • ……Select the transaction you are interested in, and divide 1 million gas by its gas consumption (12.5 million/12.5/gas).

Please note that as the execution complexity of the transaction increases, the throughput of the system drops sharply. There is still a lot of room for improvement!

Option 1: Use an intermediary party. We can use a third party that everyone trusts to complete all transactions. In this way, we can get very high throughput and reduce latency to sub-second levels. a! This will not change any system parameters, but we need to participate in a trust model unilaterally set by a third party. Third parties may review us and even take away our assets, which is not good.

Solution 2: Expand block capacity and increase block generation frequency. We can reduce the delay by reducing the block generation time, and we can also increase the throughput by increasing the block gas upper limit. This change may lead to higher node operating costs and hinder people from running nodes (like EOS, Solana, and Ripple). Option 1 will increase the need for trust, and option 2 will increase costs. Therefore, neither of them is an ideal scalability solution.

Re-understanding Optimistic Rollup

The following will involve some knowledge about hash functions and Merkel trees. After knowing so much, let’s simulate a Socratic conversation to see if we can find a protocol that can increase the actual throughput of Ethereum without increasing the burden on users and node operators.

Q : We want to improve the scalability of Ethereum, but we do not want to change its trust and cost assumptions. what should we do?

Answer : You can try to reduce the cost of existing operations (see the above three types of operations). However, it’s easier said than done. Let’s first look at the architecture of Ethereum: each node in the Ethereum network currently stores and executes every transaction from the user. The transaction is executed in the EVM and interacts with the state of the EVM (for example, contract storage items, balances, etc.) (this cost is high). Common smart contract optimization technologies mainly focus on minimizing the number of interactions with the state, but they play a limited role.

Question : Is there a way to achieve interaction without interacting with the state, thereby reducing resource costs?

Answer : In extreme cases, can we move all executions off-chain and save data on-chain? We can introduce a third party, that is, the sequencer (sequencer) , to do this. The sorter is responsible for storing and executing transactions submitted by users locally. In order to maintain the liveness of the system, sequencers regularly submit the Merkel root and state root of the transactions they receive to Ethereum. This idea is correct, because O(N) off-chain transactions only need to store O(1) state data on Ethereum.

Question : By using the sorter to perform off-chain calculations and only publishing Merkel root on the chain, we can enhance the scalability of Ethereum, right?

Answer : Yes.

Q : In other words, as long as we choose the sorter, the transfer cost can be greatly reduced. How do users deposit money?

Answer : You can join such a system by depositing money into a certain contract on the Ethereum blockchain. The sorter will record the corresponding deposit in your name. The user only needs to send a transaction saying “I want to withdraw 3 ETH, my current account balance is greater than 3 ETH, this is proof”, and then the funds can be withdrawn. Even if there is no actual status of the user on L1, the user can provide Merkel proof and reference the state root issued by the sequencer to prove that he has enough funds in the current state.

Q : We have determined that users need to provide Merkel certification to withdraw funds. How can users obtain the data needed to construct Merkel proofs?

Answer : The user can ask the sorter to provide data!

Q : What should I do if I can’t contact the sorter?

Answer : This situation may be caused by the sorter doing evil or dropping the connection due to technical problems, which will cause performance degradation (or even theft). Therefore, we must require the sequencer to submit the complete transaction data to the chain, only for storage, not for execution . The purpose here is to achieve data availability . Since all data is permanently stored on Ethereum, even if a sorter falls down, the new sorter can retrieve all the data related to Layer 2 from Ethereum, rebuild the latest L2 state, and replace the previous one The work of the sorter.

Q : If the sorter is online but refuses to provide me with Merkel proof data, I can download the data from Ethereum, right?

Answer : Yes, you can synchronize an Ethereum node by yourself, or you can choose one from many node hosting service providers.

Question : I still don’t understand…how to store data on Ethereum without executing it? Isn’t every transaction executed by EVM?

Answer : Suppose you submit 10 transactions from A to B to transfer ETH. Performing each transaction requires performing the following operations: increase A’s nonce, decrease A’s balance, and increase B’s balance. This requires multiple writes and reads of the world state. You can send the codes of all transactions to the publish(bytes _transactions) public {} function of the smart contract. Please note that the body of this function is empty! In other words, the transaction data so released will not be interpreted, executed or accessed. It is only stored in the historical log of the blockchain (the cost of writing to the log is very low).

Q : Can we trust the sorter? What if the sorter issues an illegal state transition?

Answer : Whenever the sequencer releases a batch of state transitions, there will be a “dispute period”. During the ” dispute period “, anyone can issue a ” fraud certificate ” to prove that one of the state transitions is invalid. The fraud proof is to replay the transaction that causes the state transition on the chain, and compare the obtained state root with the state root issued by the sequencer. If the two state roots are different, the fraud proof is successful and the state transition is cancelled. The state transition following the invalid state transition will also be cancelled. Once the dispute period has passed, the matter can no longer be disputed, that is, the matter is finalized .

Q : Wait! You have stated clearly before that as long as (a) increases costs, or (b) introduces new trust assumptions, it is an infeasible scalability solution. Don’t the scheme you mentioned here assume that someone will report fraud at all times?

Answer : Yes. We assume that a group of entities called ” validators ” are responsible for monitoring fraud. If there is a status mismatch between Layer 1 and Layer 2, the verifier will issue a fraud proof. We also assume that the verifier can post its fraud proof to the Ethereum blockchain during the dispute period. We believe that the existence of validators is a weak assumption. Imagine that if there are tens of thousands of users adopting this scheme, you only need one person to act as a validator. Doesn’t it sound so impractical! In addition, changing the trust model of Ethereum or increasing the cost of running Ethereum nodes is a strong assumption that will make changes we don’t want. This is what we mean by “hardly changing the assumptions of the underlying system” in our definition of centralized scalability.

Q : I agree that someone will take the role of validator, because this new solution involves many people’s interests. However, how to do it depends on the actual cost. So, how many resources does it take to run the validator and sorter?

Answer : The sorter and validator must run an Ethereum full node (not an archive node) and an L2 full node to generate the L2 state. Verifiers run software that creates fraud proofs, and sequencers run software that packages and publishes user transactions.

Q : Is that all?

Answer : Yes! Congratulations! You have rediscovered Optimistic Rollup, the most promising scalability solution from 2019 to 2021. I’m not talking big, this is the result of years of research in the Ethereum community. That is what you heard in this brief conversation.

Optimistic Rollup incentive mechanism

The Layer 2 scalability scheme is based on the fact that we try to minimize the number of transactions executed on the chain. We use fraud proofs to cancel invalid state transitions that have occurred. Since fraud proof is an on-chain transaction, we want to minimize the number of fraud proofs issued on Ethereum. In an ideal situation, fraud will never happen, and there will be no fraud certificates issued.

We counter-incentivize fraud by introducing fidelity bond. Users who want to become a sorter must first deposit a deposit on Ethereum. If their fraud is proven, they will lose the deposit. In order to motivate users to actively detect fraud, the sorter’s deposit will be rewarded to the verifier.

Integrity deposit and dispute period

In the design of the fraud proof incentive mechanism, there are two parameters that need to be designed:

  • Integrity deposit amount: The deposit paid by the sorter is the reward that the verifier can get. The higher the amount, the greater the incentive for the validator and the smaller the incentive for the sorter to do evil.
  • Duration of the dispute period: The time window during which fraud proofs can be issued. Once this time window has passed, the L2 transaction will be finalized on L1. If the dispute period is longer, it will improve security and better resist censorship attacks. If the dispute period is short, the user can enjoy a better user experience when withdrawing the funds on L2 to L1, because he can use the funds on L1 without waiting for a long time.

In my opinion, neither of these two parameters have correct static values. Perhaps a deposit of 10 ETH and a dispute period of 1 day are sufficient. The real answer is that it depends on the incentive to become a verifier (the cost of running the verifier’s software) and the difficulty of issuing fraud proofs (the level of congestion on L1). These can be adjusted manually or automatically. For example, EIP 1559 introduced the BASEFEE opcode on Ethereum. This opcode can be used to predict the degree of congestion on the chain and make the length of the dispute period programmable. It is important to implement this penalty mechanism correctly, otherwise it may be used in actual operations. For example, here is an infeasible native implementation:

  1. Alice paid a deposit of 1 ETH and became the sorter
  2. Alice posted a fake status update
  3. Bob found out and disputed it. If successful, Alice’s 1 ETH deposit will be rewarded to Bob, and the false status update will be cancelled.
  4. Alice discovers the dispute and also raises it (challenge herself!)
  5. Alice got her 1 ETH back and managed to evade the punishment she would have received for doing evil.

Alice can launch this attack by “running the transaction”, that is, broadcasting a transaction that is the same as Bob, but paying a higher gas price to let her transaction be executed before Bob. This means that Alice can always do evil at a very low cost. The solution to this problem is simple: don’t reward all the perpetrator’s deposit to the disputer, but destroy X% of it . In the above example, if we destroy 50% of the deposit, Alice can only get back 0.5 ETH by rushing the transaction, which is enough to deter Alice from doing evil in the second step above. Of course, the deposit destruction mechanism will also weaken the incentives for people to run the validator software (because the rewards for successfully raising a dispute will be reduced), so the remaining part after the destruction must be enough to motivate the validator.

Response to Criticism of Optimistic Rollup

We have learned about the design of Optimistic Rollup. Now, let’s listen to people’s criticism of Optimistic Rollup and respond.

Long withdrawal/dispute period is not conducive to adoption and composability

As we mentioned above, a longer dispute period helps to improve security. There seems to be an inherent trade-off relationship: a longer dispute period is not conducive to OR adoption, because any user who wants to withdraw funds from OR will have to wait a long time (such as 7 days). A shorter dispute period will bring a better user experience, but it will increase the risk of failing to promptly dispute fraud. We don’t think this is a problem. Due to the long delay in withdrawals, we expect that market makers will provide fast withdrawal services. This is possible because the person who verifies the L2 status can correctly determine whether the withdrawal is fraudulent, and can buy the withdrawal at a slightly lower price. For example: Participants:

  • Alice: There are 5 ETH on L2
  • Bob: There are 4.95 ETH in the “market maker” smart contract on L1, which is the validator of L2

step:

  1. Alice tells Bob that she wants to withdraw quickly and pay him 0.05 ETH
  2. Alice initiates a withdrawal transaction to Bob’s “market maker” smart contract
  3. Two things can happen:

(1) Bob verifies that the withdrawal is valid on L2, and agrees to Alice’s quick withdrawal request. 4.95 ETH in the market maker contract is immediately sent to Alice’s address in L1. When the dispute period is over, Bob will be able to get 5 ETH and get a lot of profit. (2) Bob found that the withdrawal was invalid during verification. Bob disputes the state transition of this transaction, cancels the state transition, and gets the deposit of the sorter who tries to do evil as a reward. If Alice is honest, she can complete the withdrawal immediately; if she wants to do evil, she will be punished. We expect that if there is a real demand for fast transaction services, the service fee paid to market makers will gradually decrease, ultimately making users completely unable to experience this process. The most important impact of the fast transaction service is that it can achieve composability with L1 contracts without waiting for the end of the entire dispute period. Note: This technique first appeared in the article “Simple Fast Transaction”.

The mining union accepts bribes to review the withdrawal affairs, undermining the security of OR

The “Almost Zero Cost Attack Scenario on Optimistic Rollup” pointed out that sorters can easily bribe Ethereum miners to review disputes. For Optimistic Rollup system, this will be fatal, because the safety of the entire system comes from the dispute mechanism. We do not agree with this view. We believe that the honest party will give the same or even more money to bribe the miners as the malicious party. In addition, the miners incur additional costs each time they help the evildoer. This will affect the value of Ethereum, thereby affecting the interests of miners themselves. In fact, academic literature has studied this scenario, and it turns out that “this kind of counterattack threat will produce a subgame to refine the Nash equilibrium, so the attack will not happen from the beginning.” Thank you Hasu for recommending this article to us.

The verifier’s dilemma will bring reverse incentives and reduce the security of OR

Regarding the verifier’s dilemma, Ed Felten has analyzed it in his essay and proposed a solution. We summarize as follows:

  1. If the system’s incentive mechanism works as expected, no one will do evil
  2. If no one is doing evil, there is no point in running the verifier software, because it will not bring benefits
  3. If no one runs the verifier software, the sorter will have a chance to do evil
  4. If the sorter does evil, the system will no longer function as expected

This question seems important, and it seems to be a paradox! Assuming that the total reward amount is fixed, the larger the number of validators, the lower the expected return of each validator. In addition, if the number of verifiers increases, the total reward amount may be reduced, because fraud will be reduced, resulting in lower profits for verifiers. In the following analysis, Felten proposed how to solve the verifier’s dilemma. I want to raise an objection. I think the verifier’s dilemma is not as serious as the critics say. In fact, validators are not relying on financial incentives. Suppose you build a large application on rollup, or you are a coin holder. If this system is compromised, your application will not work and your tokens will lose value. In addition, the need for quick withdrawals will give rise to market maker services (as we mentioned above), which has nothing to do with fraud. To give a realistic example, Bitcoin does not use economic incentives to encourage node operators to store a complete blockchain transaction history or provide local data to peer nodes, but people still do these things selflessly. Even if it is not incentive compatible to not provide economic incentives for running validators, it can guarantee the security of the system, which is very important for the entities investing in the system. Therefore, we believe that the Optimistic Layer2 system does not require a design mechanism to solve the verifier’s dilemma.

to sum up

As the article title says, we analyzed one of the most important technologies for Ethereum in 2021: Optimistic Rollup. The advantages of OR include: it is an extension of Ethereum, integrating the security and composability of Ethereum, as well as developer advantages. At the same time, it can also improve the performance of Ethereum without increasing the cost or trust requirements of Ethereum users. We explored the incentives that make OR feasible, and countered common criticisms. What we want to emphasize is that the upper limit of OR performance is the amount of data that can be carried on L1. Therefore, we’d better do two things: 1) compress the data you publish on L1 as much as possible (for example, through BLS signature aggregation), and 2) have a large and low-cost data layer (for example, ETH 2.0). For additional reading, we recommend Vitalik’s “Incomplete Guide to Rollup” () and “The Model of Trust” (). We also recommend that you learn about another rollup program ZK Rollup. Our friend StarkWare is building the ZK Rollup solution. Finally, there are other ways to achieve decentralized scalability, such as sharding and state channels. They all have their own advantages and disadvantages. In the next article, we will deeply analyze the mechanism and code base of the first EVM-compatible OR scheme built by Optimism. We would like to thank Hasu, Patrick McCorry, Liam Horne, Ben Jones, Kobi Gurkan, and Dave White for their valuable feedback on this article.

Author/ Translator: Tae Kon Jung