Vitalik: Why is the blockchain expansion not as easy as we thought?

How much can we increase the scalability of the blockchain? Is it really possible to “accelerate the block time ten times, increase the block size ten times and reduce the handling fee by one hundred times” as Elon Musk said, without causing extreme centralization and violating the essential properties of the blockchain? If the answer is no, how far can we reach? What happens if you change the formula algorithm? More importantly, what will happen if a function similar to ZK-SNARK or sharding is introduced? A sharded blockchain can theoretically keep adding shards, so is it really possible to do so?

It turns out that regardless of whether sharding is used or not, there are important and very subtle technical factors that limit the scalability of the blockchain. Many situations have solutions, but even with solutions, there are limitations. This article will explore many of these issues.

If you simply raise the parameter, the problem seems to be solved. But what price will we pay for this?

The ability for ordinary users to run nodes is essential to the decentralization of the blockchain

Imagine that at around two in the morning, you received an emergency call from someone on the other side of the world who helped you run a mining pool (staking pool). Starting about 14 minutes ago, your pool and several other people have been separated from the chain, and the network still maintains 79% of the computing power. According to your node, most chain blocks are invalid. At this time there was a balance error: the block seemed to have allocated 4.5 million additional tokens to an unknown address by mistake.

One hour later, you were in a chat room with two other small mining pool participants, some block explorers and exchange parties who also had the same accident, and saw someone posting a link to Twitter with the words “Announce New on-chain sustainable agreement development fund”.

In the morning, relevant discussions were widely spread on Twitter and a community forum that does not censor content. But at that time, a large part of the 4.5 million tokens had been converted into other assets on the chain, and billions of dollars worth of defi transactions had taken place. 79% of consensus nodes, as well as the endpoints of all major blockchain browsers and light wallets, follow this new chain. Maybe the new developer fund will fund certain developments, or maybe all of these will be swallowed up by leading mining pools, exchanges and their crony. But no matter what the outcome is, the fund has actually become a fait accompli, and ordinary users cannot resist.

Maybe there is such a theme movie. It may be funded by MolochDAO or other organizations.

Will this happen in your blockchain? The elites of your blockchain community, including mining pools, block explorers, and custodial nodes, may coordinate well. They are likely to be in the same telegram channel and WeChat group. If they really want to suddenly modify the rules of the agreement out of profit, then they may have this ability. The Ethereum blockchain completely solved the consensus failure within ten hours. If it is a blockchain implemented by only one client and only needs to deploy code changes to dozens of nodes, then the client code can be coordinated faster change. The only reliable way to resist this social collaborative attack is “passive defense”, and this power comes from a decentralized group: users.

Imagine if a user runs a verification node of the blockchain (whether it is direct verification or other indirect technology) and automatically rejects blocks that violate the rules of the agreement, even if more than 90% of miners or pledgers support these blocks, what will the story be like? development of.

If every user runs a validating node, the attack will soon fail: some mining pools and exchanges will fork and look stupid throughout. But even if there are only some users running the verification node, the attacker will not be able to win a big victory. On the contrary, the attack will cause chaos, and different users will see different versions of the blockchain. In the worst case, the ensuing market panic and possible continued chain forks will greatly reduce the profit of the attacker. The idea of ​​responding to such a protracted conflict can deter most attacks by itself.

Hasu’s view on this point:

“We have to be clear about one thing. The reason why we can resist malicious protocol changes is because of the culture of user verification blockchain, not because of PoW or PoS.”

Suppose your community has 37 node operators and 80,000 passive listeners who check signatures and block headers, then the attacker wins. If everyone runs the node, the attacker will fail. We don’t know what the exact threshold of herd immunity for coordinated attacks is, but one thing is absolutely clear: the more good nodes, the fewer malicious nodes, and the number we need is definitely more than a few hundred or thousands. A.

So what is the upper limit of full node work?

In order to enable as many users as possible to run full nodes, we will focus on ordinary consumer-grade hardware. Even if special hardware can be easily purchased, which can lower the threshold of some full nodes, in fact, the scalability is not as good as we imagined.

The ability of a full node to process a large number of transactions is mainly limited by three aspects:

Computing power: On the premise of ensuring safety, how many CPUs can we allocate to run nodes?

Bandwidth: Based on the current network connection, how many bytes can a block contain?

Storage: How much space can we ask users to use for storage? In addition, what should its reading speed be? (Ie, is HDD enough? Or do we need SSD?)

Many misconceptions about the use of “simple” technology to substantially expand the blockchain’s capacity stem from overly optimistic estimates of these numbers. We can discuss these three factors in turn:

Computing power

Wrong answer: 100% of the CPU should be used for block verification

Correct answer: about 5-10% of the CPU can be used for block verification

The four main reasons why the limit is so low are as follows:

We need a security boundary to cover the possibility of DoS attacks (transactions created by attackers using code weaknesses require longer processing time than regular transactions)

Nodes need to be able to synchronize with the blockchain after being offline. If I go offline for a minute, then I should be able to synchronize in a few seconds

The running node should not drain the battery quickly, nor should it slow down the running speed of other applications

Nodes also have other non-block production work to be done, most of which are verification and responding to transactions and requests entered in the p2p network

Please note that until recently, most of the explanations for “why only need 5-10%?” this point focused on a different question: because PoW block generation time is uncertain, it takes a long time to verify the block, which will increase the number of simultaneous creations. The risk of multiple blocks. There are many ways to fix this problem, such as Bitcoin NG, or using PoS proof of stake. But these did not solve the other four problems, so they did not make great progress in scalability as many people expected.

Parallelism is not a panacea. Usually, even clients that seem to be single-threaded blockchains have been parallelized: signatures can be verified by one thread, while execution is done by other threads, and there is a separate thread that handles the transaction pool logic in the background. And the closer the utilization rate of all threads is to 100%, the more energy consumption to run the node, and the lower the safety factor against DoS.


Wrong answer: If 10 MB blocks are generated within 2-3 seconds, then most users’ networks are larger than 10 MB/sec. Of course, they can all process these blocks.

Correct answer: Maybe we can process 1-5 MB blocks every 12 seconds, but it is still difficult

Nowadays, we often hear widely spread statistics about how much bandwidth an Internet connection can provide: numbers of 100 Mbps or even 1 Gbps are common. However, due to the following reasons, there is a big difference between the declared bandwidth and the expected actual bandwidth:

“Mbps” means “millions of bits per second”; a bit is 1/8 of a byte, so we need to divide the declared number of bits by 8 to get the number of bytes.

Network operators, like other companies, often fabricate lies.

There are always multiple applications using the same network connection, so nodes cannot monopolize the entire bandwidth.

P2P networks inevitably introduce overhead: nodes usually end up downloading and re-uploading the same block multiple times (not to mention that transactions are broadcast through mempool before being packaged into the block).

When Starkware conducted an experiment in 2019, they released 500 kB blocks for the first time after the cost of transaction data was reduced, and some nodes could not actually handle blocks of this size. The ability to handle large blocks has been and will continue to be improved. But no matter what we do, we still can’t get the average bandwidth in MB/sec. Convince ourselves that we can accept a 1 second delay and have the ability to handle blocks of that size.


Wrong answer: 10 TB

Correct answer: 512 GB

As you might guess, the main argument here is the same as elsewhere: the difference between theory and practice. In theory, we can buy an 8 TB solid-state drive on Amazon (SSD or NVME is really needed; HDD is too slow for blockchain state storage). In fact, the laptop I used to write this blog post has 512 GB. If you let people buy hardware, many people will become lazy (or they can’t afford the $800 8 TB SSD) and use centralized services. Even if the blockchain can be installed on a storage device, a lot of activity can quickly exhaust the disk and force you to buy a new one.

A group of blockchain protocol researchers surveyed everyone’s disk space. I know the sample size is small, but still…

In addition, the storage size determines the time required for a new node to go online and start participating in the network. Any data that the existing node must store is the data that the new node must download. This initial synchronization time (and bandwidth) is also the main obstacle for users to be able to run the node. At the time of writing this blog post, it took me about 15 hours to synchronize a new geth node. If the usage of Ethereum increases by 10 times, it will take at least a week to synchronize a new geth node, and it is more likely that the node’s Internet connection will be restricted. This is even more important during an attack. A successful response to an attack when the user has not previously run the node requires the user to enable the new node.

Interaction effect

In addition, there are interaction effects between these three types of costs. Since the database uses a tree structure internally to store and retrieve data, the cost of obtaining data from the database increases with the logarithm of the database size. In fact, because the top level (or the first few levels) can be cached in RAM, the disk access cost is proportional to the size of the database, which is a multiple of the size of the data cached in RAM.

Different databases work in different ways, usually the memory part is just a single (but very large) layer (see the LSM tree used in leveldb). But the basic principle is the same.

For example, if the cache is 4 GB, and we assume that each layer of the database is 4 times larger than the previous layer, then the current ~64 GB state of Ethereum will require ~2 visits. But if the state size is increased by 4 times to ~256 GB, then this will increase to ~3 visits. Therefore, a 4 times increase in the upper gas limit can actually be converted into a block verification time increase of about 6 times. This effect may be even greater: the hard disk takes longer to read and write when it is full than when it is idle.

What does this mean for Ethereum?

Now in the Ethereum blockchain, running a node is already a challenge for many users, although it is still possible at least with regular hardware (I just synced a node on my laptop when I wrote this article !). Therefore, we are about to encounter a bottleneck. The biggest concern for core developers is storage size. Therefore, the current huge efforts in solving computing and data bottlenecks, and even changes to the consensus algorithm, are unlikely to bring about a substantial increase in the gas limit. Even if the biggest DoS weakness of Ethereum is solved, the gas limit can only be increased by 20%.

The only solution to the problem of storage size is statelessness and state overdue. Statelessness enables a group of nodes to be verified without maintaining permanent storage. The overdue status will deactivate the status that has not been visited recently, and the user needs to manually provide proof to update. These two paths have been studied for a long time, and a stateless proof-of-concept implementation has already begun. The combination of these two improvements can greatly alleviate these concerns and open up space for a significant increase in gas limit. But even after implementing statelessness and state overdue, the gas limit may only be safely increased by about 3 times until other restrictions begin to take effect.

Another possible medium-term solution is to use ZK-SNARKs to verify transactions. ZK-SNARKs can ensure that ordinary users do not need to personally store the state or verify the block, even if they still need to download all the data in the block to resist data unavailability attacks. In addition, even if an attacker cannot forcibly submit invalid blocks, if the difficulty of running a consensus node is too high, there is still a risk of coordinated censorship attacks. Therefore, ZK-SNARKs cannot increase the node capacity indefinitely, but it can still greatly improve it (perhaps 1-2 orders of magnitude). Some blockchains explore this form on layer1, while Ethereum benefits from layer2 protocol (also called ZK rollups), such as zksync, Loopring and Starknet.

What happens after sharding?

Sharding fundamentally solves the above limitations because it decouples the data contained on the blockchain from the data that a single node needs to process and store. The node verifies the block not by downloading and executing it personally, but using advanced mathematics and cryptography to indirectly verify the block.

Therefore, a sharded blockchain can safely have a very high level of throughput that a non-sharded blockchain cannot achieve. This does require a lot of cryptographic techniques to effectively replace simple complete verification to reject invalid blocks, but it can be done: the theory has a foundation, and a proof of concept based on the draft specification is already in progress.

Ethereum plans to adopt quadratic sharding, where the total scalability is limited by the fact that nodes must be able to process a single shard and beacon chain at the same time, and the beacon chain must perform some fixation for each shard Management work. If the shard is too large, the node can no longer process a single shard, and if there are too many shards, the node can no longer process the beacon chain. The product of these two constraints constitutes the upper limit.

It is conceivable that through cubic sharding or even exponential sharding, we can go further. In such a design, data availability sampling will definitely become more complicated, but it is achievable. But Ethereum has not surpassed the quadratic, because the additional scalability benefits obtained from transaction sharding to transaction sharding cannot actually be achieved under the premise that other risk levels are acceptable.

So what are these risks?

Minimum number of users

It is conceivable that as long as one user is willing to participate, the non-sharded blockchain can run. But this is not the case with sharded blockchains: a single node cannot handle the entire chain, so enough nodes are needed to process the blockchain together. If each node can handle 50 TPS and the chain can handle 10,000 TPS, then the chain needs at least 200 nodes to survive. If the chain has less than 200 nodes at any time, it may happen that the nodes can no longer keep in sync, or the nodes stop detecting invalid blocks, or many other bad things may happen, depending on the settings of the node software.

In practice, due to the need for redundancy (including data availability sampling), the minimum number of safety is several times higher than the simple “chain TPS divided by node TPS”. For the above example, we set it to 1000 nodes.

If the capacity of the sharded blockchain increases by 10 times, the minimum number of users also increases by 10 times. Now everyone may ask: Why don’t we start with a lower capacity and increase it when there are a lot of users, because this is our actual need. If the number of users drops, then reduce the capacity?

There are a few questions here:

The blockchain itself cannot reliably detect how many unique users there are, so some governance is needed to detect and set the number of shards. Governance of capacity constraints can easily become a source of division and conflict.

What if many users accidentally drop online at the same time?

Increasing the minimum number of users required to initiate a fork makes it more difficult to defend against malicious controls.

The minimum number of users is 1,000, which is almost no problem. On the other hand, the minimum number of users is set to 1 million, which is definitely not enough. Even if the minimum number of users is 10,000, it can be said that it is starting to become risky. Therefore, it seems difficult to justify a sharded blockchain with more than a few hundred shards.

Historical retrievability

The important attribute of the blockchain that users really cherish is permanence. When the company goes bankrupt or the maintenance of the ecosystem no longer generates benefits, the digital assets stored on the server will no longer exist within 10 years. The NFT on Ethereum is permanent.

Yes, people will still be able to download and view your dongle by 2372.

But once the capacity of the blockchain is too high, it will become more difficult to store all this data, until at some point there is a huge risk, some historical data will eventually…no one will store it.

It is easy to quantify this risk. Take the data capacity of the blockchain (MB/sec) as the unit and multiply it by ~30 to get the amount of data stored per year (TB). The current sharding plan has a data capacity of approximately 1.3 MB/sec, so it is approximately 40 TB/year. If it is increased by 10 times, it is 400 TB/year. If we not only want to have access to the data, but also in a convenient way, we also need metadata (such as decompressing aggregate transactions), so that it reaches 4 PB per year, or 40 PB in ten years. The Internet Archive uses 50 PB. So this can be said to be the upper limit of the safe size of the sharded blockchain.

Therefore, it seems that in these two dimensions, the Ethereum sharding design is actually very close to the reasonable maximum safety value. The constant can be increased a little, but not too much.

Concluding remarks

There are two ways to try to expand the blockchain: basic technical improvements and simple parameter enhancements. First of all, boosting parameters sounds very attractive: if you are doing mathematical operations on a table paper, it is easy to convince yourself that a consumer-grade laptop can handle thousands of transactions per second, without the need for ZK-SNARK, rollups or Fragmentation. Unfortunately, there are many subtle reasons why this method is fundamentally flawed.

Computers running blockchain nodes cannot use 100% of the CPU to verify the blockchain; they need a large margin of security to resist accidental DoS attacks, and they need spare capacity to perform tasks such as processing transactions in a memory pool , And the user does not want to be unable to use it for any other applications at the same time when the node is running on the computer. Bandwidth is also limited: a 10 MB/s connection does not mean that 10 MB blocks can be processed per second! Maybe 1-5 MB blocks can be processed every 12 seconds. The same is true for storage. Increasing the hardware requirements for running nodes and restricting dedicated node operators is not a solution. For a decentralized blockchain, it is very important for ordinary users to be able to run nodes and form a culture that running nodes is a common behavior.

However, basic technical improvements are feasible. At present, the main bottleneck of Ethereum is storage size, and statelessness and state overdue can solve this problem and enable it to grow up to about 3 times, but not more, because we want to run nodes easier than currently. Blockchains that use sharding can be expanded further because a single node in a sharded blockchain does not need to process every transaction. But even with sharded blockchains, capacity has limitations: as the capacity increases, the minimum number of safe users increases, and the cost of archiving the blockchain (and if no one archives the chain, there is a risk of data loss) will rise. But we don’t have to worry too much: these limits are sufficient for us to process more than one million transactions per second while ensuring the complete security of the blockchain. But to achieve this without compromising the most valuable decentralization feature of the blockchain, we need to make more efforts.

Author/ Translator: Jamie Kim
Bio: Jamie Kim is a technology journalist. Raised in Hong Kong and always vocal at heart. She aims to share her expertise with the readers at Kim is a Bitcoin maximalist who believes with unwavering conviction that Bitcoin is the only cryptocurrency – in fact, currency – worth caring about.