I’m implementing a new protocol with a custom blockchain, and I’m trying to figure out what is the correct way to deal with the “slow chain” problem.
In the Bitcoin protocol, nodes constantly gossip the newest block and try to produce a block after it. However, it sometimes happens that concurrent branches emerge, and from time to time nodes have to perform a rollback, 2 or 3 blocks into the past.
As we know, block difficulty depends on the time differences between blocks. A malicious entity can easily forge a chain of arbitrarily many blocks by simply starting from the genesis block and producing blocks with consecutive timestamps differing by more than 10 minutes. This way the difficulty would never increase, and the chain can be produced cheaply.
Of course, as a result of this approach, timestamps in the forged chain would increase much faster than in the real chain. This would cause the forged chain to be discarded by nodes, as legal blocks must have timestamps less than or equal to current timestamp (plus/minus some tolerance).
But the problem is:
Alice controls a bitcoin node, and has 500,000 blocks in storage. Bob advertises that he has 600,000 blocks, but the last common ancestor has number 1000. Bob’s chain is a forged slow chain, but Alice does not know it. Alice wants to download Bob’s blocks and check if these are correct.
QUESTION: How can Alice detect that Bob’s chain is invalid before actually storing it?
Approach 1: A downloads all blocks after the last common ancestor with B, checks if everything is correct, and if all seems to be ok, reverts storage state by undoing block changes till reaching common ancestor, and then applying newly received blocks.
In our case, Alice would start downloading 599,000 blocks and eventually find out that the timestamp is too big. She would need to store the tentative chain in memory (possibly expensive) and check all the conditions. She would waste a lot of time, and every troll can trick her to do so.
Approach 2: A reverts the storage to the a common ancestor advertised by B, then tries to apply the blocks received from B as they come.
In our case, Alice reverts her storage back to block 1000, applies operations received from Bob, eventually discards a block with illegal timestamp, reverts back to common ancestor, and aplies the blocks she had before communicating with Bob. Again, time and resources wasted.
Approach 3: A can only revert a fixed, small number of blocks in the past. This way, B can do minimal damage, and forging a slow chain is very costly.
But in the case of a long-term network partition (split brain), the network parts would diverge and never achieve consensus on which chain is the longest. This situation would require manually removing records from storage.
Approach 4: A stores all the blocks ever observed, (including abandoned branches), and dynamically keeps track of which branch is the longest. This approach, while safe, requires a lot of storage space and is tricky to implement efficiently.
I’m leaning towards the third approach, but I’m wondering how is this being solved in various bitcoin clients.
I’ve seen that the core client stores only the longest chain and skips the orphaned blocks. The details of leveldb-based storage are described here:
What are the keys used in the blockchain levelDB (ie what are the key:value pairs)?
Thank you very much for your help.