Ethereum Finality Hiccup a ‘Fire Drill’ That ‘Could Have Been Avoided’

Firm conclusions are forthcoming, but already developers point to Ethereum’s decentralization as a key feature preventing a crisis

article-image

Akif CUBUK/Shutterstock modified by Blockworks

share

The Ethereum network suffered a hiccup Thursday; one that developers are still parsing for clues. But as a user, you would hardly have noticed.

At worst, it would have taken slightly longer for a transaction to be included in a block, and there was a somewhat lower certainty that a given action would be considered final — for about 25 minutes.

The blockchain’s developer community saw the incident as a strong test of the network’s resilience. It also served as further impetus to increase diversity of the independent software clients that make Ethereum run.

The extent of Ethereum’s client diversity is unique among major blockchains, and a cause championed by members of the community such as the pseudonymous Superphiz — known as the “Ethereum community health consultant” — who was one of the first to notice the issue.

“Barely a blip happened, but it was so uncharacteristic for the Ethereum network that it was a big deal anyway,” he tweeted Friday.

The same issue resurfaced again Friday, which Superphiz called, “not surprising,” adding, these things tend to happen in waves.

What happened?

The precise cause is not yet known, but the effect was a failure to finalize blocks. It’s likely to have involved a coincidence of “two or three issues,” according to Irina Timchenko, Ethereum blockchain manager at Everstake.

“During the incident, validators experienced issues with Prysm and Teku nodes. Some of these issues were resolved after a restart, while others returned to normal without requiring a restart,” Timchenko told Blockworks.

“Subsequently, validators noticed a spike in CPU usage, which appears to be a second-order effect rather than the root cause of the problem,” she added.

According to a Prysm developer known as Potuz, “several valid, but untimely attestations were broadcast in the network. This typically happens when a node that is struggling to be synced, attests with an old view of the chain.”

Loading Tweet..

A fix is already in the works, which has become more urgent since the problem has recurred.

Loading Tweet..

Timchenko and others say the loss of finality could have been avoided entirely if less than a third of all validators ran the same consensus layer software client. Currently, two clients, Prysm and Lighthouse, are estimated to be operating on greater than a third of Ethereum nodes, according to clientdiversity.org.

Freddy Zwanzger, Ethereum ecosystem lead at Blockdaemon agrees.

“The fact that Ethereum does have multiple client implementations actually seemed to have prevented the blockchain from halting, and if all consensus clients would have less than 33% of the network, the finalization delay would also have been avoided,” Zwanzger told Blockworks.

Since blocks were still being produced, transactions were processed continuously, which is why end users didn’t see much, if any, effect, he added.

Liveness over safety

In designing blockchains with proof-of-stake consensus, developers have to make choices over how to prioritize one of two properties: safety and liveness.

Colloquially, safety is thought of as “avoiding bad things.” That means every node in the network agrees consistently on the state of the chain.

Ethereum prioritizes liveness, which is thought of as “good things happen eventually.” The blockchain is designed to be very difficult to halt.

Loading Tweet..

It could have been worse if Prysm were still installed on upwards of 66% of nodes, which has been the case in the past.

When a consensus layer client causes a loss of finality — which is what happened Thursday — the network averts a fork (a “bad thing” in this case) so long it doesn’t have a supermajority client. But liveness was impacted because Prysm remains above the 33% threshold.

Ethereum’s specific consensus algorithm, called Gasper, saved the day, according to Gaetan Semp, a blockchain and DeFi consultant at Alyra.

“Gasper is truly an amazing consensus algorithm,” he tweeted, noting that even without a supermajority of nodes, the blockchain continued to be validated by LMD-Ghost, a “weaker ‘short-term consensus’ as venture firm Paradigm has called it. LMD is short of “Latest Message Driven” and LMD-Ghost is part of Ethereum’s “fork-choice rule.”

This is a trade-off, however, and one which not all users would approve of, Matt Fiebach of Blockworks Research noted.

“Without the ability for nodes to reach consensus and achieve transaction finality, a user cannot be sure that their transaction will be included in the Ethereum canonical chain,” Fiebach said.

Alternative architectures from Solana and Cosmos favor a chain halt to preserve safety. “A 30-minute loss of finality is unacceptable for end users because it puts transactions at risk of being reverted,” Richard Patel, a software developer for Firedancer, tweeted in response to the Ethereum glitch.

Firedancer, developed by Jump Crypto, will soon become the first independent client for Solana.

Vance Spencer, co-founder at Framework Ventures described the failure to finalize as “a significant event,” but one with “a lot of nuance.”

“Finalization is especially important for entities like exchanges which need to ensure on-chain transactions aren’t reverted after they process them off-chain,” Spencer told Blockworks.

A resilience test

Marius van der Wijden, who develops the Go Ethereum (or Geth) execution layer client for Ethereum, called the incident “a great fire drill” and explained that having client diversity was key to the ability of the network to heal itself.

“The different implementations react slightly [differently] to extreme scenarios which provides a huge boost in reliability,” van der Wijden told Blockworks. “Having different implementations means we can test [them] against each other, which can help us quickly pinpointing issues and determining the root cause of an issue,” he added.

Van der Wijden was confident that proof-of-work consensus algorithms would have performed “much worse” under the circumstances.

Industry observers, such as Fabrice Cheng, co-founder and CEO at Quadrata, sounded an optimistic note about the long-term prospects of the Ethereum network, coming out of this affair.

“Even when over 60% of all nodes went offline, the longest time you had you wait for your transaction to be included was [1 minute and] 24 seconds,” he said.

Friday’s recurrence was more severe, but developers note that block production continues unabated.

Loading Tweet..

Updated May 12, 2023, at 5:11 pm ET: Additional detail and comments.


Start your day with top crypto insights from David Canellis and Katherine Ross. Subscribe to the Empire newsletter.

Explore the growing intersection between crypto, macroeconomics, policy and finance with Ben Strack, Casey Wagner and Felix Jauvin. Subscribe to the On the Margin newsletter.

The Lightspeed newsletter is all things Solana, in your inbox, every day. Subscribe to daily Solana news from Jack Kubinec and Jeff Albus.

Tags

Upcoming Events

Salt Lake City, UT

MON - TUES, OCT. 7 - 8, 2024

Blockworks and Bankless in collaboration with buidlbox are excited to announce the second installment of the Permissionless Hackathon – taking place October 7-8 in Salt Lake City, Utah. We’ve partnered with buidlbox to bring together the brightest minds in crypto for […]

Salt Lake City, UT

WED - FRI, OCTOBER 9 - 11, 2024

Pack your bags, anon — we’re heading west! Join us in the beautiful Salt Lake City for the third installment of Permissionless. Come for the alpha, stay for the fresh air. Permissionless III promises unforgettable panels, killer networking opportunities, and mountains […]

recent research

AERODROME TEMPLATE.png

Research

Aerodrome is a "MetaDEX" that combines elements of various DEX primitives such as Uniswap V2 and V3, Curve, Convex, and Votium. Since its launch on Base, it has become the largest protocol by TVL with more than $495M in value locked, doubling Uniswap's Base deployment.

article-image

Also, former Valkyrie CEO lands new leadership role at Canadian investment firm Cypherpunk Holdings

article-image

This week’s biggest funding round saw Jump Trading, JPMorgan contribute to the round

article-image

Plus, a layer-1 for intellectual property is launching and Farcaster users peaked

article-image

Crypto still hasn’t shaken one of its most garish primordial tails — funny stories about fraud

article-image

Plus, publicly traded crypto companies had a pretty eventful news week

article-image

Committee members directed more questions to Christy Goldsmith Romero, who could soon be leading one of the more troubled federal agencies