OpenAI’s GPT-4 Shows Limited Success in ID’ing Smart Contract Vulnerabilities

The weaknesses of large language models like ChatGPT are “too great to use reliably for security,” OpenZeppelin’s machine learning lead says

article-image

Pavel Ignatov/Shutterstock modified by Blockworks

share

As artificial intelligence gains traction, executives at blockchain security firm OpenZeppelin said a recent company experiment proves the continued need for a human auditor.   

An OpenZeppelin study tested whether GPT-4 — OpenAI’s latest multimodal model designed to generate text and have human-like conversations — could identify various smart contract vulnerabilities within 28 Ethernaut challenges. 

GPT-4 has already been able to solve coding challenges on Leetcode, a platform for software engineers preparing for coding interviews, according to Mariko Wakabayashi, machine learning lead at OpenZeppelin. 

“We wanted to assess whether GPT4’s strong results in traditional code and academic exams map equally to smart contract code, and if yes, if it can be used to detect and propose fixes for vulnerabilities,” Wakabayashi told Blockworks. 

GPT-4 was able to solve 19 of the 23 Ethernaut challenges introduced before its training data cutoff date of September 2021. It then failed four of the final five tasks.

The AI tool “generally lacks knowledge” of events that happened after September 2021, and “does not learn from its experience,” OpenAI states on its website.

An OpenAI spokesperson did not immediately return a request for comment. 

Though the security researcher running the experiment was initially surprised to see how many challenges GPT-4 seemed to solve, Wakabayashi noted, it became clear there wasn’t “reliable reasoning” behind the model’s outputs.

“In some cases, the model was able to identify a vulnerability correctly but failed to explain the correct attack vector or propose a solution,” the executive added. “It also leaned on false information in its explanation and even made up vulnerabilities that don’t exist.”

For the problems that the AI tool did solve, a security expert had to offer additional prompts to guide it to correct solutions.

Extensive security knowledge is necessary to assess whether the answer provided by AI is “accurate or nonsensical,” Wakabayashi and Security Services Manager Felix Wegener added in written findings.

On level 24 of the Ethernaut challenges, for example, GPT-4 falsely claimed it was not possible for an attacker to become the owner of the wallet.

“While advancements in AI may cause shifts in developer jobs and inspire the rapid innovation of useful tooling to improve efficiency, it is unlikely to replace human auditors in the near future,” Wakabayashi and Wegener wrote.

OpenZeppelin’s test comes after crypto derivatives platform Bitget decided earlier this month to limit the company’s use of AI tools, such as ChatGPT.

The company told Blockworks that an internal survey found that in 80% of cases, crypto traders had a negative experience using the AI chatbot, citing false investment advice and other misinformation. 

Other crypto companies are more bullish on the technology, including Crypto.com, which launched an AI companion tool called Amy.

Abhi Bisarya, Crypto.com’s global head of product, told Blockworks in an interview that AI initiatives will be “game-changing” for the industry. 

Loading Tweet..

Though large language models (LLMs) like ChatGPT have strengths, Wakabayashi told Blockworks, its weaknesses are too great to use reliably for security.

“However, it can be a great tool for creative and more open-ended tasks, so we’re encouraging everyone at OpenZeppelin to experiment and find new use cases,” Wakabayashi said.


Get the news in your inbox. Explore Blockworks newsletters:

Tags

Decoding crypto and the markets. Daily, with Byron Gilliam.

Upcoming Events

Javits Center North | 445 11th Ave

Tues - Thurs, March 24 - 26, 2026

Blockworks’ Digital Asset Summit (DAS) will feature conversations between the builders, allocators, and legislators who will shape the trajectory of the digital asset ecosystem in the US and abroad.

recent research

Research Report Templates (8).png

Research

Kinetiq has established itself as Hyperliquid's dominant liquid staking protocol, holding 82.5% of LST market share with $610M in TVL. The protocol is now expanding beyond its kHYPE staking core into higher take-rate verticals: iHYPE for institutional custody rails, Launch for HIP-3 capital formation, and Markets for builder-deployed perpetuals. We view Markets, launching Jan. 12, as the highest-potential product line given its mechanically scalable, activity-linked unit economics. Near-term revenue remains anchored by kHYPE's KIP-2 fee schedule (~$1.6M annualized), while Markets provides embedded optionality if HIP-3 economics normalize post-Growth Mode. KNTQ's setup is relatively clean: zero insider unlocks until November 2026, 6.2% buyback yield from staking revenue, and cleared airdrop overhang. Risks center on unproven Markets execution, declining kHYPE TVL despite ongoing incentives, and competition from Hyperliquid's native initiatives.

article-image

BTC finished the week up 1.6%, while L2s, RWAs and the treasury trade continued to grind lower

article-image

DTCC moves DTC-custodied Treasuries onchain via Canton, while Lighter’s LIT launches trading at a fees multiple in Hyperliquid territory

article-image

In the 90s, rapt audiences worldwide watched a coffee pot — will that fascination ever turn to crypto?

article-image

Some systems improve by failing — and crypto has no choice

article-image

Yield Basis introduces an IL-free AMM design that already dominates BTC DEX liquidity

article-image

Maybe tokenholders don’t need the rights that corporate shareholders have come to expect

Newsletter

The Breakdown

Decoding crypto and the markets. Daily, with Byron Gilliam.

Blockworks Research

Unlock crypto's most powerful research platform.

Our research packs a punch and gives you actionable takeaways for each topic.

SubscribeGet in touch

Blockworks Inc.

133 W 19th St., New York, NY 10011

Blockworks Network

NewsPodcastsNewslettersEventsRoundtablesAnalytics