We asked four AI coding agents to rebuild Minesweeper—the results were explosive

We asked four AI coding agents to rebuild Minesweeper—the results were explosive

As an Amazon Associate I earn from qualifying purchases.

Woodworking Plans Banner

How do 4 modern-day LLMs do at re-creating a basic Windows video gaming classic?

Which mines are mine, and which are AI?


Credit: Aurich Lawson|Getty Images

The concept of utilizing AI to aid with computer system programs has actually ended up being a controversial problem. On the one hand, coding representatives can make dreadful errors that need a great deal of ineffective human oversight to repair, leading numerous designers to lose rely on the principle completely. On the other hand, some coders firmly insist that AI coding representatives can be effective tools which frontier designs are rapidly improving at coding in manner ins which get rid of a few of the typical issues of the past.

To see how efficient these contemporary AI coding tools are ending up being, we chose to check 4 significant designs with an easy job: re-creating the traditional Windows video game MinesweeperBecause it’s reasonably simple for pattern-matching systems like LLMs to play off of existing code to re-create well-known video games, we included in one novelty curveball.

Our uncomplicated timely:

Make a full-featured web variation of Minesweeper with sound results that

1) Replicates the basic Windows video game and
2) executes a surprise, enjoyable gameplay function.

Consist of mobile touchscreen assistance.

Ars Senior AI Editor Benj Edwards fed this job into 4 AI coding representatives with terminal (command line) apps: OpenAI’s Codex based upon GPT-5, Anthropic’s Claude Code with Opus 4.5, Google’s Gemini CLI, and Mistral Vibe. The representatives then straight controlled HTML and scripting files on a regional maker, directed by a “monitoring” AI design that translated the timely and appointed coding jobs to parallel LLMs that can utilize software application tools to carry out the directions. All AI strategies were spent for independently without any unique or fortunate gain access to provided by the business included, and the business were uninformed of these tests occurring.

Ars Senior Gaming Editor (and Minesweeper professional) Kyle Orland then evaluated each example blind, without understanding which design created which Minesweeper clone. Those rather subjective and non-rigorous outcomes are listed below.

For this test, we utilized each AI design’s unmodified code in a “single shot” result to see how well these tools carry out with no human debugging. In the real life, many adequately intricate AI-generated code would go through a minimum of some level of evaluation and tweaking by a human software application engineer who might find issues and address ineffectiveness.

We picked this test as a sort of easy happy medium for the existing state of AI coding. Cloning Minesweeper isn’t an insignificant job that can be carried out in simply a handful of lines of code, however it’s likewise not an exceptionally intricate system that needs lots of interlocking moving parts.

Minesweeper is likewise a widely known video game, with lots of variations recorded throughout the Internet. That need to offer these AI representatives a lot of basic material to work from and need to be simpler for us to examine than an entirely unique program concept. At the exact same time, our open-ended ask for a brand-new “enjoyable” function assists show each representative’s fondness for unguided coding “imagination,” in addition to their capability to develop brand-new functions on top of a recognized video game idea.

With all that throat-clearing out of the method, here’s our assessment of the AI-generated Minesweeper clones, total with links that you can utilize to play them yourselves.

Representative 1: Mistral Vibe

Play it on your own

Simply overlook that Custom button. It’s simply for program.

Simply overlook that Custom button. It’s simply for program.


Credit: Benj Edwards

Application

Away, this variation loses points for not executing chording– the strategy that advanced Minesweeper gamers utilize to rapidly clear all the staying areas surrounding a number that currently has actually adequate flagged mines. Without this function, this variation feels more than a little cumbersome to play.

I’m likewise a bit perplexed by the addition of a”Custom”trouble button that does not appear to do anything. It’s like the design understood that personalized board sizes were a thing in MinesweeperCould not figure out how to execute this fairly standard function.

The video game works fine on mobile, however marking a square with a flag needs a challenging long-press on a small square that likewise sets off selector deals with that are hard to clear. It’s not a perfect mobile user interface.

Discussion

This was the only working variation we evaluated that didn’t consist of sound impacts. That’s reasonable, considering that the initial Windows Minesweeper Didn’t consist of noise, however it’s still a noteworthy relative omission because the timely particularly asked for it.

The all-black “smiley face” button to begin a video game is a little off-putting, too, compared to the intense yellow variation that’s familiar to both Minesweeper gamers and emoji users worldwide. And while that smiley face does begin a brand-new video game when clicked, there’s likewise an unnecessary “New Game” button using up area for some factor.

“Fun” function

The closest thing I discovered to a “enjoyable” brand-new function here was the video game including a rainbow background pattern on the grid when I finished a video game. While that does include a little bit of whimsy to an effective video game, I anticipated a little bit more.

Coding experience

Benj keeps in mind that he was happily shocked by how well Mistral Vibe carried out as an open-weight design regardless of doing not have the big-money support of the other competitors. It was fairly sluggish, nevertheless (3rd fastest out of 4), and the outcome wasn’t excellent. Eventually, its efficiency up until now recommends that with more time and more training, an extremely capable AI coding representative might ultimately emerge.

General ranking: 4/10

This variation got a lot of the fundamentals right however overlooked chording and didn’t carry out well on the little presentational and “enjoyable” touches.

Representative 2: OpenAI Codex

Play it on your own

I can’t inform you just how much I value those chording directions at the bottom.

I can’t inform you just how much I value those chording guidelines at the bottom.


Credit: Benj Edwards

Application

Not just did this representative consist of the important “chording “function, however it likewise consisted of on-screen guidelines for utilizing it on both PC and mobile web browsers. I was even more impressed by the choice to cycle through “?”marks when marking squares with flags, a mystical function I feel even most human Minesweeper cloners may miss out on.

On mobile, the alternative to hold your finger down on a square to mark a flag is a good touch that makes this the most satisfying portable variation we checked.

Discussion

The old-school emoticon smiley-face button is quite capitivating, specifically when you explode and get a red-tinted “X(“. I was less amazed by the playfield “graphics,” which utilize a basic “*” for exposed mines and an awful red “F” for flagged tiles.

The beeps-and-boops sound impacts advised me of my very first old-school, pre-Sound-Blaster PC from the late ’80s. That’s normally a good idea, however I still valued the video game offering me the alternative to turn them off.

“Fun” function

The “Surprise: Lucky Sweep Bonus” noted in the corner of the UI discusses that clicking the button offers you a totally free safe tile when readily available. This can be quite beneficial in scenarios where you ‘d otherwise be required to think in between 2 tiles that are similarly most likely to be mines.

In general, however, I discovered it a bit odd that the video game provides you this benefit just after you discover a big, cascading field of safe tiles with a single click. It primarily operates as a “win more” button instead of a function that provides an excellent balance of danger versus benefit.

Coding experience

OpenAI Codex has a great terminal user interface with functions comparable to Claude Code (regional commands, consent management, and intriguing animations revealing development), and it’s relatively enjoyable to utilize (OpenAI likewise provides Codex through a web user interface, however we did not utilize that for this examination). Codex took approximately two times as long to code a practical video game than Claude Code did, which may contribute to the strong outcome here.

Total: 9/10

The application of chording and charming discussion touches press this to the top of the list. We simply want the “enjoyable” function was a bit more enjoyable.

Representative 3: Anthropic Claude Code

Play it on your own

The Power Mod powers on display screen here make Expert boards quite insignificant to finish.

The Power Mod powers on display screen here make Expert boards quite minor to finish.


Credit: Benj Edwards

Execution

When once again, we get a variation that gets all the gameplay essentials best however is missing out on the vital chording function that makes genuinely effective Minesweeper play possible. This resembles playing Super Mario Bros. without the run button or Ocarina of Time without Z-targeting. In a word: inappropriate.

The”flag mode”toggle on the mobile variation of this video game is completely practical, however it’s a little cumbersome to utilize. It likewise aesthetically cuts off a part of the board at the bigger video game sizes.

Discussion

Presentation-wise, this is most likely the most refined variation we checked. From making use of adorable emojis for the “face” button to nice-looking bomb and flag graphics and basic however reliable sound impacts, this looks more expert than the other variations we evaluated.

That stated, there are some strange discussion problems. The “newbie” grid has strange spaces in between columns. The borders of each square and the flag graphics can likewise end up being strangely grayed out at points, specifically when utilizing Power Mode (see listed below).

“Fun” function

The popular “Power Mode” button in the lower-right corner uses some quite enjoyable power-ups that change the core Minesweeper formula in fascinating methods. The real powers are a bit hit-and-miss.

I specifically liked the “Shield” power, which secures you from an errant guess, and the “Blast” power, which appears to ensure a big waterfall of exposed tiles any place you click. The “X-Ray” power, which exposes every bomb for a couple of seconds, might be quickly made use of by a fast gamer (or a crafty screenshot). And the “Freeze” power is rather dull, simply stopping the clock for a couple of seconds and totaling up to a little additional time.

In general, the video game give out these brand-new powers like sweet, that makes even an Expert-level board reasonably unimportant with Power Mode active. Merely picking “Power Mode” likewise appears to mark a couple of safe squares right after you begin a video game, making things even easier. While these powers can be “enjoyable,” they likewise do not feel particularly healthy.

Coding experience

Of the 4 evaluated designs, Claude Code with Opus 4.5 included the most enjoyable terminal user interface experience and the fastest total coding experience (Claude Code can likewise utilize Sonnet 4.5, which is even much faster, however the outcomes aren’t rather as full-featured in our experience). While we didn’t specifically time each design, Opus 4.5 produced a working Minesweeper in under 5 minutes. Codex took a minimum of two times as long, if not longer, while Mistral took approximately 3 or 4 times as long as Claude Code. Gemini, on the other hand, took hours of playing to get 2 non-working outcomes.

General: 7/10

The absence of chording is a huge omission, however the strong discussion and Power Mode choices provide this effort a satisfactory last rating.

Representative 4: Google Gemini CLI

Play it on your own

… where’s the video game?

… where’s the video game?


Credit: Benj Edwards

Execution, presentation, and so on.

Gemini CLI did offer us a couple of gray boxes you can click, however the playfields are missing out on. While interactive repairing with the representative might have repaired the concern, as a”one-shot”test, the design entirely stopped working.

Coding experience

Of the 4 coding representatives we checked, Gemini CLI offered Benj the most problem. After establishing a strategy, it was really, extremely sluggish at producing any functional code (about an hour per effort). The design appeared to get hung up trying to by hand produce WAV file sound results and demanded needing React external libraries and a couple of other overcomplicated dependences. The outcome just did not work.

Benj really bent the guidelines and offered Gemini a 2nd opportunity, defining that the video game ought to utilize HTML5. When the design began composing code once again, it likewise got hung up attempting to make sound impacts. Benj recommended utilizing the WebAudio structure (which the other AI coding representatives appeared to be able to utilize), however the outcome didn’t work, which you can see at the link above.

Unlike the other designs evaluated, Gemini CLI obviously utilizes a hybrid system of 3 various LLMs for various jobs (Gemini 2.5 Flash Lite, 2.5 Flash, and 2.5 Pro were readily available at the level of the Google account Benj spent for). When you’ve finished your coding session and stop the CLI user interface, it provides you a readout of which design did what.

In this case, it didn’t matter since the outcomes didn’t work. It’s worth keeping in mind that Gemini 3 coding designs are offered for other membership strategies that were not evaluated here. Because of that, this part of the test might be thought about “insufficient” for Google CLI.

Total: 0/10 (Incomplete)

Last decision

OpenAI Codex wins this one on points, in no little part since it was the only design to consist of chording as a gameplay alternative. Claude Code likewise differentiated itself with strong presentational flourishes and fast generation time. Mistral Vibe was a substantial action down, and Google CLI based upon Gemini 2.5 was a total failure on our one-shot test.

While knowledgeable coders can certainly improve outcomes by means of an interactive, back-and-forth code modifying discussion with a representative, these outcomes demonstrate how capable a few of these designs can be, even with a really brief trigger on a reasonably simple job. Still, we feel that our total experience with coding representatives on other tasks (more on that in a future post) normally enhances the concept that they presently work best as interactive tools that enhance human ability instead of change it.

Kyle Orland has actually been the Senior Gaming Editor at Ars Technica considering that 2012, composing mostly about business, tech, and culture behind computer game. He has journalism and computer technology degrees from University of Maryland. He as soon as composed an entire book about Minesweeper

146 Comments

  1. Listing image for first story in Most Read: Russia is about to do the most Russia thing ever with its next space station

Learn more

As an Amazon Associate I earn from qualifying purchases.

You May Also Like

About the Author: tech