People have been making big promises with AI lately, building chatbots, image generators, and coding assistants to show its potential. You must have seen examples of AIs creating entire websites with just a few prompts. As a web developer, I am curious if this is hype or reality. Can AIs really match human-crafted experiences?

I challenged five leading AI tools to build a production-quality app, tasking them and myself with creating a movie streaming platform from scratch. The goal? See how AI-assisted development truly stacks up against hand-crafted code.

Let's dive into this experiment and see who came out on top.

TL;DR

Hand-Crafted UI

Pros: Produces a fully functional, production-quality app with nuanced control and seamless tech integration
Cons: Time-intensive, requiring extensive knowledge of multiple technologies

ChatGPT 4o

Pros: Effectively breaks down an app into components and generates modular code
Cons: Struggles with modern web technologies, which leads to non-compilable code and slows down iterations

Anthropic Claude 3.5 Sonnet

Pros: Efficiently translates wireframes into functional apps, closely matching human-developed solutions
Cons: Cannot render complex multi-component apps in its environment, slowing down iterations

Cursor

Pros: Runs ChatGPT 4o and Claude 3.5 Sonnet inside VS Code, providing instant suggestions to feedback, allowing faster iteration
Cons: Aside from speeding up iterations, the same cons from the underlying models apply

Vercel v0

Pros: Excels in a specific tech stack, generating immediately functional code you can iterate on
Cons: Creates a monolithic file that needs to be broken up for better code organization

Cognition's Devin

Pros: Adapts to my preferred tech stack choices and has a full development environment
Cons: Requires significant micromanagement, struggles with focus and clean code production

Summary

AI-generated code provided a valuable starting point, but fell short in three key areas:

Meeting visual specifications, because AIs don’t fully understand front-end technologies
Producing maintainable code, often feeling like hacks rather than following best practices
Providing a robust dev environment, with limited to no visual feedback and inadequate project setup. In each case, I relied on Code Shaper for a production-quality dev setup.

The first two issues could be addressed with more detailed prompts, but it is frustrating to work with slow feedback loops. There are opportunities to improve here, potentially using techniques like Retrieval Augmented Generation (RAG).

Cursor, Claude and v0 stood out among the tested AI assistants, offering streamlined workflows and higher quality output. While they provided a valuable head start, 60-70% of the total effort still went into polishing the UI and addressing nuances – tasks that require human expertise.

The final verdict

Although AI assistants can't replace human creativity and craftsmanship, they're rapidly evolving. And we should expect front-end development will transform equally fast. Developers leveraging AIs will likely outpace those who don't, as they'll be able to focus more on what humans do best – creative problem-solving and innovation.

My takeaway: Embrace AI as a powerful ally in your development toolkit, but remember that human insight and creativity remain irreplaceable in crafting exceptional user experiences.

The dream app: Movie Magic

To explore the capabilities of AI assistants, I dreamt up a production-quality movie streaming app called Movie Magic. Like your favorite streaming service, this app should display a list of available movies, allowing users to search, sort, and filter these movies. Users should be able to add movies to their watchlist to watch later. Additionally, the app should support both light and dark modes so the interface is pleasant at any time during the day or night.

Below is a hand-sketched wireframe I used to serve as a guide:

Let’s have a human build it first!

To start off, I decided to take a crack at the requirements above and manually implemented the Movie Magic app. Of course, I relied on all my skills, creativity, and experience. Also, I chose my go-to stack for building production-quality apps:

And I didn’t start from scratch. Over the years, I’ve curated a set of development tools that streamline my workflow, such as Storybook for UI components, a strict ESLint configuration for consistent coding standards, and Turborepo for efficient build management. I've encapsulated this entire toolchain in Code Shaper plugins, which allow me to kickstart new projects with ease. This setup enabled me to focus immediately on building the Movie Magic app without wasting time on configuring a toolchain.

The component breakdown

I began by breaking down the app's interface into three primary components:

Application Header: Contains the logo, app name, navigation links, light/dark mode toggle, and the user menu
Toolbar: Includes the Filter & Sort button and a badge displaying the number of movies in the list
Movie List: Displays each movie's rank, title, genres, rating, year, and runtime

With a clear component breakdown, I started building. My first task was to lay out the page structure with the three main sections listed above. I then proceeded to develop each component, starting with broad strokes and refining the details over time.

Final implementation

Below is my final implementation. It took a considerable amount of work – trust me! Many details are not specified in the wireframe, like how the app would resize at different screen widths and the behavior of the filters.You can try out the live app here. Note that this version goes beyond just the movie list screen. It includes a landing page, robust authentication mechanisms, and everything you might expect from a production-quality app.

Now let’s try with ChatGPT 4o

I started by giving ChatGPT the Movie Magic wireframe and asked it to break down the app into components. This conversation lasted about two hours, as I was explaining how the app functioned, reviewing ChatGPT’s understanding, and providing clarifications and feedback. Once I was satisfied with the requirements, I asked ChatGPT to generate code using the same tech stack that I used for my manual implementation. You can review the detailed conversation in the Movie Magic repository, but here's a high-level summary:

Prompts to ChatGPT

You are a front-end developer with 20 years of experience in building web applications. Give me a detailed description of what's in this wireframe. (ChatGPTs breakdown was not very accurate, I had to nudge it in the right direction…)
Break down the page into three sections: (a) Header, (b) Toolbar (containing the Filter & Sort button + the Total Movies badge), (c) Movie List…
The movie image should have an aspect ratio of 2/3. Heights of the various sections should be as follows: …
The tech stack is TypeScript, Next.js (use App Router), Tailwind CSS, shadcn/ui, Radix UI. Ask me any clarifying questions about this stack. (ChatGPT asked six clarifying questions, e.g. "Is there a preference for SSR or SSG for specific parts of the application?")
Example answer: "Do not use SSR or SSG. Instead, use React Server Components (RSC) and React Client Components."

This process went on until I was fully satisfied with the requirements. At that point, I asked ChatGPT to generate the code. It generated decent code, along with setup instructions. Unfortunately, the setup instructions were very haphazard, as ChatGPT had copied them from the Next.js, Tailwind CSS, and shadcn/ui sites without integrating the instructions into a meaningful whole. I discarded them completely and used Code Shaper to generate my starter app.

Generated app

The generated app was reasonably modular, with the movies page at /src/app/movies/page.tsx and the three sections in three separate components. However, the code did not compile! Because ChatGPT makes no attempt to compile the code it generates, you have to step in to resolve a lot of minor issues.

Here’s the final screen that I generated after iterating with ChatGPT:

As you can see, it lacks visual design polish. Despite specific instructions, ChatGPT did not use any shadcn/ui components to make the UI clean and consistent. Light/dark mode was not implemented and movie entries didn't include the images I provided. Placeholder images were square and I had to adjust the code to render images in 2/3 aspect ratio.

It took me an additional 4 hours of iterations to get this screen close to my manual implementation. It's important to note, however, that this achievement primarily addressed the visual aspects of the application. The functional aspects, such as fetching data from a server and implementing working filters, were not developed. I chose to stop here, as I got the gist of working with ChatGPT to develop a front-end.

Can better prompting enhance ChatGPT’s performance?

While it's possible to refine ChatGPT's output through additional feedback and guidance, the key question is efficiency. Would this iterative process be faster than the 4 hours of tinkering I did with ChatGPT’s output to achieve the desired result? Given ChatGPT's slow iteration cycles and the cumbersome process of syncing code with my development environment, I concluded it wouldn't be worth the time investment.

If you know of more efficient techniques for prompting ChatGPT, I'd love to hear from you in the comments below.

ChatGPT – Overall impressions

The good

Component Breakdown: ChatGPT’s ability to analyze a wireframe and decompose it into distinct sections was impressive. Although the breakdown wasn’t flawless, achieving something like this would have been nearly impossible just a few years ago.
Understanding Feedback: It was noteworthy how well ChatGPT incorporated my feedback and refined its understanding of the requirements, demonstrating an impressive level of adaptability.
Modular Code: The code generated was modular, effectively mirroring the breakdown in the requirements.

The not so good

Cumbersome Workflow: ChatGPT operates in its own environment, generating code that it believes is correct without attempting to compile or run it to ensure it meets the requirements. Iterating on the code was cumbersome; I had to copy the code into my IDE, fix compilation errors, and then run it to see the results. Each time an issue arose, I had to provide feedback to ChatGPT and repeat the entire cycle. ChatGPT would then regenerate the entire codebase (which was a slow process), requiring me to manually synchronize the output with my existing code. This workflow proved to be highly inefficient.
Haphazard Setup Instructions: ChatGPT copied setup instructions from Next.js, Tailwind CSS, and shadcn/ui without integrating them into a cohesive whole, which could easily confuse a developer unfamiliar with the target stack. I opted to disregard ChatGPT's setup instructions and instead used Code Shaper to generate my starter app.
Misunderstanding shadcn/ui: ChatGPT didn’t grasp shadcn/ui and failed to use any of its components in the app's construction.
Incomplete Light/Dark Mode Implementation: ChatGPT provided a placeholder button for light/dark mode but did not implement the actual functionality.
Missing User Menu: The user avatar and its dropdown menu were entirely overlooked.
Lack of Understanding of Next.js and React Server Components: Components involving hooks and event handlers were mistakenly created as server components instead of client components. ChatGPT failed to add 'use client' statements where necessary.
Directory Structure Issues: Despite explicit instructions, ChatGPT placed application pages under the /pages directory instead of the /app directory, indicating that it doesn’t understand the App Router.
Long Import Paths: ChatGPT generated long relative import paths. I had to instruct it to use absolute imports for simplicity.
Component Structure: ChatGPT initially generated all components as arrow functions. After receiving feedback, it converted them to function declarations.
Movie Images: ChatGPT did not include movie images in the movie list. When I explicitly asked to add them, it did so, but with the incorrect aspect ratio.

Exploring with Anthropic’s Claude 3.5 Sonnet

Just like with ChatGPT 4o, I provided Claude with the requirements and engaged in a series of prompts to refine the generated code. While Claude approached the task differently, I found its overall development flow much smoother. A key contributing factor to this smoothness is Claude’s Artifacts feature, where it creates application components in separate windows where you can iterate on their content directly. This feature streamlines the process of refining and developing ideas or code. Here's a screenshot from my session with Claude showing an artifact containing a React component.

You can review my detailed conversation with Claude in the Movie Magic repository, but here's a high-level summary:

Prompts to Claude

Give me a detailed description of what's in this wireframe
Modify your breakdown to follow the this structure: Header, Toolbar, Movie List
Change "Poster placeholder (square)" to "Poster placeholder (aspect ratio of 2/3)
Add the following height specifications for each section ...
Add the following specs for column widths ...
Change the name of the "Number" column to "Rank"
The gap between movie list columns should be 12px. The rank column should be right justified.
The default mode for the app should be dark mode
Using the above breakdown as requirements, implement the Movie Magic app using this tech stack: TypeScript, Next.js (use App Router), Tailwind CSS, shadcn/ui

Generated app

The generated app was impressively well-structured and modular, demonstrating Claude's strong grasp of Next.js, Tailwind CSS and shadcn/ui. While the code requires minor cleanup, it was fairly close to how a human would code.

Here’s the final screen that I generated after iterating with Claude. As you can see, the visual design is remarkably close to my manual implementation, just missing the mark with responsive behaviors.

Can better prompting enhance Claude’s performance?

As we all know, LLMs perform better with improved prompting. I decided to see how Claude would do with high-fidelity visual designs as prompts vs. wireframes. I supplied two visual designs – one for mobile form factor and another for desktop form factor. Here’s the resulting implementation for the two form factors:

As you can see, Claude was able to implement the responsive behavior, however its implementation was not up to the mark. Instead of using Tailwind's responsive variants, it used a React hook + separate components for mobile vs. desktop form factors. This is not a good practice unless the layouts are completely different.

export function MovieList({ movies }: MovieListProps) {
  const isMobile = useMediaQuery({ query: '(max-width: 639px)' });

  return (
    <div className="bg-black text-white">
      {isMobile ? (
        <MobileMovieList movies={movies} />
      ) : (
        <DesktopMovieList movies={movies} />
      )}
    </div>
  );
}

Moreover, the implementation does not meet the visual design specs. The toolbar was 72px tall instead of 56px. The movie items were variable height (124-132px) instead of 112px tall. When I prompted Claude to fix these heights, it simply added h-14 & h-28 to these elements without ensuring that the constraints were actually met. This indicates that Claude understood the instructions but it does not have mastery over CSS to produce accurate results.

If you know of more efficient techniques for prompting Claude, I'd love to hear from you in the comments below.

Claude – Overall impressions

The good

Comprehensive Understanding: Claude exhibited an impressive ability to interpret and break down the hand-sketched wireframe into detailed components.
Adaptability: Claude quickly adapted to changing requirements and specifications throughout the development process.
Technical Proficiency: Claude demonstrated a strong grasp of my preferred tech stack, producing well-structured code that closely resembled human-written quality.
Modular Code: The generated code was well-organized and modular, enhancing readability and maintainability.
Artifacts Feature: Claude’s ability to generate and iterate on standalone content proved to be a powerful feature, streamlining my development workflow.

The not so good

Rendering Limitations: Claude was not able to render the generated Movie Magic application due to current limitations. This capability would have allowed quicker iterations to arrive at the desired outcome faster.
Minor Inaccuracies: While Claude's implementation was very close to the desired outcome, it still required some manual adjustments to perfectly match the intended design.

Reducing friction with Cursor

Cursor is noteworthy because it removes a lot of friction compared to working with ChatGPT and Claude in their own environments. With Cursor, you can get instant feedback to your suggestions and have faster iterations because of the tight integration with their IDE.

You can review my detailed conversation with Cursor in the Movie Magic repository, but here's a high-level summary:

Prompts to Cursor

Give me a detailed description of what's in this image
Modify your breakdown to follow the this structure: Header, Toolbar, Movie List
Add more details from the image, e.g. heights of the main sections (Cursor added the heights as percentages)
Instead of percentages, use pixel units (Cursor was able to give approximate pixel values that I adjusted later)
Update the MoviesPage component in the current file to implement these requirements
Change the icon in the Header to the Film icon in lucide-react
Change the Filter & Sort button icon to ListFilter icon in lucide-react
The nav links in the Header are centered, move them to the left as they are in the original image

Generated app

The session above was less than 30 minutes, but the generated app was already more polished than any other AI assistant I experimented with.

The main advantage of Cursor is that it allows me to iterate really fast. I was even able to make minor tweaks to the code without disrupting the code generation flow. I could have gone even further to perfect the output, but chose to stop here. I understood the value prop of Cursor very quickly.

Vercel v0 is up next

Just like with ChatGPT and Claude, I provided v0 with the requirements and engaged in a series of prompts to refine the generated code. A key advantage that v0 has over other AI assistants is that it’s tuned to a single stack: React/Next.js, Tailwind CSS and shadcn/ui. That’s great news if this happens to be your preferred stack. If not, Vercel plans to expand to other frameworks too. Another significant advantage of v0 is its capability to compile and run the generated code in its own environment – this makes iterations really fast.

Let’s get a feel for the development workflow in v0. You can review my detailed conversation with v0 in the Movie Magic repository, but here's a high-level summary:

Prompts to v0

Build a movie streaming application called Movie Magic that allows the user to browse and filter movies so they can add them to their watchlist. Use the attached wireframe as a guide.
Change the filter icon to ListFilter from Lucide icons
Allow toggle button to switch between light and dark mode (this did not work)
Convert the mode toggle button into a dropdown menu with three items: Light, Dark and System
Convert the Avatar into a Dropdown menu. First menu item is a menu label with user's name & email, followed by a divider, and lastly a "Sign Out" button.
Convert the Filter & Sort button into a Sheet that pulls out a drawer from the left side when clicked
For mobile form factor, collapse the icon, application title and navigation links into a drawer that is triggered by a hamburger menu (worked, but did it for all form factors!)
The collapsed hamburger menu should only appear in mobile mode. In tablet and desktop mode the full application header should appear (did not work well)
Theme toggle and Avatar dropdown should always be in the right corner (did not work well, fixed manually later)
In mobile mode, only show the movie image, title and genres. Hide the rank, rating, year and runtime
Improve the styling of the Movie table based on the following specifications…
Make one of the movie titles really long
The movie title should not wrap when it is too long to fit in the column
Replace table with <div>s using flexbox. Make sure to replicate all column specs.

Generated app

Since v0 code compiles and runs in its own environment, it's already a great start. I only needed to do some minor cleanup to get it running in my dev environment.

Here’s the final screen that I generated after iterating with v0:

As you can see, it lacks visual design polish. However, given that v0 deeply understands my preferred stack, most of the screen was already functional. For example, clicking the “Filter & Sort” button opened the side drawer. On the flip side, v0 failed to follow some prompts (see above), necessitating manual work.

v0 – Overall impressions

The good

Deep Understanding of a Single Tech Stack: v0 excels in targeting React, Tailwind CSS, and shadcn/ui, particularly in its initial implementation. It has a deep understanding of this tech stack and effectively utilizes shadcn/ui components to handle complex use cases. For instance, it can implement features like opening a side drawer when a button is clicked using the <Sheet> component.
Ability to Build and Run Generated Code: One of v0's most significant advantages is its capability to not only generate code, but also to compile and run it. This assures that the code is functional right out of the box. If the results are not well crafted, you can immediately provide feedback and v0 will promptly try to correct the issue. This eliminates the need to copy-paste the code into your repository just to test it, making iteration on your requirements incredibly fast.
Component Breakdown: v0 demonstrates impressive skill in analyzing a wireframe and decomposing it into the appropriate layout. It also does a commendable job of ensuring the layout is responsive. The main drawback here is that the entire layout is created in a single file (more on this later).
Targeted Prompts: v0 allows you to focus your prompts on specific elements within the generated UI, offering a high level of precision in your interactions.
Ability to Edit Code Directly Within v0: If v0 doesn't fully follow your instructions, you can take control and modify the code as you see fit. You'll see the results instantly, allowing you to quickly determine if the changes had the desired effect.
Good Setup Instructions: Once you've reached a point where more prompts are no longer yielding better results, it's time to integrate the generated code into your repository. v0 provides clear instructions for this process, including a CLI that imports the code directly into your repository. In my case, I chose to bypass v0's setup instructions, relying instead on my Code Shaper templates to configure my preferred tech stack and toolchain. I simply copied and pasted the component.jsx file generated by v0.

The not so good

Monolithic Code: v0 generates the entire UI in a single file called component.jsx. While this approach may work for simpler interfaces, it becomes unmanageable for more complex ones. Even for the relatively simple Movie Magic screen, I had to manually decompose the generated code into three main components (<Header>, <Toolbar>, and <MovieList>) and further into sub-components.
Does Not Run in My IDE: Although iterating within the v0 environment is quick, once the code is transferred to your IDE, there's no way to continue using v0. While this isn't an issue during the initial stages due to v0's strengths, it would be ideal if v0 could be integrated directly into my IDE, perhaps as a VS Code plugin.
Understanding Feedback: While v0 generally understands prompts well, there were instances where it misinterpreted them. For example, when I requested that navigation links collapse under a hamburger menu for mobile devices only, it applied this change to all form factors.
Flaky Light/Dark Mode Implementation: Given v0's strong understanding of Tailwind CSS and shadcn/ui, I expected a flawless implementation of light/dark mode. Unfortunately, this was not the case, and I had to make adjustments manually. However, this issue is relatively minor in the grand scheme of things.

And finally, Devin, by Cognition

Just like with the other AI assistants, I provided Devin with the requirements and engaged in a series of prompts to refine the generated code.

Devin takes a very different approach to code generation, functioning more like an AI apprentice rather than an expert assistant. It has knowledge of several computer languages and technologies but doesn't claim expertise in all areas. Devin's interface is tuned to a learning mindset, allowing you to provide documents and references to expand its knowledge base. Once it has learned how a technology works, you can prompt it to use it in your tasks.

Another important differentiation is that Devin has a full-blown development environment that includes:

Command Shell: For running commands like npm install and npm run...
Browser: For interacting with websites to read documentation and run the application
Editor: For editing files
Planner: For breaking down tasks and executing them in order

Additionally, Devin provides a full VS Code editor for manual code modifications.

Let’s get a feel for the development workflow in Devin. You can review the detailed conversation in the Movie Magic repository, but here's a high-level summary:

Prompts to Devin

Let's start by creating a new Next.js project by running the following command: …
Okay, now run the following command in the project's root directory to set up shadcn/ui: …
Here's the wireframe for Movie Magic. How will you break it down into components?
I want a little more hierarchy. Here's what I am thinking; let me know if you understand: …

Generated app

Once the high-level structure was solidified, I instructed Devin to start generating code. The entire session lasted 2-3 hours (not counting coffee breaks 😀), giving feedback and guidance to Devin to produce the desired output. Once I felt that the code was reasonably good and putting more time into coaching Devin would not dramatically improve the quality, I stopped and moved the code to my repository. Note that the generated code required a lot of cleanup, which I have detailed in the repository.

Here’s the final screen that I generated after iterating with Devin. As you can see, it still needs a lot of work to match the visual design.

Devin – Overall impressions

The good

Unique approach as an AI apprentice: Devin serves more as an apprentice than an expert. This allows developers to mold it to their specific needs, especially when working with less common technologies.
Well-integrated development environment: The development environment provides command execution, browser interaction, and a code editor all in one place, allowing you to iterate over the results.

The not so good

Requires significant micromanagement: Devin's tendency to stray off course when instructions are not crystal clear is a significant drawback. This constant need for explicit direction is not just a minor inconvenience but can be very frustrating, especially when trying to maintain a smooth workflow.
Produces code that often needs cleanup: Devin tends to follow less-than-ideal coding patterns (e.g., function expressions as components) and leaves behind unused code. These issues can be particularly annoying for those who prefer a clean, well-organized codebase.
Mixes up technical details: For example, not knowing the difference between Next.js App Router vs. Page Router

The final verdict

This experiment with AI-assisted front-end development reveals:

AI tools can accelerate initial development, but have limitations in their ability to iterate fast, meet visual design specs, and generate coding patterns that are maintainable.
Cursor, Claude, and v0 performed best among tested AI assistants, but human expertise was still crucial for 60-70% of the work.
The key to success: Balance AI assistance with human creativity and expertise. Use AI for routine tasks, while applying human insight for innovation and exceptional user experiences.
Looking ahead: As AI evolves, it will transform front-end development. Developers who effectively leverage AI will likely outpace those who don't.