Which AI Is Better for Writing Playwright End-to-End Tests

Last updated: March 16, 2026

Claude produces more maintainable Playwright tests with better selector strategies and error handling, while Cursor scaffolds test files faster. This guide compares both on test coverage, flakiness prevention, and maintainability.

Why AI-Assisted Playwright Test Writing Matters
Comparing AI Tools for Playwright Test Generation
What Matters Most in AI-Generated Playwright Tests
Pricing and Tool Availability
Selector Strategy Comparison
Real-World Test Generation Example
Flakiness Prevention Strategies
Practical Recommendations
Troubleshooting Generated Tests
Team Adoption Checklist

Why AI-Assisted Playwright Test Writing Matters

Writing Playwright tests manually takes time. You need to locate elements, handle async operations, manage test data, and structure your test suites. AI assistants can accelerate this process significantly. However, not all AI tools understand Playwright equally well. Some generate code that works immediately, while others produce tests that require substantial debugging.

The best AI for Playwright test generation should understand Playwright’s API, handle selectors intelligently, and produce maintainable code. Let’s examine how the major options perform.

Comparing AI Tools for Playwright Test Generation

ChatGPT (OpenAI)

ChatGPT remains a strong contender for writing Playwright tests. When prompted correctly, it generates functional test code with reasonable selectors.

Example prompt:

Write a Playwright test that logs in to my app,
checks the dashboard loads, and verifies a
notification appears.

ChatGPT typically produces code like this:

import { test, expect } from '@playwright/test';

test('dashboard shows notification after login', async ({ page }) => {
  await page.goto('https://myapp.com/login');
  await page.fill('[data-testid="email"]', 'user@example.com');
  await page.fill('[data-testid="password"]', 'password123');
  await page.click('[data-testid="login-button"]');

  await expect(page).toHaveURL(/.*dashboard/);
  await expect(page.locator('[data-testid="notification"]')).toBeVisible();
});

The code works, but ChatGPT often relies on generic CSS selectors rather than Playwright’s more locators. You may need to refine selectors for complex pages.

Claude (Anthropic)

Claude shows strong understanding of Playwright’s API and produces cleaner, more idiomatic code. It handles async patterns well and generates tests that follow Playwright best practices.

import { test, expect } from '@playwright/test';

test('user can filter products by category', async ({ page }) => {
  await page.goto('/products');

  // Open filter dropdown
  await page.getByLabel('Category').click();
  await page.getByRole('option', { name: 'Electronics' }).click();

  // Verify filtered results
  await expect(page.getByText('Showing 5 products')).toBeVisible();

  // Check product cards
  const products = page.locator('.product-card');
  await expect(products).toHaveCount(5);
});

Claude excels at using Playwright’s built-in locators like getByLabel, getByRole, and getByText. These are more maintainable than CSS selectors and handle accessibility concerns better.

GitHub Copilot

Copilot integrates directly into your editor and suggests test code as you type. It learns from your project’s existing tests, making its suggestions more contextually relevant.

// Copilot suggests this after typing:
test.describe('Shopping Cart', () => {
  test('adds item to cart', async ({ page }) => {
    await page.goto('/products/1');
    await page.click('text=Add to Cart');
    await expect(page.locator('.cart-count')).toHaveText('1');
  });
});

Copilot works best when you already have well-structured tests in your project. It mimics your existing patterns, which helps maintain consistency across your test suite.

Gemini (Google)

Gemini provides competitive test generation, particularly for complex scenarios involving multiple user flows. It handles data-driven testing reasonably well.

const testCases = [
  { email: 'user1@test.com', expected: 'success' },
  { email: 'invalid', expected: 'error' },
];

for (const tc of testCases) {
  test(`login with ${tc.email}`, async ({ page }) => {
    await page.goto('/login');
    await page.fill('#email', tc.email);
    await page.click('#submit');

    if (tc.expected === 'error') {
      await expect(page.locator('.error')).toBeVisible();
    }
  });
}

What Matters Most in AI-Generated Playwright Tests

When evaluating AI tools for test generation, focus on these factors:

Selector reliability: AI should prefer Playwright’s locators (getByRole, getByLabel, getByText) over fragile CSS selectors. These are more stable and accessible.

Error handling: Good test code includes proper waiting and retry logic. AI-generated tests should handle network delays and dynamic content without manual intervention.

Test structure: Look for proper use of test.describe for grouping, beforeEach for setup, and clean assertions. Well-structured tests are easier to maintain.

Page Object Model support: Advanced AI tools understand the Page Object Model pattern and can generate tests that work with your existing POM infrastructure.

Pricing and Tool Availability

Claude (via Claude.ai or API): $20/month for Pro, or pay-as-you-go with API ($3-15 per 1M tokens)
ChatGPT Plus: $20/month
GitHub Copilot: $10/month or $100/month for enterprise
Gemini Pro: Free with rate limits, or Google One AI Premium $20/month

For teams writing dozens of test files monthly, Claude’s web interface at $20/month is the most economical, while API pricing works for high-volume automation.

Selector Strategy Comparison

Selector Type	Claude	ChatGPT	Copilot	Reliability
`getByRole()`	Preferred	Sometimes	Rarely	Excellent
`getByLabel()`	Often	Occasionally	Rarely	Excellent
`getByText()`	Frequently	Yes	Occasionally	Good
`getByTestId()`	If provided	If provided	If provided	Perfect
CSS selectors	Last resort	Frequently	Often	Fragile
XPath	Never	Rarely	Occasionally	Very fragile

Claude’s consistent preference for accessible locators (getByRole, getByLabel) leads to tests that are more maintainable and catch accessibility issues.

Real-World Test Generation Example

Application: E-commerce checkout flow Requirement: Test adding items to cart, proceeding to checkout, and verifying order confirmation

Prompt: “Write a Playwright test that: navigates to product page, adds item to cart, goes to checkout, fills shipping info, submits payment, and verifies order confirmation page shows order number.”

Claude’s output (typical):

test('complete checkout flow', async ({ page }) => {
  // Navigate and add to cart
  await page.goto('/products/blue-shirt');
  await page.getByRole('button', { name: /add to cart/i }).click();

  // Verify cart updated
  await expect(page.getByLabel(/items in cart/i)).toContainText('1');

  // Proceed to checkout
  await page.getByRole('link', { name: /checkout/i }).click();
  await page.waitForLoadState('networkidle');

  // Fill shipping form
  await page.getByLabel(/email/i).fill('test@example.com');
  await page.getByLabel(/address/i).fill('123 Main St');

  // Submit and verify confirmation
  await page.getByRole('button', { name: /place order/i }).click();
  await page.waitForURL(/.*order-confirmation.*/);
  await expect(page.getByRole('heading', { name: /order confirmed/i })).toBeVisible();

  // Extract order number for assertions
  const orderNumber = await page.locator('[data-testid="order-number"]').textContent();
  expect(orderNumber).toMatch(/ORD-\d{6}/);
});

ChatGPT’s output (typical):

test('complete checkout flow', async ({ page }) => {
  await page.goto('https://myapp.com/products/blue-shirt');
  await page.click('button:has-text("Add to Cart")');
  await page.click('a[href="/checkout"]');

  // Fill form
  await page.fill('input[name="email"]', 'test@example.com');
  await page.fill('input[name="address"]', '123 Main St');
  await page.click('button[type="submit"]');

  // Verify
  await expect(page).toHaveURL('https://myapp.com/order-confirmation');
  await expect(page.locator('.order-number')).toBeVisible();
});

Claude’s version is more strong (accessible selectors, proper waits, network synchronization) while ChatGPT’s version works but requires refinement and is more brittle.

Flakiness Prevention Strategies

Ask your chosen AI to include these patterns:

// Wait for page stability, not just element visibility
await page.waitForLoadState('networkidle');

// Use getByRole for buttons (handles disabled states automatically)
await page.getByRole('button', { name: /submit/i }).click();

// Wait for URL changes to confirm navigation
await page.waitForURL(/.*checkout.*/);

// Add explicit waits for dynamic content
await expect(page.locator('.loading-spinner')).not.toBeVisible();

// Use proper locator chains
const item = page.locator('[data-testid="cart-item"]').first();
await item.getByRole('button', { name: /remove/i }).click();

These patterns significantly reduce test flakiness compared to minimal selectors.

Practical Recommendations

For most developers in 2026, Claude and ChatGPT offer the best balance of code quality and ease of use for Playwright test generation. Claude slightly edges ahead with its consistent use of Playwright’s modern locators and better understanding of async patterns.

For enterprise teams: Copilot’s IDE integration is valuable, but pair it with a stronger tool for critical test suites.

For startups/small teams: Claude at $20/month delivers the best test quality and ROI.

For high-volume test generation: Use Claude’s API at scale, or ChatGPT Plus for interactive refinement.

However, the best approach combines AI with your expertise:

Use AI to generate initial test scaffolding
Review and refine selectors for your specific app
Add custom assertions for business logic
Structure tests to match your project’s conventions
Run tests against staging and fix flakiness issues

Troubleshooting Generated Tests

Test fails with “element not found”:

Selectors may not match your actual DOM
Get the exact selector from your browser DevTools
Provide selector examples in next prompt

Test times out waiting for element:

Missing waitForLoadState() or waitForURL()
Add explicit wait conditions
Check if element is behind modal or other overlay

Test passes locally but fails in CI:

Timing issues (network slower in CI)
Add longer wait timeouts for CI environment
Use waitForLoadState('networkidle') instead of fixed waits

Selectors break when UI updates:

CSS selectors are fragile
Request getByRole and getByLabel locators instead
Use data-testid attributes for critical elements

Team Adoption Checklist

Select one primary AI tool for consistency
Create a template test file showing preferred patterns
Document your selector strategy (preferred locator types)
Set up CI to catch flaky tests
Review AI-generated tests before merging
Track test pass rates to identify flakiness
Adjust prompts based on team feedback

Frequently Asked Questions

Who is this article written for?

This article is written for developers, technical professionals, and power users who want practical guidance. Whether you are evaluating options or implementing a solution, the information here focuses on real-world applicability rather than theoretical overviews.

How current is the information in this article?

We update articles regularly to reflect the latest changes. However, tools and platforms evolve quickly. Always verify specific feature availability and pricing directly on the official website before making purchasing decisions.

Are there free alternatives available?

Free alternatives exist for most tool categories, though they typically come with limitations on features, usage volume, or support. Open-source options can fill some gaps if you are willing to handle setup and maintenance yourself. Evaluate whether the time savings from a paid tool justify the cost for your situation.

Can I trust these tools with sensitive data?

Review each tool’s privacy policy, data handling practices, and security certifications before using it with sensitive data. Look for SOC 2 compliance, encryption in transit and at rest, and clear data retention policies. Enterprise tiers often include stronger privacy guarantees.

What is the learning curve like?

Most tools discussed here can be used productively within a few hours. Mastering advanced features takes 1-2 weeks of regular use. Focus on the 20% of features that cover 80% of your needs first, then explore advanced capabilities as specific needs arise.

Built by theluckystrike — More at zovo.one

Table of Contents