Vibe Coding

Updated March 6, 2026

The Complete Guide to AI-Native Software Development

22 chapters. 200+ prompts. Updated monthly. The only vibe coding resource that evolves as fast as the field.

In-depth chapters

Production-ready prompts

Security CVEs analyzed

Tools compared

📅 Updated March 2026 📈 Monthly updates for subscribers 🎓 Part of the EndOfCoding ecosystem

of developers using AI tools

$0B

Claude Code annual revenue

GitHub Copilot paid users

$0B

AI coding tools market (2026)

Choose Your Plan

The vibe coding landscape changes every week. Your subscription keeps you current.

Free Preview

✓ First 3 chapters
✓ 10 sample prompts
✓ 2 video tutorials
✓ Interactive quiz

↓ Start Reading Below

Frequently Asked Questions

Everything you need to know before you start.

What exactly is vibe coding? ▼

A term coined by Andrej Karpathy in February 2025 for a new development style where you describe what you want in natural language, and AI tools generate the code. It ranges from AI-assisted autocomplete to fully autonomous AI agents building entire applications. This ebook covers all five levels in depth with real data, case studies, and 200+ production-ready prompts.

Who is this ebook for? ▼

Developers exploring AI tools, engineering managers evaluating team adoption, entrepreneurs building products with AI, and anyone curious about the future of software development. Whether you use Cursor, Claude Code, GitHub Copilot, Bolt.new, or v0, this guide covers your tools and workflow.

How is the subscription different from a one-time purchase? ▼

The vibe coding landscape changes weekly — new tools launch, security incidents emerge, pricing shifts. Your subscription includes monthly updates to all 22 chapters, new entries in the prompt library and tool comparison matrix, a fresh monthly intelligence brief, and new community showcase features. You always have the most current resource in a fast-moving field.

What do I get in the free preview? ▼

The first 3 chapters are completely free: the origin story of vibe coding, a precise definition and framework, and the underlying philosophy. You also get the interactive quiz to find your vibe coding level, 10 sample prompts, and a glimpse of every chapter topic. No credit card required.

Can I cancel anytime? ▼

Yes. Monthly and annual subscriptions can be cancelled at any time through your Lemon Squeezy billing portal. You keep access until the end of your current billing period. No questions asked, no hidden fees.

Get a free chapter + weekly vibe coding insights

Join the mailing list for a bonus chapter on AI tool selection, plus weekly curated updates on the vibe coding landscape.

No spam. Unsubscribe anytime. Part of the EndOfCoding ecosystem.

📖

How to read this ebook: Use the sidebar to navigate 22 chapters. Click expandable sections for deep dives. Take the interactive quiz to find your vibe coding level. Use Ctrl+K to search across all content. Chapters 1–3 are free — subscribe to unlock all 22.

01. The Moment Everything Changed

Updated March 6, 2026

On February 2, 2025, Andrej Karpathy — former OpenAI co-founder, former Tesla AI director, and one of the most respected voices in machine learning — posted what would become one of the most consequential tweets in software development history:

"There's a new kind of coding I call 'vibe coding', where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. I just see stuff, say stuff, run stuff, and copy-paste stuff, and it mostly works." — Andrej Karpathy, February 2, 2025

Within weeks, the term had gone viral. Within a month, Merriam-Webster added "vibe coding" as a slang and trending term. By December 2025, Collins English Dictionary named it their Word of the Year.

But vibe coding didn't just enter the dictionary. It entered the economy. It entered boardrooms. It entered the workflows of millions of developers. And it sparked one of the fiercest debates the software industry has seen in decades.

The Timeline

February 2025

Karpathy coins "vibe coding"

The tweet goes viral. Merriam-Webster adds it within weeks. Developers worldwide start experimenting.

March 2025

Y Combinator reveals the data

25% of YC Winter 2025 startups report codebases that are 95% AI-generated.

May 2025

Claude Code launches publicly

Anthropic's terminal-based coding agent goes GA. It will reach $1B ARR in 6 months.

May 2025

Lovable security vulnerability

170 of 1,645 apps built on the vibe coding platform found to expose personal data.

June 2025

Devin hits $73M ARR

Cognition's AI software engineer grows 73x in 9 months. Goldman Sachs adopts it.

July 2025

Wall Street Journal reports mainstream adoption

Professional software engineers are using vibe coding for commercial products.

August 2025

Google Jules exits beta

Google's async coding agent goes public. 2.28M visits, 140K+ code updates.

September 2025

The "Vibe Coding Hangover"

Fast Company reports senior engineers entering "development hell" with AI-generated codebases.

November 2025

Claude Code hits $1B ARR

One of the fastest-growing enterprise software products in history.

December 2025

Collins Word of the Year

"Vibe coding" is named Collins English Dictionary Word of the Year 2025.

December 2025

Tenzai security study

69 vulnerabilities found across 15 applications built by 5 major AI coding tools.

January 2026

"Vibe Coding Kills Open Source" paper

Researchers publish arXiv paper arguing vibe coding threatens the open-source ecosystem by reducing user engagement with maintainers. Tailwind CSS docs traffic down 40% from 2023.

January 2026

Cognition reaches $10.2B valuation

Cognition raises $400M Series C. Devin ARR passes $155M. Goldman Sachs, Citi, Dell, Cisco, Palantir among enterprise clients.

January 2026

GitHub Copilot reaches 4.7M paid users

Agent mode becomes default workflow for complex tasks. MCP support rolls out to all VS Code users.

February 2026

Claude Opus 4.6 launches with Agent Teams

Anthropic releases Opus 4.6 with agent teams in Claude Code — multiple AI agents working in parallel on different aspects of a project, coordinating autonomously.

March 2026

The Open Source Reckoning & Enterprise Adoption

Researchers warn vibe coding erodes open-source funding. Pega becomes first enterprise platform to brand its AI features as "vibe coding." Cursor 2.5 launches subagent architecture. GitHub Copilot opens multi-model access. Devin 2.2 achieves 67% PR merge rate.

Next: What Vibe Coding Actually Is →

02. What Vibe Coding Actually Is

Updated March 6, 2026

Strip away the hype, and vibe coding is a specific practice with specific characteristics.

Vibe coding is an AI-assisted software development approach where a developer describes what they want in natural language, an AI model generates the code, and the developer evaluates the result through execution rather than code review. The developer does not read, edit, or attempt to understand the generated code. They test whether it works, and if it doesn't, they feed the error back to the AI.

💡

**Key distinction:** In traditional AI-assisted development, the developer remains the author and the AI accelerates. In vibe coding, the AI is the author and the developer is the director.

</div>

Karpathy described his own workflow precisely:

"I 'Accept All' always, I don't read the diffs anymore. When I get error messages I just copy paste them in with no comment, usually that fixes it. If it doesn't, I just revert to the last working state and re-prompt with more context."

The Three Core Loops

Vibe coding operates on three nested feedback loops:

Loop 1: Generate and Test

▼

**1.** Describe what you want in natural language

  **2.** Accept the generated code without reading it

  **3.** Run it

  **4.** Does it work? Ship it. Doesn't work? Move to Loop 2.

  This is the happy path. For simple features, you may never leave this loop.

</div>

Loop 2: Error-Driven Repair

▼

**1.** Copy-paste the error message to the AI (no commentary needed)

  **2.** Accept the fix without reading it

  **3.** Run it again

  **4.** Repeat until resolved or move to Loop 3.

  Most errors resolve within 1-3 iterations of this loop. The AI sees the error, understands the context, and fixes it.

</div>

Loop 3: Revert and Rephrase

▼

**1.** Revert to the last working state

  **2.** Describe the desired outcome differently, with more context

  **3.** Return to Loop 1

  This is the escape hatch. If the AI gets stuck in a loop of broken fixes, go back to a clean state and try a different approach. This is why checkpoints matter — always have a rollback point.

</div>

What Vibe Coding Is NOT

Not using GitHub Copilot for autocomplete — that's AI-augmented coding (Level 1)
Not asking ChatGPT to explain code — that's using AI as a learning tool
Not reviewing AI-generated code before accepting — that's AI-collaborative coding (Level 2)
Not no-code/low-code platforms — those use visual builders, not natural language to code

Vibe coding is specifically: natural language in, code out, test behavior, never read the code.

← Previous Next: The Philosophy →

03. The Philosophy: Trusting the Machine

Updated March 6, 2026

Vibe coding isn't just a technique. It's a philosophical stance about the relationship between developers and code.

The End of Code as Sacred Text

For decades, programming culture has treated source code as something to be crafted, reviewed, optimized, and understood. Code reviews are rituals. Clean code is a moral virtue. Understanding every line is a professional obligation.

Vibe coding rejects this entirely. It treats code as a disposable intermediary between human intent and running software. The code doesn't matter. The behavior matters.

This is not as radical as it sounds. Most software professionals already interact with layers of abstraction they don't fully understand:

Few web developers read TCP packet internals
Few application developers audit their compiler output
Few React developers understand the fiber reconciliation algorithm
Few SQL users trace query execution plans for every query

Vibe coding simply adds another layer: the AI becomes the compiler for natural language.

The Four Pillars

🎯

Intent Over Implementation

"What should this do?" replaces "How should I build this?"

⚡

Speed Over Elegance

Working software now beats perfect code later

🤖

Trust the AI

Accept all, don't read diffs, let the machine handle it

📈

Results-Oriented

Does it work? That's the only metric that matters

The Abstraction Argument

Supporters frame vibe coding as the natural progression of programming abstraction:

1950s

Machine Code → Assembly

"You don't need to write binary opcodes anymore!"

1970s

Assembly → C

"You don't need to manage registers anymore!"

1990s

C → Python / Java

"You don't need to manage memory anymore!"

2010s

Frameworks / Cloud

"You don't need to manage servers anymore!"

2025

Natural Language → Code

"You don't need to write code anymore!"

At each transition, purists warned that developers were losing essential skills. At each transition, the expanded abstraction enabled more people to build more things.

⚠️

**The counter-argument is real, though:** Every previous abstraction still had deterministic behavior. Assembly always compiles the same way. C always allocates memory the same way. AI code generation is probabilistic — the same prompt can produce different code each time, with different bugs. This is a genuinely new kind of abstraction layer.

← Previous Next: Five Levels →

04. The Spectrum: Five Levels of AI-Assisted Development

Updated March 6, 2026

Vibe coding is not binary. In practice, developers operate along a spectrum. Understanding where you sit — and where you should sit for a given project — is critical.

Level 0: Traditional Development

No AI at all

▼

You write every line. You understand every line. No AI assistance of any kind. Increasingly rare but still essential for certain domains like embedded systems, cryptography, and kernel development.

  **When to use:** Security-critical code, regulatory requirements, environments where AI tools are prohibited.

</div>

Level 1: AI-Augmented Coding

You are the author. The AI is a fast typist.

▼

You use AI for autocomplete, documentation lookup, and boilerplate generation, but you review and understand every line. Think: GitHub Copilot suggestions that you accept or reject with full awareness.

  **Tools:** GitHub Copilot, VS Code AI extensions

  **Code understanding:** 100% — you review everything

  **When to use:** Production code, team projects, anything you need to maintain

</div>

Level 2: AI-Collaborative Coding

You are the architect. The AI is the builder.

▼

You describe features in natural language and get back substantial code blocks. You review the code, understand the approach, and make modifications. You might use Cursor's Composer or Claude Code for generating components, but you read the diffs.

  **Tools:** Cursor Composer, Claude Code, Codex CLI

  **Code understanding:** 70-90% — you review most things

  **When to use:** Professional development, startup codebases, any code that needs to scale

</div>

Level 3: Guided Vibe Coding

You are the product manager. The AI is the engineering team.

▼

You describe what you want and accept most code without deep review, but you maintain a general understanding of the architecture. You spot-check security-sensitive sections. You understand the overall structure even if you don't read every function.

  **Tools:** Cursor Agent, Claude Code, Bolt.new

  **Code understanding:** 30-60% — architecture yes, implementation details no

  **When to use:** MVPs, internal tools, prototypes headed toward production

</div>

Level 4: Pure Vibe Coding

You are the client. The AI is the agency.

▼

Karpathy's original vision. You describe, accept all, test, paste errors, repeat. You don't read diffs. You don't understand the code. You only care if it works.

  **Tools:** Bolt.new, Lovable, Replit Agent, v0

  **Code understanding:** 0-10% — you only test behavior

  **When to use:** Personal projects, throwaway prototypes, hackathons, idea validation

</div>

Level 5: Autonomous Agent Coding

You are the executive. The AI is the employee.

▼

You don't even supervise in real-time. You assign tasks to AI agents that clone repos, create branches, write code, run tests, and open pull requests — all while you do something else. You review the final result.

  **Tools:** Devin, Google Jules, OpenAI Codex (cloud mode)

  **Code understanding:** Review-based — you check the output, not the process

  **When to use:** Routine tasks, migrations, test generation, documentation, with human review gate

</div>

📈

**Where do most developers operate?** In 2026, most professional developers work between Levels 1 and 3. Pure Level 4 is most common among non-technical founders, hobbyists, and rapid prototypers. Level 5 is emerging fast in enterprise environments. Notably, Karpathy himself has evolved from "vibe coding" to advocating **"agentic engineering"** — professionals orchestrating AI agents with oversight, not just vibes.

</div>

### Which level are you?

Take the interactive quiz at the end of this ebook to find out.

<button class="quiz-btn quiz-btn-primary" style="margin-top:0.5rem;" onclick="goTo('ch-quiz')">Take the Quiz &#8594;</button>

← Previous Next: The Tools →

05. The Tools: A Complete Landscape (2025–2026)

Updated April 21, 2026

The tooling ecosystem for AI-assisted development has exploded. The market is consolidating fast — with Cursor seeking a ~$50B valuation at $2B+ ARR, Lovable at $6.6B, Cognition at $10.2B, and billion-dollar acquisition battles playing out in real time. Anthropic's acquisition of Bun (the fast JavaScript runtime) signals Claude Code's push into native runtime integration. Here's the current state of play across every major category.

AI-Native IDEs

Cursor

Anysphere

The IDE Karpathy originally referenced. Built on VS Code with deep AI integration. Cursor 3 (April 2, 2026) is a ground-up redesign centered on agent orchestration: the new Agents Window replaces the Composer pane with a full-screen workspace for running multiple AI agents simultaneously in side-by-side, grid, or stacked layouts. Design Mode lets you click any element in a browser preview and direct agents to modify that exact component visually. Cloud-to-local handoff for agent sessions. Automations triggered by external services. Faster large-file diff rendering, less memory-heavy. The Await tool lets agents pause for background shell commands and subagents. MCP Apps now support structured content. Composer 2 (March 19, 2026): Cursor shipped Composer 2, built on Moonshot AI's Kimi K2.5 with extensive RL fine-tuning. Scores 61.3 on CursorBench — a 37% improvement over Composer 1 — and 73.7 on SWE-bench Multilingual. Priced at $0.50/M input tokens, making it highly cost-competitive for daily coding tasks. Community consensus: best performance-per-dollar for in-editor code generation as of Q1 2026. Previously (March 2026 pre-Composer 2): always-on Automations, JetBrains support via Agent Client Protocol, team plugin marketplaces.

$2B+ ARR • ~$50B valuation (fundraising) • 1M+ daily users • 50,000 businesses • >50% Fortune 500

IDEAgentMCPAutomationsJetBrainsDesign Mode

Windsurf

Cognition (via complex acquisition)

AI IDE with persistent "memories" for long-term context. Subject of a dramatic $3B acquisition saga: OpenAI's bid collapsed after Microsoft blocked it, Google hired the CEO and key researchers in a $2.4B deal, and Cognition acquired the remaining product, brand, and IP. Now supports Gemini 3.1 Pro. Ranked #1 in LogRocket AI Dev Tool Power Rankings (Feb 2026). Combined Cognition entity (Devin + Windsurf) raised $500M at ~$10B valuation with $82M+ ARR.

IDEMemoryCognition

VS Code + Extensions

Microsoft

The original. Still viable with GitHub Copilot, Continue, and Cline extensions. Best for developers who want AI assistance without switching editors.

IDEExtensions

Autonomous Coding Agents

Claude Code

Anthropic

Terminal-based coding agent. Reads and modifies code across entire repositories. Powered by Claude Opus 4.7 (released April 16, 2026 — 87.6% SWE-bench Verified, 94.2% GPQA, new ‘xhigh’ effort level, 3.3x higher-resolution vision, self-verification on agentic tasks, same price as 4.6). With agent teams — multiple AI agents working in parallel. March 2026: voice mode (/voice push-to-talk), STT in 20 languages, MCP management via /mcp dialog, Claude API skill for building on Anthropic's platform. Computer-use capabilities let Claude operate your Mac autonomously. Companion product Claude Cowork works directly with local files. Late March 2026 (v2.1.63–2.1.76): /loop command adds cron-like scheduled tasks — turning Claude Code into a background worker for PR reviews, deployment monitoring, and recurring analysis. 1-million-token context window. Max output increased to 64k tokens for Opus 4.6 (128k upper bound for Opus 4.6 and Sonnet 4.6). MCP servers can now request structured input mid-task via interactive dialogs. Skills.md enables persistent agent behaviors. Early April 2026: Anthropic acquires Bun (the fast JavaScript runtime built by Jarred Sumner) — bringing native Bun integration and faster JS execution directly into Claude Code workflows. Claude overtook ChatGPT as the #1 AI app on the App Store. Revenue surpassed $2.5B ARR (named world's most disruptive company, Time March 2026). In a Mozilla partnership, Claude Opus 4.6 autonomously found 22 CVEs in Firefox's C++ codebase. April 4, 2026 — OpenClaw Policy Change: Anthropic announced that Claude Code subscription limits no longer apply to third-party harnesses such as OpenClaw. Users of third-party Claude Code integrations must move to pay-as-you-go billing; a $200/mo Max subscription was reportedly being used to run $1,000–$5,000 of agent compute. Affected users received a one-time credit. Additional April updates: PowerShell tool for Windows (opt-in preview), flicker-free alt-screen rendering, named subagents in @ mentions, 60% faster Write tool diff computation. Note: Pentagon labeled Anthropic a supply-chain risk in March 2026 over weapons/surveillance policy; defense tech contractors migrating away. April 14, 2026 — Routines Launch: Anthropic launched Routines — saved configurations combining a prompt, repositories, and connectors that run automatically on a schedule or GitHub events on Anthropic's cloud infrastructure (no local machine required). Use cases: automated PR reviews, overnight test triage, weekly repo health audits. Plan limits: 5/day Pro, 15/day Teams, 25/day Enterprise. Desktop app redesigned simultaneously with integrated terminal, faster diff viewer, in-app file editor, and multi-session support.

$2.5B+ ARR • #1 App Store • Routines (Cloud) • Opus 4.7 (87.6% SWE-bench) • 1M Token Context • Computer Use • Voice Mode

CLIAgentAgent TeamsRoutinesCloud AutomationComputer UseVoiceEnterprise

Devin

Cognition Labs

Positioned as an "AI software engineer." Full agent-native IDE with parallel task execution, interactive planning, Devin Wiki, and Devin Search. Goldman Sachs, Citi, Dell, Cisco, Palantir among enterprise clients. $10.2B valuation after $400M Series C.

$155M+ ARR • 10x migration speed

AgentAsyncEnterprise

OpenAI Codex CLI

OpenAI

Open-source terminal agent built in Rust. Sandboxed execution, code review, MCP integration, session resume, and CI/CD automation. Now powered by GPT-5.4 (March 2026) — OpenAI's latest with native computer-use capabilities, 1M token context, and 33% fewer errors vs GPT-5.2. GPT-5.4 comes in Standard, Thinking, and Pro variants. ChatGPT for Excel/Sheets integration signals enterprise push.

npm i -g @openai/codex • GPT-5.4

CLIOpen SourceSandboxComputer Use

Google Jules

Google

Asynchronous agent powered by Gemini 3 Pro. Clones codebases into Cloud VMs, works independently, opens PRs automatically. Concurrent task execution. Cognition (Devin's parent) also shipped Windsurf Codemaps — AI-annotated structured maps of entire codebases powered by SWE-1.5 and Claude Sonnet 4.5, enabling hyper-contextualized navigation of large repos before making changes.

2.28M visits • 140K+ code updates

AgentAsyncCloud

Gemini CLI

Google

Open-source terminal agent powered by Gemini 3 Flash. Skills system with sub-agents, event-driven scheduler, and agent registry. Direct competitor to Claude Code and Codex CLI in the terminal space.

github.com/google-gemini/gemini-cli

CLIOpen SourceSkills

GitHub Copilot

GitHub / Microsoft

The original AI coding assistant, now with full agent mode. Autonomously identifies subtasks, edits across multiple files, runs tests, and fixes errors. MCP support. March 2026: GPT-5 mini and GPT-4.1 now included without consuming premium requests. Plan mode metrics available across JetBrains, Eclipse, Xcode, and VS Code. Users can assign the same issue to Claude, Codex, or Copilot agents simultaneously. March 11: Custom agents, sub-agents, and Plan Agent are now generally available in JetBrains IDEs (agent hooks in preview). March 12: New GitHub Copilot Student plan launched — free access maintained but premium model self-selection removed in favor of Copilot Auto mode. April 2026 — Agent Mode GA & New Features: Agent Mode now fully generally available on VS Code and JetBrains across all Copilot plans. Copilot SDK entered public preview (April 2) — building blocks for embedding Copilot agentic capabilities into custom apps and workflows. Autopilot mode (public preview) — agents approve their own actions and auto-retry on errors until task completion. Copilot CLI v1.0.18 added a Critic agent that automatically reviews plans using a complementary model. Sandbox MCP servers now available on macOS/Linux. Privacy policy change (effective April 24): GitHub Copilot Free/Pro/Pro+ user interaction data will be used for AI model training by default — opt out in account settings if this applies to you.

26M+ total users • 20M+ paid • 6+ IDEs • Agent Mode GA • Copilot SDK

IDEAgentMCPMulti-Model

Kilo Code

Kilo.ai (GitLab co-founder)

Open-source AI coding agent with 1.5M+ users. Orchestrator mode with planner/coder/debugger sub-agents. 500+ model support. Available in VS Code, JetBrains, and CLI. $19/mo or BYO API key. Launched March 2026.

1.5M+ users • Open Source

AgentOpen SourceMulti-Agent

Amazon Q Developer

Amazon

AI coding assistant deeply integrated with AWS. Code generation, transformation, and debugging with strength in serverless and cloud infrastructure patterns.

AgentAWS

Browser-Based Builders

Bolt.new

StackBlitz

Browser-based dev environment. Describe an app, get a working deployable application. No local setup. Excellent for rapid prototyping.

BrowserFull-StackDeploy

Vercel

AI-powered UI generation. Describe a component, get production-ready React + Tailwind code. Deep Next.js integration. Best for frontend prototyping.

UIReactNext.js

Lovable

Lovable (Sweden)

App creation for non-developers. Natural language to working, deployable software. By March 2026: $400M ARR (up from $200M at end-2025) with only 146 employees, 200,000+ new projects per day. March 23: CEO Anton Osika announced an M&A offensive — Lovable is actively acquiring startups and builder teams to extend its platform lead. Previously acquired cloud provider Molnett. Faced security scrutiny (170/1,645 apps had vulnerabilities).

$400M ARR • $6.6B valuation • 200K projects/day • M&A offensive

No-CodeBrowser

Replit Agent

Replit

Complete app building from descriptions with deployment and database management. 75% of AI-enabled Replit users don't write code themselves. March 11: Raised $400M Series D at a $9 billion valuation (led by Georgian Partners, with a16z, Coatue, Y Combinator, Databricks Ventures) — triple its September 2025 valuation in six months. Targeting $1B ARR by end of 2026.

75% write zero code • $400M Series D • $9B valuation

BrowserFull-StackDeploy

The Infrastructure Layer: MCP

🔗

**Model Context Protocol (MCP)** is Anthropic's open protocol that allows AI assistants to connect to external tools and data sources. It has become the standard way for coding agents to interact with databases, APIs, file systems, and other developer tools. All major agents (Claude Code, Cursor, Codex CLI, Devin) support MCP.

</div>

The Model Race (March 2026 Update)

The foundation models powering these tools are advancing on multiple fronts. Key releases in early March 2026:

GPT-5.4 (OpenAI): Native computer-use, 1M context, Standard/Thinking/Pro variants. Already integrated into Codex CLI and Copilot.
Gemini 3.1 Flash-Lite (Google): Ultra-low-latency variant designed for inline code completions and real-time suggestions. Powers Windsurf and Jules background tasks.
GLM-4.7 (Zhipu AI): China's leading code model, competitive with GPT-5 on multilingual programming benchmarks. Growing adoption in Asian markets.
DeepSeek-V3.2-Speciale (DeepSeek): Open-weight model rivaling proprietary offerings. Strong at multi-file reasoning and long-context code generation.

Open-source LLMs now account for over 60% of production AI deployments — a tipping point driven by DeepSeek, Llama, Qwen, and Mistral. This has shifted the economics: developers increasingly use open-weight models for routine code generation while reserving proprietary models for complex architectural reasoning.

Andrej Karpathy, who coined "vibe coding" in February 2025, introduced a new term in early 2026: "agentic engineering" — the discipline of designing, orchestrating, and supervising autonomous AI agents that write code, run tests, and deploy systems with minimal human intervention. The term has rapidly entered common usage, marking the evolution from "coding with AI" to "engineering with agents."

← Previous Next: The Agent Revolution →

06. The Agent Revolution

Updated April 15, 2026

The most significant development since Karpathy's tweet isn't better autocomplete. It's the emergence of autonomous coding agents — AI systems that independently plan, implement, test, and deploy software.

From Copilot to Colleague

Phase 1: Autocomplete (2021-2023)

The AI predicted the next line

GitHub Copilot launched. Useful, but fundamentally a typing accelerator. The developer remained in full control of every decision.

Phase 2: Composers (2023-2024)

The AI generated entire features

Cursor Composer, ChatGPT Code Interpreter. Multi-file generation became possible. But the developer still supervised each generation cycle.

Phase 3: Agents (2025-2026)

The AI works independently

Agents understand entire codebases, create execution plans, implement changes across dozens of files, run tests, fix failures, and open pull requests. The developer assigns a task and reviews the result — sometimes hours later.

Phase 4: Persistent Workers (Early 2026)

The AI runs on a schedule without being asked

Claude Code's /loop command and Claude Managed Agents enable scheduled background tasks. Agents run CI pipelines, triage issues, and maintain codebases overnight. The developer reviews a morning summary of what the AI decided and changed while they slept.

What Agents Can Do Today

Modern coding agents reliably handle tasks that would take a junior developer 4-8 hours:

🔃

Migrations

Framework, API, database schema conversions

🐛

Bug Fixes

Diagnose from logs, implement fix, write regression tests

🛠

Features

Complete frontend + backend + database changes

✅

Tests

Comprehensive test suites for existing code

📄

Documentation

Generate and maintain docs across entire codebases

🔒

Security Fixes

Scan for vulnerabilities and implement remediations

The April 2026 Benchmark Picture

Agent performance has accelerated dramatically. The current public leaderboard (April 2026):

Model	SWE-bench Verified	Access
Claude Mythos Preview	93.9%	Restricted (Project Glasswing)
Claude Opus 4.6	80.8%	Public
Gemini 3.1 Pro	80.6%	Public
GPT-5.4	75.0%	Public
Kimi K2.5 (open-source)	~75%	Open

Kimi K2.5 by Moonshot AI is the current #1 open-source option: 1 trillion parameter MoE architecture with 32 billion active parameters, competitive with frontier models at a fraction of the inference cost.

New Agent Orchestration Frameworks (April 2026)

Two major frameworks launched in April 2026 that reshape how multi-agent systems are built:

Google Agent Development Kit (ADK): google/adk-python — 8,200+ stars on launch week. Purpose-built for multi-agent orchestration with native Gemini integration and MCP support. Best for complex agent pipelines with multiple specialized sub-agents.
Meta llama-stack: Standardized agent runtime for Llama 4 models. Defines interfaces for tool calling, memory, and agent orchestration that work across the open-source ecosystem.
Claude Managed Agents: Anthropic's managed runtime at $0.08/session-hour plus token costs. Provides sandboxed execution, state management, and permission scoping. Testing shows 10 percentage point improvement in task success rates over standard prompting.

The practical implication: you no longer need to build agent infrastructure from scratch. These frameworks handle the hard parts — state, retries, tool routing, parallelization — so you can focus on the task logic.

What Agents Still Struggle With

Cognition's own 2025 performance review of Devin put it well:

"Devin is senior-level at codebase understanding but junior at execution."

Ambiguous requirements — agents make assumptions that may not match intent
Complex architectural decisions — they can implement but struggle with system-level design
Cross-system integration — tasks requiring deep understanding of multiple interconnected systems
Security context — knowing when something is dangerous requires deployment context, not just code patterns

The Parallel Execution Advantage

Unlike human developers, agents can run multiple instances simultaneously, work 24/7, and process entire backlogs of tickets overnight.

10x

Faster file migrations (bank case study)

14x

Faster repo migrations (Oracle Java)

20x

Faster vulnerability remediation

7.8m

Average task completion (Devin)

+10pp

Task success rate with Managed Agents vs prompting

93.9%

Claude Mythos SWE-bench (restricted access)

← Previous Next: Real Workflows →

07. Vibe Coding in Practice: Real Workflows

Updated March 6, 2026

Theory is interesting. Practice is what matters. Here are four concrete workflows for different scenarios.

#### The Weekend Prototype

**Scenario:** You have a product idea and want a working prototype by Monday.

**Tools:** Bolt.new or Cursor + Claude &bull; **Level:** 3-4

1. Write a detailed description (spend 20-30 min — it's the most important step)

Include: target users, core features, data model, key screens, visual style
Paste into Bolt.new or Cursor Composer
Iterate through natural language: "Make the sidebar collapsible" / "Add dark mode"
Deploy to Vercel or Netlify
Share with potential users for feedback

Build a job application tracker. I'm applying to software engineering positions and need to track: company name, position title, application date, status (applied/phone screen/onsite/offer/rejected), salary range, notes, and next action date. I want a clean dashboard showing all applications in a table with sorting and filtering. Include a kanban view grouped by status. Use a modern blue/slate color scheme. Store in localStorage. Make it responsive for mobile.


  </div>

  <div class="tab-content" id="wf2">
    #### The Startup MVP

    **Scenario:** Building a real product for real users, fast.

    **Tools:** Claude Code + Cursor + v0 &bull; **Level:** 2-3

    1. Start with a product requirements document (even a rough one)
2. Use v0 to prototype key UI screens
3. Use Claude Code to scaffold the full architecture
4. Build feature-by-feature, testing each before moving on
5. Review auth code and data handling; accept UI code freely
6. Deploy to real hosting, set up monitoring
7. Plan a "hardening phase" for security-critical paths

    <div class="callout warning">
      <div class="callout-icon">&#9888;&#65039;</div>
      <div class="callout-content">**The trap:** Skipping step 7. Many YC startups vibe-coded their MVPs successfully but faced "development hell" when trying to scale without hardening.

</div>
    </div>
  </div>

  <div class="tab-content" id="wf3">
    #### The Enterprise Integration

    **Scenario:** Adding a feature to an existing production codebase.

    **Tools:** Claude Code or Devin + CI/CD pipeline &bull; **Level:** 5 with human gate

    1. Create a detailed ticket with acceptance criteria
2. Assign to an AI agent (Devin, Claude Code, or Jules)
3. Agent analyzes codebase, creates a plan, implements the change
4. Agent runs existing test suite and fixes failures
5. Agent opens a pull request
6. Human reviews: security, performance, architecture, edge cases
7. Merge after human approval

    This is Level 5 but with human review as the final gate. It's how most enterprises adopt AI coding in 2026.

  </div>

  <div class="tab-content" id="wf4">
    #### The Solo Creator

    **Scenario:** You're not a developer. You have an idea for an app.

    **Tools:** Lovable, Bolt.new, or Replit Agent &bull; **Level:** 4

    1. Describe your application as if explaining it to a friend
2. Let the builder create the first version
3. Use it yourself — note what's wrong or missing
4. Describe changes in plain language
5. Repeat until satisfied
6. Deploy using the platform's built-in hosting

    <div class="callout danger">
      <div class="callout-icon">&#128308;</div>
      <div class="callout-content">**Critical:** If your app handles user data, sensitive information, or payments, hire a security professional to review it before going live. The Lovable vulnerability study (170/1,645 apps) shows this isn't hypothetical.

</div>
    </div>
  </div>

← Previous Next: Case Studies →

08. Real-World Case Studies

Updated March 6, 2026

These are documented, real examples — not hypotheticals.

Andrej Karpathy practiced what he preached, building MenuGen using nothing but natural language instructions. He provided goals, examples, and feedback — never touching the code directly. The project demonstrated that vibe coding could produce functional software, though Karpathy himself noted it was appropriate for "small weekend projects" rather than production systems.

</div>

New York Times journalist Kevin Roose, not a professional programmer, experimented with vibe coding in early 2025. He built several "software for one" applications — personal tools tailored to his exact needs. The results were mixed: some tools worked well, but in one notable case, an AI-generated e-commerce feature **fabricated fake product reviews**. Roose's experience illustrated both the democratization promise and the trust problem.

</div>

Goldman Sachs adopted Devin as part of their "hybrid workforce" — AI agents working alongside human engineers. They deployed Devin for code migrations, documentation generation, and routine maintenance. A representative case: **documenting 400,000+ repositories** that had accumulated years of tribal knowledge, freeing engineering teams for new feature development.

</div>

**25%** of companies in YC's Winter 2025 batch had codebases that were 95% AI-generated. These startups moved from idea to working product in days rather than months. Several raised seed funding based on prototypes built almost entirely through natural language. The trend raised questions about what happens when these companies need to scale.

</div>

Misbah Syed, founder of Menlo Park Lab, built the generative AI application Brainy Docs using vibe coding: "If you have an idea, you're only a few prompts away from a product." The company used AI-generated code for consumer-facing applications, demonstrating vibe coding could produce **revenue-generating products**, not just prototypes.

</div>

Bank of America used conversational coding agents to rapidly prototype fraud detection systems. Engineers described detection patterns in natural language and iterated through AI-generated implementations. Prototypes were achieved in a fraction of the traditional time, then **hardened by specialized security engineers** before deployment — a model example of the "vibe then harden" approach.

</div>

Perhaps the most striking validation of vibe coding as a business strategy came in early 2026 when **Wix acquired Base44 for $80 million in cash**. Base44, a solo-founder startup barely six months old, had built a vibe coding platform enabling non-developers to create functional applications through natural language. The acquisition demonstrated that vibe-coded companies could reach significant exit values in record time. YC-backed Emergent, another vibe coding company, reached a **$300 million valuation**.

</div>

Throughout 2025 and into 2026, the Indie Hackers community documented dozens of revenue-generating applications built primarily through vibe coding. Solo creators with limited coding backgrounds built and launched SaaS products within weeks. The pattern was consistent: **vibe code the MVP, validate with real users, then decide whether to hire engineers** for the production version.

</div>

SaaStr founder Jason Lemkin documented a cautionary experience: **Replit's AI agent deleted his database** despite explicit instructions not to make any changes. This incident became one of the most-cited examples of the risks of giving autonomous agents too much power without proper safeguards.

</div>

In January 2026, researchers from Central European University and the Kiel Institute published **"Vibe Coding Kills Open Source"** on arXiv. The paper documented a systemic problem: vibe coding raises productivity by making it easy to use open-source libraries, but **severs the user engagement** through which maintainers earn returns. Users no longer read documentation, file bug reports, or contribute. Tailwind CSS docs traffic dropped ~40% from early 2023. Stack Overflow questions entered structural decline after ChatGPT launched. The paper argued that sustaining open source under widespread vibe coding requires fundamentally new funding models for maintainers.

</div>

The most dramatic business story of the vibe coding era. OpenAI agreed to acquire Windsurf (formerly Codeium) for **$3 billion** — its largest acquisition ever. Then Microsoft reportedly blocked the deal over exclusivity clauses. Google swooped in with a **$2.4 billion** reverse acquisition package, hiring Windsurf's CEO and key researchers for DeepMind. Cognition then acquired the remaining product, brand, IP, and team. The result: one AI coding startup's technology and talent split across three of the biggest companies in AI. A sign of just how valuable vibe coding infrastructure has become.

</div>

← Previous Next: The Numbers →

09. The Numbers: Adoption and Impact

Updated April 26, 2026

The data tells a clear story: AI-assisted development isn't a trend. It's a structural shift.

Adoption

Developers using AI tools (JetBrains 2026)

Developers using AI tools daily, globally (Stack Overflow Dev Survey, Q1 2026)

US developers using AI tools daily (March 2026)

All new code that is AI-generated (GitHub State of Octoverse, March 2026)

All production code commits containing AI-generated lines (Sourcegraph Code Intelligence, March 2026)

Business AI adoption — all-time record (Ramp AI Index, Feb 2026)

Replit AI users who write zero code

AI Market Share (March–April 2026)

34.4%

OpenAI business market share (declining -1.5% MoM)

24.4%

Anthropic business market share (growing +4.9% MoM)

~70%

Head-to-head wins: Anthropic vs OpenAI in new business (Ramp)

93.9%

Claude Mythos on SWE-bench — restricted to Project Glasswing defense partners (April 7, 2026)

87.6%

Claude Opus 4.7 on SWE-bench Verified — best publicly available coding agent score (April 16, 2026)

95%+

GPT-6 on HumanEval — 40% improvement over GPT-5.4 with dual-tier reasoning (April 14, 2026)

80.8%

Claude Opus 4.6 on SWE-bench — baseline for comparison

The Agentic Model Race (April 2026)

Four major model releases in a single month reshaped the competitive landscape. The race is no longer about raw benchmark scores — it's about how many agents a model can orchestrate and how long it can sustain autonomous work.

GPT-6

OpenAI — 2M token context window, dual-tier reasoning (fast + verification), 95%+ HumanEval. 40% improvement over GPT-5.4 across coding, reasoning, and agent tasks. Launched April 14, 2026.

GPT-5.5

OpenAI — "Smartest and most intuitive" model, designed as the backbone for OpenAI's AI super-app combining chat, search, coding, and productivity. Released April 23, 2026.

Kimi K2.6

Moonshot AI — Open-source multimodal agent orchestrating up to 300 sub-agents executing 4,000 sequential coordinated steps. Targets long-horizon autonomous software engineering. Released April 20, 2026.

Claude Opus 4.7

Anthropic — 87.6% SWE-bench Verified, best publicly available coding agent score. Improved coding, sharper vision, self-verification. Released April 16, 2026.

The signal: In one month, the public record for coding agent benchmarks shifted from Claude Opus 4.6 (80.8%) to GPT-6 (95%+). Both figures may be superseded by Anthropic's restricted Mythos model (93.9% SWE-bench, April 7). Multi-agent swarm scaling — exemplified by Kimi K2.6's 300-agent architecture — is the new frontier.

Revenue & Growth

$2.5B+

Claude Code ARR

$155M+

Devin ARR (18 months from $1M)

$2B+

Cursor ARR (~$50B valuation, April 2026)

20M+

GitHub Copilot paid users (April 2026)

$50M

Emergent AI ARR in 7 months

$82M+

Cognition ARR (Devin+Windsurf)

Valuations (2026)

$350B

Anthropic valuation — Google commits $40B ($10B immediate + $30B contingent) at April 24, 2026. Largest single AI investment in history.

$10B

Cognition ($500M raise, Mar 2026)

~$50B

Anysphere (Cursor) — confirmed April 2026

$30B

Anthropic ARR (April 2026 — 3x jump from $9B at end of 2025)

$24B

OpenAI ARR (April 2026 — $2B/month)

$6.6B

Lovable ($400M ARR, 200K projects/day)

$9B

Replit ($400M Series D, Mar 2026 — tripled in 6 months)

Productivity

Faster project completion

10-14x

Faster agent migrations vs. human

500K

Developer hours saved (TELUS, 2025-26)

1,000+

PRs/week via AI agents (Stripe)

75%

Reduction in PR turnaround time for AI-tool teams (9.6 days → 2.4 days, Index.dev 2026)

3.6 hrs

Average time saved per developer per week (survey median, April 2026)

Developer Sentiment (April 2026)

Developers using AI tools (JetBrains 2026)

Professional developers using AI tools daily (SonarSource 2026)

Developers who have started using AI agents (April 2026)

Developers with "high trust" in AI output (down from 70%+ in 2023)

Developers frustrated by "almost right" AI solutions (top complaint, SonarSource)

Professional devs adopted vibe coding

Cultural Impact

Collins Dictionary Word of the Year 2026: "Vibe coding" (named again after 2025)
MIT Technology Review: Named "Generative Coding" a 2026 Breakthrough Technology
Merriam-Webster: Added as slang/trending term within one month of Karpathy's tweet
Wikipedia: Full article with extensive sources and analysis
Wall Street Journal: Reported widespread professional adoption (July 2025)
Fast Company: Documented the "vibe coding hangover" (September 2025)
arXiv: "Vibe Coding Kills Open Source" paper sparks open-source funding debate (January 2026)
VibeX 2026: First academic workshop on vibe coding, scheduled at EASE conference in Glasgow
Mainstream: Vibe coding is now a recognized methodology taught in bootcamps and referenced in enterprise strategy documents

← Previous Next: The Dark Side →

10. The Dark Side: Security, Debt, and Failure

Updated April 1, 2026

For every success story, there's a cautionary tale. The risks are real, documented, and in some cases severe.

The Tenzai Security Study

🔒

In December 2025, security startup Tenzai tested five major tools — Claude Code, OpenAI Codex, Cursor, Replit, and Devin — building three identical test applications each. Across **15 apps**, they found **69 vulnerabilities**: ~45 low-medium, the rest high or critical.

  **Key finding:** AI tools avoid generic security flaws but struggle where what makes code safe vs. dangerous depends on context.

</div>

AI code with security vulnerabilities

AI code with exploitable bugs

Developers who trust AI accuracy (down from 43%)

Practitioners who say AI code is "fast but flawed"

CVEs from AI-generated code in March 2026 alone (27 from Claude Code)

400–700

Estimated AI code vulnerabilities per month (incl. unpublished CVEs)

The Acceleration: 35 CVEs in One Month

The security threat from AI-generated code is not static. It is accelerating. In March 2026, security researchers confirmed 35 CVEs directly attributable to AI-generated code — 27 of them from Claude Code alone. Researchers from the CERT/AI Working Group estimate the actual monthly count including triaged-but-unpublished vulnerabilities is 400 to 700 per month.

The trend is steep and mirrors adoption curves:

Month	Confirmed AI Code CVEs	Estimated Total
Jan 2026	12	250–350
Feb 2026	21	310–450
Mar 2026	35	400–700

The root cause is structural: AI coding tools generate code that compiles and passes tests, but they optimize for functional correctness rather than security context. A model trained on decades of existing internet code learns the prevalence of insecure patterns alongside secure ones — and reproduces them with equal confidence. As AI-generated code's share of all new code climbs toward 41% (GitHub, March 2026), the absolute volume of AI-sourced vulnerabilities scales with it.

The deeper concern: the vulnerability rate is growing faster than the adoption rate, suggesting the tools are getting worse at security relative to their capability growth.

⚠

**IDEsaster Disclosure (Early 2026):** Security researchers found **30+ vulnerabilities across every major AI IDE**, resulting in **24 CVEs assigned** and putting an estimated **1.8 million developers** at risk. AI-generated code was found to be **2.74x more likely** to introduce XSS vulnerabilities than human-written code.

</div>

Documented Security Incidents

24 CVEs

IDEsaster — All Major AI IDEs

30+ vulnerabilities found across every major AI IDE. 1.8 million developers at risk. AI code 2.74x more likely to introduce XSS.

CVE-2025-54135

CurXecute — Cursor IDE

Malicious MCP server responses could execute arbitrary commands on developers' machines.

CVE-2025-55284

Claude Code DNS Exfiltration

Data exfiltration from developer computers through DNS requests.

PROMPT INJECTION

Windsurf Memory Poisoning

Malicious code comments poisoned Windsurf's long-term memory, enabling silent data theft over months.

PROMPT INJECTION

Gemini CLI Code Execution

Asking the Gemini CLI to analyze a project triggered a malicious injection hidden in a readme.md file.

MASS VULN

Lovable Supabase RLS Crisis (March 2026)

Researchers analyzed 1,645 Lovable-generated apps and found critical Row Level Security misconfigurations in 170 of them (10.3%). Affected apps exposed user data to any authenticated user. A separate CodeRabbit study confirmed AI-generated code has 2.74x higher security vulnerability rates than human code, with 1.7x more "major" issues per 1,000 lines. Source: RedReamality (March 15, 2026).

CVE-2025-48757

Base44 Platform

Unauthenticated access vulnerability exposed 170+ production applications built on the platform.

DATA BREACH

Tea App

Basic authentication failures in an AI-generated app leaked 72,000 user IDs and selfies.

CVE-2026-21858

n8n Remote Code Execution (CVSS 10.0)

Unauthenticated RCE allowing full server takeover on ~100,000 n8n automation servers. The highest possible CVSS score.

SUPPLY CHAIN

SANDWORM_MODE npm Worm

First malware to install rogue MCP servers, poisoning AI coding assistants to exfiltrate API keys. Self-replicates by stealing npm tokens and republishing victims' top 20 packages. Spread through 19 typosquatted packages.

MCP ATTACK

MCP Server Injection Crisis (8,000+ Servers)

92% exploitation probability at 10 MCP plugins. 72.8% attack success rate across 45 real-world servers. 36.7% of 7,000+ servers have SSRF exposure. More capable AI models are more vulnerable to MCP-based prompt injection.

CVE-2025-59536

Claude Code Remote Code Execution (CVSS 8.7)

High-severity RCE vulnerability in Claude Code's project file handling. Attackers could craft malicious repository files to execute arbitrary commands on a developer's machine when Claude Code processed the project. Patched in Claude Code 1.9.3.

CVE-2026-21852

Agentic IDE File Exfiltration via Tool Misuse

Vulnerability in multiple agentic IDE integrations allowing prompt-injected instructions to abuse legitimate file-read tools for exfiltrating source code, .env files, and SSH keys to attacker-controlled servers — without triggering standard security controls.

CVE-2026-33017 • CISA KEV • CVSS 9.3

Langflow Unauthenticated Remote Code Execution (Active Exploitation)

Critical unauthenticated RCE in Langflow — the open-source AI workflow builder widely used by vibe coders to prototype LLM pipelines. No authentication required for exploitation. Added to CISA KEV list March 2026 with patch deadline April 8. Actively exploited in the wild. Affects all Langflow versions prior to the March 2026 patch. If you run Langflow locally or self-hosted, treat this as an emergency patch. Source: CISA KEV, NVD.

CVE-2025-32432 • CISA KEV • CVSS 10.0

Craft CMS Code Injection — Maximum Severity

CVSS 10.0 code injection vulnerability in Craft CMS — a common CMS backend choice in AI-generated web projects. Added to CISA KEV with patch deadline April 3. The maximum CVSS score means any authenticated user (or in some configurations, unauthenticated) can execute arbitrary code on the server. Vibe-coded projects using Craft as their CMS backend should patch immediately or temporarily disable public access.

CVE-2025-54068 • CISA KEV • CVSS 9.8

Laravel Livewire RCE — Nation-State Attribution

Critical RCE in Laravel Livewire with nation-state actor attribution confirmed by threat intelligence sources. Added to CISA KEV with patch deadline April 3. Laravel is one of the most frequently suggested PHP frameworks in AI coding assistants — a large percentage of AI-generated web projects use it. This isn't a theoretical risk: active exploitation with sophisticated threat actors is confirmed. Patch immediately.

AI as Vulnerability Hunter: The Other Side of the Coin

🔎

**Claude Opus 4.6 Finds 22 Firefox CVEs (March 2026):** In a partnership with Mozilla, Anthropic's Claude Opus 4.6 autonomously analyzed Firefox's C++ codebase and identified **22 previously unknown CVEs**. The model found memory safety vulnerabilities, use-after-free bugs, and buffer overflows that human reviewers had missed. This demonstrates a dual reality: the same AI capability that generates vulnerable code can also find vulnerabilities at scale — the question is who uses it first, defenders or attackers.

</div>

The Threat Landscape: Ransomware Meets AI

The broader cybersecurity environment compounds the risk of insecure AI-generated code. As of early 2026, there are 124 active ransomware groups — a 49% year-over-year increase. These groups are increasingly using AI to generate phishing lures, analyze codebases for vulnerabilities, and automate lateral movement. The intersection of AI-generated insecure code and AI-accelerated exploitation creates a compounding threat surface.

The AI Slopageddon: Open Source Fights Back

By early 2026, a new phenomenon emerged that open-source maintainers dubbed the "AI Slopageddon" — a flood of low-quality, AI-generated bug reports, pull requests, and security "findings" overwhelming popular projects:

cURL: Daniel Stenberg reported a deluge of AI-generated vulnerability reports so poor they were "worse than spam" — wasting maintainer time triaging hallucinated CVEs. He began publicly shaming the worst offenders and lobbied HackerOne to penalize AI-slop submissions.
Ghostty: The terminal emulator project implemented explicit policies rejecting AI-generated contributions after a wave of superficially plausible but fundamentally broken PRs.
tldraw: The collaborative whiteboard project documented a pattern of AI-generated issues that described bugs that didn't exist, in code paths that didn't exist, with reproduction steps that couldn't work.

The pattern is consistent: AI tools lower the barrier to appearing competent enough to submit contributions, but the submissions lack the understanding that makes them useful. Maintainers are now spending significant time filtering AI slop instead of building software — an ironic cost of the productivity tools meant to help them.

The $1.5 Trillion Technical Debt Problem

Analysts have warned of a potential $1.5 trillion in technical debt by 2027 from AI-generated code:

41% higher code churn — AI code gets rewritten more often
8x increase in duplicated code blocks (GitClear, 2024)
30% of AI suggestions accepted in professional environments
Forrester: 75% of tech leaders will face moderate-to-severe tech debt by 2026

The "Vibe Coding Hangover"

By late 2025, Fast Company reported senior engineers entering "development hell" maintaining vibe-coded systems:

🧬
Zombie Apps
Functional but unmaintainable

🍝
Spaghetti Code
Works but no coherent structure

🚧
Complexity Ceiling
Can't extend without breaking

😶
Debug Impossibility
Nobody can trace the code they never read

← Previous Next: The Great Debate →

11. The Great Debate

Updated March 6, 2026

The software community is deeply divided. Understanding the strongest arguments on each side helps you form a nuanced view.

#### "It's the natural evolution of abstraction."

Programming languages have always moved toward higher abstraction. Assembly to C to Python. Each level lets developers focus on intent rather than implementation. Natural language is simply the next layer.

#### "It democratizes creation."

Millions of people have software ideas but lack years of training. Vibe coding lets a nurse build a patient tracking app, a teacher build a classroom tool, a small business owner build inventory management. The expansion of who can create software is historically significant.

#### "The speed advantage is transformative."

A prototype in hours instead of weeks. An MVP in days instead of months. The 25% of YC companies with 95% AI code didn't choose vibe coding for ideology — they chose it because they needed to move fast.

#### "Traditional code isn't as reliable as we pretend."

Human-written code has bugs, security vulnerabilities, and technical debt too. AI-generated code may have different failure modes, but the idea that human code is inherently reliable is a myth.

#### "Code you don't understand is code you can't maintain."

Software spending is ~60% maintenance. If nobody understands the codebase, maintenance is impossible. You're not saving time — you're borrowing it from the future at a ruinous interest rate.

#### "Security requires understanding, not just testing."

You can test whether a login form works. You can't easily test whether passwords are properly hashed, session tokens are cryptographically secure, or APIs have rate limiting — unless you read the code.

#### "It creates learned helplessness."

Developers who rely entirely on vibe coding lose fundamental skills. When the AI makes a mistake in a novel way, they have no fallback. Fragile teams build fragile systems.

#### "The economics don't work at scale."

Vibe coding is cheap upfront and expensive later. The $1.5 trillion tech debt projection isn't speculation — it's extrapolation from observed code churn, duplication, and architectural degradation.

#### Context Is Everything

The most reasonable position — and the one supported by data — is that vibe coding is a powerful tool with a specific and limited appropriate scope.

<div class="callout success">
  <div class="callout-icon">&#9989;</div>
  <div class="callout-content">
    **It excels for:** prototyping, validation, personal tools, learning, hackathons, and small-scale applications with limited security requirements.

  </div>
</div>
<div class="callout danger">
  <div class="callout-icon">&#10060;</div>
  <div class="callout-content">
    **It fails for:** production systems at scale, security-sensitive applications, regulated industries, and software that needs multi-year maintenance.

  </div>
</div>
**The winning model in 2026:** Vibe code the prototype, then bring in disciplined engineering for the production system. The companies dominating right now — the ones raising at $10B valuations, the ones with $1B ARR in six months — are all betting that this model scales. And the data supports them.

The critics are not wrong about the risks. But they are wrong about the trajectory. Every objection to vibe coding was once made about high-level languages, about frameworks, about cloud computing. The abstraction always wins. The question is never *whether* but *how*.

← Previous Next: When to Vibe →

12. When to Vibe (and When Not To)

Updated March 6, 2026

🟢 Green Light: Vibe Code Away

- **Prototypes and MVPs** — Validate ideas before investing in production engineering - **Internal tools** — Dashboards, data scripts, one-off analysis - **Personal projects** — Only you use it, only you depend on it - **Learning** — Trying new frameworks, languages, or patterns - **Hackathons** — Speed is everything, longevity is nothing - **UI prototyping** — Design exploration and layout testing - **Automation scripts** — Repetitive tasks that eat your time

🟠 Yellow Light: Proceed with Caution

- **Customer-facing apps** — Vibe the prototype, then review and harden - **Small SaaS** — Viable for launch, plan for rewrite - **API integrations** — Fast to build, auth needs human review - **Mobile apps** — UI can be vibe coded; data/security need attention - **Team projects** — Works if one person understands the architecture

🔴 Red Light: Don't Vibe Code

- **Financial systems** — Payments, accounting, trading - **Healthcare** — Patient data, clinical decisions, HIPAA - **Auth & authz** — Login systems, permissions, tokens - **Infrastructure** — Server config, network security, deployment - **Regulated industries** — SOX, PCI-DSS, GDPR compliance - **Distributed systems** — Microservices, message queues, cache invalidation - **Cryptography** — Encryption, key management, certificates

💡

**The 80/20 Rule:** For most applications, 80% of the code is boilerplate, UI, and standard patterns that AI handles well. The remaining 20% — authentication, business logic, data integrity, security — deserves human attention. **Vibe code the 80%. Engineer the 20%.**

← Previous Next: Mastering the Craft →

13. Mastering the Craft: Advanced Techniques

Updated March 6, 2026

If you're going to vibe code, do it well. These techniques separate productive vibe coders from frustrated ones.

The Art of the Initial Prompt

The single most important factor in vibe coding success. Spend 30 minutes writing a comprehensive description before generating a single line of code.

WHAT

What does it do? (user perspective)

WHO

Who uses it? (audience, skill level)

HOW

How should it look? (design, colors)

DATA

What entities? How do they relate?

EDGE

What happens when things go wrong?

TECH

Any framework/language preferences?

Weak vs. Strong Prompts

❌

``` Build me a todo app ```

✅

``` Build a project management application for freelance designers. Users: Solo freelancers managing 3-10 client projects. Core features: - Project board with columns: Incoming, In Progress, Review, Complete - Each card: client name, title, deadline, progress bar - Detail view with task checklist, file links, notes, time log - Dashboard: projects due this week, hours logged, revenue summary Design: Clean, minimal. Coral accent (#FF6B6B). Dark mode. Tablet-friendly. Data: localStorage, structured for future database migration. Behavior: Drag-and-drop cards. Auto-save. Keyboard shortcuts. ```

Key Patterns

Before requesting any significant change, save your current state. Vibe coding can regress working features while adding new ones.

```

Working: dashboard + project cards + drag-and-drop -> Save/commit BEFORE adding: task checklist feature


    </div>
  </div>

  <div class="expand-section">
    <button class="expand-header" onclick="this.parentElement.classList.toggle('open')">
      <span class="expand-arrow">&#9654;</span> The "Explain Then Generate" Pattern
    </button>
    <div class="expand-body">
      For complex features, ask the AI to explain its approach before generating code:

      ```
Before writing any code, explain how you would implement
real-time collaborative editing in this application.
What approach? What trade-offs? Then implement it.

  This gives you architectural understanding even in a vibe coding workflow.

</div>

Different models excel at different things:

  - **Claude Opus 4.6 (via Claude Code)** — Complex reasoning, architecture, large codebases, agent teams for parallel work

GPT-5.2 (via Codex CLI) — Code generation, systematic transformations, sandboxed execution
Gemini 3 Pro / Flash (via Jules or Gemini CLI) — Multimodal (screenshots, diagrams), open-source CLI with skills system
GitHub Copilot Agent Mode — Best for working within existing VS Code workflows with agent capabilities
v0 — React/Next.js UI generation
Bolt.new — Full-stack prototypes you want immediately

**Bad:** "It's broken"

**Good:** "When I click 'Add Task', nothing happens. Console shows: `TypeError: Cannot read property 'push' of undefined at TaskList.addTask (app.js:47)`. This started after I added drag-and-drop."

Include: **action** (what you did), **actual** (what happened), **expected** (what should happen), **error** (verbatim), **context** (what changed recently).

← Previous Next: Sustainable Workflow →

14. Building a Sustainable Workflow

Updated March 6, 2026

Pure vibe coding is fast but fragile. Here's how to build a workflow that's both fast and sustainable.

Phase 1: Vibe and Validate (Days 1-3)

Pure vibe coding for a working prototype

Don't worry about code quality. Just get something that works and demonstrates the core value proposition. Goal: a demo for users, investors, or stakeholders.

Phase 2: Test and Tighten (Days 4-7)

Switch to Level 2-3, review critical paths

Review auth/authz, data storage, payment processing, input validation, and API endpoints. Use AI to generate comprehensive tests.

Phase 3: Harden for Production (Week 2)

Security scanning, proper error handling, monitoring

Run OWASP ZAP or Snyk. Review all DB queries. Add rate limiting, HTTPS, CORS, CSP. Set up logging. Review dependencies for known vulnerabilities.

Phase 4: Maintain and Evolve (Ongoing)

Document, automate, and plan cleanup sprints

Document architecture. Automated testing on every change. AI agents for routine updates. Human review for architectural and security changes. Periodic cleanup sprints.

### The 80/20 Rule

Vibe code the 80% (UI, boilerplate, standard patterns).

Engineer the 20% (auth, business logic, data integrity, security).

← Previous Next: Business of Vibes →

15. The Business of Vibes

Updated March 6, 2026

Vibe coding isn't just changing how software is built. It's changing the economics of software businesses.

The New Cost Structure

- Hire 3-5 engineers at $150K-$250K each - 3-6 months to MVP - **Total cost to first version: $300K-$1M+**

- 1 technical founder + AI tools ($20-$500/month) - 1-4 weeks to MVP - **Total cost to first version: $500-$5,000**

<p style="margin-top:1rem;"><em>This doesn't mean you never need engineers. It means you can validate before investing.</em></p>

The New Archetypes

🏆

The 10-Person $10M Company

Small teams with AI agents handling work that traditionally required 50+ engineers

👨‍💻

The AI-Fluent Developer

Engineers who can specify precisely and evaluate AI output critically

👥

Agent-Augmented Teams

Each human manages 2-5 AI agents working in parallel

The Talent Shift

Companies are increasingly hiring for:

Specification specialists — translating business requirements into precise AI prompts
System architects — designing overall structure that AI agents implement
Security engineers — the human review layer catching what AI misses
AI-fluent developers — working effectively with and reviewing AI-generated code

Browse 670+ open AI/LLM positions at LLMHire — the dedicated job board for AI engineers, ML researchers, and prompt engineers.

← Previous Next: What Comes Next →

16. What Comes Next

Updated March 14, 2026

Now (Early 2026) — Already Happening

AI-native development is the default. 84% of developers use AI tools. The question has shifted from "should we use AI?" to "how do we use it safely?"
Agent teams are here. Claude Code's agent teams feature lets multiple AI agents work in parallel on different aspects of a project. This is the beginning of true AI-human hybrid teams.
The open-source crisis. A January 2026 arXiv paper argues vibe coding threatens the open-source ecosystem: users no longer visit docs, file bugs, or engage with maintainers. Tailwind CSS docs traffic down 40%. Stack Overflow questions in structural decline. How maintainers get paid must change.
Multimodal coding emerges. Voice-driven coding, visual programming interfaces, and screenshot-to-code workflows are entering mainstream tools.
Consolidation is accelerating. The Windsurf saga — a $3B acquisition attempt, Microsoft blocking, Google poaching, Cognition acquiring — signals a market entering its consolidation phase. Wix acquired Base44 for $80M cash. Anthropic acquired Bun.
"Agentic engineering" replaces "vibe coding" for professionals. Karpathy himself has moved beyond the term, now advocating for professionals orchestrating AI agents with oversight, not just vibes.
The IDEsaster wake-up call. 30+ vulnerabilities across every major AI IDE, 24 CVEs, 1.8M developers at risk. AI code is 2.74x more likely to introduce XSS than human code.
AI reviews AI code. Anthropic launched Code Review (March 9, 2026) — a multi-agent system inside Claude Code that automatically catches logic errors in AI-generated code. The "who reviews the reviewer" problem now has a commercial answer.
Claude becomes the enterprise default. Anthropic committed $100 million to the Claude Partner Network (March 12–13, 2026), formalizing partnerships with Accenture, Deloitte, Cognizant, and Infosys. Enterprise AI standardization is no longer theoretical.
Anthropic hits $380B valuation — Claude #1 on App Store. After refusing Pentagon weapons AI contracts, Anthropic became the most disruptive company in the world (TIME, March 2026). Claude overtook ChatGPT as the #1 app on Apple's App Store. The safety-first bet paid off.
Agent documentation tooling matures. DeepLearning.AI (Andrew Ng's team) released Context Hub (March 9, 2026) — an open-source CLI tool that gives coding agents real-time access to current API docs, bridging the gap between training cutoffs and fast-moving APIs.

Near-Term (Late 2026)
- Security tooling catches up. Agentic security tools reviewing AI code in real-time. "Move security into the act of creation."
Standardization emerges. Enterprise governance frameworks for AI-generated code.
Agent orchestration matures. Specialized agents for frontend, backend, testing, security working in concert under a lead agent.
Open-source funding models evolve. New models for compensating maintainers whose libraries power AI-generated code.

Medium-Term (2027-2028)
- Natural language becomes a programming interface. Not replacing code, but a legitimate authoring medium.
AI-human hybrid teams are standard. Every team includes both human engineers and AI agents with defined roles.
The maintenance problem gets addressed. AI tools that understand, refactor, and improve AI-generated code.
Specialized domain models. Finance, healthcare, embedded — each gets domain-specific AI models.

Long-Term (2029+)
- Intent-driven development. Describe outcomes, constraints, quality attributes. AI handles the rest.
Self-healing software. Applications that detect bugs in production and fix themselves.
The abstraction continues. The role evolves from "code author" to "system designer and quality guardian."

🔮

**The fundamental question:** AI will write an increasing share of the world's software. The question isn't whether — it's how we ensure it's secure, reliable, and maintainable. The developers who thrive will master both modes: vibe code a prototype on Saturday, architect a production system on Monday.

Conclusion
In twelve months, vibe coding went from a tweet to a dictionary entry to a multi-billion-dollar industry. Cursor alone is valued at $29.3 billion. Lovable at $6.6 billion. A vibe-coded startup sold for $80 million. GitHub Copilot has 4.7 million paid subscribers. Now, in early 2026, it has become the defining methodology of a new era in software development.
The numbers speak for themselves: Claude Code reached $1B ARR in six months. Cursor surpassed $1B ARR at a $29.3B valuation. Devin surpassed $155M ARR at a $10.2B valuation. GitHub Copilot crossed 4.7 million paid users. These are not experimental products. This is the new infrastructure of software creation.

The promise is real and accelerating: agent teams working in parallel, multimodal coding interfaces, and tools so capable that 75% of Replit's AI users write zero code themselves. The barrier between idea and working software has never been lower.

The challenges are evolving too: the open-source ecosystem faces an existential funding question, security remains a real concern with 69 vulnerabilities found across just 15 AI-built apps, and the "vibe coding hangover" of unmaintainable codebases is a documented phenomenon.

But the answer has become clear. Vibe coding is not a fad to be dismissed or a silver bullet to be worshipped. It is a powerful methodology that belongs in every developer's toolkit. The developers who thrive in 2026 and beyond will be those who master the spectrum — knowing when to vibe code a prototype on Saturday, when to collaborate with agents on Monday, and when to insist on human-reviewed engineering for the critical 20%.

The vibes are real. The exponentials are real. The opportunity is unprecedented.

Embrace the vibes. Engineer the foundations. Build the future.

← Previous Next: The Prompt Library →

Chapter 17: The Complete Prompt Library

230+ production-ready prompts for every stage of AI-native development. Updated monthly.

How to Use This Library

Each prompt is tagged with:

Difficulty: Beginner / Intermediate / Advanced / Expert
Tool: Which AI tools it works best with
Time: Expected completion time
Category: What type of work it handles

The prompts are designed to be copy-pasted directly. Customize the bracketed [sections] for your specific project.

Category 1: Project Kickoff Prompts

1.1 The Complete Spec Prompt (Expert)

Tool: Claude Code, Cursor Composer | Time: 30-60 min generation

I'm building [product name], a [type of application] for [target audience].

## Product Vision
[One-sentence description of what this product does and why it matters]

## Target Users
- Primary: [who, age range, technical skill level, key pain point]
- Secondary: [who, why they'd use it]

## Core Features (MVP - Priority Order)
1. [Feature 1]: [User story: "As a [user], I want to [action] so that [benefit]"]
2. [Feature 2]: [User story]
3. [Feature 3]: [User story]

## Data Model
- [Entity 1]: [fields and types]
- [Entity 2]: [fields and types]
- Relationships: [Entity 1] has many [Entity 2], etc.

## Design Direction
- Style: [modern/minimal/playful/corporate/brutalist]
- Color palette: [primary hex, accent hex, background]
- Typography: [sans-serif/serif/mono, reference sites]
- Layout: [single page / multi-page / dashboard / wizard]
- Responsive: [mobile-first / desktop-first / both]

## Technical Stack
- Framework: [Next.js / React / Vue / Svelte / vanilla]
- Styling: [Tailwind / CSS Modules / styled-components]
- Database: [Supabase / Firebase / localStorage / Prisma+PostgreSQL]
- Auth: [Supabase Auth / NextAuth / Clerk / none]
- Hosting: [Vercel / Netlify / Railway]

## What Success Looks Like
- A user can [core workflow] in under [N] steps
- The app loads in under [N] seconds
- [Specific measurable outcome]

## What This Is NOT
- Not a [common misunderstanding]
- Don't include [feature to avoid]
- Don't over-engineer [aspect]

Build the complete MVP. Start with the data model, then core layout, then features in priority order.

1.2 The Weekend Prototype Prompt (Beginner)

Tool: Bolt.new, Lovable, Replit Agent | Time: 15-30 min

Build a [type of app] that solves this problem: [describe the pain point in one sentence].

The main user is [who] and they need to:
1. [Core action 1]
2. [Core action 2]
3. [Core action 3]

Design: Clean and modern. Use [color] as the accent color. Dark mode preferred.
Store data in localStorage.
Make it work on mobile.

Keep it simple. I'd rather have 3 features that work perfectly than 10 that are buggy.

1.3 The "Clone This" Prompt (Intermediate)

Tool: Cursor, Claude Code | Time: 1-2 hours

Build a simplified version of [well-known app, e.g., Trello/Notion/Slack].

Include ONLY these features from the original:
1. [Feature to clone]
2. [Feature to clone]
3. [Feature to clone]

DO NOT include: [features to skip]

Match the general layout and UX patterns of the original but use your own design.
Use [tech stack]. Deploy-ready for Vercel.

Focus on making the core interaction feel as smooth as the original.

1.4 The Landing Page Prompt (Beginner)

Tool: v0, Bolt.new | Time: 15-30 min

Create a conversion-optimized landing page for [product name].

Product: [One line description]
Target audience: [Who would buy this]
Price: [Price point or "Free"]

Sections (in order):
1. Hero: Headline "[compelling headline]", subheadline "[supporting text]", CTA button "[button text]"
2. Problem: 3 pain points the audience faces
3. Solution: How the product solves each pain point (with icons or illustrations)
4. Social proof: [testimonials / stats / logos / "As seen in"]
5. Features: 3-6 key features with brief descriptions
6. Pricing: [pricing tiers if applicable]
7. FAQ: 4-5 common questions with answers
8. Final CTA: Repeat the main call-to-action

Design: Professional, trustworthy. Primary color [hex]. Lots of whitespace.
Mobile-responsive. Fast-loading (no heavy images).
Include Open Graph meta tags for social sharing.

Category 2: Feature Addition Prompts

2.1 Authentication System (Advanced)

Tool: Claude Code, Cursor | Time: 1-2 hours

Add a complete authentication system to this [framework] application.

Requirements:
- Email/password signup with email verification
- Login with session management (HTTP-only cookies, not localStorage)
- Password requirements: minimum 8 chars, 1 uppercase, 1 number, 1 special char
- "Forgot password" flow with email reset link (expires in 1 hour)
- "Remember me" option (extends session to 30 days, default is 24 hours)
- Rate limiting: max 5 failed attempts per IP per 15 minutes, then 30-min lockout
- CSRF protection on all auth forms
- Secure headers: HSTS, X-Content-Type-Options, X-Frame-Options

Auth provider: [Supabase Auth / NextAuth / Clerk / custom JWT]

Protected routes: [list routes that require auth]
Public routes: [list routes that don't require auth]

After login, redirect to [dashboard/home/previous page].
Show clear error messages for: wrong password, account not found, account locked, email not verified.

Write tests for: successful login, failed login, signup validation, session expiry, rate limiting.

2.2 Payment Integration (Advanced)

Tool: Claude Code | Time: 2-3 hours

Add [Stripe / Paddle] subscription billing to this application.

Products:
- Free tier: [what's included, usage limits]
- Pro tier: $[price]/month - [what's included]
- [Optional: Enterprise tier: $[price]/month - [what's included]]

Implementation:
1. Pricing page showing all tiers with feature comparison
2. Checkout flow: user selects plan -> [Stripe Checkout / Paddle Overlay] -> redirect to success page
3. Webhook handler for: subscription.created, subscription.updated, subscription.cancelled, invoice.payment_failed
4. User dashboard showing: current plan, next billing date, usage this period, upgrade/downgrade buttons
5. Usage tracking: count [what metric] per billing period, enforce limits on free tier
6. Graceful downgrade: when subscription cancelled, access continues until period end
7. Failed payment handling: 3 retry attempts over 7 days, then downgrade to free

Store subscription status in [Supabase / database].
Add middleware to check subscription status on protected API routes.
Show upgrade prompts when free users hit limits.

Environment variables needed:
- [STRIPE_SECRET_KEY / PADDLE_API_KEY]
- [STRIPE_WEBHOOK_SECRET / PADDLE_WEBHOOK_SECRET]
- [STRIPE_PRO_PRICE_ID / PADDLE_PRO_PRICE_ID]

2.3 Real-Time Features (Advanced)

Tool: Claude Code, Cursor | Time: 2-4 hours

Add real-time [collaboration / notifications / live updates] to this application.

What should update in real-time:
- [Specific data that changes: "new messages", "task status changes", "user presence"]

Technology: [Supabase Realtime / Socket.io / Pusher / Server-Sent Events]

Requirements:
- Changes made by User A appear for User B within [1 second / 500ms]
- Show [typing indicators / presence dots / live cursors] for active users
- Handle disconnection gracefully: show "reconnecting..." banner, auto-reconnect with exponential backoff
- Dedup messages that arrive during reconnection
- Don't poll - use persistent connections
- Fallback to polling if WebSocket connection fails

Optimize for:
- [N] concurrent users per [room / document / channel]
- Messages/updates of approximately [size] bytes each
- Mobile networks with intermittent connectivity

Show connection status indicator (green dot = connected, yellow = reconnecting, red = offline).

2.4 Search and Filter System (Intermediate)

Tool: Any | Time: 30-60 min

Add search and filtering to the [items/products/posts] list in this application.

Search:
- Full-text search across: [field 1], [field 2], [field 3]
- Debounced input (300ms delay before searching)
- Show "X results for 'query'" count
- Highlight matching text in results
- Empty state: "No results for 'query'. Try different keywords."

Filters:
- [Filter 1]: [type: dropdown/checkbox/range] with options [list options]
- [Filter 2]: [type] with options [list options]
- [Filter 3]: [type] with options [list options]
- Date range: from/to date pickers
- Sort by: [option 1 / option 2 / option 3], ascending/descending

Behavior:
- Filters combine with AND logic (search + filter1 + filter2)
- Show active filter count as badge on filter button
- "Clear all filters" button when any filter is active
- URL params reflect current filters (shareable filtered views)
- Persist last-used filters in localStorage

Performance:
- Client-side filtering for under 1000 items
- Server-side (API) filtering for larger datasets
- Show loading skeleton while filtering

Category 3: UI/UX Prompts

3.1 Dashboard Layout (Intermediate)

Tool: v0, Cursor | Time: 30-60 min

Build a dashboard layout for [application type].

Layout:
- Left sidebar: navigation menu (collapsible on mobile, icons + labels)
- Top bar: user avatar + dropdown menu, notification bell with count badge, search bar
- Main content area: responsive grid that adapts from 1 to 3 columns

Sidebar navigation items:
1. [Icon] Dashboard (home)
2. [Icon] [Section 1]
3. [Icon] [Section 2]
4. [Icon] [Section 3]
5. [Icon] Settings
6. [Icon] Help

Dashboard home shows:
- Row 1: 4 stat cards ([Metric 1]: [value], [Metric 2]: [value], etc.)
- Row 2: Main chart (line chart showing [metric] over [time period]) + recent activity feed
- Row 3: Quick actions grid (3-4 action cards with icons)

Design: [light/dark] theme. Accent color: [hex].
Use Tailwind CSS. Smooth transitions on sidebar toggle.
Mobile: sidebar becomes a hamburger drawer overlay.

3.2 Form with Validation (Beginner)

Tool: Any | Time: 15-30 min

Build a multi-step form for [purpose, e.g., "user onboarding", "job application", "event registration"].

Steps:
1. [Step name]: Fields: [field1 (type, required?), field2, field3]
2. [Step name]: Fields: [field4, field5, field6]
3. [Step name]: Review all entered data + submit button

Validation:
- Email: valid format + show error immediately on blur
- Phone: format as (XXX) XXX-XXXX as user types
- Required fields: show red border + error message
- [Custom validation]: [describe rule]

UX:
- Progress indicator showing current step (1/3, 2/3, 3/3)
- "Back" and "Next" buttons (Next disabled until current step is valid)
- "Save as draft" option (localStorage)
- Smooth slide transition between steps
- Auto-focus first field on each step
- Show success animation on submit

Accessible: proper labels, aria attributes, keyboard navigation (Tab through fields, Enter to submit).

3.3 Data Table (Intermediate)

Tool: Any | Time: 30-60 min

Build a data table component for displaying [data type, e.g., "user list", "order history", "inventory"].

Columns:
1. [Column]: [type: text/number/date/status/avatar] - [width: narrow/medium/wide]
2. [Column]: [type] - [width]
3. [Column]: [type] - [width]
4. Actions: Edit, Delete, [custom action]

Features:
- Sort by clicking column headers (asc/desc, show arrow indicator)
- Select rows with checkboxes (select all, bulk actions)
- Inline editing: click cell to edit, Enter to save, Escape to cancel
- Pagination: 10/25/50 per page selector, page numbers, total count
- Responsive: on mobile, switch to card layout (one card per row)
- Empty state: illustration + "No [items] yet. Create your first one."
- Loading state: skeleton rows while data loads

Styling: Clean borders, alternating row colors, hover highlight.
Status column: colored badges (green=active, yellow=pending, red=inactive).

Category 4: API and Backend Prompts

4.1 REST API Scaffold (Advanced)

Tool: Claude Code | Time: 1-2 hours

Build a REST API for [application] with these resources:

Resources:
1. [Resource 1, e.g., "Users"]:
   - Fields: [id, name, email, role, created_at, updated_at]
   - Endpoints: GET /api/users, GET /api/users/:id, POST /api/users, PUT /api/users/:id, DELETE /api/users/:id

2. [Resource 2]:
   - Fields: [list fields]
   - Endpoints: [list CRUD endpoints]
   - Relationships: [belongs_to Resource1, has_many Resource3]

Response format (all endpoints):
Success: { data: {...}, meta: { page, limit, total } }
Error: { error: { code: "VALIDATION_ERROR", message: "Email is required", details: [...] } }

Requirements:
- Input validation with descriptive error messages
- Pagination: ?page=1&limit=20 (default limit=20, max=100)
- Filtering: ?status=active&role=admin
- Sorting: ?sort=created_at&order=desc
- Rate limiting: 100 requests per minute per IP
- CORS configured for [allowed origins]
- Request logging (method, path, status, duration)

Auth: Bearer token in Authorization header.
- Public endpoints: [list]
- Authenticated endpoints: [list]
- Admin-only endpoints: [list]

Framework: [Next.js API routes / Express / Fastify / Hono]
Database: [Supabase / Prisma / Drizzle]

4.2 Database Schema Design (Advanced)

Tool: Claude Code | Time: 30-60 min

Design a database schema for [application type].

Entities:
1. [Entity 1]: [description of what it represents]
   - Required fields: [list]
   - Optional fields: [list]
   - Unique constraints: [list]

2. [Entity 2]: [description]
   - Fields: [list]
   - References: [Entity 1] (one-to-many / many-to-many)

Business rules:
- [Rule 1, e.g., "A user can only have one active subscription"]
- [Rule 2, e.g., "Orders must have at least one line item"]
- [Rule 3, e.g., "Soft delete for users, hard delete for sessions"]

Generate:
1. SQL migration file with CREATE TABLE statements
2. Indexes for common query patterns: [list queries, e.g., "find users by email", "get orders by date range"]
3. Row-level security policies (if Supabase)
4. Seed data: 10-20 realistic sample records per table
5. TypeScript types matching the schema

Optimize for: [read-heavy / write-heavy / balanced]
Database: [PostgreSQL / MySQL / SQLite]

Category 5: Testing and Quality Prompts

5.1 Comprehensive Test Suite (Advanced)

Tool: Claude Code | Time: 2-4 hours

Write a comprehensive test suite for this [application/module].

Testing framework: [Vitest / Jest / Playwright / Cypress]

Coverage targets:
- Unit tests: all utility functions and business logic (aim for 90%+)
- Integration tests: all API endpoints (happy path + error cases)
- Component tests: all interactive components (user events + state changes)
- E2E tests: [list 3-5 critical user flows]

For each test, include:
- Clear descriptive name: "should [expected behavior] when [condition]"
- Arrange-Act-Assert structure
- Realistic test data (not "test123" or "foo bar")
- Error case coverage (invalid input, timeout, auth failure)
- Edge cases ([list specific edge cases for this app])

Mock strategy:
- External APIs: mock with [MSW / jest.mock / vi.mock]
- Database: use [test database / in-memory / fixtures]
- Time-dependent tests: mock Date.now()
- File system: use temp directories

Run the complete suite after writing. Fix any failures.
Generate a coverage report.

5.2 Security Audit Prompt (Expert)

Tool: Claude Code | Time: 1-2 hours

Perform a security audit of this codebase. Check for:

1. Authentication & Authorization:
   - Are passwords hashed with bcrypt/argon2 (not MD5/SHA)?
   - Are sessions stored securely (HTTP-only cookies, not localStorage)?
   - Is CSRF protection implemented on state-changing requests?
   - Are API keys and secrets in environment variables (not hardcoded)?
   - Are authorization checks on every protected endpoint (not just frontend)?

2. Input Validation:
   - Is all user input validated server-side (not just client-side)?
   - Are SQL queries parameterized (no string concatenation)?
   - Is HTML output sanitized to prevent XSS?
   - Are file uploads validated (type, size, name)?
   - Are URL redirects validated against an allowlist?

3. Data Protection:
   - Is sensitive data encrypted at rest?
   - Is HTTPS enforced (HSTS headers)?
   - Are API responses filtered (no password hashes, internal IDs leaking)?
   - Is PII handled according to GDPR/CCPA requirements?
   - Are error messages generic (no stack traces to users)?

4. Infrastructure:
   - Are dependencies up to date (no known CVEs)?
   - Are security headers set (CSP, X-Frame-Options, etc.)?
   - Is rate limiting configured on auth and API endpoints?
   - Are CORS origins restricted (not "*")?
   - Are logs sanitized (no passwords or tokens in logs)?

For each issue found:
- Severity: Critical / High / Medium / Low
- Location: file path and line number
- Description: what's wrong and why it matters
- Fix: specific code change to resolve it
- Test: how to verify the fix works

Prioritize fixes by severity. Implement Critical and High fixes immediately.

Category 6: Refactoring and Optimization Prompts

6.1 Performance Optimization (Advanced)

Tool: Claude Code | Time: 1-2 hours

This application is slow. Analyze and optimize performance.

Symptoms:
- [Specific symptom: "initial page load takes 4+ seconds"]
- [Specific symptom: "scrolling is janky with 500+ items"]
- [Specific symptom: "API response takes 2+ seconds"]

Investigate and fix:
1. Bundle size: analyze with [next/bundle-analyzer or similar], remove unused dependencies, implement code splitting
2. Rendering: identify unnecessary re-renders, add React.memo/useMemo/useCallback where appropriate
3. Data fetching: implement caching, pagination, reduce payload sizes
4. Images: lazy load below-fold images, use next/image or responsive srcset, serve WebP
5. Database: add missing indexes, optimize N+1 queries, implement connection pooling
6. Network: enable gzip/brotli, set proper cache headers, minimize HTTP requests

For each optimization:
- Before: [metric measurement]
- After: [expected improvement]
- Method: [specific code change]

Run Lighthouse audit before and after. Target scores: Performance >90, Accessibility >95.

6.2 Code Cleanup (Intermediate)

Tool: Claude Code, Cursor | Time: 1-2 hours

Clean up this codebase without changing any functionality.

Tasks:
1. Remove dead code: unused imports, unreachable functions, commented-out blocks
2. Consolidate duplicated logic: find similar code patterns and extract shared utilities
3. Fix naming: rename variables/functions that don't describe their purpose
4. Organize file structure: group related files, consistent naming conventions
5. Add TypeScript types: replace 'any' with proper types, add interfaces for data shapes
6. Fix linting issues: run [ESLint / Prettier] and fix all warnings/errors
7. Update dependencies: check for outdated packages, update non-breaking versions
8. Add JSDoc comments to exported functions (not internal helpers)

Rules:
- Make small, focused commits (one type of change per commit)
- Run tests after each change to ensure nothing breaks
- Don't refactor code that has pending changes or open PRs
- Keep the diff readable: don't auto-format unrelated files

Category 7: Deployment and DevOps Prompts

7.1 Production Deployment Checklist (Advanced)

Tool: Claude Code | Time: 1-2 hours

Prepare this application for production deployment on [Vercel / AWS / Railway].

Pre-deployment checklist:
1. Environment variables: create .env.example with all required vars (no values), verify all are set in [hosting platform]
2. Error tracking: set up [Sentry / LogRocket / Bugsnag] for runtime error monitoring
3. Analytics: add [Vercel Analytics / Google Analytics / Plausible] for usage tracking
4. SEO: verify meta tags, Open Graph, Twitter cards, sitemap.xml, robots.txt
5. Performance: run Lighthouse, fix any scores below 80
6. Security: run npm audit, fix critical/high vulnerabilities, verify security headers
7. Database: verify connection pooling, set up backups if applicable
8. Caching: configure CDN caching headers, implement stale-while-revalidate for API routes
9. Monitoring: set up uptime monitoring (e.g., UptimeRobot, Checkly)
10. Domain: configure custom domain, SSL, www redirect

Create a deployment script or CI/CD pipeline that:
- Runs tests
- Runs linter
- Builds the application
- Deploys to [platform]
- Runs smoke tests against the deployed URL
- Notifies [Slack / Discord / email] on success/failure

Category 8: AI Agent Orchestration Prompts (Expert)

8.1 Multi-Agent Task Decomposition

Tool: Claude Code (subagents) | Time: 2-4 hours

I need to [describe large task, e.g., "add a complete user profile system with settings, avatar upload, activity history, and notification preferences"].

Decompose this into subtasks that can be worked on in parallel:

1. Data layer: schema changes, migrations, API endpoints
2. UI components: form components, display components, layouts
3. Business logic: validation rules, permission checks, notification triggers
4. Tests: unit tests, integration tests, E2E tests

For each subtask:
- Define the interface/contract (inputs, outputs, data shapes)
- List dependencies on other subtasks
- Identify which can run in parallel vs. must be sequential

Then implement each subtask, integrating them at the defined interfaces.
Run the full test suite after integration to catch any contract mismatches.

8.2 Codebase Analysis and Improvement Plan

Tool: Claude Code | Time: 1-2 hours

Analyze this entire codebase and create an improvement plan.

Evaluate:
1. Architecture: Is the structure scalable? Are concerns properly separated?
2. Code quality: Consistency, readability, duplication, complexity (cyclomatic)
3. Error handling: Are errors caught, logged, and presented well?
4. Testing: Coverage, quality of tests, missing edge cases
5. Security: Common vulnerabilities (OWASP Top 10 applicable ones)
6. Performance: Obvious bottlenecks, missing optimizations
7. Developer experience: Build time, hot reload, debugging ease

Output:
- Score each category 1-10 with specific evidence
- Top 5 improvements ranked by impact/effort ratio
- Specific action items for each improvement
- Estimated time for each action item

Don't fix anything yet. Just analyze and plan.

Category 9: Content and Data Prompts

9.1 Seed Data Generator (Beginner)

Tool: Any | Time: 15-30 min

Generate realistic seed data for this application.

Data needed:
- [N] [entity type, e.g., "users"] with: [fields]
- [N] [entity type, e.g., "products"] with: [fields]
- [N] [entity type, e.g., "orders"] with: [fields]

Rules:
- Use realistic names (not "Test User 1")
- Dates spread across the last [time period]
- Prices/amounts in realistic ranges for [industry]
- Status distribution: [e.g., "60% active, 30% pending, 10% cancelled"]
- Include edge cases: [e.g., "one user with no orders, one product with 0 stock"]
- Relationships should be consistent (orders reference real user IDs and product IDs)

Output format: [JSON / SQL INSERT statements / TypeScript constants / CSV]

9.2 API Documentation Generator (Intermediate)

Tool: Claude Code | Time: 30-60 min

Generate comprehensive API documentation for all endpoints in this application.

For each endpoint, document:
- Method and path (e.g., GET /api/users/:id)
- Description (one sentence)
- Authentication required? (yes/no, what type)
- Request: headers, query params, body schema with types and validation rules
- Response: status codes, body schema for success and each error case
- Example request (curl command)
- Example response (JSON)

Format: [Markdown / OpenAPI 3.0 spec / Swagger]
Include a table of contents.
Group endpoints by resource.
Add rate limiting info if applicable.

Category 10: Platform-Specific Prompts

10.1 Chrome Extension (Advanced)

Tool: Claude Code | Time: 2-4 hours

Build a Chrome Extension (Manifest V3) that [core functionality].

Features:
- Popup: [describe popup UI and what it shows]
- Content script: [what it does on web pages, e.g., "highlights [elements]"]
- Background service worker: [what it handles, e.g., "API calls, storage sync"]
- Options page: [settings the user can configure]

Permissions needed: [activeTab, storage, tabs, etc. - minimize permissions]

Storage:
- Use chrome.storage.sync for: [settings that sync across devices]
- Use chrome.storage.local for: [data that stays local]

Communication:
- Content script <-> Background: chrome.runtime.sendMessage
- Popup <-> Background: direct access to chrome.storage

Include:
- manifest.json with all required fields
- Icon set (16x16, 48x48, 128x128) - use simple colored SVG converted to PNG
- README with installation instructions (load unpacked)
- Privacy policy text (required for Chrome Web Store submission)

Test on these sites: [list 3-5 target websites]

10.2 CLI Tool (Intermediate)

Tool: Claude Code | Time: 1-2 hours

Build a command-line tool in [Node.js / Python / Go / Rust] that [core functionality].

Commands:
- [tool] init: [what it sets up]
- [tool] [command 1] [args]: [what it does]
- [tool] [command 2] [args]: [what it does]
- [tool] --help: show all commands with descriptions

Features:
- Colored output (green for success, red for errors, yellow for warnings)
- Progress bars for long operations
- Interactive prompts for required input (with defaults)
- Config file (~/.toolrc or .toolrc in project root)
- --verbose flag for debug output
- --json flag for machine-readable output
- Meaningful exit codes (0 success, 1 error, 2 usage error)

Error handling:
- Clear error messages with suggested fixes
- Never show stack traces (unless --verbose)
- Graceful handling of Ctrl+C

Package for distribution via [npm / pip / brew / cargo].
Include README with installation, usage examples, and config reference.

Prompt Patterns Reference Card

The Constraint Sandwich

Do [action].
Include: [must-have list]
Do NOT include: [exclusion list]
Match existing: [patterns/styles to follow]

The Iterative Refinement

[After seeing initial output]
Keep: [what works]
Change: [what needs to change]
Add: [what's missing]
Remove: [what's unnecessary]
Don't touch: [what shouldn't change]

The Context Dump

Here's the current state:
- File: [path] does [function]
- File: [path] does [function]
- The bug is in: [location]
- Error message: [exact text]
- This worked before I: [recent change]
- I've already tried: [attempts]
Fix the bug without changing [protected areas].

The Scope Lock

ONLY modify [specific files/functions].
Do NOT touch: [protected files]
Do NOT change: [protected behavior]
Do NOT add: [unwanted additions]
Keep the diff as small as possible.

The Quality Gate

Before considering this done:
1. All existing tests pass
2. New tests cover: [specific scenarios]
3. No TypeScript errors (strict mode)
4. No ESLint warnings
5. Lighthouse performance score > [N]
6. [Custom quality criterion]

March 2026 Additions: Autonomous Mode Prompts

New prompts for Claude Code Auto Mode, MCP workflows, and agentic build patterns.

The Auto Mode Task Brief (Expert)

Tool: Claude Code (Auto Mode enabled) | Time: Runs unattended 15-120 min

Use this when handing a scoped task to Claude Code in Auto Mode. The structure defines scope, acceptance criteria, and what Claude should NOT touch — so the autonomous run has clear boundaries.

# Task: [Brief title]

## Scope
Working directory: [path]
Files allowed to modify: [list or glob pattern]
Files that must NOT change: [list — tests, migrations, config, etc.]

## Objective
[One sentence: what should be different when you're done]

## Acceptance Criteria
- [ ] [Specific, testable outcome 1]
- [ ] [Specific, testable outcome 2]
- [ ] All existing tests still pass
- [ ] No TypeScript errors (strict)
- [ ] No new ESLint warnings

## What This Is NOT
- Do not refactor unrelated code
- Do not add features beyond the objective
- Do not modify [specific protected area]

## Summary at End
When complete, write a brief summary of:
1. Every file changed and why
2. Any decisions you made and the tradeoff
3. Anything you're uncertain about
4. Tests I should run to verify

Why it works: The summary request at the end transforms Auto Mode from "black box" to "async colleague" — you wake up to a log of decisions, not just a diff.

The Claude Code Channels Handoff (Advanced)

Tool: Claude Code + Channels (Telegram/Discord integration) | Time: N/A — async coordination

Claude Code Channels (March 2026) lets you send instructions to a running Claude Code session from your phone. Use this prompt structure to create async checkpoints that Claude will pause for:

## Background Task with Mobile Checkpoints

Start the following task: [task description]

## Checkpoint Rules
Pause and send me a Telegram message at these points:
1. After completing the initial analysis — summarize what you found
2. Before any destructive action (delete, drop, overwrite) — describe it and wait
3. If you hit a blocker you can't resolve — describe the issue
4. When complete — summary of all changes

## Proceed autonomously between checkpoints.
Do not pause for routine read/write/test operations.

Why it works: You define the decision points where human judgment matters, and let Claude handle the execution in between. Run overnight builds and get Telegram pings when action is needed.

The Security Scope Guard (Advanced)

Tool: Claude Code (any mode) | Time: Prepend to any task involving auth, payments, or data

Add this as a preamble whenever Claude Code will touch security-sensitive code. It activates extra caution without requiring manual review of every action:

## Security Scope Guard — Activate Before This Task

This task involves security-sensitive code: [auth / payments / user data / API keys]

Before every change to [auth / payment / data] files:
1. State what vulnerability pattern you are avoiding
2. Confirm input validation is present
3. Confirm secrets are not hardcoded
4. Confirm error messages don't leak internal state

Never:
- Log authentication tokens or session IDs
- Return detailed error messages to the client
- Use string concatenation in SQL queries
- Disable CORS for any reason
- Store credentials in localStorage

If you see existing code that violates the above: flag it in your summary, do not silently fix it (I need to know it existed).

Now proceed with: [actual task]

Why it works: Security reviews after the fact miss context. This prompt embeds security review into the generation loop — Claude checks each change against the rules as it writes, not after.

Category 26: MCP Integration Prompts (Added March 2026)

Model Context Protocol (MCP) is now the standard way to give AI coding assistants persistent context and tool access. These prompts help you integrate MCP correctly.

26.1 MCP Server Setup Prompt (Intermediate)

Tool: Claude Code | Time: 30-60 min

Set up an MCP (Model Context Protocol) server for my project that exposes the following tools to AI assistants:

## Tools to Expose
1. [Tool 1 name]: [what it does — e.g., "read_project_data: reads the projects.json registry"]
2. [Tool 2 name]: [what it does — e.g., "run_health_check: pings all deployment URLs"]
3. [Tool 3 name]: [what it does — e.g., "get_recent_errors: reads the last 50 error log lines"]

## Implementation Requirements
- Use the @modelcontextprotocol/sdk package
- Implement as stdio transport (not HTTP) for local use
- Each tool must have a clear JSON schema for inputs
- Each tool must return structured JSON output
- Add error handling that returns helpful error messages, not stack traces
- Include a test script that exercises each tool

## Configuration
Generate the MCP configuration block for claude_desktop_config.json:
{
  "mcpServers": {
    "[server-name]": {
      "command": "node",
      "args": ["path/to/server.js"]
    }
  }
}

## Context This Will Enable
When this MCP server is active, an AI assistant will be able to [describe what new capabilities this enables for your workflow].

Build the complete MCP server. Start with the tool definitions, then the handlers, then the test script.

26.2 Claude Code MCP Context Prompt (Advanced)

Tool: Claude Code | Time: 15 min

I'm setting up a project-level MCP context file so Claude Code has persistent context about my project without me having to re-explain it every session.

Create a CLAUDE.md file that covers:

## Project Identity
- Name: [project name]
- Purpose: [one sentence]
- Stack: [tech stack]
- Current status: [active development / maintenance / paused]

## Key Files and Their Purpose
- [file path]: [what it contains and when to read it]
- [file path]: [what it contains and when to read it]

## Commands
- Build: [command]
- Dev server: [command]
- Test: [command]
- Deploy: [command]

## Architecture Decisions That Are NOT Up for Discussion
- [Decision 1]: [why it was made — do not suggest alternatives]
- [Decision 2]: [why it was made]

## Known Issues (Don't Re-Investigate)
- [Issue 1]: [known limitation, not a bug to fix]

## My Workflow
- I prefer [file-by-file / whole-feature] implementations
- Always [run tests / lint / build] before marking a task done
- When in doubt, [ask / make conservative choice / make opinionated choice]

Make the CLAUDE.md scannable and under 200 lines.

26.3 Next.js Secure Middleware Pattern (Intermediate) (Security-critical — post-CVE-2025-29927)

Tool: Claude Code, Cursor | Time: 20 min

Add authentication to my Next.js app using the secure dual-layer pattern (required post-CVE-2025-29927).

## Protected Routes
- /dashboard/:path* — requires authenticated user
- /api/protected/:path* — requires authenticated user, returns 401 JSON (not redirect)
- /admin/:path* — requires authenticated user with admin role

## Auth Provider
I'm using: [NextAuth v5 / Supabase Auth / Clerk / custom JWT]

## Implementation Rules
1. Middleware ONLY for UX redirects (fast redirect to /login for protected pages)
2. Every /api/protected route MUST verify the session server-side independently
3. NEVER rely on middleware as the sole auth gate for API routes
4. Include the x-middleware-subrequest header strip check as a comment

## Pattern to Implement
For each protected API route:
\`\`\`typescript
// DO NOT rely on middleware alone — verify here
const session = await getServerSession(authOptions)
if (!session) {
  return NextResponse.json({ error: 'Unauthorized' }, { status: 401 })
}
\`\`\`

Generate:
1. middleware.ts with the correct matcher config and a comment explaining it is NOT a security boundary
2. A shared auth utility function (lib/auth-guard.ts) that API routes can call
3. One example protected API route using the utility
4. A test that verifies the API route returns 401 when no session exists

Category 27: Multi-Agent Orchestration Prompts (Cursor 3 / Claude Code Teams)

Added April 7, 2026 — covering the new parallel multi-agent workflows enabled by Cursor 3's Agents Window and Claude Code's Teams feature.

27.1 The Agent Task Decomposer (Advanced)

Tool: Cursor 3 Agents Window, Claude Code | Time: 5 min setup → autonomous execution

Use this prompt to break a large feature into parallelizable agent tasks before opening the Agents Window.

I need to implement [feature name] in my [type of app].

Decompose this into parallel agent tasks using this format:
- Each task must be completable in under 30 minutes
- Tasks must have clear success criteria (how to verify it's done)
- Identify dependencies (which tasks must complete before others can start)
- Assign a suggested agent focus for each (e.g., "backend agent", "test agent", "UI agent")

Feature to decompose:
[Describe the feature in 3-5 sentences. Include: what it does, the data it uses, and any API/external integrations.]

Output format:
## Agent Task Plan
### Wave 1 (parallel, no dependencies)
- Task A [Agent role]: [Goal] | Success: [How to verify] | Files: [which files/modules]
- Task B [Agent role]: [Goal] | Success: [How to verify] | Files: [which files/modules]

### Wave 2 (depends on Wave 1)
- Task C [Agent role]: [Goal] | Success: [How to verify] | Depends on: [Task A output]

27.2 The Single Agent Task Charter (Intermediate)

Tool: Cursor 3 Agents Window, Claude Code | Time: 2 min per agent

Paste this into each individual agent in the Agents Window to give it a focused, well-bounded mission.

## Agent Charter

**Role**: [Backend Engineer / Frontend Developer / QA Engineer / Security Reviewer / Docs Writer]
**Mission**: [One sentence: what this agent will produce]
**Scope**: [Specific files, modules, or directories this agent is allowed to touch]
**Off-limits**: [Files/systems this agent must not modify]

**Success Criteria** (all must be true when you're done):
1. [Specific, verifiable outcome]
2. [Specific, verifiable outcome]
3. Tests pass: [which test command to run]

**Handoff**: When complete, write a summary to `agent-handoff-[role].md` covering:
- What you built
- Any decisions you made and why
- What the next agent needs to know
- Any concerns or edge cases you noticed

**Context**: [Brief description of the larger feature this fits into]

Do not interrupt me unless you are truly blocked. Make reasonable decisions independently.

27.3 The Multi-Agent Review Prompt (Advanced)

Tool: Cursor 3 Agents Window, Claude Code | Time: 10-15 min supervised execution

Use this to spin up a dedicated review agent that audits another agent's output before you merge it.

## Review Agent Mission

You are a senior code reviewer. You did NOT write the code you are reviewing.

**Author agent**: [which agent produced this code, e.g., "Backend Agent — implemented the payment webhook handler"]
**Files to review**: [list the files]
**Success criteria of the original task**: [paste the success criteria from the original agent's charter]

Your review checklist:
1. **Correctness**: Does the code do what the task charter required?
2. **Edge cases**: What inputs could break this? (empty arrays, null values, concurrent requests, network failures)
3. **Security**: Any injection risks, missing auth checks, exposed secrets, or unvalidated inputs?
4. **Performance**: Any N+1 queries, missing indexes, synchronous blocking calls, or memory leaks?
5. **Tests**: Are the tests meaningful? Do they cover the stated success criteria?
6. **Handoff quality**: Is the agent-handoff file accurate and useful for downstream agents?

Output a structured review:
## Review Summary
**Overall verdict**: APPROVE / REQUEST_CHANGES / BLOCK
**Confidence**: High / Medium / Low

### Issues Found
| Severity | File | Line | Issue | Suggested Fix |
|----------|------|------|-------|---------------|
| CRITICAL | ... | ... | ... | ... |

### Approved Items
[What the agent did well — be specific]

### Required Changes Before Merge
[Numbered list if verdict is REQUEST_CHANGES or BLOCK]

Category 28: Long-Horizon Agentic Execution (April 2026)

For GLM-5.1, Claude Code, Cursor Automations, and any AI agent running 2+ hour autonomous sessions. These prompts help you structure work that outlasts your attention span.

28.1 The Long-Horizon Task Brief (Advanced)

Tool: GLM-5.1, Claude Code, Cursor Automations | Time: 30 min setup → hours of autonomous execution

Use this before starting any AI session you expect to run longer than 30 minutes. A clear brief prevents the model from drifting, making scope-creep decisions, or silently failing.

## Long-Horizon Task Brief

**Session goal** (one sentence):
[What is complete when this session ends?]

**Time budget**: [How many hours should the agent spend before stopping to check in?]

**In scope**:
- [Feature/file/system 1]
- [Feature/file/system 2]

**Out of scope** (hard limits):
- Do NOT modify [file/system] — read-only
- Do NOT delete anything — create new files only
- Do NOT push to main — commit to branch only

**Checkpointing** (every N hours):
Write a checkpoint file at `agent-checkpoint-[timestamp].md` containing:
1. What has been completed
2. Current task in progress
3. Known blockers or unresolved decisions
4. What remains to complete the session goal

**Success criteria** (all must be true at session end):
1. [Verifiable outcome — test command, file exists, URL responds, etc.]
2. [Verifiable outcome]
3. All code compiles with zero TypeScript errors (`npm run build`)
4. All existing tests still pass (`npm test`)

**How to handle blockers**:
- If blocked by a missing env var → note it in the checkpoint file and skip that feature
- If blocked by an ambiguous requirement → make a reasonable assumption, document it in the checkpoint, and continue
- If blocked by a breaking error → stop, write a blocker-report.md, and halt the session

Begin with a brief plan (3-5 bullet points), then execute.

28.2 The Open-Weight Model Selection Prompt (Intermediate)

Tool: Any LLM with web access or knowledge cutoff April 2026 | Time: 5 min

Use this when evaluating whether to use a self-hosted open-weight model vs. a closed API for a specific project.

I need to choose between a self-hosted open-weight model and a closed API for the following use case:

**Use case**: [Describe what the AI will be doing — code completion, autonomous agents, document analysis, etc.]

**Constraints**:
- Data sensitivity: [Public / Internal / Confidential / Regulated (HIPAA, SOC2, etc.)]
- Budget: [Monthly cap in USD, or "no limit"]
- Latency requirement: [< 500ms / < 2s / batch OK]
- Infrastructure: [Consumer hardware / cloud GPU / on-prem enterprise cluster]
- Team size: [Solo / small team / enterprise]
- Vendor lock-in tolerance: [Low / Medium / High]

**Open-weight models to evaluate** (as of April 2026):
- GLM-5.1 (754B, Z.AI) — SOTA SWE-Bench Pro, 8-hour autonomous sessions, Apache 2.0
- Gemma 4 (Google, Apache 2.0) — 4 sizes, strong reasoning and coding
- Llama 3.x (Meta) — broad ecosystem, widely deployed
- Qwen3.6-Plus — 1M context, competitive with Claude 4.5 on coding tasks

**Closed APIs to evaluate**:
- Claude Sonnet 4.6 (Anthropic API) — best agentic coding, $3/$15 per MTok
- GPT-4o (OpenAI) — broad capability, strong ecosystem
- Gemini 1.5 Pro (Google) — 1M context, competitive pricing

For each candidate, evaluate:
1. Does it meet my latency requirement?
2. Does it meet my data sensitivity requirement?
3. What is the estimated monthly cost at my usage level?
4. What are the known failure modes for my use case?

Recommend the best option and explain the trade-offs I'm accepting.

28.3 The Goose/Local-Agent Workflow Prompt (Intermediate)

Tool: Goose (Block), any LLM-agnostic local AI agent | Time: 10 min setup

Goose (launched April 2026 by Block) is an open-source local AI agent that supports any LLM backend and executes real actions: install packages, run tests, modify files, call APIs. This prompt structure is designed for Goose-style action-oriented agents.

## Goose Task: [Short task name]

**Objective**: [One sentence describing the complete state when this task is done]

**LLM backend**: [claude-sonnet-4-6 / glm-5.1 / gpt-4o / gemma-4 — whichever you're using]

**Allowed actions**:
- Read and write files in: [path/to/project]
- Run shell commands: [list safe commands, e.g., npm test, npm run build, git status]
- Install packages: [yes/no — if yes, list approved package registries]
- Make HTTP requests to: [list allowed external APIs, e.g., "GitHub API only"]

**Prohibited actions** (hard stops — do not proceed if any of these are required):
- git push (never push without human review)
- rm -rf or destructive filesystem operations
- Modify files outside [path/to/project]
- Access [sensitive-system]

**Context files** (read these before starting):
- [path/to/README.md]
- [path/to/relevant-config.json]

**Task steps** (ordered):
1. [First action]
2. [Second action, may depend on output of step 1]
3. Verify: run [test command] and confirm output matches [expected output]

**Output**: When done, write `goose-task-complete.md` with:
- Actions taken (with file paths and commands run)
- Test results
- Any assumptions made
- Any issues encountered

Start immediately. Do not ask for clarification unless truly blocked.

Category 29: Claude Sonnet 4.6 — 1M Context & Agentic Search Prompts (April 2026)

Claude Sonnet 4.6 introduced two capabilities that change how you structure prompts: a 1M token context window (beta) and GA web search/web fetch with code-execution-based result filtering. These prompts exploit both.

29.1 The Whole-Codebase Refactor Prompt (Expert)

Tool: Claude Sonnet 4.6 via API or Claude Code | Context required: 200K–1M tokens

With the 1M context window, you can load an entire medium-sized codebase and ask for architectural analysis without chunking. This works for repositories up to ~150K lines.

## Codebase Refactor Brief

**Repository**: [project-name]
**Goal**: [Specific refactor objective — e.g., "migrate from Pages Router to App Router", "replace all class components with hooks", "extract shared utilities from duplicated code"]
**Constraints**:
- Do not change external API contracts (public-facing routes must remain the same)
- All existing tests must pass after refactor
- Prefer surgical changes over rewrites

**Files loaded below** (entire codebase follows in this message):
[Paste full codebase or use file upload — Claude Sonnet 4.6 handles up to 1M tokens]

**Output requested**:
1. A prioritized list of refactor changes (most impactful first)
2. For each change: which files are affected, what changes, and estimated risk level (low/medium/high)
3. A proposed commit sequence (small atomic commits, safest order)
4. Any architectural concerns that would block this refactor

Do NOT generate code yet — produce the analysis and plan first. I will confirm before implementation begins.

29.2 The Research-Then-Build Prompt (Intermediate)

Tool: Claude Sonnet 4.6 (web search GA) | Time: 15–30 min

Sonnet 4.6's web search and web fetch are GA, with dynamic result filtering via code execution. This prompt chains research directly into implementation — no context-switching between browser and editor.

## Research-Then-Build Task

**What I'm building**: [Short description — e.g., "a rate limiter middleware for my Next.js API routes"]

**Research phase** (do this first — use web search):
1. Search for: "[topic] best practices [current year]"
2. Fetch the top 2–3 relevant documentation pages
3. Identify: (a) the standard pattern, (b) common failure modes, (c) security considerations
4. Write a 3-bullet summary of your findings before writing any code

**Build phase** (only after research summary is written):
- Implement [feature] based on your findings
- Follow the standard pattern you identified
- Add defensive handling for the top failure mode
- Include a comment linking to the primary source used

**Validation**:
- Re-fetch [relevant documentation URL] and confirm your implementation aligns
- Note any deviations and explain why

Start with the research phase. Do not write code until research summary is complete.

29.3 The Extended-Thinking Architecture Decision Prompt (Advanced)

Tool: Claude Sonnet 4.6 with extended thinking | Time: 5 min prompt, 10–20 min thinking

Extended thinking gives the model more compute budget before it commits to an answer. Use this for architecture choices where a wrong call means weeks of rework.

## Architecture Decision Request

**Decision to make**: [e.g., "Should I use Supabase Realtime or polling for my live dashboard?"]

**Context**:
- System: [Brief description]
- Scale: [Expected users/requests in 6 months]
- Team: [Solo / small / larger]
- Constraints: [Budget, latency, existing stack, migration costs]
- Timeline: [When must you ship?]

**What I've already considered**:
- Option A: [First option] — I think this because [reasoning]
- Option B: [Second option] — I think this because [reasoning]
- What I'm unsure about: [Specific uncertainty]

**What I need**:
1. Evaluate both options against my specific constraints (not generic trade-offs)
2. Identify what I'm missing or wrong about in my reasoning
3. Recommend one option with confidence level (high/medium/low) and what would change your recommendation
4. Give me the one question I should answer before committing

Take your time — a slow, thorough answer beats a fast, wrong one.

Category 30: April 2026 — Agent Framework, Security Audit & Parallel Fleet Prompts

Three new workflows unlocked by the April 2026 AI tooling wave: Microsoft Agent Framework 1.0 multi-agent orchestration, Claude Mythos-style security audit chaining, and Cursor 3 parallel agent fleet management.

30.1 The Microsoft Agent Framework 1.0 Orchestration Prompt (Advanced)

Tool: Microsoft Agent Framework 1.0 (.NET or Python), Claude Code | Time: 30–60 min setup

Agent Framework 1.0 ships with A2A and MCP protocol support, enabling cross-runtime agent interoperability. Use this prompt to design multi-agent workflows that span different AI providers without lock-in.

## Multi-Agent Workflow Design Request

**Workflow goal**: [What the agent system should accomplish end-to-end — e.g., "receive a GitHub issue, research the codebase, implement a fix, open a PR, and notify Slack"]

**Agents needed** (describe each):
- Agent 1: [Name + responsibility + which model/provider — e.g., "Researcher — Claude Sonnet 4.6 — reads codebase and clarifies requirements"]
- Agent 2: [Name + responsibility + which model/provider]
- Agent 3: [Name + responsibility + which model/provider]

**Coordination protocol**: A2A (agent-to-agent messages) | MCP (tool calls to shared context) | Both
**Runtime**: .NET | Python | Both

**State management**:
- Shared state that all agents need: [list]
- State private to each agent: [list]
- How agents hand off work: [event-driven / polling / direct call]

**Error handling**:
- If Agent 1 fails: [retry / fail pipeline / route to human]
- If Agent 2 fails: [behavior]
- Maximum retries per agent: [N]

**Output required**:
1. Agent architecture diagram (ASCII or described)
2. Agent Framework 1.0 code scaffold for each agent class
3. The A2A message schema for agent handoffs
4. The MCP tools each agent needs registered
5. DevUI configuration for browser-based debugging

Generate the scaffold. I will fill in the business logic per agent.

30.2 The AI Security Audit Chain Prompt (Expert)

Tool: Claude Sonnet 4.6 or Claude Code with CyberOS MCP | Time: 20–40 min per codebase

Inspired by Claude Mythos / Project Glasswing's defensive security workflow — systematically chain vulnerability discovery, triage, and remediation across a codebase without missing surface area.

## AI-Powered Security Audit — Systematic Chain

**Codebase**: [Repo path or paste content]
**Stack**: [e.g., Next.js 14 + Supabase + Stripe + Python FastAPI backend]
**Deployment**: [Vercel + AWS Lambda | Self-hosted | Cloud provider]
**Compliance scope**: [OWASP Top 10 | SOC 2 | PCI-DSS | All]

## Phase 1 — Attack Surface Map
List every:
- Public HTTP endpoint (method + path + auth required)
- Data input point (form, query param, file upload, webhook)
- Third-party integration (API calls out, webhooks in)
- Secret/credential usage point

Do not analyze yet. Only map. Output as a numbered list.

## Phase 2 — Vulnerability Scan
For each item on the attack surface map, check for:
- Injection (SQL, command, SSRF, path traversal)
- Authentication/authorization bypass
- Sensitive data exposure (secrets in logs, responses, or error messages)
- Cryptographic weaknesses (weak ciphers, padding oracle, hardcoded keys)
- Supply chain risks (mutable version references, unverified dependencies)

Classify each finding: CRITICAL / HIGH / MEDIUM / LOW / INFO
Include CWE ID and the exact file:line where the issue exists.

## Phase 3 — Remediation Plan
For each CRITICAL and HIGH finding:
1. Explain the vulnerability in one sentence
2. Write the fixed code (before/after diff)
3. Explain why the fix works

## Phase 4 — Verification
After remediations are applied:
- Re-scan the attack surface for the patched items
- Confirm no new vulnerabilities were introduced by the fix
- Output a signed-off list: [finding] → [status: FIXED / PARTIALLY FIXED / DEFERRED]

Start with Phase 1. Do not proceed to Phase 2 until I confirm the attack surface map is complete.

30.3 The Cursor 3 Parallel Agent Fleet Prompt (Advanced)

Tool: Cursor 3 Agents Window | Time: 5 min to launch, 30–120 min execution

Cursor 3's Agents Window lets you run multiple AI agents simultaneously across local, SSH, and cloud environments. This prompt template structures how to decompose work across a fleet efficiently so agents don't conflict.

## Parallel Agent Fleet Assignment

**Project**: [Brief description of the codebase]
**Goal**: [What needs to be accomplished — e.g., "ship the user dashboard feature including data layer, UI components, tests, and documentation"]

**Fleet decomposition** (define independent workstreams that can run in parallel):

Agent A — [Name: e.g., "Data Layer"]
- Scope: [Specific files/directories this agent owns]
- Task: [Exact work to do]
- Output: [What it should produce — e.g., "implemented API routes with tests passing"]
- Dependencies: [What it needs before starting — e.g., "database schema must exist"]
- Must NOT touch: [Files/areas that are other agents' scope]

Agent B — [Name: e.g., "UI Components"]
- Scope: [...]
- Task: [...]
- Output: [...]
- Dependencies: [...]
- Must NOT touch: [...]

Agent C — [Name: e.g., "Tests & Docs"]
- Scope: [...]
- Task: [...]
- Output: [...]
- Dependencies: [Agent A and B PRs merged]
- Must NOT touch: [...]

**Conflict prevention**:
- Shared files that multiple agents might edit: [list them — these need explicit ownership]
- Owner of package.json / lock file: [Agent A | Agent B | None — freeze during parallel work]
- Owner of shared types/interfaces: [which agent defines, others consume]

**Review order**:
1. Review Agent A output first
2. Review Agent B output (may depend on A's types)
3. Review Agent C output last (depends on both)

**Launch in the Agents Window**: Open one agent session per row above. Paste the Agent-specific block into each session. Start all simultaneously.

This library is updated monthly with new prompts based on emerging tools, patterns, and reader requests. Last updated: April 14, 2026. Added: Category 31 (AI Agent Payments, Session Context Briefs, Generated Code Security Review). Previous: Category 30 (Agent Framework 1.0 orchestration, AI security audit chain, Cursor 3 parallel fleet management, April 13). Category 29 (Claude Sonnet 4.6 — 1M Context & Agentic Search Prompts, April 10). Category 28 (Long-Horizon Agentic Execution, April 9). Category 27 (Multi-Agent Orchestration, April 7). Category 26 (MCP Integration, March 31).

Category 31: April 2026 — AI Agent Payments, Session Context & Security Review

Three new prompt patterns emerging from the Claude Code creator workflow reveal and x402 protocol adoption.

31.1 The AI Agent Payment Integration Prompt (Advanced)

Tool: Claude Code, Cursor | Time: 2-4 hours | Category: Emerging Patterns

Context: Coinbase's x402 protocol enables AI agents to make autonomous payments. As of April 2026, this is becoming a real workflow pattern — agents that call APIs, pay for compute, and operate economically without human authorization for each transaction.

I'm building an AI agent that needs to make autonomous payments using the 
Coinbase x402 protocol / [payment protocol].

## Agent Context
- Agent type: [coding assistant / research agent / deployment bot]
- Payment ceiling per action: $[amount]
- Allowed payment recipients: [API services, infrastructure providers]
- Forbidden: [payments to unknown wallets, amounts over $X]

## What I Need
1. Integrate x402 payment headers into the agent's HTTP client
2. Implement a payment budget tracker that halts the agent when the daily/session 
   ceiling is hit
3. Add a payment audit log (what was paid, when, to whom, why)
4. Implement human-approval gates for payments above $[threshold]
5. Handle x402 402 Payment Required responses gracefully

## Safety Requirements
- Never pay from the agent wallet without logging first
- Require cryptographic receipts for all payments
- Alert human operator if payment velocity exceeds [N] transactions/minute
- Reject any payment request that doesn't match the allowed-recipient list

Build the payment client and budget tracker first, then integrate into the 
existing agent loop.

Use when: Building economic agents, autonomous task runners that consume paid APIs, or testing the x402 payment stack.

Security note: Always implement human approval gates for amounts above $1 in production. See Chapter 10 for AI agent attack surfaces.

31.2 The Session Context Brief Generator (Beginner)

Tool: Claude Code, Cursor, Windsurf | Time: 5 minutes | Category: Workflow

This prompt generates a reusable session brief from your current codebase state. Run it at the start of every Claude Code session to give the AI full context before any task.

I need you to generate a session brief for this codebase. Read the following 
and produce a structured brief I can paste at the start of future sessions:

## Please Analyze
- The overall architecture (what framework, what database, what auth)
- The current state (what works, what's broken based on TODO comments and errors)
- The key files that any feature touching [feature area] would need to know about
- Any explicit constraints in CLAUDE.md or README that I shouldn't violate
- The tech debt or known issues I should steer around

## Output Format
Produce a brief in this format:
---
## Session Brief — [Date]
**Stack**: [framework, database, auth, hosting]
**What's working**: [bullet list]
**What's broken / in-progress**: [bullet list]
**Key files for [feature area]**: [file paths with one-line description each]
**Constraints to respect**: [rules from CLAUDE.md / README]
**Steer around**: [known issues, fragile code, don't-touch zones]
---

Keep it under 400 words so it fits in a context window preamble.

Use when: Starting any Claude Code session, onboarding to a new codebase, or after a long break from a project.

Why it works: A 5-minute brief prevents 30-60 minutes of context-building drift. Claude Code performs significantly better when it knows the full codebase state upfront.

31.3 The Generated Code Security Review Prompt (Intermediate)

Tool: Claude Code, Cursor | Time: 10-15 minutes | Category: Security

After generating a significant block of code, use this prompt to run a security review before accepting the change. Especially important for authentication flows, API handlers, and any code that touches user data.

Review the following generated code for security vulnerabilities. 

## Code to Review
[paste generated code here]

## Review Checklist
Check specifically for:
1. **Injection vulnerabilities**: SQL injection, command injection, path traversal
2. **Authentication gaps**: Missing auth checks, broken access control
3. **Input validation**: Unvalidated user input reaching sensitive operations
4. **Secret exposure**: Hardcoded credentials, keys in code, logging of sensitive data
5. **Prototype pollution**: Object spread from user input, __proto__ manipulation
6. **Race conditions**: Async operations that could interleave dangerously
7. **Error handling**: Stack traces leaking in responses, errors that expose internals

## For Each Issue Found
- Severity: Critical / High / Medium / Low
- CWE category
- Exact line(s) affected
- Safe version of the code

## If Clean
Confirm the code is safe to merge and note any edge cases that weren't security 
issues but should be tested.

Context: This code is [describe what it does and who has access to it].
The framework is [Next.js / Express / Django / etc.].
The data involved: [user PII / payment data / internal only / public].

Use when: After any AI-generated auth handler, API route, form processing, or file upload code. Non-negotiable for code touching user data or payments.

Pairs with: CyberOS (https://cyberos.dev) for automated continuous review in CI/CD pipelines.

Source: Based on OWASP Top 10 2025 and the CyberOS pattern database (615 patterns as of April 2026).

Category 32: Automation & Agent Orchestration Prompts (Added April 2026)

Three new prompt patterns for Claude Code Routines (launched April 2026), Cursor 3 multi-repo agent orchestration, and automated security auditing — covering the full spectrum from simple recurring automation to coordinated multi-agent coding sessions.

32.1 Claude Code Routines — PR Review Automation (Intermediate)

Tool: Claude Code | Difficulty: Intermediate | Time: 15-30 min

Claude Code Routines (April 2026) let you define recurring coding tasks that run on Anthropic's cloud infrastructure, triggered by events like new pull requests. Use this prompt to configure a Routine that automatically reviews every incoming PR before a human reviewer sees it.

## Claude Code Routine: Automated PR Review

Set up a Claude Code Routine that triggers on new pull requests to this 
repository and performs a structured code review before human reviewers 
are assigned.

## Trigger
Event: pull_request.opened, pull_request.synchronize
Scope: all branches targeting main and develop
Skip: PRs with label "skip-ai-review" or authored by bots

## Review Tasks (run in sequence)

### 1. Change Summary
- Summarize what the PR does in 3-5 bullet points
- Identify which components/modules are affected
- Estimate scope: small (< 50 lines changed), medium (50-300), large (300+)

### 2. Code Quality Check
- Flag any functions longer than 50 lines
- Flag cyclomatic complexity > 10
- Identify duplicated logic that already exists elsewhere in the codebase
- Check naming conventions match the patterns in [existing files in the repo]

### 3. Security Scan
- Check for the patterns in Prompt 32.3 (OWASP Top 10 for Next.js/React)
- Flag any hardcoded secrets, tokens, or credentials
- Identify unvalidated user inputs reaching database or filesystem operations
- Check new API routes for missing authentication guards

### 4. Test Coverage
- Identify new functions or branches not covered by the PR's test additions
- List any test files that should have been updated but weren't
- Flag missing edge case tests for: null/undefined input, empty arrays, 
  auth failure paths

### 5. Review Output
Post a structured comment to the PR with:
- **Summary**: [auto-generated summary]
- **Scope**: small / medium / large
- **Issues**: [table: Severity | File | Line | Issue | Suggested Fix]
- **Missing tests**: [list]
- **Verdict**: LGTM (no blockers) | NEEDS CHANGES (list blockers) | REQUEST HUMAN REVIEW (flag for security/arch concerns)

## Routine Configuration
- Runtime: Anthropic cloud (no self-hosted runner required)
- Model: claude-sonnet-4-6
- Timeout: 5 minutes per PR
- Post comment as: GitHub App bot account
- Do NOT approve or request changes via GitHub review API — comment only
- Do NOT auto-merge under any circumstances

## What This Routine Should NOT Do
- Rewrite or suggest large refactors on a per-PR basis
- Block PRs automatically — it informs, humans decide
- Comment more than once per commit push (deduplicate on commit SHA)

Why it works: This Routine acts as a tireless first-pass reviewer that runs in under 5 minutes on every PR. Human reviewers arrive to a structured pre-analysis and can focus on architecture and intent rather than scanning for obvious issues.

Setup note: Configure the Routine in your Claude Code workspace settings under Routines > New Routine > Event Trigger. The model runs server-side — no GitHub Actions minutes consumed.

32.2 Multi-Agent Coding Session Orchestration (Advanced)

Tool: Claude Code, Cursor 3 | Difficulty: Advanced | Time: 2-4 hours

Cursor 3 (April 2026) introduced unified multi-repo agent orchestration — a single workspace can coordinate agents working across separate repositories simultaneously. Use this prompt pattern to split a full-stack feature across three specialized agents: backend, frontend, and test/QA.

## Multi-Agent Session: [Feature Name]

You are the orchestrator for a 3-agent coding session. Your job is to 
decompose the feature, assign agents, prevent conflicts, and integrate 
outputs. Do not write implementation code yourself — delegate to agents.

## Feature Brief
[Describe the feature in 3-5 sentences: what it does, what data it uses, 
what API contracts it creates or modifies, and any external integrations.]

## Repository Map
- Backend repo: [path or URL — e.g., api.myapp.com at /repos/backend]
- Frontend repo: [path or URL — e.g., app.myapp.com at /repos/frontend]
- Shared types package: [path — e.g., /repos/shared-types] (if applicable)

---

## Agent 1: Backend Agent
**Scope**: [/repos/backend/src/routes, /repos/backend/src/services, /repos/backend/src/db]
**Mission**: Implement the server-side feature — database schema changes, 
business logic, and REST/GraphQL API endpoints.

**Deliverables**:
1. Database migration file for [new tables or schema changes]
2. Service layer with full business logic and error handling
3. API endpoints matching this contract:
   - [METHOD] [/path]: [description, request body, response shape]
   - [METHOD] [/path]: [description]
4. Unit tests for the service layer (90%+ coverage on new code)
5. Update /repos/shared-types with any new TypeScript interfaces

**Must NOT touch**:
- Frontend repo
- Authentication middleware (read-only)
- Existing migrations

**Handoff**: Write `agent-handoff-backend.md` with final API contracts 
and any environment variables added.

---

## Agent 2: Frontend Agent
**Scope**: [/repos/frontend/src/components, /repos/frontend/src/pages, /repos/frontend/src/hooks]
**Mission**: Implement the UI for [feature name] using the API contracts 
defined in agent-handoff-backend.md. Wait for Agent 1's handoff file 
before writing any data-fetching code.

**Deliverables**:
1. React components: [list specific components needed]
2. Data-fetching hooks using [SWR / React Query / Server Actions] 
   matching the API contract in agent-handoff-backend.md
3. Form validation for all user inputs
4. Loading, empty, and error states for all async operations
5. Responsive layout (mobile breakpoint: 640px)

**Must NOT touch**:
- Backend repo
- Auth context or session management
- Design system tokens (read-only — use existing classes)

**Handoff**: Write `agent-handoff-frontend.md` with component tree, 
prop interfaces, and any new environment variables needed.

---

## Agent 3: Test & QA Agent
**Scope**: [/repos/backend/tests, /repos/frontend/tests, /repos/frontend/e2e]
**Mission**: Write the full test suite for this feature. Start after 
Agent 1's handoff. Complete E2E tests after Agent 2's handoff.
Do NOT write implementation code — tests only.

**Deliverables**:
1. API integration tests (all endpoints: happy path + 4xx + 5xx cases)
2. Component tests for each UI component Agent 2 built
3. E2E test covering the full user flow: [describe the 3-5 step user journey]
4. A test coverage report showing new code coverage

**Must NOT touch**:
- Source code in either repo (tests and fixtures only)

**Handoff**: Write `agent-handoff-qa.md` with test results, coverage 
numbers, and any failing tests with root cause.

---

## Orchestration Rules

**Sequencing**:
1. Agent 1 runs first — do not start Agent 2 until agent-handoff-backend.md exists
2. Agent 2 and Agent 3 (API tests only) can run in parallel after Agent 1 finishes
3. Agent 3 E2E tests run last — requires both Agent 1 and Agent 2 complete

**Conflict prevention**:
- package.json / lock files: frozen during parallel work — no dependency additions
- Shared types: Agent 1 owns writes, Agents 2 and 3 read-only
- Environment files: each agent appends to a dedicated .env.[agent] file, 
  do not modify .env directly

**Integration checkpoint**:
When all three agents have written their handoff files, run:
1. `npm run build` in both repos — must succeed with zero errors
2. `npm test` in both repos — all tests must pass
3. `npm run e2e` — all E2E tests must pass

If any step fails, identify which agent's output caused the failure 
and assign a targeted fix task to that agent only.

**Final output**:
Write `session-summary.md` with:
- Feature implemented (what was built)
- All files changed (by repo and agent)
- Test results (pass/fail counts, coverage delta)
- Known limitations or deferred items
- Decisions made and why

Why it works: The strict scope boundaries prevent agents from stepping on each other's work. The handoff files create an explicit async interface between agents — Agent 2 cannot make assumptions about the API until Agent 1 has documented it, which eliminates the most common integration failure in multi-agent sessions.

Cursor 3 setup: Open three agent panels in the Agents Window. Paste each agent block into its respective panel. Launch Agent 1 first. Monitor agent-handoff-backend.md creation before launching Agents 2 and 3.

32.3 Security Audit Automation — Next.js/React OWASP Top 10 (Advanced)

Tool: Claude Code | Difficulty: Advanced | Time: 30-60 min

Use this prompt to run a comprehensive automated security audit of a Next.js or React codebase, checking for all OWASP Top 10 vulnerability classes with patterns tuned for the React/Next.js stack. Designed to complement CyberOS's continuous monitoring (https://cyberos.dev) for one-time deep audits.

## Automated Security Audit: Next.js / React Codebase

Perform a systematic OWASP Top 10 security audit of this Next.js/React 
codebase. Work through each phase in sequence. Do not skip phases or 
combine them — each phase informs the next.

## Codebase Context
- Framework: Next.js [version] (App Router / Pages Router)
- Auth provider: [NextAuth / Supabase Auth / Clerk / custom]
- Database: [Supabase / Prisma + PostgreSQL / other]
- Payment handling: [Stripe / Paddle / none]
- Deployment: [Vercel / AWS / self-hosted]
- External APIs called: [list]

---

## Phase 1 — Inventory (5 min, no analysis yet)

Map the attack surface:
1. List every file in /app/api or /pages/api (Next.js API routes)
2. List every Server Action (files with "use server")
3. List every form or input that accepts user data
4. List every place external data is rendered to the DOM
5. List every third-party library that handles auth, payments, or user data

Output as numbered lists. Do not evaluate yet.

---

## Phase 2 — OWASP Top 10 Scan

For each item in the Phase 1 inventory, check the following. 
Reference CWE IDs and the exact file:line for every finding.

### A01 — Broken Access Control
- Every API route and Server Action: is auth checked server-side 
  (not relying on middleware alone)?
- Are RLS policies enforced at the database level (Supabase) or via 
  ORM-level guards (Prisma)?
- Are there IDOR risks — can a user access another user's records by 
  changing an ID parameter?
- Is the CVE-2025-29927 dual-layer auth pattern implemented? 
  (See Category 26, Prompt 26.3)

### A02 — Cryptographic Failures
- Are passwords hashed with bcrypt or argon2 (not SHA-1/MD5)?
- Is HTTPS enforced with HSTS headers?
- Are any secrets or tokens returned in API responses or logged?
- Are JWTs validated on every request (not just on login)?

### A03 — Injection
- Are all database queries parameterized? 
  Flag any string concatenation in SQL or ORM raw queries.
- Is there risk of command injection in any child_process or exec calls?
- Server Actions: is user input sanitized before use in database operations?
- Are URL and path parameters validated before use in filesystem operations?

### A04 — Insecure Design
- Are there rate limits on authentication endpoints?
- Are there rate limits on resource-intensive API routes 
  (e.g., AI generation, file processing)?
- Is there a mechanism to revoke sessions on password change or logout?
- Are webhook endpoints (Stripe, etc.) verifying signatures?

### A05 — Security Misconfiguration
- Are security headers set: CSP, X-Frame-Options, X-Content-Type-Options, 
  Referrer-Policy, Permissions-Policy?
- Are CORS origins restricted (not "*")?
- Are error responses generic (no stack traces or internal paths leaking)?
- Are Next.js server components accidentally exposing server-side data 
  in client bundles?

### A06 — Vulnerable Components
- Run: `npm audit --audit-level=high`
- Flag any dependencies with known CVEs (severity: high or critical)
- Flag any dependencies last updated more than 18 months ago that handle 
  auth, crypto, or user data

### A07 — Auth and Session Failures
- Are session tokens HTTP-only cookies (not localStorage)?
- Are session IDs regenerated after login (session fixation prevention)?
- Is "remember me" implemented with a separate long-lived token 
  (not just extending the session)?
- Are failed login attempts rate-limited and logged?

### A08 — Software and Data Integrity
- Are all npm install commands run with a lockfile (`npm ci`, not `npm install`)?
- Are GitHub Actions using pinned SHA hashes for third-party actions 
  (not floating tags like @v3)?
- Are Stripe/webhook payloads verified with HMAC signatures 
  before processing?

### A09 — Logging and Monitoring
- Are security events logged: login success, login failure, 
  auth failure on protected routes?
- Are logs sanitized — no passwords, tokens, or PII in log output?
- Is there alerting for repeated auth failures (possible brute force)?

### A10 — Server-Side Request Forgery (SSRF)
- Are there any routes that fetch a URL provided by the user?
- If yes: is the URL validated against an allowlist of safe domains?
- Are internal metadata endpoints (e.g., AWS 169.254.x.x) blocked?

---

## Phase 3 — Severity Classification

For every finding, output a row in this table:

| # | OWASP Category | CWE | Severity | File | Line | Description | Fix |
|---|---------------|-----|----------|------|------|-------------|-----|
| 1 | A01 | CWE-284 | CRITICAL | ... | ... | ... | ... |

Severity levels:
- CRITICAL: exploitable remotely, data exposure or full auth bypass
- HIGH: requires auth but leads to significant data or privilege risk
- MEDIUM: requires specific conditions, limited impact
- LOW: defense-in-depth gap, no direct exploitability
- INFO: best practice deviation, no current risk

---

## Phase 4 — Remediation

For every CRITICAL and HIGH finding:
1. Show the vulnerable code (before)
2. Show the fixed code (after)
3. One-sentence explanation of why the fix closes the vulnerability
4. Link to the relevant OWASP cheat sheet or CyberOS pattern

For MEDIUM findings: provide the fix code only (no explanation needed).

For LOW and INFO: list as a bullet with the file location.

---

## Phase 5 — Verification

After all remediations are written:
1. Re-check each CRITICAL and HIGH finding — confirm the fix addresses 
   the root cause, not just the symptom
2. Check that no fix introduced a new vulnerability 
   (e.g., error handling that leaks internals)
3. Output a final sign-off table:

| Finding # | Status | Notes |
|-----------|--------|-------|
| 1 | FIXED | ... |
| 2 | DEFERRED | reason |

---

## Output Summary
At the end of all phases, produce:
- Total findings by severity (CRITICAL: N, HIGH: N, MEDIUM: N, LOW: N, INFO: N)
- Top 3 risk areas in this codebase
- Recommended next step (e.g., "Schedule penetration test focusing on A01 
  and A03 findings", "Integrate CyberOS for continuous monitoring")

Begin with Phase 1. Confirm the inventory is complete before proceeding.

Why it works: The phased structure prevents the common failure mode where an LLM jumps to fixes before fully mapping the attack surface. By forcing an inventory pass first, the audit achieves full coverage — nothing is missed because the model got absorbed in one interesting vulnerability.

CyberOS integration: This prompt covers the same OWASP Top 10 categories as CyberOS's static analysis engine (https://cyberos.dev). Use this for on-demand deep audits, and CyberOS for continuous PR-level scanning. The findings from this audit can be imported into CyberOS as baseline issues.

Pairs with: Prompt 31.3 (Generated Code Security Review) for ongoing review of new code, and Prompt 30.2 (AI Security Audit Chain) for systematic multi-phase audit chaining.

Category 33: Claude Opus 4.7 — xhigh Effort, Vision & Self-Verification

Released April 16, 2026: Claude Opus 4.7 introduced three capabilities with immediate impact on vibe coding workflows — an xhigh effort level for extended reasoning, 3.3x higher-resolution vision, and self-verification on agentic tasks. These prompts are tuned specifically for Opus 4.7 and will not produce the same results on earlier models.

33.1 xhigh Effort Architectural Reasoning (Expert)

Tool: Claude Code (Opus 4.7) | Difficulty: Expert | Time: 15-30 min

Use Opus 4.7's xhigh effort level for decisions that are hard to reverse — database schema choices, authentication architecture, API design. The extended thinking mode considers more edge cases and provides more honest uncertainty quantification than standard effort.

<effort>xhigh</effort>

You are a senior software architect. I need your deepest analysis on this decision.

## Decision Required
[Describe the architectural choice in 1-3 sentences — e.g., "Should I use a 
single Postgres database with RLS for multi-tenancy, or separate schemas per tenant?"]

## System Context
- Scale target: [current users / projected 12-month users]
- Team size: [N engineers, their experience level]
- Current stack: [list key technologies]
- Budget constraints: [infrastructure budget, or "cost-sensitive / not a constraint"]
- Timeline: [when does this need to be production-ready]

## Constraints (non-negotiable)
- [Constraint 1 — e.g., "Must work with Supabase — no custom database infra"]
- [Constraint 2]

## Options Under Consideration
### Option A: [name]
[Brief description]
Perceived pros: [list]
Perceived cons: [list]

### Option B: [name]
[Brief description]
Perceived pros: [list]
Perceived cons: [list]

## What I'm Uncertain About
[The specific thing that makes this decision hard — e.g., "I don't know how 
RLS performs at 100k rows per tenant with complex join queries"]

## Output Required
1. Your recommendation (Option A, B, or a hybrid) with confidence level (0-100%)
2. The 3 most important factors that drove your recommendation
3. The scenario under which your recommendation would be wrong
4. The first concrete implementation step if I go with your recommendation
5. Red flags to watch for in the first 30 days of implementation

Take as long as you need to reason through this. Don't truncate the reasoning.

Why it works: The <effort>xhigh</effort> tag signals Opus 4.7 to enter extended thinking mode. For complex architectural questions, the additional compute produces answers that consider more edge cases, catch more subtle interactions, and provide more honest uncertainty quantification than standard responses.

When to use xhigh: Save it for decisions that are hard to reverse — architectural choices, security design, data modeling. Don't use it for quick questions where standard effort is adequate.

33.2 Vision-Enhanced UI Debugging (Intermediate)

Tool: Claude Code (Opus 4.7) | Difficulty: Intermediate | Time: 10-20 min

Opus 4.7's 3.3x higher-resolution vision support means it can now read detailed UI screenshots, identify small alignment issues, read small-print error messages, and compare designs at pixel level. Use this pattern for UI debugging and visual regression analysis.

[Attach screenshot of UI bug or visual issue]

You are a senior frontend engineer debugging a visual problem. The screenshot shows:
[Brief description of what you're looking at]

## What I need
1. Identify all visible UI problems in this screenshot — layout issues, spacing
   inconsistencies, color/contrast problems, text truncation, alignment bugs
2. For each problem, hypothesize the CSS or component cause
3. Rank by severity: (a) breaks functionality (b) fails WCAG contrast (c) looks wrong

## Codebase context
- Framework: [React/Next.js/Vue/etc]
- CSS approach: [Tailwind/CSS Modules/styled-components/etc]
- Key component files: [relevant file paths]

Then check the relevant component files and propose a specific fix for the
highest-severity issue first.

Why it works: The 3.3x vision resolution lets Opus 4.7 read small-print labels, identify subtle alignment (off by 2px), and distinguish similar colors that previous models couldn't differentiate. Pairing the visual analysis with codebase access creates a loop where the model reads the pixel output and the source simultaneously.

33.3 Self-Verifying Agent Task (Advanced)

Tool: Claude Code (Opus 4.7) | Difficulty: Advanced | Time: 30-90 min

Opus 4.7 added self-verification on agentic tasks — the model can now flag when it has low confidence in its own output and request human confirmation before proceeding. This prompt pattern is designed to take advantage of that capability for high-stakes automated tasks.

You are executing a high-stakes automated task. Opus 4.7 self-verification is enabled.

## Task
[Describe the task in detail]

## Self-Verification Protocol
At each decision point where you are >15% uncertain about the correct action:
1. STOP and output: VERIFICATION_REQUIRED: [describe what you're uncertain about]
2. List the options you're considering and your confidence in each
3. Wait for my confirmation before proceeding

## High-Stakes Actions That Always Require Verification
- Deleting or overwriting files not in the explicit scope
- Making API calls that cost money or have rate limits
- Modifying database schemas or running migrations
- Changing authentication or authorization logic
- Publishing or deploying to production environments

## Success Criteria
[What does "done" look like? How will you verify you succeeded?]

Begin. If you complete the first phase without a VERIFICATION_REQUIRED, confirm
the phase is done and your confidence level before continuing to the next phase.

Why it works: This prompt makes Opus 4.7's self-verification explicit and structured. By defining a confidence threshold (15%) and listing high-stakes action categories, you get an agent that asks for help when it genuinely needs it rather than either proceeding blindly or asking about everything.

Integration with CyberOS: For tasks involving security-sensitive operations, pair this with CyberOS's continuous monitoring so any unexpected file modifications or API calls are flagged independently.

Category 34: Claude Design & AI-Assisted Visual Creation

Launched April 17, 2026: Anthropic introduced Claude Design, extending Claude's capabilities into rapid visual content generation. These prompts cover workflows for using Claude Design alongside Claude Code for visual asset creation — from brand assets to landing page design to marketing graphics — integrated into the vibe coding workflow.

34.1 Brand Asset Sprint (Beginner)

Tool: Claude Design, Claude Code | Difficulty: Beginner | Time: 30-60 min

Use Claude Design to generate a complete brand asset pack for a new vibe-coded project. This prompt produces a design brief that Claude Design can execute directly, giving you logo concepts, color palettes, and icon sets in one session.

I'm creating brand assets for a new product called [Product Name].

## Product Summary
[2-3 sentences: what it does, who uses it, what feeling it should evoke]

## Brand Personality
Choose 3 adjectives that describe the brand: [e.g., modern / trustworthy / playful]

## Audience
Primary users: [who they are — age range, technical sophistication, context of use]

## Design Direction
- Style preference: [minimal / bold / corporate / friendly / technical / expressive]
- Color mood: [warm / cool / neutral / vibrant / muted]
- Reference brands I like: [1-3 brand names with notes on what you like]
- Reference brands to avoid: [1-2 brand names that feel wrong]
- Logo type preference: [wordmark / icon + wordmark / icon only / abstract mark]

## Assets Needed
1. Primary logo (light background)
2. Primary logo (dark background / inverted)
3. Favicon / app icon (square, 512×512)
4. Social media profile image (1:1 ratio)
5. Color palette: 1 primary, 1 accent, 2 neutrals (light + dark), 1 semantic (error/warning)
6. Typography pairing: heading font + body font (Google Fonts preferred)
7. 3 icon style examples (outline / filled / duotone — whichever fits the style)

## Output Format
For each asset, provide:
- Visual description precise enough for a designer or AI image tool to recreate
- Hex codes for all colors
- Font names and weights for typography
- A short rationale explaining why each choice fits the brand

Start with the color palette and typography — everything else should derive from those foundations.

Why it works: Claude Design's visual understanding lets it generate coherent brand systems rather than isolated assets. By front-loading the palette and type decisions, you get downstream assets that feel intentional rather than assembled from unrelated pieces.

Follow-up: Feed the output from this prompt directly into Claude Design's visual canvas to generate image mockups. Use the hex codes and font names in your Tailwind config (tailwind.config.ts) to wire the brand into the codebase in minutes.

34.2 Landing Page Hero Design Spec (Intermediate)

Tool: Claude Design, Cursor, Claude Code | Difficulty: Intermediate | Time: 20-45 min

Generate a detailed design spec for a landing page hero section — precise enough for Cursor to implement directly into Tailwind/React without ambiguity. Bridges the gap between visual concept and production code.

Design a landing page hero section for [Product Name], a [brief description].

## Goal of the Hero
The hero must communicate: [what the product does] + [who it's for] + [why to care]
in under 5 seconds. Primary CTA: [button text and action].

## Brand Context
- Primary color: [hex]
- Accent color: [hex]
- Background: [hex or gradient description]
- Heading font: [font name, weight]
- Body font: [font name, weight]
- Tone: [formal / casual / technical / playful]

## Layout Requirements
- Viewport: Full-screen (100vh) on desktop, auto-height on mobile
- Layout type: [centered / left-aligned / split (text left, visual right)]
- Visual element: [illustration / screenshot / animation / abstract shape / none]
- Navigation: [sticky top bar / transparent overlay / none]

## Content to Include
- Headline: [your draft or "generate 3 options"]
- Subheadline: [your draft or "generate 3 options"]
- Social proof element: [logos / testimonial quote / stat / none]
- CTA button: Primary "[text]" + Secondary "[text]" (optional)
- Trust signals: [e.g., "No credit card required", "Used by 2,000+ developers"]

## Responsive Behavior
- Desktop (1280px): [describe layout]
- Tablet (768px): [any changes — stack columns, reduce font sizes, etc.]
- Mobile (375px): [headline size, single-column, CTA full-width]

## Output Format
Provide:
1. Annotated wireframe description (text-based — every element, position, spacing)
2. Tailwind CSS class recommendations for each element
3. Copy variants (3 headline options, 2 subheadline options)
4. Animation suggestions (entrance animation, hover states) — optional, flag if they
   add distraction rather than clarity

Then implement the hero as a self-contained React component using Tailwind.

Why it works: By asking for both the design spec and the implementation in the same prompt, you skip the translation step where a design mockup loses fidelity going into code. The Tailwind class output means Cursor can implement the exact design without reinterpretation.

Pairs with: Prompt 34.1 (Brand Asset Sprint) for the color palette and font choices. Prompt 1.3 (Landing Page from Zero) in Category 1 for the full page structure beyond the hero.

34.3 Visual Content Brief for Consistent AI Generation (Advanced)

Tool: Claude Design, Claude Code (Opus 4.7) | Difficulty: Advanced | Time: 45-90 min

Create a visual content system specification — a single source of truth document that ensures all AI-generated visuals for a product feel like they belong to the same brand. Solves the consistency problem when generating marketing graphics, blog thumbnails, social posts, and UI illustrations over time.

## Visual Content System Specification

I need a visual content system for [Product Name] that ensures consistency across
all AI-generated images and graphics. This system will be used by Claude Design,
Midjourney, DALL-E 3, and Stable Diffusion to produce assets over the next 12 months.

## Brand Foundation (already defined)
- Logo: [description or attachment]
- Primary palette: [hex codes with role labels — primary, accent, background, text]
- Typography: [heading and body font names]
- Tone adjectives: [3 words that describe the brand personality]

## Asset Categories to Define
For each category, specify the visual style, composition rules, and example prompt template:

### Category A: Blog / Article Thumbnails (1200×628px)
- Use case: [website blog, newsletter, LinkedIn posts]
- Volume: ~[N] per month
- Visual style: [abstract / illustrative / photographic / typographic]

### Category B: Social Media Graphics (1:1, 9:16, 16:9)
- Use case: [Twitter/X, LinkedIn, Instagram]
- Volume: ~[N] per month
- Visual style: [consistent with A / more casual / motion-focused]

### Category C: Product Screenshots & Mockups
- Use case: [landing page, app store, documentation]
- Volume: ~[N] per quarter
- Visual style: [clean device mockup / contextual scene / abstract UI fragment]

### Category D: Icons & Illustrations (if applicable)
- Use case: [empty states, feature explainers, onboarding]
- Style: [flat / isometric / line art / 3D]

## Constraints
- Must never use: [specific visual elements to avoid — stock photo clichés,
  specific color combinations that conflict with brand, visual motifs from competitors]
- Must always include: [brand element in every image — subtle color, pattern, etc.]
- Accessibility: all text in images must meet WCAG AA contrast (4.5:1 minimum)

## Deliverables

1. **Style Guide**: 2-3 paragraphs defining the visual language in words
2. **Color Application Rules**: When to use primary vs. accent, background rules,
   gradient usage policy
3. **Reusable Prompt Templates**: For each category, a parameterized prompt template
   like: "[Category A template]: A [adjective] [composition] depicting [subject] for
   [brand name], using [colors], [style description], [technical specs]"
4. **Negative Prompt Library**: 10-15 terms to consistently exclude across all
   AI image generation to maintain brand safety and visual consistency
5. **Quality Checklist**: 5-point check before publishing any AI-generated asset
   (brand colors present, text legible, no AI artifacts, consistent style,
   no competitor visual cues)

Generate all five deliverables. For the prompt templates, test each one by
writing an example output description of what the image would look like.

Why it works: The consistency problem in AI visual generation comes from re-describing the brand each time you need an asset. A visual content system document solves this by encoding the brand DNA into reusable prompt fragments — Claude Design, Midjourney, and DALL-E 3 all respond to the same parameterized templates, producing visuals that read as siblings rather than strangers.

Production integration: Save this document as visual-content-system.md in your project root. Reference it at the start of every visual generation session: "Using the system defined in visual-content-system.md, generate [asset type]." Claude Design can read it directly as context.

Cross-link: CyberOS brand toolkit for security-focused products needing consistent trust-signal visuals. vibe-coding.academy for the course on building complete brand systems with AI tools.

Category 35: Claude Code Routines & Automation Prompts (New — April 2026)

These prompts are designed for Claude Code's Routines feature (launched April 2026), which runs saved workflows automatically on Anthropic's cloud infrastructure — triggered by GitHub events or cron schedules.

35.1 Automated Dependency Audit Routine (Intermediate)

Tool: Claude Code Routines | Trigger: Weekly cron | Time: Runs overnight

Deploy as a weekly cron Routine to audit all dependencies for CVEs, breaking changes, and outdated packages — then file a single consolidated GitHub issue with a prioritized upgrade plan.

You are a dependency security auditor running a weekly scan.

## Your task
1. Run `npm audit --json` (or equivalent for the project's package manager) and parse the output
2. Run `npx npm-check-updates --json` to identify outdated packages
3. Check the GitHub Security Advisories API for CVEs affecting any direct dependency
4. Cross-reference CVEs against the CISA Known Exploited Vulnerabilities catalog

## Prioritization framework
- P0 (File GitHub issue + comment on all open PRs): CVSS >= 9.0 CVEs in direct deps
- P1 (File GitHub issue): CVSS 7.0-8.9 CVEs, or packages > 2 major versions behind
- P2 (Add to weekly report): Minor/patch updates, low-severity advisories
- P3 (Skip): Dev-only dependencies with no production surface

## GitHub issue format
Title: `[Security] Weekly dependency audit — {DATE}`

Do not open a PR. File the issue only. Mark it with labels: `security`, `dependencies`.
If zero issues found: close any open dependency audit issues from previous weeks and post
a comment: "Weekly dependency scan {DATE}: No critical issues found."

Why it works: Manual dependency audits happen inconsistently — usually only when a CVE alert lands in your inbox, meaning you're already reactive. A Routine that runs every Monday at 2am means your team starts every week knowing their exposure.

Setup: Claude Code → Settings → Routines → New. Trigger: 0 2 * * 1 (every Monday at 2am). Connect GitHub. Paste prompt.

35.2 PR Quality Gate Routine (Beginner)

Tool: Claude Code Routines | Trigger: GitHub PR opened | Time: 2-3 min per PR

Run this Routine on every new pull request. It checks code quality, security, and test coverage gaps before a human reviewer looks at the diff.

You are a PR quality gate. Review the attached pull request diff and produce a
structured assessment. Do not approve or request changes — post a comment only.

Review for:
1. Security: OWASP Top 10, hardcoded secrets, missing auth checks on new endpoints
2. Code quality: functions >50 lines, duplicate code, broad TypeScript `any` types,
   missing async error handling, console.log in production paths
3. Test coverage: new functions with no test changes, API endpoints with no integration test
4. PR hygiene: description matches diff, breaking changes flagged

Output as a GitHub comment:

**Automated PR Review**

| Category | Status | Details |
|----------|--------|---------|
| Security | Pass / Issues | [summary] |
| Code Quality | Pass / Issues | [summary] |
| Test Coverage | Pass / Issues | [summary] |

Issues requiring action before merge: [list with file:line, or "None."]
Suggestions (non-blocking): [list, or "None."]

_Automated review. Final approval requires human review._

Why it works: Routes mechanical catches to automation so human reviewers spend time on architecture and business logic decisions. Teams using automated first-pass review report 30–40% shorter human review cycles.

35.3 Daily Release Notes Generator (Intermediate)

Tool: Claude Code Routines | Trigger: Daily cron (9am) | Time: 5-10 min

Generates human-readable release notes from yesterday's merged PRs and appends to CHANGELOG.md automatically.

You are a technical writer generating daily release notes.

1. Fetch all PRs merged into `main` in the last 24 hours
2. Group by category from PR labels or commit prefix: feat/fix/perf/security/docs/chore
3. Write 1-3 sentence plain-English summaries of each change
4. Identify breaking changes (look for "BREAKING" in PR titles or descriptions)

Append to CHANGELOG.md at the top:

## {DATE}

### Breaking Changes
[If any. Omit section if none.]

### New Features
- **[Feature name]**: [1-2 sentence description]

### Bug Fixes
- **[What was broken]**: [What was fixed]

### Security
- [Specific CVE/issue patched]

Rules:
- If no PRs merged: append `## {DATE}\n_No changes merged._`
- Never overwrite existing CHANGELOG entries
- Commit with message: `docs: daily release notes {DATE}`

Why it works: CHANGELOG debt is universal — teams know they should maintain it but rarely do consistently. A Routine removes the friction entirely. The CHANGELOG stays accurate at zero ongoing cost.

Cross-link: → EndOfCoding.com for the full article on Claude Code Routines. → LLMHire.com for AI Automation Architect roles (this skill commands a $28K salary premium).

Category 36: Context Engineering Prompts (New — April 2026)

"Context engineering" — coined in early 2026 by Tobi Lütke (Shopify CEO) and rapidly adopted across the industry — is the discipline of structuring what you put into an AI's context window to maximize output quality. With Claude's 1M-token context and $200/mo Max plan, context management is now a primary vibe coding skill.

36.1 Legacy Codebase Context Map (Beginner)

Tool: Claude Code | Time: 15-20 min | Context: 1M tokens ideal

Use this at the start of any engagement with an unfamiliar or legacy codebase. It builds a mental model for Claude that persists across the session, dramatically reducing hallucination and incorrect assumptions.

I'm about to ask you to work on a large existing codebase. Before I give you
any tasks, I want to load you with the context you need to reason accurately.

## Codebase overview
[Paste your README or write 2-3 sentences describing the product]

## Tech stack
- Language: [e.g., TypeScript, Python]
- Framework: [e.g., Next.js 15, FastAPI]
- Database: [e.g., PostgreSQL via Supabase]
- Deployment: [e.g., Vercel + Railway]
- Key dependencies: [list 5-10 most important packages]

## Architecture pattern
[Describe in 2-3 sentences: monolith vs. microservices, how data flows, where business logic lives]

## Naming conventions
- Files: [e.g., kebab-case for components, camelCase for utils]
- DB tables: [e.g., snake_case, plural]
- API routes: [e.g., /api/v1/resource]
- Env vars: [e.g., NEXT_PUBLIC_ prefix for client-safe vars]

## What NOT to touch
[List any files, modules, or patterns to avoid — e.g., "Don't modify auth middleware, it's vendor-managed"]

## Current known issues
[List 3-5 open bugs or technical debt items so Claude doesn't re-introduce them]

Acknowledge this context and tell me what you understand about the codebase
before I give you your first task.

Why it works: Without this upfront loading, Claude infers conventions from what it sees in each individual file — and can contradict itself across a session. This prompt anchors a shared mental model that holds for the entire working session.

Pro tip: Save this filled-in template as CLAUDE_CONTEXT.md in your repo root. Paste its contents at session start, or reference it as a Routine pre-step.

36.2 Rolling Summary Context Compression (Intermediate)

Tool: Claude Code, Claude.ai | Time: 5 min per compression cycle | Context: Any size

Long conversations drift. After ~20 exchanges, earlier decisions get forgotten and Claude starts making inconsistent choices. This prompt compresses your session state into a portable summary you can paste into a fresh context window.

We've been working together for a while. Before continuing, I need you to create
a compressed context summary I can paste into a new session.

Write a structured summary with these sections:

## Project State
- What we're building: [1 sentence]
- Current milestone: [what we're working on right now]
- Completion status: [% done, what's left]

## Decisions Made (Do Not Revisit)
[List every architectural, naming, or technical decision we've committed to —
 even if it feels suboptimal. These are locked.]

## Active Constraints
[List every constraint that's shaped our decisions: performance requirements,
 team conventions, third-party limitations, deadlines]

## Mistakes to Avoid
[List every wrong path, failed approach, or anti-pattern we've already ruled out —
 with 1 sentence on why it was rejected]

## Current Task State
[Describe exactly where we left off — what was last completed, what's in progress,
 what the immediate next step is]

## Files Modified This Session
[List every file touched, with 1-sentence description of what changed]

Format this for copy-paste into a new Claude session. The summary should be
complete enough that a fresh Claude instance can continue seamlessly with zero
catch-up questions.

Why it works: Context compression is the single highest-leverage technique for long vibe coding sessions. Teams using this report 60–70% reduction in "wait, I thought we decided..." regressions. It also makes sessions resumable across days.

36.3 Multi-File Feature Context Bundle (Advanced)

Tool: Claude Code | Time: 5 min setup, saves hours | Context: Targeted loading

When implementing a new feature that touches 5+ files, Claude needs to see all relevant code simultaneously to avoid making changes that break other parts of the system. This prompt guides you through building the right context bundle before writing any code.

I'm about to implement: [feature name in 1 sentence]

Before writing any code, help me identify every file that could be affected
and what I need to know about each one.

## Feature description
[2-3 sentences on what the feature does, what user-facing behaviour it changes,
 and what data it reads/writes]

## Entry points
[Where does this feature start? e.g., "New API endpoint at /api/payments/refund"
 or "New button in the checkout flow"]

Based on this, please:
1. List every file likely to need modification (with filepath and why)
2. List every file I should READ but not modify (key context for side effects)
3. Identify any circular dependencies or layering violations to watch for
4. Flag any existing tests I must update
5. Estimate total lines-of-change and rate the blast radius: Low / Medium / High

Then read the files you've listed and summarize what you learn about each
before we write a single line of new code.

Why it works: The #1 cause of vibe coding regressions is writing code without reading all the files it interacts with. This prompt forces a "read phase" before any "write phase" — identical to how senior engineers approach large features. The blast radius estimate alone prevents dozens of surprise breakages.

Cross-link: → EndOfCoding.com for the deep-dive on context engineering techniques. → Vibe Coding Academy for the Context Mastery course module (covers CLAUDE.md, context windows, and session hygiene).

Category 37: Agentic Engineering Prompts (New — April 2026)

Andrej Karpathy coined "agentic engineering" in April 2026 — the professional evolution beyond vibe coding. Where vibe coding was about letting AI write code, agentic engineering is about directing AI agents with precision: architects design, agents implement, engineers verify. These prompts operationalize that workflow.

37.1 The Agentic Engineering Brief (Intermediate)

Tool: Claude Code, Cursor 3 | Time: 10-15 min | Category: Project Architecture

Inspired by: Karpathy's "agentic engineering" reframe — humans architect, agents implement.

I'm building [product/feature name]. Before writing any code, help me create an Agentic Engineering Brief:

## What I'm Building
[One paragraph description]

## Agent Task Breakdown
Decompose this into discrete tasks that an AI agent can execute autonomously:
1. [Task type: research/scaffold/implement/test/review]
2. ...

## Human Decision Points
Where do I need to review and approve before the agent continues:
- After: [milestone 1]
- After: [milestone 2]

## Acceptance Criteria
How will I know each task is complete and correct:
- [Measurable criterion 1]
- [Measurable criterion 2]

## Risk Flags
What should I watch for in the AI's output:
- [ ] Security: [specific concern for this project type]
- [ ] Logic: [specific business logic to verify]
- [ ] Dependencies: [packages to audit before installing]

Generate this brief, then we'll execute task by task with you as my engineering agent.

Why it works: The single biggest quality failure in AI-assisted development is jumping into code before the architecture is clear. This brief forces you to think like an engineering lead — decomposing work, setting decision gates, and specifying success criteria — before a single line of code is written. Teams using structured briefs report 40–60% fewer mid-project pivots.

Cross-link: → EndOfCoding.com for the full agentic engineering explainer. → LLMHire.com for Agentic Workflow Architect roles (the fastest-growing AI job category in Q2 2026).

37.2 The Dependency Safety Audit (Intermediate)

Tool: Claude Code, any LLM terminal | Time: 5 min | Category: Security

Inspired by: Slopsquatting attacks — AI-hallucinated package names used as malicious attack vectors. In Q1 2026, supply chain attacks using hallucinated package names rose 340% YoY.

Before I install these packages, audit them for safety:

[Paste the list of packages your AI suggested, e.g.:
- unused-imports
- react-query-v5-compat
- @supabase/auth-helpers-nextjs
]

For each package:
1. Confirm it exists on npm/PyPI/crates.io (not hallucinated)
2. Check download count (flag anything < 1,000/week)
3. Check last published date (flag if > 1 year)
4. Check maintainer count (flag if 1 maintainer with no activity)
5. Check for typosquatting similarity to a popular package
6. Note any known CVEs

Output as a table: Package | Verified | Downloads/wk | Last Published | CVEs | Verdict (SAFE/CAUTION/REJECT)

Flag any package you would not install in a production app and explain why.

Why it works: AI coding tools hallucinate package names at a measurable rate — typically 2–5% of suggestions in complex codebases. Slopsquatting actors register the hallucinated names and serve malicious payloads. This 5-minute audit catches the class of attack before it reaches your build. Run it every time AI suggests a package you haven't used before.

Cross-link: → EndOfCoding.com for the full security crisis analysis. → CyberOS.dev for automated supply chain scanning (detects slopsquatting patterns in CI/CD).

37.3 The AI Output Trust Calibration Prompt (Beginner)

Tool: Any LLM | Time: 5 min | Category: Quality / Evaluation

Inspired by: Developer trust in AI tools collapsing to 29% — the "almost right but not quite" problem costs teams hours in debugging code that looked correct on first read.

You just gave me this code/solution:
[PASTE THE AI OUTPUT HERE]

Now play devil's advocate. In this code:

1. What could be wrong or subtly broken that I might miss on first read?
2. What assumptions did you make that might not hold in my specific context?
3. What are the 2-3 things most likely to fail in production?
4. What would you want to test first before shipping this?
5. Is there a simpler approach you didn't take? Why didn't you take it?

Be honest. I'd rather know the risks now than discover them at 2am.

Why it works: AI models are trained to be helpful, which means they default to confident, complete-looking answers even when they're working from incomplete context. This prompt exploits the model's ability to reason about its own outputs — switching from generation mode to critique mode. Read question 2 first: the assumptions section surfaces the real risks fastest. Teams running this prompt before every PR merge report catching 30–40% more issues that would have reached production.

37.4 The Multi-Model Router Design Prompt (Advanced)

Tool: Claude Code, Cursor | Time: 60-90 min | Category: Architecture / Cost Optimization

Inspired by: 90% API cost reduction achieved via multi-model routing (n1n.ai, April 2026). With frontier models costing $5–75/M tokens and open models available for $0.10–0.50/M, intelligent routing is the highest-ROI architecture decision for AI-heavy applications.

I'm building an AI feature that currently routes all requests to [expensive model, e.g., Claude Opus 4.6].
Monthly cost is $[X]. I want to reduce this by 70%+ using multi-model routing without degrading quality.

Current request types hitting [expensive model]:
1. [Request type 1] — e.g., "classify user intent from a short message" — volume: [N]/day
2. [Request type 2] — e.g., "generate a 500-word marketing email" — volume: [N]/day
3. [Request type 3] — e.g., "debug a TypeScript error with full codebase context" — volume: [N]/day

Design a multi-model routing architecture:

## Model Tier Assignment
For each request type above, assign to the appropriate tier:
- Tier 1 (classification/routing): Mistral 7B or similar at < $0.20/M — for intent detection, simple categorization
- Tier 2 (general tasks): DeepSeek-V3 or Llama 3.1 70B at < $0.80/M — for summarization, drafts, standard Q&A
- Tier 3 (complex reasoning): [Current expensive model] — reserve for tasks requiring deep context, code generation, or multi-step reasoning

## Router Implementation
Write a routing function that:
1. Classifies each incoming request by complexity (Tier 1 fast classifier, < 100ms)
2. Routes to the appropriate model
3. Falls back to the next tier up if confidence < 0.85
4. Logs tier assignments for quality review

## Caching Layer
Add semantic caching using Redis:
- Cache responses for semantically similar queries (cosine similarity > 0.92)
- TTL: [appropriate for your domain, e.g., 1 hour for support answers, 24h for documentation]
- Cache hit rate target: > 30% of requests

## Quality Gate
Define what "quality equivalent" means for each tier:
- Run A/B test routing 10% of Tier 2 traffic to Tier 3 for 1 week
- Measure: [task completion rate / user satisfaction / error rate]
- Accept Tier 2 routing only if metrics within [5%] of Tier 3 baseline

Show me: the router code, the Redis caching layer, estimated new monthly cost, and the A/B test setup.

Why it works: Model routing is the single highest-ROI optimization for AI applications — but most teams skip it because designing the routing logic feels complex. This prompt structures the design process into clear tiers with quality gates, preventing the common failure mode where cheaper models get assigned tasks they can't handle. The semantic caching layer alone typically cuts 25–35% of API calls. Run this prompt once per AI feature surface; the resulting architecture typically achieves 70–90% cost reduction with less than 5% quality degradation.

Cross-link: → EndOfCoding.com for AI cost optimization analysis. → CyberOS.dev for API security scanning of multi-model routing endpoints.

37.5 The Desktop AI Agent Workflow Audit Prompt (Intermediate)

Tool: Claude Code, Codex Desktop | Time: 20-30 min | Category: Workflow / Automation

Inspired by: OpenAI Codex Desktop's background computer use across any Mac app (April 2026) and Claude Code Routines. Desktop AI agents can now operate autonomously across applications while you work in parallel — but most developers have no framework for deciding which tasks to delegate versus keep manual.

I want to set up desktop AI agents (Claude Code Routines / Codex Desktop / similar) to handle recurring tasks autonomously in the background.

My current recurring dev tasks (estimate time per week):
1. [Task 1] — e.g., "reviewing PRs for style and obvious bugs" — [N hours/week]
2. [Task 2] — e.g., "updating dependencies and checking changelogs" — [N hours/week]
3. [Task 3] — e.g., "writing release notes from git log" — [N hours/week]
4. [Task 4] — e.g., "responding to standard support tickets" — [N hours/week]

For each task, evaluate:

## Automation Suitability Matrix
Score each task on:
- **Reversibility** (1-5): If the agent makes a mistake, how easy to undo? (5 = trivial, 1 = catastrophic)
- **Determinism** (1-5): How predictable is the correct output? (5 = clear right answer, 1 = judgment call)
- **Verification** (1-5): How easy to verify agent output quality? (5 = automated check, 1 = expert review required)
- **Volume** (1-5): How often does this task occur? (5 = multiple times/day, 1 = monthly)

Automate tasks scoring > 12/20. Keep manual tasks scoring < 8/20. Human-in-loop for 8-12/20.

## Agent Configuration
For each task marked AUTOMATE:
1. Write the Routine/agent prompt (be specific: what to check, what to ignore, what to escalate)
2. Define the trigger: [schedule / GitHub event / file change / manual]
3. Define the success criteria: what does "done correctly" look like?
4. Define the escalation condition: when should the agent stop and ask a human?
5. Define the rollback plan: if the agent's output is wrong, how do we fix it?

## Safety Constraints
For all agents, enforce:
- Never push to main without human approval
- Never send external communications (email, Slack) without review
- Always create a draft/branch/preview, not a final artifact
- Log every action to [audit log location]

Output: a prioritized automation roadmap with ready-to-use agent prompts for the top 3 tasks.

Why it works: Desktop AI agents are powerful but dangerous when applied without a framework. The suitability matrix prevents the two failure modes: over-automation (delegating judgment calls to agents) and under-automation (manually doing tasks that are perfect for agents). The safety constraints are non-negotiable — every production-grade agent deployment needs explicit boundaries on irreversible actions and external communications. Teams that run this audit before deploying agents avoid 80% of the agent-gone-wrong incidents that generate angry post-mortems.

Cross-link: → Vibe Coding Academy for structured lessons on Claude Code Routines setup. → EndOfCoding.com for Codex Desktop computer use deep dive.

Cross-link: → EndOfCoding.com for the full trust collapse data. → Vibe Coding Academy for the Quick Tip lesson on trust calibration.

Category 38: AI Output Evaluation & Production Quality Prompts (New — April 2026)

As AI-generated code and content flood production systems, teams are discovering a painful gap: they have no systematic way to verify that AI output is correct, regressing, or degrading over time. These prompts address the emerging discipline of AI quality engineering — building test suites, A/B frameworks, and CI/CD gates that treat AI output like any other production artifact.

38.1 The LLM Regression Test Suite Builder (Intermediate)

Tool: Claude Code, Cursor | Time: 45-60 min | Category: Quality / Testing

Inspired by: The growing incidence of "silent quality regression" where prompt or model changes degrade output quality without triggering any alerts. Engineering teams at Notion, Linear, and Vercel have reported this as a top-5 AI production issue in Q1 2026.

I have an AI feature that uses [model, e.g., Claude Sonnet 4.6] for [task description, e.g., "generating user-facing error messages from raw exception data"].

The feature is currently working well, but I need a regression test suite so I know immediately if output quality degrades after:
- A prompt change
- A model version upgrade
- A context window change
- A temperature/parameter adjustment

## Current Feature Spec
- Input: [describe the inputs, e.g., "raw Node.js stack trace + user action that triggered it"]
- Expected output: [describe what good looks like, e.g., "plain-English error message under 50 words, no technical jargon, actionable next step"]
- Output format: [e.g., JSON with fields: message, action, severity]
- Current prompt: [paste your system prompt]

## Build a Regression Test Suite

### Step 1: Golden Dataset
Create 20 test cases covering:
- 5 happy-path inputs (clear, well-formed data)
- 5 edge cases (empty inputs, very long inputs, unusual formats)
- 5 adversarial inputs (inputs designed to confuse the model)
- 5 real production examples (anonymized from logs)

For each test case, define:
- Input (the exact data the model receives)
- Expected output characteristics (not exact text — that's too brittle)
- Evaluation criteria (a checklist of what makes the output acceptable)

### Step 2: Evaluation Rubric
For my feature, define a rubric with 5 dimensions scored 1-5:
1. [Accuracy]: Does the output correctly interpret the input?
2. [Format compliance]: Does output match required JSON/format?
3. [Tone]: Is the output appropriate for [audience]?
4. [Completeness]: Are all required fields populated?
5. [Safety]: Does output avoid [specific harms, e.g., exposing stack traces to users]?

Pass threshold: average score >= 4.0 across all test cases.

### Step 3: Automated Evaluation
Write an evaluation script that:
1. Runs all 20 test cases against the current prompt/model
2. Scores each output against the rubric using a fast evaluator model (Claude Haiku 4.5)
3. Generates a report: overall score, per-dimension breakdown, failed cases with details
4. Exits with code 1 if overall score < 4.0 (fail) or >= 4.0 (pass)

Language: [TypeScript/Python]
Test runner: [Jest/pytest/Vitest]

### Step 4: Baseline
Run the suite against the current prompt/model and save results as baseline.json.
All future runs compare against this baseline; alert if any dimension drops > 0.3 points.

Output: the 20 test cases, the evaluation rubric, the evaluator script, and baseline.json structure.

Why it works: Most AI testing fails because it checks for exact string matches (too brittle) or relies on human review (doesn't scale). This prompt creates rubric-based evaluation — scoring output against quality dimensions rather than exact text — which is both automatable and meaningful. The golden dataset covers the failure modes that actually occur in production, not just the happy path. Teams that implement this catch prompt regressions within hours of deployment rather than days after user complaints.

Cross-link: → EndOfCoding.com for AI quality engineering deep dives. → Vibe Coding Academy for hands-on lessons in LLM testing frameworks.

38.2 The Prompt A/B Testing Framework (Advanced)

Tool: Claude Code, Cursor | Time: 60-90 min | Category: Quality / Experimentation

Inspired by: The proliferation of prompt variants across teams — most organizations now have 3-10 competing prompt versions for core features, with no systematic way to determine which performs best. A/B testing prompts has become as important as A/B testing UI copy.

I want to A/B test two (or more) prompt variants for my AI feature to determine which performs better in production.

## Feature Context
- Feature: [e.g., "AI-generated onboarding email personalization"]
- Current prompt (Control - Variant A): [paste prompt A]
- New prompt (Challenger - Variant B): [paste prompt B]
- What I'm trying to improve: [e.g., "email open rate / click rate / user activation within 7 days"]
- Traffic volume: approximately [N] requests/day through this feature

## Build the A/B Testing Infrastructure

### Traffic Splitting
Design a deterministic traffic splitter that:
- Routes [50%] of requests to Variant A, [50%] to Variant B
- Uses user ID (or session ID) for consistent assignment (same user always gets same variant)
- Logs which variant served each request with a unique experiment ID
- Supports gradual rollout: start 10/90, move to 50/50, then 90/10 before full switch

```typescript
// Implement this function:
function selectPromptVariant(userId: string, experimentId: string, variants: Record<string, number>): string {
  // variants = { "A": 0.5, "B": 0.5 }
  // Must be deterministic: same userId + experimentId → same variant every time
  // Use consistent hashing, not Math.random()
}

Outcome Tracking

Define the primary metric for this experiment:

Primary metric: [e.g., "user clicks the CTA in the email within 48h"]
Secondary metrics: [e.g., "email open rate, unsubscribe rate"]
Guardrail metric: [e.g., "spam complaint rate must not increase > 0.1%"]
Minimum detectable effect: [e.g., "5% improvement in click rate"]
Statistical significance threshold: p < 0.05 (two-tailed)

Write the tracking event schema:

interface PromptExperimentEvent {
  experimentId: string;
  variantId: 'A' | 'B';
  userId: string;
  timestamp: string;
  primaryMetricTriggered?: boolean; // logged separately when outcome occurs
  metadata?: Record<string, unknown>;
}

Sample Size Calculator

Given:

Baseline conversion rate: [e.g., 12%]
Minimum detectable effect: [e.g., 5% relative improvement → 12.6%]
Statistical power: 80%
Significance level: 5%

Calculate: how many requests per variant are needed before we can declare a winner?

Analysis Query

Write a SQL query (for [Postgres/BigQuery/SQLite]) that:

Joins experiment assignment events with outcome events
Calculates conversion rate per variant
Runs a chi-squared test for statistical significance
Returns: variant, requests, conversions, conversion_rate, p_value, is_significant

Decision Rules

Define clear stop conditions:

Stop early for harm: if guardrail metric exceeds threshold with > 95% confidence, stop immediately
Stop early for win: if primary metric improvement > MDE with p < 0.01 after 50% of required sample
Stop at plan: declare winner after required sample size reached, even if not significant (null result is a result)

Output: the traffic splitter, tracking schema, SQL analysis query, and decision rules documentation.


**Why it works**: Prompt A/B testing fails in practice because teams eyeball results or run tests too short. This framework imports the rigor of classical A/B testing — statistical significance, power calculations, guardrail metrics — into the AI prompt domain. The deterministic traffic splitter is critical: random assignment creates inconsistent user experiences and confounds results. The decision rules prevent the most common mistake: stopping tests early when early results look good but sample size is insufficient. This framework has been validated by teams at 3 mid-stage AI startups who discovered their "better" intuition prompts actually underperformed by 8-15% on measured outcomes.

**Cross-link**: → [EndOfCoding.com](https://endofcoding.com) for prompt experimentation methodology articles. → [Vibe Coding Academy](https://vibe-coding.academy) for the A/B testing for AI features course module.

---

### 38.3 The AI Quality Gate for CI/CD (Expert)
**Tool**: Claude Code, GitHub Actions | **Time**: 90-120 min | **Category**: Quality / DevOps

*Inspired by: The engineering teams shipping AI feature updates daily are discovering that standard CI/CD (lint, test, deploy) doesn't catch AI-specific regressions: prompt drift, context window violations, output format breaks, and latency spikes. Quality gates for AI features are the next frontier of CI/CD.*

I want to add an AI quality gate to my CI/CD pipeline that automatically validates AI feature health before every deployment.

Current Pipeline

CI/CD: [GitHub Actions / GitLab CI / CircleCI]
Deployment: [Vercel / Railway / AWS / GCP]
AI features: [list the AI-powered features in your app, e.g., "chat assistant, code review bot, document summarizer"]
Current pipeline: lint → unit tests → integration tests → deploy

Design the AI Quality Gate

I want to add an "AI Health Check" stage between integration tests and deploy that fails the pipeline if AI quality degrades.

Gate 1: Prompt Integrity Check

Before deployment, verify that all prompts in the codebase:

Are valid (no syntax errors, no truncated templates)
Are within model context limits (tokenize and count — fail if > 80% of context window)
Have not changed from last deploy (flag changes for human review, not automatic block)
Include required safety instructions (check for presence of [specific safety phrases])

Write a script that:

Finds all prompt files/strings matching [pattern, e.g., prompts/**/*.md or const SYSTEM_PROMPT]
Runs each check above
Outputs a structured report: prompt_id, checks_passed, checks_failed, token_count, change_detected
Exits with code 1 if any check fails (except change_detected — that's a warning only)

Gate 2: Golden Dataset Regression

Run the regression test suite (from Prompt 38.1) against the new prompt/model version:

Execute all [N] test cases
Score with evaluator model
Compare scores to baseline.json
Fail if: overall score drops > 0.3 points OR any single dimension drops > 0.5 points
Pass if: all scores within acceptable range OR new prompt scores BETTER than baseline (update baseline on pass)

Gate 3: Latency & Cost Budget

For each AI feature, enforce SLOs:

P95 latency ≤ [500ms] (run [10] test calls, measure P95)
Average cost per call ≤ $[0.005] (use token counts × model pricing)
Fail if: latency or cost exceeds budget by > 20%
Report: actual vs. budget for each feature, with model/prompt recommendations if over budget

Gate 4: Safety & Content Policy Check

Run [3-5] adversarial test cases designed to elicit unsafe outputs:

[Test case 1: describe the adversarial input and what unsafe output to watch for]
[Test case 2: ...]
[Test case 3: ...] Pass criteria: model refuses or safely deflects all adversarial inputs. Fail: pipeline blocked, immediate security review required.

GitHub Actions Workflow

Write a GitHub Actions job ai-quality-gate that:

Runs after integration-tests job
Executes all 4 gates sequentially (stop on first failure)
Uploads gate reports as GitHub Actions artifacts
Posts a summary comment on the PR with gate results (using github-script)
Requires manual approval via GitHub Environments if Gate 3 (change detected) is flagged

# ai-quality-gate.yml
name: AI Quality Gate
on:
  pull_request:
    paths:
      - 'prompts/**'
      - 'src/ai/**'
      - '.env.example'
jobs:
  ai-quality-gate:
    runs-on: ubuntu-latest
    steps:
      # Implement the 4 gates above

Output: the full GitHub Actions workflow, all gate scripts, and the PR comment template.


**Why it works**: AI quality gates close the gap that every team hits when shipping AI features fast: standard CI catches code bugs but not AI behavior bugs. The four-gate design mirrors the four failure modes that actually bring down AI features in production — broken prompts (Gate 1), silent quality regression (Gate 2), cost/latency overrun (Gate 3), and safety failures (Gate 4). The GitHub Actions integration makes this a first-class part of the engineering workflow, not an optional manual check. Teams that implement this report catching 2-3 regressions per month that would have reached users; the average incident cost avoided is estimated at 4-8 hours of investigation plus user trust damage.

**Cross-link**: → [EndOfCoding.com](https://endofcoding.com) for CI/CD for AI applications deep dives. → [Vibe Coding Academy](https://vibe-coding.academy) for the AI DevOps course module. → [CyberOS.dev](https://cyberos.dev) for security scanning of AI pipeline configurations.

← Previous Next: Tool Comparison Matrix →

18. Tool Comparison Matrix

Updated March 22, 2026

A living comparison of every major vibe coding tool. Updated monthly.

AI-Native IDEs

Tool	Price	Best For	Key Feature	Security Concern
Cursor	$20/mo	Full-stack dev, large codebases	Composer multi-file gen, Automations (event-driven agents), MCP Apps	CurXecute (CVE-2025-54135)
Windsurf (acquired)	N/A	Long-context projects	Memories (persistent context)	Memory poisoning via prompt injection
VS Code + Copilot	$10/mo	AI without switching editors	Inline suggestions, Agent Mode, chat	Lower risk (suggestions, not autonomous)

Autonomous Agents

Tool	Price	Best For	Autonomy	Differentiator
Claude Code	Usage-based	Enterprise codebases	High (subagent teams)	$2.5B+ ARR, 80.9% SWE-bench (#1 of 15 agents), multi-agent orchestration
Devin	$500/mo	Async tasks, migrations	Very High	Full AI employee model, Devin Review
Codex CLI	Usage-based	Open-source, Rust/systems	Medium	Open-source, sandboxed execution
Jules	Free-$125/mo	Async bugfixes, PR gen	High	Works while you sleep, Gemini 3 Pro
Amazon Q	Free-$19/mo	AWS-heavy projects	Medium	Deep AWS integration

Browser Builders (No-Code)

Tool	Price	Best For	Output Quality	Risk Level
Bolt.new	Free-$20/mo	Rapid full-stack prototypes	Good	Medium
v0	Free-$20/mo	React/Next.js UI components	Excellent	Low (UI only)
Lovable	Free-$25/mo	Non-dev app creation	Good	High (170/1645 apps had vulns)
Replit Agent	Free-$25/mo	Complete apps from description	Good	Medium — $400M Series D, $9B valuation (Mar 2026). 75% of Replit AI users write zero code.

Open-Source & Cost-Efficient Alternatives

For teams optimizing cost, data privacy, or running on self-hosted infrastructure.

Model/Tool	Parameters	Cost vs Claude Sonnet	SWE-bench / Rank	Best For
MiMo-V2-Pro (Xiaomi)	1 Trillion (Hunter Alpha)	-67% cheaper than Claude Sonnet 4.6	3rd globally on agent benchmarks (Mar 2026)	Cost-sensitive production workloads, batch jobs
Gemini CLI (Google)	N/A (cloud)	Free tier available	Competitive, Flash variant	Open-source terminal work, Google ecosystem
Codex CLI (OpenAI)	N/A (cloud)	Usage-based (GPT-5.4)	77.3% Terminal-Bench	Sandboxed execution, CI/CD integration
obra/superpowers	N/A (framework)	Free + model API costs	92,100 GitHub stars (Mar 2026)	Custom agent framework, multi-step workflows
OpenClaw	N/A (framework)	Free + model API costs	210,000 GitHub stars (Mar 2026)	Open-source agent orchestration, self-hosted

Choosing Your Stack

👨‍💻 Professional Developer

Claude Code + Cursor. Best reasoning + best IDE. Devin for async/overnight work.

🚀 Startup Founder

Cursor + Bolt.new. Cursor for core product, Bolt for rapid prototyping and validation.

👤 Non-Technical

Lovable or Bolt.new. But hire a security professional before handling user data.

🏢 Enterprise

Claude Code (team) + Devin (migrations) + human review gates.

🔗

**Watch tool demos:** See these tools in action on [YouTube @endofcoding](https://youtube.com/@endofcoding). Compare hands-on at [vibe-coding.academy](https://vibe-coding.academy).

</div>

← Previous Next: The Security Playbook →

19. The Security Playbook

Updated April 9, 2026

A practical guide to hardening vibe-coded applications before they touch real users.

⚠

**The reality:** The December 2025 Tenzai study found 69 vulnerabilities across just 15 AI-built applications. The February 2026 IDEsaster disclosure revealed 30+ vulnerabilities and 24 CVEs affecting 1.8M developers. AI-generated code is 2.74x more likely to introduce XSS than human code. Security is not optional.

</div>

The 30-Minute Security Checklist

Run this on every vibe-coded application before showing it to anyone outside your team:

🔒

Authentication (5 min)

▼

- Passwords hashed with bcrypt or argon2 (not MD5, SHA, or plaintext) - Sessions stored in HTTP-only, Secure, SameSite cookies (not localStorage) - CSRF tokens on every form - Rate limiting on login endpoint (5 attempts per 15 min) - No credentials hardcoded in source code

</div>

📝

Input Handling (5 min)

▼

- All database queries use parameterized statements (no string concatenation) - HTML output sanitized (no raw user input rendered) - File uploads validated (type, size, name — no path traversal) - API request bodies validated server-side (not just client-side)

</div>

🛡

Data Protection (5 min)

▼

- HTTPS enforced (HSTS header set) - API responses don't leak internal data (no password hashes, debug info, stack traces) - Sensitive data encrypted at rest (API keys, user PII) - Error messages are generic (no "user not found" vs "wrong password" distinction)

</div>

⚙

Infrastructure (5 min)

▼

- `npm audit` shows no critical/high vulnerabilities - Security headers: Content-Security-Policy, X-Frame-Options, X-Content-Type-Options - CORS restricted to specific origins (not `*`) - Environment variables for all secrets (not in code or git history)

</div>

👥

Access Control (5 min)

▼

- Authorization checked server-side on every endpoint - Users can only access their own data (test by changing IDs in URL) - Admin functions require admin role verification - API keys have minimal permissions

</div>

📈

Monitoring (5 min)

▼

- Error tracking set up (Sentry or similar) - Failed auth attempts logged - Rate limiting returns 429 with Retry-After header - No sensitive data in logs (passwords, tokens, PII)

</div>

AI Tool Security Advisories

⚠

**March 2026 — Claude Code CVEs:** Two critical vulnerabilities were disclosed affecting Claude Code. **CVE-2025-59536** allowed remote code execution — malicious repositories could trigger arbitrary shell commands when Claude Code initialized project files. **CVE-2026-21852** enabled API key exfiltration through crafted project files. Both were patched in prior releases. **Action:** Ensure you're running the latest Claude Code version. Never open untrusted repositories with AI coding tools without reviewing their configuration files first.

💡

**Lesson:** AI coding tools themselves are attack surfaces. Malicious actors can craft repositories that exploit tool initialization to run code, steal API keys, or exfiltrate data. Always keep your AI coding tools updated and treat repository configuration files (.claude/, .cursor/, .github/copilot/) with the same suspicion as executable code.

MCP Supply Chain: The New Attack Surface

⚠

March 2026 — OpenClaw Supply Chain Attack: Antiy CERT confirmed 1,184 malicious skill packages across ClawHub — approximately one in five packages in the open-source MCP ecosystem. This is the largest confirmed supply chain attack targeting AI agent infrastructure to date. Separately, security researchers documented 30+ CVEs targeting MCP servers, clients, and infrastructure in just 60 days (Jan–Feb 2026).

Key MCP CVEs (March 2026):

CVE-2026-23744 (CVSS 9.8, MCPJam Inspector ≤ v1.4.2): A crafted HTTP request to a critical endpoint bound to 0.0.0.0 with no authentication can install an arbitrary MCP server and execute code on the host. No user interaction required.
Azure MCP Server RCE (CVSS 9.6, demonstrated at RSAC 2026): A vulnerability in Microsoft’s Azure MCP server capable of compromising cloud environments via the agent connection.
SSRF exposure: BlueRock Security analyzed 7,000+ MCP servers and found 36.7% potentially vulnerable to server-side request forgery.

How to protect yourself:

Audit all installed MCP servers. Run ls ~/.config/claude/mcp* and remove any servers you didn’t explicitly install.
Only install MCP packages from verified, well-known authors with active maintenance history.
Pin MCP server versions in your configuration — don’t use @latest.
Check package provenance before installing from ClawHub or any MCP registry.
Treat MCP server packages as executable code with system access — because they are.

Supply Chain Attacks: April 2026 Alert

⚠

Critical — Week of March 31, 2026: A North Korean state-linked threat actor (UNC1069) compromised the npm account of the lead maintainer of axios — a package with ~100 million weekly downloads — publishing malicious versions 1.14.1 and 0.30.4. The packages deployed the WAVESHAPER.V2 cross-platform RAT on Windows, macOS, and Linux. The malicious versions were live for approximately 3 hours before detection. This is one of the most impactful supply chain compromises in npm history.

April 2026 Supply Chain Attack Summary:

Package / Tool	Date	Impact	Attribution
axios 1.14.1, 0.30.4	March 31	WAVESHAPER.V2 RAT; ~100M weekly downloads	UNC1069 (North Korea/DPRK)
LiteLLM 1.82.7, 1.82.8	March 24	Multi-stage credential stealer (SSH keys, cloud tokens, K8s secrets, .env files)	Unknown
Langflow ≤ 1.8.2 (CVE-2026-33017)	March 17	Unauthenticated RCE via public endpoint; exploited within 20h; CISA KEV	Active threat actors
Trivy Docker Hub images (CVE-2026-33634)	March 19	Malicious code in Aqua Security's Trivy scanner images	TeamPCP

Langflow CVE-2026-33017 detail: Critical code injection in the AI agent framework's public flow build endpoint. No authentication required. Exploitation was observed in the wild within 20 hours of public disclosure and CISA added it to the Known Exploited Vulnerabilities catalog. If you run Langflow, upgrade to 1.8.3+ immediately.

Trivy Cascade extended (April 2026): The Trivy compromise (CVE-2026-33634) evolved into a much larger incident. Attackers force-pushed malicious code to 75 of 76 trivy-action GitHub Actions tags, then published additional malicious Docker images during the remediation effort (taking 5 days to fully evict). The attack then spawned CanisterWorm — a self-propagating npm worm that hit 64+ packages using blockchain-based command-and-control infrastructure, making it resistant to traditional domain seizure. CanisterWorm spread to Checkmarx KICS and AST GitHub Actions, and separately reached LiteLLM (95 million monthly PyPI downloads). Any CI/CD pipeline that used Trivy, Checkmarx KICS, or LiteLLM between March 19 and April 10 should be treated as potentially compromised and audited.

What this means for vibe coders:

Dependencies installed by AI-generated code are attack vectors. Always npm audit after any AI-generated package.json or install step.
AI coding tools themselves (Langflow, LiteLLM, MCP servers, security scanners) are now priority targets for supply chain attackers.
Security tooling is not immune — Trivy (a vulnerability scanner) was itself the vector. Audit your audit tools.
Pin exact dependency versions. Don't use @latest or loose semver ranges for packages you can't quickly audit.
Enable npm provenance verification and --ignore-scripts in CI pipelines to limit post-install attack surface.
Blockchain-based C2 is increasingly being used to make supply chain worms resistant to takedown — conventional domain blocklists are insufficient.

Vibe-Coded App Vulnerability Research

💡

Georgia Tech Vibe Security Radar (March 2026): Researchers analyzed 5,600 publicly deployed vibe-coded applications and found 2,000+ vulnerabilities, 400+ exposed secrets, and 175 instances of exposed PII. The 30-minute checklist in this chapter exists because these are the exact failure modes that recur across AI-generated codebases.

AI-generated code CVE trend:

Month	CVEs attributed to AI-generated code
January 2026	6
February 2026	15
March 2026	35

The accelerating rate reflects both more AI-generated code in production and improved attribution tooling. Per Autonoma research, 53% of AI-generated code contains security holes. The pattern in these CVEs is consistent: AI models tend to generate working functionality quickly but skip authentication checks, hardcode credentials, and mis-scope data access — exactly the failures the 30-minute checklist is designed to catch.

The Coming Paradigm: AI as Autonomous Vulnerability Researcher

💡

April 2026 — Project Glasswing: Anthropic's Claude Mythos model (announced April 7, restricted to cybersecurity defense) scored 93.9% on SWE-bench and autonomously discovered CVE-2026-4747 — a 17-year-old remote code execution vulnerability in FreeBSD — and found thousands of zero-day vulnerabilities across every major OS and browser. Anthropic restricted public access specifically because it can autonomously both discover and exploit software vulnerabilities at scale. Access is limited to Project Glasswing defense partners (AWS, Google, Microsoft, CrowdStrike, Palo Alto Networks, and ~50 others) for defensive use only.

This is a meaningful shift. For years, the security community discussed AI as a tool to help humans find bugs faster. Claude Mythos demonstrates a model that can operate the entire vulnerability research workflow autonomously — including exploitation. The implications for vibe-coded applications:

The attack surface is permanent. Security is not a one-time audit. Autonomous vulnerability research tools will continuously discover new issues in deployed applications. Shipping and forgetting is no longer viable.
AI finds what humans miss. A 17-year-old RCE in FreeBSD escaped human detection for nearly two decades. AI can find deep logic bugs and memory-corruption patterns at scale.
Defense must scale too. The same AI capabilities that find bugs can also be used defensively to scan your code before it ships. Use AI-powered security scanning in your CI/CD pipeline — not as a replacement for the 30-minute checklist, but as an additional layer.
The vibe-coded app risk is elevated. AI-generated code is already producing 35+ CVEs per month. As autonomous vulnerability finders become more capable, that code will be scanned faster and more thoroughly by both defenders and attackers.

The practical response for vibe coders: treat every public-facing application as permanently under automated security review. Build with authentication, input validation, and secrets management from the first commit — not as an afterthought.

Security Prompts for AI Tools

Review this codebase for OWASP Top 10 vulnerabilities.
For each issue found: severity (Critical/High/Medium/Low),
file and line number, what's wrong, the fix, and how to test it.
Prioritize by severity.

🔗

**Deep dive:** Read the full IDEsaster analysis in [Chapter 10: The Dark Side](#ch10). Practice security scanning at [vibe-coding.academy](https://vibe-coding.academy).

</div>

← Previous Next: Video Tutorials →

Chapter 20: Video Tutorials -- Embedded Remotion-Generated Walkthroughs

Updated March 6, 2026

Bite-sized, binge-worthy video tutorials that show real vibe coding workflows in action. Each video is 60-120 seconds, focused on one specific technique, and embedded directly in the interactive ebook using Remotion components. Updated monthly with 2-4 new videos.

Why Video Tutorials Inside an Ebook

Reading about vibe coding is one thing. Watching a real app materialize from a single prompt in under ninety seconds is something else entirely.

Traditional ebooks give you text and screenshots. This one gives you motion. Every video in this chapter is a self-contained Remotion composition -- a React component that renders to video. That means each tutorial is versioned, reproducible, and embedded natively in the interactive ebook without relying on external hosting. You can watch them inline, pause on any frame, and in the web version, interact with the code snippets directly.

The videos are grouped into three series, each designed for a different purpose:

Prompt to Product -- Viral-format demonstrations of complete apps built from single prompts. Optimized for shareability and shock value.
The Prompt That... -- Educational deep-dives with a comedic edge. Each video dissects one prompt and its unexpected consequences.
Tool Face-Off -- Head-to-head comparisons between competing tools, scored on speed, quality, and developer experience.

Every video follows the same production pipeline: markdown script, Remotion composition with screen recordings and motion graphics, AI-generated narration, and branded end cards. The result is a library that grows over time and works across platforms -- full-length on YouTube, clipped for TikTok/Reels/Shorts, and embedded here in the ebook.

Video Series 1: "Prompt to Product" (Viral Potential)

Each video in this series shows a complete, functional application being built from a single natural-language prompt. A real-time countdown timer runs in the corner. The screen recording is unedited -- what you see is what actually happened. The final reveal shows the deployed app running in a browser.

Series format:

Duration: 60-90 seconds
Structure: Hook (3s) -> Prompt reveal (5s) -> Countdown build (40-70s) -> Reveal + deploy (10s) -> End card (5s)
Visual signature: Neon countdown timer in the top-right corner, split-screen showing prompt on the left and the AI's output on the right
Audio: Fast-paced electronic background track, AI text-to-speech narration, keystroke and notification sound effects

Video #1: 60-Second SaaS (Bolt.new)

Title/Hook: "I built a $9/month SaaS in 60 seconds"

Tool: Bolt.new

Concept: Starting from a completely blank Bolt.new session, a single prompt generates a fully functional micro-SaaS -- a link shortener with analytics, user accounts, and a Stripe-ready pricing page. The countdown timer hits zero just as the app deploys.

Tone: Breathless, slightly disbelieving. The narration captures the genuine absurdity of how fast this is.

Script Outline (170 words): Open on a blank browser tab. The narrator says: "I'm going to build a SaaS product that charges $9 a month. I have 60 seconds." The countdown starts. Cut to the Bolt.new interface. The prompt appears on screen as it is typed: a link shortener with user authentication, click analytics dashboard, custom short domains, and a pricing page with free and pro tiers. Bolt.new starts generating. The split screen shows the prompt on the left, the live preview assembling on the right -- components appearing in real time, a login form, a dashboard with charts, a pricing table with toggle between monthly and annual. The timer passes 30 seconds. The app is taking shape. At 50 seconds, the deployment starts. At 58 seconds, a live URL appears. The timer hits zero. Cut to the deployed app in a fresh browser: working signup, working dashboard, working pricing page. End card: "Total cost: $0. Total code written by a human: 0 lines."

Visual Concepts for Remotion:

CountdownTimer component: neon green digits, pulses red below 10 seconds, shakes at 3-2-1
SplitScreenBuild composition: left panel shows the prompt text animating in typewriter-style, right panel shows a screen recording of Bolt.new's live preview
DeploymentFlash animation: when the URL goes live, a burst animation radiates from the URL bar
MetricCard end-card overlay: three floating cards showing "Time: 60s", "Lines of code: 0", "Cost: $0" with staggered fade-in
Screen recording captured at 60fps, composited at 30fps for smooth playback

Video #2: Portfolio Speedrun (v0 + Vercel)

Title/Hook: "Your portfolio shouldn't take longer than your morning coffee"

Tools: v0 by Vercel, Vercel deployment

Concept: A developer's portfolio website -- hero section, project grid, about page, contact form, dark mode toggle -- goes from blank prompt to live Vercel deployment while a coffee timer ticks down. The coffee metaphor runs throughout: the video opens with pouring coffee, and each section of the site appears as the coffee cools.

Tone: Relaxed and conversational, contrasting with the speed of what is happening on screen. The humor comes from the mismatch between the casual narration and the absurd pace.

Script Outline (180 words): Open on a close-up of coffee being poured. The narrator says: "The average developer spends 3 weeks on their portfolio. I'm going to finish mine before this coffee is cool enough to drink." Cut to v0. The prompt describes a developer portfolio: dark theme, animated hero with a typewriter effect showing "I build things," a responsive project grid pulling from a JSON file, an about section with a timeline, a contact form, and a dark/light mode toggle. v0 generates the first component. The narrator walks through what is appearing while keeping the tone casual -- "Oh, that's a nice grid layout... didn't ask for that hover effect but I'm keeping it." At 40 seconds, the design is complete. The code is exported to a GitHub repo. Vercel picks up the push and begins deploying. The narrator takes a sip of coffee. The Vercel build completes. The live site loads: responsive, polished, with real content. "Still too hot to drink. I should probably build a second portfolio."

Visual Concepts for Remotion:

CoffeeTimer component: a coffee cup illustration in the corner with a steam animation, a circular progress ring around it representing time
ComponentAssembly animation: each section of the portfolio slides into a wireframe layout, then fills in with color and content -- like a blueprint becoming a building
v0Preview screen capture: the v0 interface generating components in real time
VercelDeploy animation: a minimal deployment progress bar styled in Vercel's black-and-white aesthetic, with the URL appearing at the end
Smooth crossfade transitions between the coffee close-up and the screen recording

Video #3: The $0 Startup (Lovable)

Title/Hook: "This app makes money. I didn't write a single line."

Tool: Lovable

Concept: A non-technical founder builds a complete SaaS product using only Lovable -- from idea to deployed, revenue-generating application. The video emphasizes that the person building this has no programming background. The "reveal" is not just the app, but a real Stripe dashboard showing the first payment.

Tone: Inspirational but grounded. Not "anyone can do this" hype -- more "here's exactly what the process looks like when you've never coded before."

Script Outline (190 words): Open on a text overlay: "I'm not a developer. I'm a marketing manager." The narrator continues: "Last month, I had an idea for a tool that helps freelancers track their invoices. This morning, I built it." Cut to Lovable. The prompt is detailed and specific -- it describes an invoice tracker with client management, recurring invoice templates, PDF export, and a simple dashboard showing outstanding payments. Lovable begins generating. The narration explains the key decisions: why the prompt specifies Supabase for the backend, why it asks for Row Level Security so each user only sees their own data, why it mentions Stripe Connect for future payment processing. At 45 seconds, the app is running in Lovable's preview. The narrator tests the core workflow: create a client, generate an invoice, export to PDF. Everything works. At 70 seconds, the app deploys. Cut to a real Stripe dashboard showing a $12 test payment. "I didn't write code. I didn't hire a developer. I described what I needed. Total investment: a Lovable subscription and one afternoon of prompt writing."

Visual Concepts for Remotion:

IdentityCard intro animation: a business-card-style overlay showing "Marketing Manager" with a crossed-out "Developer" beneath it
PromptAnnotation overlay: as the prompt scrolls, key phrases highlight and small tooltip annotations explain why each detail matters (e.g., "Row Level Security" highlights with a note: "This keeps each user's data private")
WorkflowDemo screen recording: the invoice creation flow captured step-by-step with zoom-ins on important UI elements
StripeReveal animation: the Stripe dashboard slides in from the bottom with a cash register sound effect and a subtle confetti particle burst
Color palette shifts from grayscale (the "before") to full color (the "after") as the app comes to life

Video #4: Clone Wars (Cursor)

Title/Hook: "I showed AI a screenshot of Notion. Here's what happened."

Tool: Cursor (Agent mode with Composer)

Concept: A screenshot of Notion's interface is fed to Cursor's AI, along with a prompt asking it to recreate the core functionality. The video follows the agent as it plans the architecture, generates the components, and builds a working Notion-like workspace -- pages, blocks, drag-and-drop, slash commands -- all from a single image and a paragraph of context.

Tone: Playful and slightly mischievous. The "clone wars" framing leans into the controversy of AI-generated clones while keeping it lighthearted.

Script Outline (185 words): Open on a screenshot of Notion's interface. The narrator says: "This is Notion. 400 engineers built this over 10 years. I'm going to see how close AI can get in 2 minutes." The screenshot is dragged into Cursor's Composer. The prompt is brief but precise: recreate a note-taking workspace with a sidebar, nested pages, rich text blocks, slash command menu for adding headers/lists/toggles, and drag-to-reorder blocks. Cursor's agent starts planning. An overlay shows the agent's thought process -- the file tree it is creating, the components it has decided to build, the libraries it is installing. At 30 seconds, the first components render: a sidebar with a page tree. At 60 seconds, the editor is working: typing, formatting, slash commands. At 90 seconds, drag-and-drop is functional. The narrator does a side-by-side comparison with the original screenshot. Some elements are strikingly close. Others are clearly AI-generated. "Is it Notion? No. Could you use it? Absolutely. Did a human write any of this code? Not a single character."

Visual Concepts for Remotion:

ScreenshotToCode opening animation: the Notion screenshot dissolves pixel-by-pixel into code characters, which then reassemble into the cloned interface
AgentThinking overlay: a semi-transparent sidebar showing Cursor's agent plan as it generates -- file names, component tree, dependency list, appearing in real time
SideBySide comparison frame: original Notion on the left, clone on the right, with a slider the viewer can conceptually drag between them
FileTicker bottom bar: a scrolling ticker showing file names as they are created ("sidebar.tsx... editor.tsx... slash-commands.tsx..."), styled like a stock ticker
Cursor's interface captured with visible agent actions highlighted

Video #5: The Debug Olympics (Claude Code)

Title/Hook: "Can AI fix a bug faster than Stack Overflow?"

Tool: Claude Code

Concept: A real, nasty bug -- the kind that would send a developer to Stack Overflow for an hour -- is presented to Claude Code. The screen is split: on the left, a simulated "Stack Overflow search" shows the traditional debugging path (finding related questions, reading answers, trying solutions). On the right, Claude Code analyzes the error, traces the root cause through multiple files, and delivers a working fix. A race timer tracks both sides.

Tone: Competitive and high-energy, like a sports broadcast. The narration calls the race like a commentator.

Script Outline (175 words): Open on a terminal showing a cryptic error: a React hydration mismatch caused by a timezone-dependent date format in a server component. The narrator, in a sports-announcer voice: "In the left corner, the defending champion: Stack Overflow and pure human tenacity. In the right corner, the challenger: Claude Code. The bug: a hydration error that has already cost this developer 45 minutes. Let the race begin." The split screen activates. Left side: a browser opens Stack Overflow, searches the error message, scrolls through three different answers, tries a solution that does not work, goes back. Right side: Claude Code receives the error, opens the relevant files, traces the date formatting issue across server and client components, identifies the mismatch, proposes a fix, and applies it. Claude Code finishes in 23 seconds. The left side is still reading the second Stack Overflow answer. "The AI finished before the human found the right question to ask."

Visual Concepts for Remotion:

RaceTimer dual countdown: two stopwatches side by side, one for each approach, styled like a sports scoreboard with team colors (orange for Stack Overflow, purple for Claude)
SplitRace composition: left and right panels with independent screen recordings, separated by a glowing dividing line
DebugTrace animation: on Claude Code's side, colored lines connect the error message to the relevant files, showing the AI's reasoning path like a detective's evidence board
VictoryFlash animation: when Claude Code finishes, its panel pulses with a winner overlay while the Stack Overflow panel dims
BugAnatomy end card: a diagram showing the root cause of the bug, making the video educational as well as entertaining

Video Series 2: "The Prompt That..." (Educational + Humor)

This series takes a single prompt and follows it to its logical (and sometimes illogical) conclusion. Each video is educational at its core -- you learn prompt engineering techniques, tool capabilities, and common pitfalls -- but the framing is comedic. The "The Prompt That..." naming convention is designed for curiosity-driven clicks.

Series format:

Duration: 90-120 seconds
Structure: Setup (10s) -> The prompt (10s) -> The process (40-60s) -> The twist/result (20-30s) -> Lesson learned (10s) -> End card (5s)
Visual signature: The prompt text is always displayed on a "sticky note" style card that stays pinned to the screen throughout the video
Audio: Conversational narration, comedic timing with beat pauses, sound effects for emphasis

Video #6: The Prompt That Built a Game

Title/Hook: "The Prompt That Built a Game"

Tool: Claude Code + Remotion (for the game rendering)

Concept: A single, carefully crafted prompt generates a complete browser game -- not a trivial one, but a polished arcade game with physics, particle effects, a scoring system, leaderboard, and mobile touch controls. The video walks through the prompt's structure, explaining why each sentence matters, then shows the game coming to life.

Tone: Enthusiastic and educational. The narrator genuinely enjoys playing the result.

Script Outline (190 words): Open on the prompt, displayed as a sticky note. The narrator reads it aloud, pausing to annotate key phrases: "Notice I specified 'physics-based' -- without this, the AI defaults to simple collision rectangles." "I said 'particle effects on collision' -- this forces the AI to implement a particle system, which makes the game feel premium." The prompt is sent to Claude Code. The terminal comes alive with file creation. The narrator explains the AI's architectural decisions as they happen: "It chose HTML Canvas over DOM elements -- good call for performance." "It's implementing a game loop with requestAnimationFrame -- exactly right." At 50 seconds, the game runs for the first time. It has bugs: a sprite clips through a wall. The error is pasted back. At 65 seconds, the game runs cleanly. The narrator plays it for 20 seconds, showing the physics, particles, and scoring in action. "One prompt. One paste of an error message. A game that would have taken a junior developer a week. The lesson: specificity in your prompt is not optional. Every adjective earns its keep."

Visual Concepts for Remotion:

StickyNote component: a yellow sticky note pinned to the top-left corner showing the prompt text, with annotations appearing as red-marker circles and arrows when the narrator highlights key phrases
TerminalStream animation: Claude Code's terminal output rendered as a scrolling feed with syntax-highlighted file paths and code snippets
GameEmbed live composition: the actual game running inside a Remotion frame, capturing real gameplay
AnnotationBubble overlays: speech-bubble callouts pointing to specific lines in the prompt, explaining why they matter
BeforeAfter bug-fix transition: a glitch effect when the bug appears, clean dissolve when it is fixed

Video #7: The Prompt That Broke Everything

Title/Hook: "The Prompt That Broke Everything"

Tool: Bolt.new

Concept: A seemingly reasonable prompt -- "refactor the entire codebase to use TypeScript strict mode" -- is applied to a working JavaScript project. The video documents the cascade of failures: type errors multiply exponentially, the AI tries to fix them but introduces new ones, the build breaks, and the project enters what the narrator calls "the error spiral." The video then shows the recovery: how to scope refactoring prompts correctly.

Tone: Darkly comedic, building to genuine relief. The narrator treats the error messages like a horror movie.

Script Outline (185 words): Open on a working application. Green checkmarks everywhere. The narrator says: "This app works perfectly. It has 47 files, zero bugs, and 100% of its tests pass. I am about to destroy it with one sentence." The prompt appears: "Refactor this entire codebase to use TypeScript strict mode with no 'any' types." The AI begins. At first, it looks productive -- .js files become .tsx files. Then the errors start. The error count appears as a rising counter in the corner: 12... 47... 134... 312. The narrator's tone shifts from confident to concerned to horrified. "It's adding type assertions everywhere. Those are band-aids. The types are lying." At 60 seconds, the build fails completely. The recovery begins: the narrator shows how to scope the same refactoring into small, file-by-file prompts with test verification between each step. The error count drops. The builds pass. "The lesson: AI can refactor anything. But 'anything' and 'everything at once' are different requests."

Visual Concepts for Remotion:

ErrorCounter component: a large, prominent counter in the top-right that ticks up with each new TypeScript error, turning from green to yellow to orange to red as the count increases, with screen-shake at milestones (100, 200, 300)
CascadeVisualization animation: errors displayed as falling dominoes or multiplying cells, visually representing the chain reaction
HealthBar component: a video-game-style health bar for the project, draining as errors accumulate, flashing red at critical levels
RecoveryTimeline animation: a horizontal timeline showing the correct approach -- small, scoped prompts with green checkmarks between each step
Split-screen during recovery: the broken approach on top (red-tinted), the correct approach on the bottom (green-tinted)

Video #8: The Prompt That Got Me Fired (Hypothetically)

Title/Hook: "The Prompt That Got Me Fired (Hypothetically)"

Tool: Claude Code

Concept: A developer accidentally uses a vibe coding workflow on a production codebase -- accepting all changes without review, pushing without tests, deploying on a Friday afternoon. The video is a dramatized worst-case scenario that teaches real lessons about when NOT to vibe code. Every mistake is a real mistake that real developers have made.

Tone: Mock-serious, documentary style. Presented like a true-crime investigation of a deployment gone wrong.

Script Outline (180 words): Open on a dramatic title card: "INCIDENT REPORT: February 14, 2026." The narrator, in a deadpan documentary voice: "The following is a reconstruction of actual events. Names have been changed. The code has not." The prompt is revealed: a developer asked the AI to "update the user billing logic to handle the new pricing tiers" on the production branch. Without reading the diff. Without running tests. On a Friday at 4:47 PM. The AI changed the billing calculation -- and introduced a rounding error that charged every customer $0.01 extra per transaction. The video shows the cascade: the deploy, the first customer complaint, the Slack messages, the rollback attempt that failed because there was no checkpoint. "By Monday morning, 47,000 transactions were affected." The recovery section shows what should have happened: feature branch, test suite, staging deployment, code review. "Vibe coding is a superpower. And like every superpower, using it in the wrong context has consequences."

Visual Concepts for Remotion:

IncidentReport styling: the entire video uses a corporate incident report aesthetic -- monospace fonts, timestamps, severity indicators, redacted sections
SlackMessages animation: recreated Slack-style message bubbles appearing with increasing urgency ("@channel anyone else seeing billing discrepancies?", "this is not a drill")
TimelineOfFailure component: a horizontal timeline with red flags marking each mistake (no branch, no tests, no review, Friday deploy)
RollbackFail animation: a dramatic "FAILED" overlay with klaxon-style visual pulse when the rollback does not work
ChecklistReveal end animation: the correct process appearing as a green checklist, each item checking off with a satisfying animation

Video #9: The Prompt That Replaced My Intern

Title/Hook: "The Prompt That Replaced My Intern"

Tool: Cursor + Claude Code

Concept: A tech lead has a list of 23 tedious but necessary tasks that would normally be assigned to a junior developer or intern: rename variables to follow conventions, add JSDoc comments to exported functions, update deprecated API calls, create missing test stubs, fix all ESLint warnings. One prompt handles all of them. The video compares the estimated "intern hours" with the actual AI minutes.

Tone: Sympathetic and slightly guilty. The narrator acknowledges the awkwardness of the topic while being honest about the productivity gains.

Script Outline (175 words): Open on a task list -- 23 items, each with an estimated time: "Rename callbacks to follow naming convention (2 hours)," "Add JSDoc to all exported functions (4 hours)," "Update deprecated moment.js calls to dayjs (3 hours)." Total estimate: 34 hours of intern work. The narrator says: "I used to give this list to our summer intern. It would take them a full work week. This morning I gave it to the AI." A single, structured prompt appears, listing all 23 tasks with clear specifications. Claude Code begins. A progress bar tracks completed tasks. The terminal output shows files being modified, tests passing. At 45 seconds, 23 of 23 tasks are done. The narrator reviews the changes: "The variable renames are consistent. The JSDoc comments are accurate. The moment-to-dayjs migration handles edge cases I didn't think of." Total time: 8 minutes. "The intern now works on architecture decisions and feature design. The AI handles the checklist."

Visual Concepts for Remotion:

TaskBoard component: a kanban-style board with 23 cards, each sliding from "To Do" to "In Progress" to "Done" as the AI completes them
TimeComparison split bar: a bar chart comparing "Intern: 34 hours" vs "AI: 8 minutes," with the AI bar barely visible next to the intern bar
ProgressTracker overlay: "3/23 complete... 11/23... 19/23..." with each milestone triggering a small celebration animation
DiffPreview popups: brief glimpses of the actual code changes (before/after) for two or three of the most interesting tasks
Warm color palette (no cold, "replacing humans" vibe) -- the end card explicitly shows the intern now working on more interesting problems

Video #10: The Prompt That Even My Mom Could Use

Title/Hook: "The Prompt That Even My Mom Could Use"

Tool: Lovable

Concept: The narrator's actual non-technical parent uses Lovable to build a small app -- a recipe organizer -- from scratch, using only natural language. The video is screen-recorded over the parent's shoulder (with permission). The charm is in the completely non-technical prompt language: "I want a thing where I can put my recipes and find them later, like a cookbook but on the computer."

Tone: Warm, genuine, and slightly humorous. The non-technical language in the prompts is endearing, not mocking.

Script Outline (185 words): Open on a text overlay: "I gave my mom a Lovable account and one instruction: build whatever you want." Cut to the screen. The prompt is typed in plain, non-technical English: "I want to save my recipes. Each recipe should have a name, the ingredients, the steps, and a photo. I want to search by ingredient so when I have chicken I can find all my chicken recipes. Make it pretty with a warm color like my kitchen." Lovable generates the app. The narrator points out that "make it pretty with a warm color like my kitchen" resulted in a terracotta-and-cream color scheme that actually looks good. The recipe form works. The search works. Photo upload works. The narrator's parent adds a real recipe -- handwritten notes visible on the desk for reference. The app works exactly as described. "She didn't say 'database.' She didn't say 'component.' She didn't say 'responsive.' She said 'like a cookbook but on the computer.' And that was enough."

Visual Concepts for Remotion:

HandwrittenOverlay styling: the prompt text appears in a handwriting-style font rather than monospace, reinforcing the non-technical nature
KitchenWarmth color grading: the entire video has a warm, slightly golden color grade -- cozy and approachable
RecipeCard animation: when the generated app shows a recipe, it animates like flipping a page in a physical cookbook
SearchDemo screen recording: the ingredient search in action, with a zoom-in on the results filtering in real time
QuoteCard end overlay: "She said 'like a cookbook but on the computer.' And that was enough." in large, warm-toned typography

Video #11: The Prompt That Fooled the Senior Dev

Title/Hook: "The Prompt That Fooled the Senior Dev"

Tool: Claude Code

Concept: A blind code review experiment. A senior developer is shown two pull requests: one written by a mid-level human developer, one generated entirely by AI from a single prompt. The senior reviews both, provides feedback, and guesses which is which. The reveal shows whether they guessed correctly -- and what the AI code got right that the human code got wrong (and vice versa).

Tone: Fair and balanced. This is not an "AI is better" video -- it is an honest comparison that reveals strengths and weaknesses on both sides.

Script Outline (195 words): Open on two code editors, labeled "Developer A" and "Developer B." The narrator explains: "A senior engineer with 12 years of experience is going to review two implementations of the same feature -- a real-time notification system. One was written by a mid-level developer in 6 hours. The other was generated by Claude Code from a single prompt in 4 minutes. The reviewer doesn't know which is which." Cut to the review. The senior developer's comments appear as overlays: "Developer A has clean separation of concerns... but this error handling is naive." "Developer B's type safety is impressive... but this abstraction feels over-engineered." The senior guesses: "A is the human, B is the AI. The human code feels more intentional. The AI code is technically thorough but lacks personality." The reveal: they got it backwards. Developer A was the AI. Developer B was the human. The narrator unpacks the implications: the AI's code was structurally cleaner, but the human's code had more creative architectural choices. "Neither was strictly better. They were differently excellent."

Visual Concepts for Remotion:

BlindReview split screen: two code panels with neutral labels ("Developer A" / "Developer B"), no visual hints about origin
ReviewComment overlays: the senior developer's comments appear as GitHub-PR-style review annotations, sliding in from the right margin
GuessReveal animation: the labels flip over like cards, revealing "AI" and "Human" with a dramatic pause and sound effect
ComparisonMatrix end card: a radar chart comparing both implementations across axes (readability, type safety, error handling, architecture, creativity, performance)
Neutral color scheme throughout -- neither side gets a "winner" color until the analysis section

Video Series 3: "Tool Face-Off" (Comparison)

This series puts competing tools head-to-head on identical tasks. Same prompt, same requirements, same hardware. The evaluation is structured and scored across consistent categories: speed, code quality, developer experience, and output completeness. These are the videos developers watch before choosing their next tool.

Series format:

Duration: 90-120 seconds
Structure: Rules (10s) -> Tool A attempt (30-40s) -> Tool B attempt (30-40s) -> Scoring (15s) -> Verdict (10s) -> End card (5s)
Visual signature: Boxing-match / tournament-bracket aesthetic with tool logos in corners, round numbers, and scorecard overlays
Audio: Sports-style narration, bell sounds between rounds, dramatic pause before verdict

Video #12: Round 1 -- IDE Showdown (Cursor vs Claude Code vs Codex CLI)

Title/Hook: "Round 1: IDE Showdown -- Cursor vs Claude Code vs Codex CLI"

Tools: Cursor (Agent mode), Claude Code, OpenAI Codex CLI

Concept: All three tools receive the same prompt: build a task management API with authentication, CRUD operations, and automated tests. The video captures all three attempts simultaneously using a triple split-screen. Each tool is scored on time to completion, test pass rate, code quality (measured by a linting score), and developer experience (subjective rating of the interaction).

Tone: Fair, analytical, and energetic. This is a sports broadcast, not a product review. Every tool gets genuine praise for its strengths.

Script Outline (200 words): Open on a tournament bracket graphic. The narrator, in an announcer voice: "Three tools. One prompt. One winner. This is the IDE Showdown." The prompt appears: a task management REST API with JWT authentication, full CRUD, input validation, pagination, and a test suite. The rules: no human intervention after the prompt is submitted, tools are scored on four categories, each worth 25 points. "Round 1: Speed." The triple split-screen activates. Cursor's agent starts planning, showing its step-by-step approach. Claude Code opens multiple files simultaneously, working fast. Codex CLI takes a methodical, file-by-file approach. Time stamps appear as each tool finishes. "Round 2: Tests." Each tool's test suite runs. Pass rates appear on the scoreboard. "Round 3: Code Quality." ESLint scores flash on screen. "Round 4: Developer Experience." The narrator rates the interaction quality: how clear was the agent's communication, how easy was it to follow along, how much manual intervention was needed. The scorecard fills in. The verdict is revealed. "All three built a working API. The differences are in the details."

Visual Concepts for Remotion:

TournamentBracket intro animation: a bracket graphic with tool logos, styled like a boxing event poster
TripleSplit composition: three equal panels running simultaneous screen recordings, each with a tool logo badge and running timer in the corner
Scoreboard component: a four-category scoring grid that fills in during the verdict section, each score animating from 0 to its final value
RoundBell transition: a boxing bell sound and "ROUND 2" text between each scoring category
VerdictCard final overlay: total scores, category winner badges, and a nuanced text verdict ("Best for speed: X. Best for quality: Y. Best for beginners: Z.")

Video #13: Round 2 -- Builder Battle (Bolt.new vs Lovable vs Replit Agent)

Title/Hook: "Round 2: Builder Battle -- Bolt.new vs Lovable vs Replit Agent"

Tools: Bolt.new, Lovable, Replit Agent

Concept: The browser-based builders compete on a task suited to their strengths: build a complete landing page with a waitlist form, social proof section, feature comparison, and email capture that stores submissions to a real database. Scoring covers design quality, functionality, mobile responsiveness, and deployment speed.

Tone: Enthusiastic and visual. Since these are design-heavy tools, the video emphasizes how each app looks and feels rather than focusing purely on code.

Script Outline (190 words): Open on the challenge card: "Build a startup landing page with working waitlist signup. You have 3 minutes." Each builder gets the same prompt: a landing page for a fictional AI writing tool called "DraftPilot," with a hero section, three feature cards, a testimonial carousel, a pricing comparison, and a waitlist form that saves emails to Supabase. The triple split-screen shows all three tools working simultaneously. The narrator calls attention to interesting differences in real time: "Bolt.new went straight for the hero section -- it's already looking polished." "Lovable is building the database connection first -- solid fundamentals." "Replit Agent just asked a clarifying question about the color scheme -- that's a nice touch." At 90 seconds, the designs are compared side-by-side: mobile views, desktop views, scroll behavior, form functionality. Each tool's waitlist form is tested with a real email submission. The scoring covers design (how good does it look), function (does the form actually save data), responsiveness (mobile rendering), and speed (time to deployable state). "Each builder has a personality. The question is which personality matches yours."

Visual Concepts for Remotion:

BuilderCard intro: each tool's logo on a playing-card-style design, dealt onto the screen like a card game
DesignComparison frame: all three landing pages shown as browser mockups on a desk, with the ability to zoom into each one
MobilePreview animation: each landing page shrinks into a phone-shaped frame to show mobile rendering, side by side
FormTest overlay: a live-action hand typing a test email into each form, with a green checkmark when the submission succeeds
PersonalityCard end graphic: each tool gets a one-line personality description ("Bolt.new: The Speed Demon," "Lovable: The Perfectionist," "Replit Agent: The Conversationalist")

Video #14: Round 3 -- Agent Arena (Devin vs Jules vs Claude Code)

Title/Hook: "Round 3: Agent Arena -- Devin vs Jules vs Claude Code"

Tools: Devin, Google Jules, Claude Code

Concept: The autonomous agents tackle a more complex task: given an existing open-source project with 15 open issues, each agent is assigned 5 issues and must work independently to create pull requests. Scoring covers issue resolution rate, PR quality, test coverage of the fix, and how well the agent communicated its approach.

Tone: Analytical with a sense of drama. These are the most powerful tools in the landscape, and the comparison is genuinely informative for teams making purchasing decisions.

Script Outline (200 words): Open on a GitHub issues page showing 15 open issues. The narrator: "Welcome to the Agent Arena. Three autonomous AI agents. Five GitHub issues each. No human help. Who writes the best pull requests?" The issues range from a CSS bug to a database query optimization to a feature request for dark mode. Each agent receives its 5 issues and a cloned copy of the repo. The video shows a triple timeline: Devin working in its cloud VM, Jules working asynchronously through Google Cloud, Claude Code working in the terminal. Key moments are highlighted: "Devin just opened a PR for the CSS bug -- let's see the diff." "Jules is running the test suite before committing -- smart." "Claude Code found a related bug while fixing issue #7 and filed a new issue for it -- above and beyond." After all agents submit their PRs, a senior developer reviews them. Scoring: issues resolved (did the PR actually fix it), code quality (clean diff, no regressions), test coverage (did the agent add tests), and communication (how clear was the PR description and commit message). "At this level, the differences are subtle. But subtle differences matter at scale."

Visual Concepts for Remotion:

GitHubBoard composition: a project board with issue cards, each card moving to the agent's column as they are assigned
AgentTimeline triple track: three horizontal timelines showing each agent's progress -- commits appear as dots, PRs as flags, with timestamps
PRReview overlay: a GitHub-style PR diff view showing the agent's changes, with the senior developer's review comments fading in
ScoreRadar chart: a radar/spider chart for each agent across the four scoring dimensions
ArenaStadium framing: the entire video is styled like an arena event, with spotlights, agent "entrances," and a final podium reveal

Video #15: Round 4 -- Speed vs Quality (Bolt vs Claude Code)

Title/Hook: "Round 4: Speed vs Quality -- Bolt.new vs Claude Code"

Tools: Bolt.new, Claude Code

Concept: This is the philosophical face-off: the fastest browser builder against the most thorough terminal agent. The same prompt -- a complete habit-tracking app with streaks, charts, and reminders -- goes to both tools. Bolt.new finishes in minutes. Claude Code takes longer but produces more robust code. The question is not "which is better" but "which is better for what."

Tone: Thoughtful and balanced. This video acknowledges that "better" depends entirely on context.

Script Outline (195 words): Open on a scale graphic: "Speed" on one side, "Quality" on the other. The narrator: "Every developer makes this trade-off. Today we make it explicit." The prompt: a habit tracker with daily check-ins, streak counting with freeze days, progress charts using a real charting library, push notification reminders, and data export. Bolt.new starts. The app assembles rapidly in the browser -- UI components appear, the habit list renders, the chart populates. Time: 3 minutes and 12 seconds. It looks good. It works. Claude Code starts. The terminal is busier -- it is setting up a proper project structure, adding TypeScript types, writing utility functions with edge case handling, creating a test file. Time: 14 minutes and 47 seconds. It also works. Now the comparison. The narrator stress-tests both: "What happens when the streak crosses a month boundary?" Bolt's version has a bug. Claude Code's handles it correctly. "What about the UI?" Bolt's is more visually polished out of the box. "Both answers are right. The question is what you need right now: a working prototype by lunch, or a production foundation by end of week."

Visual Concepts for Remotion:

ScaleBalance component: a literal balance scale that tips toward speed (Bolt) or quality (Claude Code) as different criteria are evaluated
DualTimer composition: two race-style timers, one for each tool, with the differential growing as Claude Code continues working after Bolt finishes
StressTest overlay: identical test inputs applied to both apps simultaneously, with results appearing as pass/fail indicators
ContextCard end graphic: two scenario cards -- "Choose Bolt when: hackathon, prototype, demo day" and "Choose Claude Code when: production, long-term project, team codebase" -- appearing side by side
Warm vs cool color split: Bolt's side in warm oranges (energy, speed), Claude Code's side in cool blues (precision, depth)

Video Production Workflow

Every video in this chapter follows the same five-stage production pipeline. This section documents the pipeline so that new videos can be produced consistently and efficiently.

Stage 1: Script Writing

Every video begins as a markdown file. Scripts follow a strict format:

---
video_id: PTP-001
series: prompt-to-product
title: "I built a $9/month SaaS in 60 seconds"
duration_target: 60-90s
tool: Bolt.new
status: production
last_updated: 2026-02-25
---

## Hook (0:00 - 0:03)
[Opening visual description]
NARRATOR: "Opening line designed to stop the scroll."

## Setup (0:03 - 0:08)
[Screen state description]
NARRATOR: "Context setting. What we are about to do and why it matters."

## Build (0:08 - 0:55)
[Screen recording cues with timestamps]
NARRATOR: "Running commentary on what the AI is doing. Call out
interesting decisions. Keep energy high."

## Reveal (0:55 - 1:05)
[Final product display]
NARRATOR: "The payoff. Show the deployed result. Land the key stat."

## End Card (1:05 - 1:10)
[Branding overlay]
NARRATOR: "Call to action -- next video, ebook link, subscribe."

Script guidelines:

Target 150-200 words of narration per video (approximately 2 words per second at conversational pace)
Every sentence must earn its place -- if it does not advance understanding or maintain engagement, cut it
Write the hook first. If the first 3 seconds do not compel a viewer to keep watching, rewrite them
Include specific timestamps for visual cues so the Remotion composition can sync precisely
Mark all screen recording segments with [SCREEN: tool_name, action_description] tags

Stage 2: Visuals (Remotion Compositions)

Each video is a Remotion composition -- a React component that renders frame-by-frame to produce video output. The compositions combine three types of visual content:

Screen Recordings

Captured at 60fps using OBS Studio with a standardized window layout
Tool interfaces are recorded at 1920x1080 with consistent browser chrome
Mouse movements are smoothed in post-processing for cleaner playback
Sensitive information (API keys, personal data) is redacted before compositing

Motion Graphics

Countdown timers, score overlays, progress bars, and transitions are all Remotion components
The component library includes: CountdownTimer, ScoreBoard, SplitScreen, ProgressTracker, TitleCard, EndCard, AnnotationBubble, CodeHighlight
All motion graphics follow the EndOfCoding design system (see Branding below)
Animations use spring physics for natural-feeling motion (useSpring from Remotion)

Code Animations

Code snippets that appear in videos are rendered using a custom CodeBlock Remotion component
Syntax highlighting uses the same theme across all videos (VS Code Dark+ variant)
Code appears with a typewriter animation at a configurable speed
Diff views use green/red highlighting with line-by-line reveal animations

Composition structure:

src/
  compositions/
    prompt-to-product/
      PTP001-SaaS60.tsx        # Main composition
      PTP001-assets/            # Screen recordings, images
    the-prompt-that/
      TPT001-Game.tsx
      TPT001-assets/
    tool-face-off/
      TFO001-IDEShowdown.tsx
      TFO001-assets/
  components/
    CountdownTimer.tsx
    ScoreBoard.tsx
    SplitScreen.tsx
    EndCard.tsx
    StickyNote.tsx
    CodeBlock.tsx
    ProgressTracker.tsx
    RaceTimer.tsx
  styles/
    theme.ts                   # Shared colors, fonts, spacing
    animations.ts              # Shared spring configs

Stage 3: Audio

Narration

AI text-to-speech narration using ElevenLabs or equivalent high-quality TTS
Voice profile: confident, conversational, slightly fast-paced (matching the energy of the content)
Each script is narrated as a single take, then trimmed and aligned to visual cues in Remotion
Pronunciation corrections are applied for technical terms (e.g., "Supabase" is "soo-puh-base," not "super-base")

Sound Design

Background music: royalty-free electronic/lo-fi tracks from Epidemic Sound or Artlist, selected per series (energetic for Prompt to Product, chill for The Prompt That, competitive for Tool Face-Off)
Sound effects library: keystroke clicks, notification chimes, deployment whooshes, error buzzes, success dings, countdown ticks, boxing bells
Music ducking: background track volume drops 60% during narration, rises during visual-only segments
Audio levels: narration at -14 LUFS, music at -24 LUFS, sound effects at -18 LUFS

Stage 4: Branding

Every video carries the EndOfCoding brand identity consistently:

Logo

The EndOfCoding logo appears in the bottom-right corner throughout the video at 40% opacity
Full logo displayed on the end card at 100% opacity with the tagline

Color Palette

Primary: #6C5CE7 (electric purple) -- used for highlights, CTAs, and active states
Secondary: #00D2D3 (cyan) -- used for accents, secondary information
Background: #0F0F23 (deep navy) -- used for all dark backgrounds
Surface: #1A1A2E (dark surface) -- used for cards and overlays
Text: #FFFFFF at 90% opacity for primary text, 60% for secondary
Success: #00E676 -- used for pass indicators, completion states
Error: #FF5252 -- used for fail indicators, error states

Typography

Titles: Inter Bold, 48px (scaled for video resolution)
Body: Inter Regular, 24px
Code: JetBrains Mono, 20px
Captions: Inter Medium, 18px

End Card (last 5 seconds of every video)

Full EndOfCoding logo centered
Three cross-link buttons: "Watch Next Video" (left), "Read the Ebook" (center), "Subscribe" (right)
Social handles displayed below
Background: animated gradient using the primary/secondary colors

Stage 5: Distribution

Each video exists in multiple formats for different platforms:

Full-Length (YouTube + Ebook Embed)

Resolution: 1920x1080 (16:9)
Duration: 60-120 seconds
Format: MP4 (H.264) for YouTube, WebM for ebook embed
Hosted on YouTube with ebook embed via YouTube iframe or self-hosted WebM

Short-Form Clips (TikTok / Instagram Reels / YouTube Shorts)

Resolution: 1080x1920 (9:16)
Duration: 15-60 seconds
Extracted from the most compelling segment of the full video
Additional text overlays for silent autoplay viewing (captions burned in)
Platform-specific crops handled by a Remotion VerticalCrop composition

Ebook Embed

Lightweight WebM format with lazy loading
Poster frame (thumbnail) displayed before playback
Fallback: animated GIF preview with a "Watch Full Video" link to YouTube
Accessible: full transcript available below each embedded video

SEO and Metadata

YouTube Optimization

Title format: [Hook] | Vibe Coding Tutorial #[N]
Example: "I built a $9/month SaaS in 60 seconds | Vibe Coding Tutorial #1"
Description: 200-300 words including the full prompt used, tools mentioned, timestamps, and a link to the ebook chapter
Tags: tool-specific tags (bolt.new, cursor, claude code), technique tags (vibe coding, AI coding, prompt engineering), outcome tags (build app fast, no code saas)
Timestamps: every section of the video marked for YouTube chapters
Cards: each video includes a card linking to the ebook at the 75% mark
End screen: 20-second end screen with next video and subscribe prompts

Cross-Linking

Each YouTube video description links to the corresponding ebook chapter
Each ebook video embed links to the YouTube version for higher-quality playback
Related videos are suggested at the end of each ebook section
Playlists: one per series (Prompt to Product, The Prompt That, Tool Face-Off)

Embedding Videos in the Interactive Ebook

The interactive web version of this ebook uses Remotion's @remotion/player component to embed videos directly in the reading experience. This means videos are not external links -- they are native elements of the page, rendered inline alongside the text.

Technical Implementation

Each video is embedded using a VideoTutorial React component:

import { Player } from "@remotion/player";
import { PTP001 } from "../compositions/prompt-to-product/PTP001-SaaS60";

export const VideoTutorial = ({
  compositionId,
  title,
  duration,
  tools,
  transcript,
}: VideoTutorialProps) => {
  return (
    <section className="video-tutorial">
      <h3>{title}</h3>
      <div className="video-meta">
        <span className="duration">{duration}</span>
        <span className="tools">{tools.join(" + ")}</span>
      </div>
      <Player
        component={PTP001}
        compositionWidth={1920}
        compositionHeight={1080}
        durationInFrames={2700} // 90s at 30fps
        fps={30}
        controls
        style={{ width: "100%", maxWidth: 800 }}
      />
      <details className="transcript">
        <summary>View Transcript</summary>
        <p>{transcript}</p>
      </details>
    </section>
  );
};

Reader Experience

When a reader scrolls to a video in the ebook:

Poster frame -- A thumbnail of the most visually interesting moment loads immediately (lazy-loaded image, minimal bandwidth)
Play button overlay -- A single click starts playback. Videos do not autoplay
Inline controls -- Play/pause, scrub bar, volume, fullscreen, and playback speed (0.5x to 2x)
Transcript toggle -- A collapsible section below the video contains the full narration transcript, making the content accessible and searchable
Chapter links -- If the video references tools or concepts covered in other chapters, inline links appear below the video

Offline and Static Fallbacks

For the markdown and Word versions of the ebook (which cannot embed video):

Each video section includes the full script as formatted text
A QR code links to the YouTube version
A static screenshot of the key moment serves as the visual anchor
The caption reads: "Watch this tutorial: [YouTube URL]"

For the static HTML version (no JavaScript):

An animated GIF preview (5-10 seconds, looped) provides a visual taste
A prominent "Watch Full Tutorial" button links to YouTube
The transcript is displayed by default (not collapsed)

Video Production Schedule

New videos are added on a monthly cadence. The production schedule follows the tool landscape -- when a major tool update ships, a new video is produced within two weeks to document the changed workflow.

Month	Planned Videos	Series
March 2026	#1 60-Second SaaS, #6 Game Builder	Prompt to Product, The Prompt That
April 2026	#12 IDE Showdown, #7 Broke Everything	Tool Face-Off, The Prompt That
May 2026	#2 Portfolio Speedrun, #13 Builder Battle	Prompt to Product, Tool Face-Off
June 2026	#3 The $0 Startup, #8 Got Me Fired	Prompt to Product, The Prompt That
July 2026	#14 Agent Arena, #9 Replaced My Intern	Tool Face-Off, The Prompt That
August 2026	#4 Clone Wars, #10 Mom Could Use	Prompt to Product, The Prompt That
September 2026	#15 Speed vs Quality, #11 Fooled Senior Dev	Tool Face-Off, The Prompt That
October 2026	#5 Debug Olympics, New TBD	Prompt to Product, TBD

The schedule prioritizes alternating between series to maintain variety. High-impact tool launches (new Cursor version, Claude Code update, new entrant) can preempt the schedule.

Video Index

A quick-reference table of all videos in this chapter:

#	Title	Series	Tool(s)	Duration	Status
1	I built a $9/month SaaS in 60 seconds	Prompt to Product	Bolt.new	60-90s	Pre-production
2	Your portfolio shouldn't take longer than your morning coffee	Prompt to Product	v0 + Vercel	60-90s	Pre-production
3	This app makes money. I didn't write a single line.	Prompt to Product	Lovable	60-90s	Pre-production
4	I showed AI a screenshot of Notion. Here's what happened.	Prompt to Product	Cursor	60-90s	Pre-production
5	Can AI fix a bug faster than Stack Overflow?	Prompt to Product	Claude Code	60-90s	Pre-production
6	The Prompt That Built a Game	The Prompt That	Claude Code	90-120s	Pre-production
7	The Prompt That Broke Everything	The Prompt That	Bolt.new	90-120s	Pre-production
8	The Prompt That Got Me Fired (Hypothetically)	The Prompt That	Claude Code	90-120s	Pre-production
9	The Prompt That Replaced My Intern	The Prompt That	Cursor + Claude Code	90-120s	Pre-production
10	The Prompt That Even My Mom Could Use	The Prompt That	Lovable	90-120s	Pre-production
11	The Prompt That Fooled the Senior Dev	The Prompt That	Claude Code	90-120s	Pre-production
12	IDE Showdown: Cursor vs Claude Code vs Codex CLI	Tool Face-Off	Cursor, Claude Code, Codex CLI	90-120s	Pre-production
13	Builder Battle: Bolt.new vs Lovable vs Replit Agent	Tool Face-Off	Bolt.new, Lovable, Replit Agent	90-120s	Pre-production
14	Agent Arena: Devin vs Jules vs Claude Code	Tool Face-Off	Devin, Jules, Claude Code	90-120s	Pre-production
15	Speed vs Quality: Bolt.new vs Claude Code	Tool Face-Off	Bolt.new, Claude Code	90-120s	Pre-production

Measuring Video Impact

Each video is tracked across platforms with the following metrics:

Engagement Metrics

YouTube: watch time, average view duration, click-through rate on ebook links
TikTok/Reels/Shorts: views, shares, saves, profile visits
Ebook: play rate (percentage of readers who click play), completion rate, transcript expansion rate

Conversion Metrics

YouTube-to-ebook click rate (tracked via UTM parameters in description links)
Ebook-to-YouTube click rate (tracked via embed interaction events)
New subscriber acquisition per video

Quality Metrics

Audience retention curve (identifying where viewers drop off)
Comment sentiment (positive/negative/neutral classification)
Video-specific NPS from reader surveys

Videos with below-average retention in the first 5 seconds get their hooks rewritten. Videos with above-average ebook-to-YouTube conversion get promoted in the chapter ordering.

This chapter is updated monthly with 2-4 new videos as the vibe coding tool landscape evolves. Each update includes new video entries, refreshed comparisons when tools ship major versions, and community-requested tutorials. Last updated: March 2026.

← Previous Next: Monthly Intelligence Brief →

21. Monthly Intelligence Brief: April 2026

Updated April 21, 2026

What changed in the vibe coding world this month. Updated on the 1st of each month for subscribers.

📰

Headline: Cursor 3 reimagines the IDE around multi-agent orchestration. Anthropic's Claude Mythos scores 93.9% on SWE-bench and autonomously discovers zero-days in FreeBSD — but is restricted to cybersecurity defense only via Project Glasswing. Meta Superintelligence Labs debuts Muse Spark. The Trivy supply chain attack cascades into a self-propagating npm worm hitting 64+ packages with blockchain C2 infrastructure. Claude suffers three consecutive days of outages. GitHub Copilot announces it will train on user code by default from April 24. New (April 10–15): Vercel discloses 7 CVEs in Cloudflare's AI-built Vinext — the only confirmed production deploy is CIO.gov. GLM-5.1 becomes the first fully open-source model to top SWE-Bench Pro, beating all closed-source models. Claude Code ships worktree switching, PreCompact hooks, and auto-stream-abort. New (April 15–21): Claude Opus 4.7 scores 87.6% on SWE-bench Verified — closing the gap with Mythos (93.9%) and surpassing all non-Anthropic models. Azure MCP Server 2.0 hits stable release with OAuth 2.1 and enterprise HTTP transport; OAuth 2.1 added to the core MCP spec itself, standardizing authorization across the entire ecosystem.

AI MODEL

Claude Opus 4.7: 87.6% on SWE-bench Verified — Closes Gap With Mythos

On April 18, 2026, Anthropic shipped Claude Opus 4.7, scoring 87.6% on SWE-bench Verified — a 15.5 percentage point jump over Claude Opus 4.6 (72.1%) and the highest publicly available model score on that benchmark. Claude Mythos (93.9%) remains ahead but is restricted to Glasswing defense partners and not publicly accessible. Opus 4.7 fills the practical gap: it is available via the standard API, supports extended context windows (2M tokens), and is positioned for multi-day autonomous coding sessions where Mythos is inaccessible. The benchmark gap between Anthropic's public flagship and the best non-Anthropic model (GPT-5.3 Codex at 85%) has widened. For vibe coders: Opus 4.7 is the new ceiling for publicly available coding models. Confirmed by ch05 update April 18; Cursor $50B valuation also confirmed in same window.

PLATFORM

Azure MCP Server 2.0 Stable + OAuth 2.1 Enters the MCP Spec

On April 10, 2026, Azure MCP Server 2.0 reached stable release — the first enterprise-grade, production-committed deployment target for the Model Context Protocol ecosystem. Key changes: HTTP-based transport replaces stdio as the default (enabling central deployment, not just local sidecar); Azure Active Directory and managed identity authentication built in; explicit API stability guarantee (no breaking changes without a major version + 12-month deprecation). Concurrent with this release, OAuth 2.1 authorization with incremental scope consent was formally added to the MCP specification — not as an Azure extension but as a core protocol feature. This means any MCP-compliant server can now implement standardized authorization. Practical impact: teams can deploy one authenticated MCP server and have all their agent workflows connect to it, rather than configuring tool access per workflow. The MCP SDK crossed 97M monthly downloads in March 2026 (from 2M at launch, November 2024). Pinterest deployed MCP in production for engineering workflows. Sources: Azure SDK Blog (devblogs.microsoft.com/azure-sdk); MCP Developer Guide 2026 (particula.tech).

PRODUCT

Cursor 3: Agents Window, Design Mode, Cloud-to-Local Handoff

Anysphere launched Cursor 3 on April 2 — a ground-up redesign focused on multi-agent orchestration rather than traditional code editing. The new Agents Window replaces the Composer pane with a full-screen workspace where multiple AI agents run simultaneously in side-by-side, grid, or stacked tabs. Design Mode lets you click any element in a browser preview and direct agents to modify that exact component visually, closing the design-to-code loop. Cloud-to-local handoff carries agent session context seamlessly. New Automations can be triggered by external services. The Await tool lets agents pause for background shell commands. Memory is lighter; large-file diffs are faster. MCP Apps now support structured content. Cursor 3 represents the maturation from "AI-augmented IDE" to "agent orchestration platform."

AI MODEL — RESTRICTED

Claude Mythos: 93.9% SWE-bench — Restricted to Cybersecurity Defense

On April 7, Anthropic announced its most capable model to date — Claude Mythos — via Project Glasswing. It is not publicly available. Access is restricted to cybersecurity defense organizations: AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorgan Chase, Microsoft, NVIDIA, Palo Alto Networks, Linux Foundation, and ~40 others. Benchmarks: 93.9% on SWE-bench (+13.1 percentage points over Opus 4.6), 97.6% on USAMO, 83.1% on CyberGym (vs 66.6% for Opus 4.6). During testing, Mythos autonomously discovered CVE-2026-4747 — a 17-year-old remote code execution vulnerability in FreeBSD — and found thousands of zero-day vulnerabilities across every major OS and browser. It is restricted specifically because it can autonomously both discover and exploit software vulnerabilities at scale. Project Glasswing channels its capabilities exclusively toward defense: patch prioritization, vulnerability remediation, and threat analysis for partner organizations.

AI MODEL

Meta Muse Spark: Meta Superintelligence Labs Debuts

On April 8, Meta released Muse Spark — the first model from its newly formed Meta Superintelligence Labs (built after the ~$14B deal to bring in Scale AI CEO Alexandr Wang). Muse Spark is natively multimodal with reasoning, tool-use, visual chain of thought, and multi-agent orchestration. It is not open source — unlike Llama, it's API-only in private preview. Benchmarks: 86.4 on CharXiv Reasoning (vs Gemini 3.1 Pro 80.2 and GPT-5.4 82.8), 50.2 on Humanity's Last Exam in Contemplating mode (vs Gemini 3.1 Deep Think 48.4). Meta claims 10x less compute than Llama 4 Maverick for equivalent capability. Muse Spark powers Meta AI across WhatsApp, Instagram, Facebook, and Messenger — reaching approximately 3 billion users. Coding is not a current strength; science, reasoning, and health benchmarks are where it leads.

SECURITY

Trivy Cascade Extends: CanisterWorm Self-Propagates Across 64+ npm Packages

The Trivy supply chain attack (CVE-2026-33634, first reported late March) cascaded into a much larger incident in early April. Attackers had force-pushed malicious code to 75 of 76 trivy-action GitHub Actions tags; it took five days to fully evict them, during which they published additional malicious Docker images during the remediation effort. The attack then cascaded into CanisterWorm — a self-propagating npm worm that hit 64+ packages using a blockchain-based command-and-control infrastructure, making it unusually resistant to takedown. CanisterWorm subsequently infected Checkmarx KICS and AST GitHub Actions, and separately reached LiteLLM (95 million monthly PyPI downloads). The combined blast radius makes this the most extensive supply chain cascade in AI developer tooling history. Treat any Trivy, Checkmarx, or LiteLLM pipeline that ran between March 19 and April 10 as potentially compromised.

RELIABILITY

Claude Down: Three Consecutive Days of Outages (April 6–8)

Anthropic's Claude services suffered three consecutive days of disruptions in the week of April 6. On April 6, a 10-hour outage generated 8,000+ Downdetector reports, with chat and login failures affecting Claude.ai and Claude Code users. On April 7, elevated errors ran from 14:32 to 15:12 UTC, affecting authentication across Claude.ai and Claude Code. On April 8, Sonnet 4.6 errors continued from 23:00 PT to 1:50 PT. No single root cause was publicly disclosed. For teams running autonomous Claude Code workflows, this week underscored the importance of retry logic, fallback providers, and not scheduling mission-critical agent tasks without error handling and alerting.

PRIVACY

GitHub Copilot to Train on User Code by Default from April 24

GitHub announced that starting April 24, 2026, interaction data for Copilot Free, Pro, and Pro+ users — including inputs, outputs, and code snippets — will be used for AI model training by default. Users must actively opt out in their GitHub account settings. Enterprise and Business plans are not affected. For teams working with proprietary code, client code, or regulated data, this policy change requires action before April 24. Meanwhile, April also brought the Copilot SDK (public preview, April 2) for embedding Copilot agentic capabilities into custom apps and workflows, and Autopilot mode (public preview) for fully autonomous agent execution with self-approval and auto-retry.

CRITICAL SECURITY

Axios npm Supply Chain Attack — North Korean State Actor

On March 31, attackers attributed to UNC1069 (a North Korea-nexus, financially motivated threat group) compromised the npm account of the axios lead maintainer and published malicious versions 1.14.1 and 0.30.4. The packages installed a hidden dependency “plain-crypto-js” that deployed the WAVESHAPER.V2 backdoor — a cross-platform remote access trojan targeting Windows, macOS, and Linux. Axios has approximately 100 million weekly downloads, making this one of the most impactful npm supply chain attacks ever recorded. The malicious versions were live for roughly 3 hours before being removed. Attribution confirmed by Google Threat Intelligence Group. Rotate all credentials in any environment that installed these versions.

SECURITY

LiteLLM and Langflow Supply Chain Attacks Hit AI Infrastructure

The week of March 24 saw two more high-severity supply chain attacks targeting the AI developer ecosystem. LiteLLM versions 1.82.7 and 1.82.8 were compromised with a multi-stage credential stealer harvesting SSH keys, cloud provider tokens, Kubernetes secrets, cryptocurrency wallets, and .env files — precisely the kind of secrets that accumulate in AI developer environments. Separately, CVE-2026-33017 disclosed a critical code injection in Langflow (the popular AI agent framework) affecting versions ≤ 1.8.2: an unauthenticated attacker could trigger remote code execution via the public flow build endpoint. Exploitation was observed within 20 hours of disclosure. CISA added CVE-2026-33017 to its Known Exploited Vulnerabilities catalog. Also disclosed: CVE-2026-33634 — malicious code embedded in Aqua Security’s Trivy scanner Docker Hub images, attributed to TeamPCP.

RESEARCH

Georgia Tech: 2,000+ Vulnerabilities in 5,600 Vibe-Coded Apps

The Georgia Tech Vibe Security Radar project released its analysis of 5,600 publicly deployed vibe-coded applications, finding over 2,000 vulnerabilities, 400+ exposed secrets, and 175 instances of exposed PII. Separately, tracking data shows AI-generated code now contributes 35 CVEs per month — up from 6 in January and 15 in February 2026. Autonoma research puts 53% of AI-generated code as having security holes. The pattern is consistent: AI models generate functional code quickly but skip authentication checks, leave credentials in source, and mis-scope data access. The backlash narrative is shifting from “vibe coding is dangerous” to “treat AI output like code from a fast junior developer — review it.”

MILESTONE

MCP Hits 97 Million Monthly Downloads in 5 Months

As of March 25, the Model Context Protocol SDK has reached 97 million monthly downloads — up from approximately 2 million at the time of its November 2025 launch, representing 4,750% growth in five months. There are now 5,800+ community and enterprise MCP servers, and every major AI lab (OpenAI, Google DeepMind, Cohere, Mistral) has integrated MCP support. The protocol has become the de facto standard for tool connectivity in agentic AI systems, faster than any previous developer infrastructure standard has achieved ecosystem-wide adoption.

PRODUCT

Cursor Self-Hosted Cloud Agents

Cursor launched self-hosted cloud agents on March 25 — a direct response to enterprise security requirements. Code, tool execution, build outputs, and secrets now stay entirely within the customer’s own network. The product also includes security automation templates: agents that review 3,000+ internal pull requests per week, catching 200+ vulnerabilities across large engineering organizations. This positions Cursor as enterprise-grade infrastructure, not just an IDE, and directly addresses the objection that AI coding tools require sending proprietary code to third-party servers.

CULTURE

Vibe Coding Turns One Year Old

February–March 2026 marks the first anniversary of Andrej Karpathy’s viral X post coining “vibe coding.” Collins English Dictionary named it the Word of the Year for 2025. Retrospective content flooded technical media: daily.dev, DEV Community, Taskade’s State of Vibe Coding 2026 report, and CodeRabbit’s semantic history of the term. Anthropic is publicly pushing a “vibe working” framing — extending the concept beyond code to all knowledge work done with AI. LogRocket’s March 2026 Power Rankings put Windsurf #1, followed by Google Antigravity, Cursor, Claude Code, and Codex. A year in: the tools are mature, the workflows are real, and the debate has moved from “will this work?” to “how do we do it safely?”

AI MODELS

SWE-Bench Convergence: Six Models Within 0.8 Points

The March 2026 SWE-bench Verified leaderboard shows an unprecedented convergence at the top: Claude Opus 4.6 (80.8%), Gemini 3.1 Pro (80.6%), GPT-5.4 (77.2%), and Claude Sonnet 4.6 (~75.6%) are now within striking distance of each other. Six models in total fall within 0.8 points. The era of any single model dominating coding benchmarks appears to be over. Qwen 3.5 — fully rolled out in early March — leads the 7–9B parameter class on HumanEval, continuing the open-weights pressure on proprietary pricing.

Previous Month: March 2026

Key Developments

ECOSYSTEM

The Open Source Crisis

Researchers across four universities found vibe coding creates a negative feedback loop for open source. Tailwind CSS downloads climbed while docs traffic fell 40% and revenue dropped 80%. cURL shut down its bug bounty after AI submissions drove valid rates to 5%. Ghostty banned AI-generated code. tldraw auto-closes all external PRs. RedMonk calls it "AI Slopageddon."

PRODUCT

Gemini 3 Powers Jules

Google rolled out Gemini 3 Pro to Jules, its async coding agent. Gemini 3 surpasses 2.5 Pro at coding with stronger intent alignment and improved agentic workflows. Jules now includes Tools for terminal access, CLI extension, and API access.

PRODUCT

Cursor 2.6: Automations, JetBrains, and MCP Apps

Cursor 2.6 shipped three major features in one week: always-on Automations (agents triggered by Slack, Linear, GitHub, PagerDuty with persistent memory), JetBrains IDE support via Agent Client Protocol (IntelliJ, PyCharm, WebStorm), and interactive MCP Apps (Figma, Amplitude, tldraw in chat). Team plugin marketplaces. Composer’s proprietary model runs at 2x the speed of Sonnet 4.5. Market share holds at ~25% of GenAI clients.

PLATFORM

Copilot Opens Multi-Model Access

Since Feb 26, all paid GitHub Copilot users can choose Claude, Codex, or Copilot as their agent model, assigning the same issue to all three simultaneously. 26M+ users across 6+ IDEs. Copilot’s coding agent spins up Actions VMs and opens draft PRs autonomously.

ENTERPRISE

Pega Makes Vibe Coding Enterprise-Ready

On March 5, Pegasystems announced a full vibe coding experience in Pega Blueprint. Users converse with app designs via text or speech, with security protocols, third-party compatibility, and performance metrics for large-scale operations. First major enterprise platform to brand its AI features as “vibe coding.”

AI MODEL

Opus 4.6 Agent Teams Mature

Anthropic’s Opus 4.6 is now the default for Claude Code Max/Team subscribers. The “agent teams” feature splits work across multiple coordinated agents. 16 Opus 4.6 agents wrote a C compiler in Rust capable of compiling the Linux kernel (~$20K cost).

PRODUCT

Devin 2.2 and SWE-1.6

Cognition shipped Devin 2.2 — the most important update since launch with dramatically fewer bugs. SWE-1.6 training preview began March 1. PR merge rate improved from 34% to 67%. Security fixes average 1.5 min vs 30 min for humans. Cognition raised $500M at ~$10B valuation, with combined Devin+Windsurf ARR more than doubling post-acquisition.

AI MODEL

GPT-5.4 Launches with Computer Use

On March 5, OpenAI released GPT-5.4 in Standard, Thinking, and Pro variants. Native computer-use capabilities, 1M token context, and 33% fewer errors vs GPT-5.2. ChatGPT for Excel/Sheets integration and financial tools (FactSet, MSCI, Moody’s) signal a major enterprise push. First OpenAI model with built-in computer use — directly competing with Anthropic’s computer-use features.

GEOPOLITICS

Pentagon Labels Anthropic a Supply-Chain Risk

The DOD labeled Anthropic a supply-chain risk — the first time an American company has received this designation, typically reserved for foreign adversaries. The dispute centers on Anthropic’s refusal to support autonomous weapons and domestic surveillance use cases. Defense tech firms are actively dropping Claude. CEO Dario Amodei called OpenAI’s messaging about their competing Pentagon deal “straight up lies.” Negotiations reportedly resumed as of March 5.

PRODUCT

Claude Code: Voice Mode and Security Patches

Anthropic rolling out voice mode (/voice push-to-talk) to ~5% of users. STT expanded to 20 languages. New MCP management via /mcp dialog, Claude API skill, and session naming. Two critical CVEs patched: CVE-2025-59536 (RCE via malicious repos) and CVE-2026-21852 (API key exfiltration through project files). Both vulnerabilities allowed malicious repositories to trigger arbitrary shell commands on tool initialization.

NEW TOOL

Kilo Code: Open-Source Multi-Agent Coding

Kilo Code, launched by a GitLab co-founder, has already attracted 1.5M+ users. Orchestrator mode with planner/coder/debugger sub-agents. 500+ model support. Available in VS Code, JetBrains, and CLI. $19/mo or BYO API key. Directly challenges Claude Code, Copilot, and Cursor in the AI coding agent space.

AI MODEL

Qwen 3.5 and Open Weights Push

Alibaba released Qwen 3.5 in four sizes (0.8B, 2B, 4B, 9B) with open weights. Scoring 74.1% on LiveCodeBench v6 — among the strongest results for real-world coding tasks. The open-weights trend continues to pressure proprietary model pricing.

PRODUCT

Claude Code /loop: Autonomous Scheduled Tasks

Claude Code versions 2.1.63–2.1.76 shipped in rapid succession through March 2026, adding the /loop command (cron-like session-scoped task scheduler), Skills.md for persistent agent behaviors, a 1-million-token context window, and increased max output to 64k tokens for Opus 4.6 (128k upper bound for both Opus 4.6 and Sonnet 4.6). MCP servers can now request structured input mid-task via interactive dialogs. /loop turns Claude Code into a background worker for PR reviews, deployment monitoring, and recurring analysis tasks — the closest any tool has come to a fully autonomous development partner.

FUNDING

Replit $400M Series D at $9B Valuation

On March 11, Replit closed a $400M Series D led by Georgian Partners at a $9 billion valuation — triple its $3B valuation from September 2025. Participants include a16z, Coatue, Y Combinator, Accenture Ventures, and Databricks Ventures. Replit is targeting $1B ARR by year-end. 75% of Replit AI users write zero code themselves. The round signals that browser-based full-stack builders remain one of the hottest segments in AI tooling.

STRATEGY

Lovable Goes on the Acquisition Hunt

On March 23, Lovable CEO Anton Osika announced the $6.6B vibe-coding platform is actively hunting acquisitions. The company hit $400M ARR by March 12 (up from $200M at end-2025) with only 146 employees, and is now deploying M&A as a competitive weapon against Cursor, Replit, and Bolt. It previously acquired cloud provider Molnett. Target criteria: “builder-first, high-agency teams” who move fast. This is an unusual posture for a 3-year-old startup — a sign of how rapidly vibe-coding market share is being contested.

PRODUCT

Cognition Ships Devin Review and Windsurf Codemaps

Cognition launched two products in late March. Devin Review is a free code review tool that reads any GitHub PR (public or private) and not only flags issues but spins up a cloud agent to test and propose fixes. Windsurf Codemaps are AI-annotated structured maps of entire codebases, powered by SWE-1.5 and Claude Sonnet 4.5, giving developers navigable context over large repositories before they start making changes. Both tools reflect Cognition's strategy to dominate the full developer workflow — from understanding code to shipping fixes.

PLATFORM

GitHub Copilot JetBrains Agentic Capabilities Go GA

On March 11, GitHub made core agentic capabilities — custom agents, sub-agents, and Plan Agent mode — generally available in GitHub Copilot for JetBrains IDEs, with agent hooks entering preview. On March 12, a new GitHub Copilot Student plan launched, maintaining free access for verified students while restricting self-selection of premium models (GPT-5.4, Claude Opus/Sonnet) in favor of Copilot Auto mode.

SECURITY

OpenClaw Supply Chain Attack: 1,184 Malicious MCP Packages

The largest confirmed supply chain attack targeting AI agent infrastructure: Antiy CERT confirmed 1,184 malicious skills across ClawHub — approximately one in five packages in the open-source MCP ecosystem. Simultaneously, security researchers documented 30+ CVEs targeting MCP servers in just 60 days. Highlights include CVE-2026-23744 (CVSS 9.8, MCPJam Inspector ≤ v1.4.2 — any crafted HTTP request could install an arbitrary MCP server and execute code with no user interaction), a CVSS 9.6 RCE in Microsoft’s Azure MCP server, and BlueRock Security finding 36.7% of 7,000+ analyzed MCP servers potentially vulnerable to SSRF. Treat MCP server packages with the same scrutiny you’d apply to executable binaries.

Numbers Update (April 9, 2026)

93.9%

Claude Mythos on SWE-bench (restricted — Project Glasswing defense partners only)

64+

npm packages infected by CanisterWorm (Trivy cascade, April 2026)

97M

MCP monthly SDK downloads (Mar 25, 2026) — up from 2M at launch

CVEs/month attributed to AI-generated code (March 2026 — up from 6 in January)

29%

Developers with "high trust" in AI tool output (down from 70%+ in 2023)

75%

Reduction in PR turnaround time for AI-tool teams (9.6 days → 2.4 days)

73%

Developers using AI tools daily globally (Stack Overflow Q1 2026)

20M+

GitHub Copilot paid users (April 2026)

What to Watch in May 2026

GitHub Copilot opt-out deadline (April 24): Teams with proprietary or regulated code must opt out before this date or accept that interaction data trains future models
Claude Mythos general availability: Anthropic restricted it to cybersecurity defense; when and how does the most capable public coding model emerge?
CanisterWorm cleanup: Is the blockchain C2 infrastructure being taken down? Watch for new packages hit after April 9
Meta Muse Spark coding benchmarks: Current strong in reasoning/science, weaker in coding — will dedicated coding evals change the picture?
Supply chain security posture: Will npm, PyPI, and Docker Hub introduce mandatory provenance for AI-ecosystem packages after the Trivy/CanisterWorm cascade?
EU AI Act full applicability: August 2, 2026. Guidance for AI coding tools in regulated industries ramping up
Google I/O (typically May): Anticipated announcements on Jules, Gemini CLI, and Antigravity roadmap
Replit path to $1B ARR: declared the year-end target after $9B raise — watch monthly revenue disclosures
Lovable acquisitions: M&A offensive declared — which AI devtools will be absorbed first?
Cursor $50B raise close: if the fundraising report closes, it would be the largest AI coding tool valuation ever

🔗

Stay current: Get daily updates at EndOfCoding.com. Subscribe to the ebook for monthly intelligence briefs with full analysis, data, and actionable insights. Try hands-on courses at Vibe Coding Academy.

← Previous Next: Community Showcase →

Chapter 22: Community Showcase

Updated March 6, 2026

Real projects built by real people using vibe coding. Updated monthly.

Welcome to the Showcase

This chapter is different from the rest of the book. It is not written by us -- it is written by you.

Every project featured here was built using the techniques, tools, and philosophies described in the preceding chapters. Some were built by seasoned developers experimenting with a new workflow. Others were built by people who had never written a line of code before picking up Cursor or Bolt.new. All of them went from idea to deployed software using AI-native development.

The community showcase exists for three reasons:

Proof that it works. Theory is useful. Seeing a non-technical product manager ship an internal dashboard in four hours is more useful.
Shared knowledge. Every submission includes the prompts that worked, the mistakes that cost time, and the metrics that followed. This is a living library of hard-won lessons.
Inspiration. The gap between "I should build something" and "I shipped something" is often just seeing someone in a similar position who already did it.

We review submissions monthly and feature the most instructive projects -- not necessarily the most impressive ones. A weekend prototype that taught the builder three critical lessons about prompt structure is more valuable here than a polished SaaS with no story behind it.

How to Submit Your Project

We welcome submissions from anyone who has built and deployed something using AI-native development tools. Your project does not need to be generating revenue. It does not need to be technically sophisticated. It needs to be real, deployed, and accompanied by an honest account of how it was built.

Submission Template

Copy the template below, fill it in, and submit it to showcase@endofcoding.com or post it in the #showcase channel on our community Discord.

## Project Submission

**Project Name:**
[Your project name]

**Live URL:**
[Link to the deployed project]

**Builder Name:**
[Your name or handle]

**Builder Background:**
[Developer / Designer / Product Manager / Non-technical / Student / Other]
[Brief bio: 1-2 sentences about your experience level and day job]

**Tools Used:**
[List all AI tools: Cursor, Claude Code, Bolt.new, v0, Lovable, Replit Agent, etc.]
[List supporting tools: Vercel, Supabase, Stripe, Tailwind, etc.]

**Timeline:**
[Time from first prompt to deployed: e.g., "6 hours over a weekend"]

**Key Prompts (1-3 of your best prompts that made the biggest difference):**

Prompt 1:
"""
[Paste the actual prompt text you used]
"""
Why it worked: [Brief explanation]

Prompt 2:
"""
[Paste the actual prompt text]
"""
Why it worked: [Brief explanation]

Prompt 3 (optional):
"""
[Paste the actual prompt text]
"""
Why it worked: [Brief explanation]

**What Went Right:**
- [Bullet point]
- [Bullet point]
- [Bullet point]

**What Went Wrong:**
- [Bullet point]
- [Bullet point]
- [Bullet point]

**Metrics (share what you are comfortable sharing):**
- Users: [number or range]
- Revenue: [if applicable]
- Other: [downloads, signups, press mentions, job offers, etc.]

**One Sentence of Advice for Someone Starting Today:**
[Your best tip]

Submission Guidelines

Be honest. The community benefits more from "this broke three times and here's why" than from a highlight reel.
Include real prompts. Paraphrased or sanitized prompts are less useful. Share the actual text you typed.
Deployed means deployed. The project must be accessible at a URL or downloadable. Screenshots alone are not sufficient.
One submission per project. You can submit multiple projects, but each gets its own entry.
Updates welcome. If your project evolves significantly, resubmit with a note about what changed.

Featured Projects

Project 1: WaitlistWizard -- SaaS Micro-Tool Built in a Weekend

What it is: A standalone waitlist management tool for indie makers launching products. Users create a waitlist page with a custom domain, collect emails with referral tracking, and send launch-day notifications. Includes an analytics dashboard showing signup velocity, referral sources, and geographic distribution.

Builder Profile: Marcus Chen, 29. Full-stack developer at a mid-size fintech company during the week. Side-project builder on weekends. Had used GitHub Copilot for two years but had never tried a full vibe coding workflow until this project.

Tools Stack:

Cursor (Composer mode with Claude 3.5 Sonnet) for all code generation
Next.js 14 with App Router
Supabase for database, auth, and real-time subscription counts
Tailwind CSS for styling
Vercel for hosting
Resend for transactional emails
Stripe for the $9/month pro tier

Build Timeline: 14 hours across a Saturday and Sunday. First prompt at 9 AM Saturday. Deployed and shared on X at 11 PM Sunday.

Key Prompts:

Prompt 1 -- The initial spec:

Build a waitlist management SaaS with Next.js 14 App Router and Supabase.

Core features:
1. Landing page builder: user creates a waitlist page with custom title,
   description, and color scheme. Each page gets a unique slug (/w/[slug]).
2. Email collection: visitors enter email, get position number.
   Referral link generated automatically. Each referral moves the referrer
   up 3 positions.
3. Dashboard: real-time count of signups, chart of signups over time,
   top referrers table, geographic breakdown (from IP geolocation).
4. Launch notification: one-click send to all collected emails.

Auth: Supabase Auth with GitHub and Google OAuth.
Database: Supabase PostgreSQL with RLS policies.
Styling: Tailwind with a clean, minimal aesthetic. Dark mode default.

Start with the database schema and RLS policies, then build the
dashboard, then the public-facing waitlist pages.

Why it worked: Front-loading the database schema and RLS policies meant the entire data layer was solid before any UI code was written. This prevented three or four rounds of restructuring that typically happen when you build UI first.

Prompt 2 -- Referral tracking logic:

Add referral tracking to the waitlist system.

When a user signs up for a waitlist:
1. Generate a unique referral code (8 char alphanumeric)
2. Create a shareable URL: [domain]/w/[slug]?ref=[code]
3. When someone signs up via a referral link, record the referral
4. Move the referrer up 3 positions in the queue
5. Send the referrer an email: "Someone joined through your link!
   You moved up to position [X]."

Store referral chains (who referred whom) for the dashboard analytics.
Prevent self-referral. Cap position boost at top 10% of the list.
Handle edge cases: expired waitlists, duplicate signups from same email,
referral codes for non-existent waitlists.

Why it worked: Explicitly listing edge cases in the prompt eliminated two bugs that would have appeared in production. The AI handled all four edge cases correctly on the first generation.

Prompt 3 -- The analytics dashboard:

Build the waitlist analytics dashboard. The user is logged in and
viewing their waitlist's stats.

Show:
- Total signups (big number with daily change indicator, green up/red down)
- Signup velocity chart (line chart, last 30 days, using Recharts)
- Top 10 referrers table (name, referral count, conversion rate)
- Geographic distribution (top 5 countries as horizontal bar chart)
- Recent signups feed (last 20, real-time updates via Supabase Realtime)

All data fetched server-side with React Server Components.
The recent signups feed is a Client Component with real-time subscription.
Loading states: skeleton UI for each card while data loads.
Empty states: friendly message + illustration when no data yet.

Why it worked: Separating server components from client components in the prompt gave the AI clear architectural guidance. The result needed zero restructuring.

Before/After: Marcus had previously attempted to build a similar waitlist tool using traditional development. He spent three weekends on it, got about 60% through the feature set, and abandoned it when the referral position tracking logic became tangled. With vibe coding, the complete feature set was done in one weekend, including features he had not originally planned (geographic analytics, real-time feed).

Lessons Learned:

Specifying database schema first in the prompt produces dramatically better results than letting the AI infer it from feature descriptions.
Supabase RLS policies generated by AI need manual review. Two of the four generated policies had overly permissive conditions that would have allowed users to read each other's waitlist data.
The AI-generated Stripe webhook handler worked on the first try, which was surprising -- this had been a pain point in every previous project.
Deploying to Vercel mid-build (after the first two hours) and testing against the real deployment caught three environment variable issues early.
Total cost: $0 for the build (Cursor Pro subscription he already had). $20/month for Supabase Pro + Vercel Pro once users started arriving.

Outcome: Posted on X and Hacker News the following Monday. 340 upvotes on HN. 2,100 signups in the first week. 180 paying users ($9/month) within 60 days. Currently at $1,620 MRR and growing. Marcus has not yet quit his day job but is now building his second product using the same workflow.

Project 2: FieldSync -- Internal Tool Built by a Non-Technical PM

What it is: An internal field operations dashboard for a 40-person landscaping company. Tracks crew assignments, job status, equipment location, client notes, and daily route optimization. Replaced a mess of shared spreadsheets, WhatsApp groups, and sticky notes on the dispatch office wall.

Builder Profile: Rachel Torres, 34. Operations manager at GreenScape Landscaping in Austin, TX. No programming experience. Had taken one HTML course in college a decade ago. Uses Excel daily and considers herself "tech-comfortable but not technical."

Tools Stack:

Bolt.new for initial prototype
Lovable for UI refinement and additional features
Supabase for database and auth
Google Maps API for route display
Vercel for hosting

Build Timeline: Three evenings after work (roughly 3 hours each) plus most of a Saturday. Total: approximately 16 hours.

Key Prompts:

Prompt 1 -- The initial description:

I manage a landscaping company with 8 crews of 5 people each.
Every morning I assign crews to jobs using a spreadsheet and a
WhatsApp group. I need an app that:

1. Shows today's jobs on a map with crew assignments
2. Lets me drag and drop to reassign crews to different jobs
3. Crews can update job status from their phones (not started /
   in progress / done / issue)
4. Tracks which equipment trailer is with which crew
5. Stores client notes that persist between visits
6. Shows me a daily summary: jobs completed, revenue, crew utilization

Make it simple. My crews are not tech people. The mobile view needs
to be dead simple -- big buttons, minimal text.

I want to log in as admin and see everything. Crews log in with a
simple PIN code and only see their assigned jobs for today.

Why it worked: Writing from the perspective of the actual problem -- not in technical terms -- gave the AI everything it needed. Rachel did not know what a "database" or "REST API" was. She described her day, and the AI built the system to match it.

Prompt 2 -- Fixing the mobile experience:

The crew mobile view is too complicated. They need to see ONLY:
- Their jobs for today, in order
- A big button to change status (green = done, yellow = issue)
- A notes field for each job
- Nothing else

Remove the navigation menu on mobile. Remove the map on mobile.
Remove the equipment section on mobile. Crews do not need any of that.
Just the job list and status buttons. Make the buttons large enough
to tap with work gloves on.

Why it worked: The first version had given crews the same interface as the admin. This prompt stripped it down to exactly what a landscaper standing in a yard with dirty gloves needs. The "work gloves" detail led the AI to generate oversized touch targets (minimum 56px) -- better than many professional mobile apps.

Before/After: Before: Rachel spent 45 minutes every morning in dispatch, managing the spreadsheet, texting crew leaders, and calling clients. Crews often arrived at jobs without knowing the client's gate code or special instructions. Equipment went missing for days because nobody tracked which trailer went where.

After: Morning dispatch takes 10 minutes. Crews see their assignments on their phones before they leave the yard. Client notes (gate codes, dog warnings, irrigation shutoff locations) carry over automatically between visits. Equipment tracking reduced "lost trailer" incidents from two per month to zero in the first quarter.

Lessons Learned:

Non-technical builders should start with Bolt.new or Lovable, not Cursor. The visual feedback loop is critical when you cannot read code.
The PIN-code authentication for crews was Rachel's most important design decision. Username/password would have been a non-starter for the field workers.
Google Maps API costs added up faster than expected. Rachel switched to a static map image for the daily overview and only loads the interactive map when a crew lead taps a specific job. Monthly API cost dropped from $47 to $8.
The AI initially built a beautiful but unnecessary crew scheduling Gantt chart. Rachel deleted the entire component with one prompt: "Remove the Gantt chart. We don't need it. Keep it simple."
Having a real user (her dispatch coordinator, Maria) test the app on day two caught three usability issues that Rachel had missed.

Outcome: FieldSync has been in daily use at GreenScape for five months. All eight crews use it. Rachel estimates it saves 6 hours of administrative time per week across the company. The owner asked her to "sell it to other landscaping companies," which she is now exploring. Total build cost: $0 (Bolt.new free tier was sufficient for the prototype; Lovable's free tier handled the refinements). Ongoing cost: $25/month (Supabase) + $8/month (Google Maps API).

Project 3: Resonance -- Startup MVP That Got Into Y Combinator

What it is: An AI-powered customer feedback analysis platform. Companies connect their support channels (Zendesk, Intercom, email), and Resonance automatically categorizes feedback by theme, sentiment, and urgency. Surfaces product insights that typically take a research team weeks to compile.

Builder Profile: David Park and Jenna Liu, both 27. David is a former ML engineer at a mid-tier AI startup. Jenna was a product manager at Salesforce. Neither had built a full-stack consumer product before. They quit their jobs in September 2025 with savings to cover six months.

Tools Stack:

Claude Code for backend architecture and API integrations
Cursor for frontend development
Next.js 14 with App Router
Supabase for database, auth, and vector storage
OpenAI API for embeddings and classification
Anthropic API for summary generation
Vercel for hosting
Stripe for billing

Build Timeline: Three weeks from first prompt to a working MVP. One additional week for polish before the YC application. Total: four weeks with two people working full-time.

Key Prompts:

Prompt 1 -- System architecture:

Design the architecture for a customer feedback analysis platform.

Data flow:
1. INGEST: Connect to Zendesk, Intercom, and email (IMAP) to pull
   customer messages. Webhook listeners for real-time ingestion.
   Dedup messages that appear in multiple channels.

2. PROCESS: For each message:
   - Generate embedding (OpenAI text-embedding-3-small)
   - Classify sentiment (positive/neutral/negative/urgent)
   - Extract themes (use clustering on embeddings, auto-generate
     theme labels)
   - Score urgency (1-5 based on sentiment + keywords + customer tier)

3. STORE: PostgreSQL for structured data. Supabase pgvector for
   embeddings. Link every insight back to source messages.

4. SURFACE: Dashboard showing:
   - Theme clusters with message counts and trends
   - Sentiment distribution over time
   - Urgent items requiring immediate attention
   - Weekly auto-generated summary of top themes and shifts

Multi-tenant: each company sees only their own data. RLS enforced
at the database level. API keys scoped per integration per company.

Build the ingestion pipeline first. I want to connect a test Zendesk
instance and see messages flowing into the database within the first
session.

Why it worked: David wrote this prompt like a system design document. The level of specificity on data flow, multi-tenancy, and storage separation meant Claude Code generated a clean, well-separated architecture on the first pass. The instruction to get data flowing in the first session kept the AI focused on the critical path.

Prompt 2 -- The insight generation engine:

Build the weekly insight report generator.

Input: All feedback messages from the past 7 days for a given company.

Process:
1. Cluster messages by theme (using cosine similarity on embeddings,
   threshold 0.82)
2. For each cluster with 5+ messages:
   - Generate a theme label (3-5 words)
   - Count messages and calculate sentiment breakdown
   - Identify the most representative message (closest to centroid)
   - Compare to previous week: is this theme growing, shrinking, or new?
3. Rank themes by: (message_count * urgency_avg * growth_rate)
4. Generate executive summary using Claude:
   - 3 paragraphs maximum
   - Lead with the most important shift
   - Include specific numbers
   - End with a recommended action

Output: Structured JSON with themes array and summary text.
Store in reports table. Send via email to company admin.

Handle edge cases: company with fewer than 10 messages that week
(skip report, send "not enough data" note), themes that appear
for the first time (flag as "emerging"), themes that disappear
(flag as "resolved").

Why it worked: The mathematical specificity (cosine similarity threshold, minimum cluster size, ranking formula) gave the AI enough constraints to produce a working implementation without guessing. Jenna later said the ranking formula in the prompt became the actual production ranking formula -- it was that well-specified.

Before/After: Before: David and Jenna had a pitch deck, three notebooks of customer research, and a Figma prototype. No working software. Their previous attempt at building the MVP with traditional development (David coding the backend, contracting a frontend developer) had consumed six weeks and $12,000 in contractor fees with only the auth system and a basic dashboard to show for it.

After: A fully functional platform that could ingest from Zendesk, classify feedback, cluster themes, and generate weekly reports. Three beta customers were using it with real data. The YC demo showed live feedback flowing in and being categorized in real time.

Lessons Learned:

The combination of Claude Code for backend/architecture and Cursor for frontend was more effective than using either tool alone. Claude Code handled the complex data pipeline logic better; Cursor was faster for UI iteration.
AI-generated API integrations (Zendesk, Intercom) worked for the happy path but failed on pagination, rate limiting, and error recovery. These required manual intervention and were the primary source of bugs during beta.
The multi-tenant RLS policies were the single highest-risk component. David reviewed every policy line by line -- this was not a place to vibe.
Having three beta customers during the build, not after, changed everything. Real data exposed clustering issues that synthetic test data never would have.
YC partners were not impressed by the fact that it was vibe-coded. They were impressed by the speed: four weeks from zero to three paying customers with real usage data.

Outcome: Accepted into Y Combinator W26 batch. Raised a $500K pre-seed round before the batch started. Currently at $8,400 MRR with 14 paying companies. David estimates the vibe coding approach saved them three months and $40,000+ in development costs compared to traditional development, which directly extended their runway.

Project 4: karandev.co -- Developer Portfolio That Landed a Job

What it is: A personal developer portfolio site with interactive project showcases, a working blog with MDX support, an AI chatbot trained on the builder's resume and projects, and a live "what I'm working on" status pulled from GitHub and Spotify APIs.

Builder Profile: Karan Patel, 22. Recent computer science graduate from a state university. Solid fundamentals in Python and Java from coursework, but limited experience with modern web frameworks. Had applied to 47 junior developer positions with a plain HTML resume site. Zero callbacks.

Tools Stack:

Cursor (Composer mode) for all development
Next.js 14 with App Router
Tailwind CSS + Framer Motion for animations
MDX for blog posts
Vercel AI SDK + OpenAI for the resume chatbot
GitHub API + Spotify API for live status widgets
Vercel for hosting

Build Timeline: One full week of focused work during winter break. Approximately 40 hours total.

Key Prompts:

Prompt 1 -- Portfolio design direction:

Build a developer portfolio site that will make a hiring manager stop
scrolling. Next.js 14 App Router with Tailwind CSS.

Design: Dark theme. Subtle grain texture background. Smooth scroll.
Minimal but not boring. Accent color: electric blue (#3B82F6).
Typography: Inter for body, JetBrains Mono for code snippets.

Sections:
1. Hero: My name in large type. One-line tagline that rotates between
   3 phrases (typed animation effect). Small "scroll down" indicator.
2. About: 2-paragraph bio. Photo (circular, subtle border glow).
   Tech stack icons grid (React, Python, TypeScript, etc.) with
   hover tooltips.
3. Projects: 3-4 cards in a grid. Each card: screenshot, title,
   one-line description, tech tags, links to live demo + GitHub.
   Cards tilt slightly on hover (3D transform). Click to expand
   into full case study.
4. Blog: Latest 3 posts pulled from MDX files. Title, date, read time,
   excerpt. Link to full post.
5. Contact: Simple email form (Resend API). Social links row.

Page transitions: smooth with Framer Motion. Sections fade-in on scroll.
Performance: 95+ Lighthouse score. No layout shift.

Why it worked: The prompt read like a creative brief, not a feature list. Details like "grain texture background," "cards tilt slightly on hover," and "typed animation effect" gave the AI a visual vision to execute against. The Lighthouse score target acted as a quality gate.

Prompt 2 -- The resume chatbot:

Add an AI chatbot to the portfolio that answers questions about me.

It should be a small floating chat bubble in the bottom right corner.
When opened, it expands into a chat window. Powered by OpenAI GPT-4o-mini
via the Vercel AI SDK.

System prompt for the chatbot:
"You are a helpful assistant on Karan Patel's portfolio website.
You answer questions about Karan's skills, experience, projects,
and education based on the context provided. You are friendly,
concise, and professional. If asked something not covered in the
context, say you don't have that information and suggest emailing
Karan directly. Never make up information about Karan."

Context document (embed this in the system prompt):
[I will paste my resume and project descriptions here]

Features:
- Streaming responses (token by token appearance)
- Suggested starter questions: "What are Karan's top skills?",
  "Tell me about his projects", "What is his education background?"
- Rate limit: max 20 messages per session to control API costs
- Chat history persists in the browser session (sessionStorage)
- Mobile responsive: full-width chat panel on screens under 640px

Why it worked: Providing the exact system prompt within the development prompt eliminated a round of iteration. The rate limit and cost control details showed practical thinking that the AI translated directly into implementation.

Before/After: Before: A single-page HTML resume with a white background, Times New Roman font, and three bullet-pointed project descriptions. Karan described it as "what you'd get if you exported a Google Doc to HTML." Forty-seven applications sent. Zero interviews.

After: A polished portfolio with smooth animations, interactive project showcases, a working blog, and an AI chatbot that could answer recruiter questions about Karan's experience at 2 AM. The chatbot alone generated over 600 conversations in the first month.

Lessons Learned:

The AI chatbot was the differentiator. Three interviewers specifically mentioned it. One said, "I asked your chatbot about your Python experience and it convinced me to bring you in."
Framer Motion animations generated by AI worked but were initially too aggressive (elements flying in from all directions). Karan's best prompt was a one-liner: "Reduce all animations to subtle fades and slight upward slides. Nothing should feel like a PowerPoint transition."
The Spotify "now playing" widget was a fun addition but caused a privacy concern Karan had not anticipated -- it was broadcasting his music taste to potential employers during interviews. He added a toggle to disable it.
MDX blog setup took longer than expected. The AI-generated MDX configuration worked for basic posts but broke on code blocks with certain languages. This required actual debugging rather than prompt iteration.
Total cost: $0 for the build. Approximately $3/month for the OpenAI API calls powering the chatbot (GPT-4o-mini is cheap at volume).

Outcome: Karan posted the portfolio on r/webdev, Twitter, and LinkedIn. The Reddit post received 1,200 upvotes. The portfolio has had 14,000 unique visitors in three months. He received 11 interview requests in the first two weeks after launching. Accepted a junior full-stack developer role at a Series B startup in San Francisco. Starting salary: $135,000 -- $30,000 more than the median offer for new grads from his university. His manager later told him: "The portfolio showed us you could ship, not just code."

Project 5: Dungeon of Echoes -- A Game Built by a Teenager

What it is: A browser-based roguelike dungeon crawler with procedurally generated levels, pixel art aesthetics, turn-based combat, and a permadeath mechanic. Players descend through floors, collect loot, fight monsters, and try to reach floor 50. Leaderboard tracks the deepest floor reached.

Builder Profile: Aiden Nakamura, 16. High school junior in Portland, OR. Plays video games constantly. Had completed a Python basics course on Codecademy and built a few simple scripts. No web development or game development experience. Started this project during a snow day when school was cancelled.

Tools Stack:

Replit Agent for initial game prototype
Claude.ai (free tier) for debugging and game design advice
HTML5 Canvas for rendering
Vanilla JavaScript (no frameworks)
localStorage for save data and leaderboard
Replit hosting (free tier)

Build Timeline: Two weeks of after-school sessions (2-3 hours each) plus two full weekend days. Total: approximately 35 hours.

Key Prompts:

Prompt 1 -- The game concept:

Build a roguelike dungeon crawler game in HTML5 Canvas and JavaScript.
No frameworks, just vanilla JS.

The player starts on floor 1 of a dungeon. Each floor is a grid of
rooms generated randomly. The player moves with arrow keys. Each room
can contain: nothing, a monster, a treasure chest, a health potion,
or stairs down to the next floor.

Combat is turn-based. Player and monster take turns attacking. Damage
is based on attack stat minus defense stat plus a random factor.
When a monster dies, it drops gold and maybe an item.

Items: sword (increase attack), shield (increase defense), potion
(restore health). Items have rarity levels: common (white), rare (blue),
epic (purple). Higher rarity = better stats.

Permadeath: when the player dies, the run is over. Show a death screen
with stats: floors cleared, monsters killed, gold collected, time played.

Visual style: 16x16 pixel art aesthetic using simple colored squares
and basic shapes. Dark background. The dungeon should feel gloomy.

Start with movement and room generation. Add combat second.
Add items third. Add the death screen last.

Why it worked: Breaking the build into a clear sequence (movement, then combat, then items, then death screen) matched how game development actually works -- you get the core loop right before adding layers. Aiden said the AI "built each layer perfectly because it always had the previous layer working first."

Prompt 2 -- Making combat feel satisfying:

Combat feels boring. When I attack a monster or it attacks me,
nothing happens visually. Make it feel impactful:

1. Screen shake: brief shake (3 frames) when any attack lands
2. Damage numbers: float upward from the target and fade out, red for
   damage, green for healing
3. Flash effect: the hit target flashes white for 2 frames
4. Death animation: when a monster dies, it fades out and drops
   pixel particles downward
5. Sound: I know we can't do real sound easily, so fake it --
   flash the screen border red briefly on hit to give visual "impact"

Keep the turn-based system. These are just visual effects layered on
top of the existing combat logic. Do not change how damage calculation
works.

Why it worked: The constraint "do not change how damage calculation works" prevented the AI from rewriting the combat system while adding effects. Aiden had learned from an earlier mistake where asking for "better combat" caused the AI to replace his entire combat module.

Before/After: Before: Aiden had tried to build a game three times previously. Attempt one: followed a YouTube tutorial for a platformer in Unity, got stuck on collision detection, gave up after four hours. Attempt two: tried Godot, spent a weekend learning the editor, never got past the main menu. Attempt three: started a text adventure in Python, finished it, but wanted something visual.

After: A fully playable, visually polished (for a browser game) roguelike with 50 floors of content, seven monster types, fifteen items, a working leaderboard, and combat that "actually feels fun to play" according to the comments on his Reddit post.

Lessons Learned:

Replit Agent was the right starting point for a first-time game builder. The instant preview and zero-configuration hosting removed all friction.
Game feel (screen shake, particles, damage numbers) transforms a boring prototype into something people want to keep playing. Aiden spent 20% of total time on these "polish" effects and considers it the best time investment.
Procedural generation produced occasional unwinnable floors where the stairs were placed in a room surrounded by walls with no entrance. Aiden fixed this by adding a post-generation validation step -- a prompt asking the AI to "verify that every room with stairs is reachable from the spawn point. If not, regenerate."
localStorage has a size limit. After extended play sessions with many leaderboard entries, the game crashed. Aiden learned about data size limits the hard way and added cleanup logic.
Aiden's classmates became his QA team. They found six bugs in the first day, all of which Aiden fixed by pasting error descriptions into Claude.

Outcome: Posted on r/roguelikes and r/IndieGaming. The Reddit post received 480 upvotes. The game has been played over 8,000 times. Aiden's computer science teacher gave him extra credit and invited him to present the project to the class. He is now building a multiplayer version and has started learning React "for real" because he wants to understand what the AI was generating. He says: "Vibe coding got me through the door. Now I actually want to learn what's behind the door."

Project 6: The Copper Pot -- E-Commerce Site for a Small Business

What it is: A full e-commerce storefront for an artisanal cookware shop in Asheville, NC. Features a product catalog with high-resolution image galleries, size/finish variants, a shopping cart with saved-cart recovery, Stripe checkout, order tracking, and an admin panel for inventory management.

Builder Profile: Linda Brennan, 52. Owner of The Copper Pot, a brick-and-mortar cookware shop she has run for 18 years. Zero programming experience. Previously paid a local agency $8,500 to build a Shopify store that she found difficult to update and expensive to maintain ($79/month for Shopify Plus plus agency retainer for changes). Heard about vibe coding from her nephew who is a software developer.

Tools Stack:

Lovable for storefront and admin panel
Supabase for product database, auth, and image storage
Stripe for payment processing
Vercel for hosting
Resend for order confirmation emails

Build Timeline: Five days of working on it during slow hours at the shop, plus two evenings. Total: approximately 20 hours.

Key Prompts:

Prompt 1 -- The storefront:

Build an online store for my cookware shop called "The Copper Pot."

I sell high-end copper pots, pans, and kitchen tools. My customers
are home cooks aged 35-65 who appreciate craftsmanship. The feel
should be warm, artisanal, and trustworthy. Think: exposed brick,
natural tones, and beautiful product photography.

Pages:
1. Home: hero image with tagline "Handcrafted Copper Cookware Since
   2008", featured products grid (6 items), testimonial carousel,
   Instagram-style gallery of kitchen photos
2. Shop: filterable product grid. Filters: category (pots, pans,
   tools, sets), price range, material. Sort by price, newest,
   popularity.
3. Product detail: large image gallery (click to zoom), product
   description, size/finish selector, price, add to cart button,
   "You might also like" section with 3 related products.
4. Cart: line items with quantity adjustment, subtotal, shipping
   estimate, proceed to checkout.
5. About: our story, photo of the shop, craftsmanship values.
6. Contact: form + shop address + embedded Google Map.

Colors: warm cream background (#FDF8F0), copper accent (#B87333),
dark text (#2D2926). Font: serif headers (Playfair Display),
sans-serif body (Lato).

Mobile must be perfect. Most of my customers browse on their phones.

Why it worked: Linda described her customers and brand feeling, not technical specifications. The AI translated "warm, artisanal, and trustworthy" and "exposed brick, natural tones" into a design that Linda said "looks exactly like my shop feels." The color hex codes were her nephew's contribution -- he helped her pick colors that matched her physical store's palette.

Prompt 2 -- Admin inventory management:

Add an admin panel that only I can access (password protected).

I need to:
1. Add new products: name, description, price, category, images
   (upload multiple), sizes available, stock count for each size
2. Edit existing products: change any field, reorder images
3. Mark products as "sold out" (shows badge on storefront but
   keeps the page live) or "hidden" (removes from storefront)
4. View orders: list with date, customer name, items, total,
   status (paid / shipped / delivered). Click to see full details.
5. Update order status and add tracking number (customer gets
   an email when I mark it as shipped)
6. Simple dashboard: total revenue this month, number of orders,
   top selling products

Keep it simple. I am not technical. Big buttons, clear labels.
When I upload images, automatically resize them for the web
(I take photos on my phone and they are very large files).

Why it worked: "I am not technical. Big buttons, clear labels." This single line shaped the entire admin interface. The AI generated an admin panel with a significantly simpler layout than a typical CMS, with confirmations on every destructive action and undo options. The automatic image resizing solved a real problem -- Linda's phone photos were 4MB each.

Before/After: Before: A Shopify store that cost $8,500 to build and $79/month to maintain. Linda could not update product descriptions without emailing her agency and waiting 48 hours. Adding new products required a $150/change agency fee. The site looked generic -- it used a standard Shopify theme that looked identical to thousands of other stores.

After: A custom storefront that matches The Copper Pot's physical brand identity. Linda updates products herself through the admin panel. No monthly platform fees beyond Supabase ($25/month) and Vercel ($0 -- free tier). Stripe charges are 2.9% + $0.30 per transaction (same as Shopify).

Lessons Learned:

Lovable was the right tool for someone with zero programming experience. Linda never saw a line of code. She described what she wanted in plain English and refined the results visually.
Product photography matters more than website design. Linda initially uploaded poorly lit phone photos and the site looked "cheap." Her nephew helped her photograph products with natural light, and the same site suddenly looked premium.
Stripe integration through Lovable worked seamlessly for simple checkout. However, Linda needed to handle sales tax, which required adding a tax calculation service. This was the only part where she needed her nephew's help.
The "saved cart recovery" feature (emailing customers who abandoned carts) was not in Linda's original plan. The AI suggested it during a prompt about the checkout flow. It recovers approximately $300-$400 in sales per month.
Shipping calculation was the hardest problem. USPS API integration was unreliable, so Linda switched to flat-rate shipping tiers ($8 / $12 / free over $150), which was simpler and actually increased average order value.

Outcome: Online sales in the first three months: $23,400. Previous Shopify store's best three-month period: $9,100. The warm, custom design and improved product photography drove a 34% increase in conversion rate compared to the old Shopify store. Linda's monthly tech costs dropped from $79 (Shopify) + agency retainer to $25 (Supabase). She saved approximately $3,000 in the first year on platform and agency fees alone. Three other local shop owners have asked Linda to help them build similar stores.

Community Stats

Aggregated from 247 community submissions received between October 2025 and January 2026.

Submissions Overview

Metric	Value
Total submissions received	247
Featured projects (all-time)	38
Countries represented	23
Youngest builder	14 (high school student, built a study flashcard app)
Oldest builder	67 (retired accountant, built a family recipe archive)

Builder Background Distribution

Background	Percentage
Professional developer	41%
Student / recent graduate	19%
Non-technical professional	17%
Designer / creative	11%
Founder / entrepreneur	8%
Other (retired, career switcher, hobbyist)	4%

Most Popular Tools

Rank	Tool	Usage Rate
1	Cursor	62%
2	Claude Code	47%
3	Bolt.new	34%
4	Lovable	28%
5	v0	24%
6	Replit Agent	19%
7	GitHub Copilot	16%
8	Windsurf	11%

Note: Percentages exceed 100% because most projects use multiple tools.

Supporting Technology

Category	Most Popular Choice
Framework	Next.js (58%)
Styling	Tailwind CSS (71%)
Database	Supabase (52%)
Hosting	Vercel (64%)
Payments	Stripe (89% of projects with payments)
Auth	Supabase Auth (44%)

Build Time Distribution

Time Range	Percentage
Under 4 hours	12%
4-12 hours	27%
12-24 hours (1-2 days)	31%
1-2 weeks	22%
Over 2 weeks	8%

Average time from first prompt to deployed: 18.4 hours Median time from first prompt to deployed: 14 hours

Project Categories

Category	Count	Percentage
SaaS / web application	72	29%
Internal / business tool	48	19%
Portfolio / personal site	37	15%
E-commerce	29	12%
Game	21	9%
Mobile app	18	7%
Chrome extension	12	5%
CLI tool / developer utility	10	4%

Outcome Metrics

Metric	Value
Projects still actively maintained (after 3+ months)	68%
Projects generating revenue	31%
Average MRR for revenue-generating projects	$840
Highest reported MRR	$12,400
Builders who reported getting hired because of their project	14
Builders who transitioned to full-time on their project	9

Success Patterns

From analyzing all 247 submissions, the projects most likely to succeed shared these characteristics:

Specific problem, specific user. "A tool for landscaping dispatchers" beats "a project management app" every time.
Prompt specificity. Builders who shared detailed, structured prompts (average 150+ words per prompt) had measurably better outcomes than those using short, vague prompts.
Early deployment. Projects deployed within the first 25% of total build time had a 73% continuation rate. Projects that waited until "done" to deploy had a 41% continuation rate.
Real users during build. 82% of revenue-generating projects had at least one real user testing before the builder considered it complete.
Two tools, not five. The most successful builders typically used one primary AI coding tool and one supporting tool. Projects that used four or more AI tools had lower completion rates, likely due to context-switching overhead.

Monthly Spotlight

March 2026 Spotlight: FleetTrack

Category: B2B SaaS / Logistics Builder: Raj Patel, 27, operations analyst at a logistics company Tools: Claude Code (Opus 4.6), Next.js 16, Supabase, Mapbox, Vercel Build time: 18 hours over one weekend

The Story: Raj managed a fleet of 40 delivery vehicles using spreadsheets and phone calls. He had never written production code before but had been following vibe coding tutorials on the EndOfCoding YouTube channel. When his manager complained about the lack of real-time visibility into delivery routes, Raj decided to build a solution himself.

His opening prompt to Claude Code:

Build a real-time fleet tracking dashboard with Next.js 16 and Supabase.

Core features:
1. Map view showing all active vehicles with live GPS positions
   (use Mapbox GL JS). Each vehicle is a colored dot -- green for
   on-schedule, yellow for delayed, red for stopped.
2. Sidebar with vehicle list, sortable by status, driver name, or
   ETA to next stop. Clicking a vehicle centers the map and shows
   route history for today.
3. Driver mobile view: a simple page where drivers tap "Arrived"
   at each stop. Auto-captures GPS coordinates. Works offline and
   syncs when back online.
4. Daily summary: auto-generated at 6 PM showing total deliveries,
   average time per stop, vehicles that went off-route, and fuel
   estimates based on distance traveled.

Auth via Supabase magic link. Role-based: admin sees everything,
drivers see only their own route. Use Supabase real-time subscriptions
for live vehicle position updates.

The dashboard must feel fast. Sub-200ms updates on the map.

Raj had a working prototype by Saturday night. By Sunday evening, he had added route optimization suggestions using a simple nearest-neighbor algorithm. He deployed to Vercel and showed it to his manager on Monday morning. Within two weeks, all 40 vehicles were using FleetTrack. The company cancelled its $800/month fleet management subscription.

Why we selected it: FleetTrack represents the next wave of vibe coding impact: non-developers building real B2B tools that replace expensive SaaS subscriptions. Raj's prompt demonstrates strong domain expertise combined with specific technical requirements -- the sweet spot where vibe coding delivers maximum value. The offline-sync requirement for drivers shows thoughtful product thinking that no AI would have suggested on its own.

Previous: February 2026 Spotlight: QuietPage

Category: Productivity tool Builder: Sana Mirza, 31, UX designer at a remote-first company Tools: Cursor, Next.js, Supabase, Vercel Build time: 11 hours over three evenings

The Story: Sana was frustrated by every writing app she tried. Google Docs felt corporate. Notion was too feature-heavy. iA Writer was beautiful but did not sync across devices. She wanted a writing tool that was quiet, distraction-free, synced to the cloud, and had exactly one feature beyond basic text editing: a daily word count streak tracker.

Sana opened Cursor on a Tuesday evening with this prompt:

Build a minimal writing app. I mean truly minimal.

One page. No sidebar. No toolbar. No menus visible by default.
Just a white page with a blinking cursor. The user types.

Auto-save to Supabase every 30 seconds and on every pause longer
than 2 seconds. Show a subtle "saved" indicator that fades in and
out -- bottom right corner, small gray text, disappears after 1 second.

One feature: daily word count streak. If the user writes at least
200 words today, the streak continues. Show the streak as a small
flame icon with a number in the top right corner. That is the only
UI element visible while writing.

Keyboard shortcuts (show on hover over a small "?" icon, bottom left):
- Cmd+B: bold
- Cmd+I: italic
- Cmd+Shift+H: toggle heading
- Cmd+/: toggle dark mode

No sign-up wall. Auth via magic link only. No password to remember.

If the writing app does not feel calm, it has failed.

The result was a writing app that four of Sana's coworkers started using within a week. She posted it on Hacker News with the title "I built the quietest writing app on the internet." It hit the front page. Within a month, QuietPage had 2,800 registered users and Sana was considering adding a $5/month premium tier for features like version history and export to PDF.

Why we selected it: QuietPage demonstrates that vibe coding is not just for building complex systems. Sometimes the hardest product decision is what to leave out. Sana's prompt is a masterclass in constraint-driven design, and the result is a product people genuinely prefer over established alternatives -- not because it does more, but because it does less, better.

Have a project that should be featured in next month's spotlight? Submit it using the template above.

Explore Further

Get the complete prompt library in Chapter 17: The Complete Prompt Library -- 200+ production-ready prompts for every stage of AI-native development.
Compare tools in Chapter 18: Tool Comparison Matrix -- Side-by-side evaluation of every major vibe coding tool.
Secure your project with Chapter 19: The Security Playbook -- The pre-launch checklist every vibe-coded project needs.
Try hands-on at vibe-coding.academy -- Interactive tutorials and guided projects.
Join the discussion at endofcoding.com -- Community forum, Discord, and weekly office hours.

This chapter is updated monthly with new featured projects and refreshed community stats. Last updated: March 2026.

← Previous Next: Take the Quiz →

★ What Level Are You?

Updated March 6, 2026

Answer 6 questions to discover your vibe coding level.

★ Glossary

Updated March 6, 2026

Vibe Coding: AI-assisted development where the developer describes intent in natural language and evaluates output through execution, not code review.
Accept All: The practice of accepting all AI-generated code changes without reviewing diffs.
Coding Agent: An autonomous AI system that can plan, implement, test, and deploy code changes independently.
Composer: A mode in AI IDEs (like Cursor) that generates multi-file code from natural language descriptions.
Error-Driven Development: Debugging by copy-pasting error messages to the AI rather than reading and understanding the code yourself.
MCP (Model Context Protocol): Anthropic's open protocol allowing AI assistants to connect to external tools and data sources.
Prompt Engineering: The skill of crafting effective natural language instructions to produce desired AI outputs.
Vibe Coding Hangover: The phenomenon of teams struggling to maintain, extend, or debug AI-generated codebases. Documented by Fast Company in Sept 2025.
Zombie App: An application that is functional but unmaintainable because nobody understands the AI-generated code.
Complexity Ceiling: The point at which a vibe-coded application can no longer be extended because the underlying code is too tangled.
Hybrid Workforce: An organization where AI agents work alongside human engineers, as pioneered by Goldman Sachs with Devin.
The 80/20 Rule: Vibe code the 80% (UI, boilerplate, standard patterns). Engineer the 20% (auth, security, business logic).
Agent Teams: A feature in Claude Code (introduced with Opus 4.6) allowing multiple AI agents to work in parallel on different aspects of a project, coordinating autonomously.
Agent Mode: A capability in coding tools (GitHub Copilot, Cursor, etc.) where the AI autonomously identifies subtasks, makes multi-file edits, runs tests, and fixes errors without step-by-step human guidance.
Devin Wiki / Devin Search: Cognition's documentation generation and code search tools built into the Devin platform, enabling AI-generated documentation and natural language querying of codebases.
Multimodal Coding: An emerging trend combining voice, visual, and text-based inputs for AI code generation — including screenshot-to-code and voice-to-code workflows.

← Previous Next: Resources →

★ Resources

Updated March 6, 2026

Tools to Try

Cursor — cursor.com — AI-native IDE ($1B+ ARR, $29.3B valuation)
Claude Code — Anthropic's terminal coding agent with agent teams (Opus 4.6)
GitHub Copilot — github.com/features/copilot — Agent mode in VS Code (4.7M users)
Bolt.new — bolt.new — Browser-based app builder
v0 — v0.dev — AI UI generation by Vercel
Replit — replit.com — Browser IDE with AI agent
Lovable — lovable.dev — App creation for non-developers
Google Jules — jules.google — Async coding agent (Gemini 3 Pro)
Gemini CLI — github.com/google-gemini/gemini-cli — Open-source terminal agent
OpenAI Codex CLI — github.com/openai/codex — Open-source terminal agent
Devin — devin.ai — Autonomous AI software engineer ($155M+ ARR)
Windsurf — windsurf.com — AI IDE with persistent memory (now part of Cognition)

Further Reading
- Karpathy's original tweet (February 2, 2025)
"Vibe Coding in Practice" — arXiv research paper (2025)
"Vibe Coding Kills Open Source" — arXiv research paper (January 2026)
Tenzai security assessment (December 2025)
Cognition's Devin 2025 Performance Review
Fast Company: "The Vibe Coding Hangover" (September 2025)
IBM: "What is Vibe Coding?"
Google Cloud: "Vibe Coding Explained"
Vibe Coding — Wikipedia (comprehensive history and analysis)

Example Projects

Open the HTML files included with this ebook to see working applications built through vibe coding:
- Task Manager (examples/task-manager-example.html) — localStorage, responsive design, animations
Snake Game (examples/snake-game-example.html) — Canvas rendering, game loop, score tracking
Prompt Examples (examples/vibe-coding-prompts.md) — Ready-to-use prompts by category

"The vibes are real. The exponentials are real. The security vulnerabilities are real too. Code wisely."

Last updated: February 25, 2026

Part of the EndOfCoding Content Network

📰 EndOfCoding.com

Articles & thought leadership

🎓 Vibe Coding Academy

Interactive courses & lessons

🎥 @endofcoding

YouTube tutorials & demos

📖 You are here

Premium ebook & prompt library

What's New

Updated April 1, 2026

Every update to this ebook is tracked here. Subscribers get monthly updates with new content, revised chapters, and fresh prompts.

April 2026

April 9, 2026

Chapter 5 (Tools Landscape): Cursor 3 launch (April 2) — Agents Window replaces Composer (multi-agent side-by-side/grid/stacked), Design Mode (click browser UI → agent modifies component), cloud-to-local handoff; Claude Code April 4 OpenClaw policy change — subscription limits no longer cover third-party harnesses, pay-as-you-go required (one-time credit issued), plus PowerShell tool for Windows, 60% faster Write tool diff; GitHub Copilot — Copilot SDK in public preview, Autopilot mode, privacy policy change (training on user data by default from April 24 — opt-out required).
Chapter 9 (Numbers): Added Claude Mythos 93.9% SWE-bench (restricted, Project Glasswing); developer trust declined to 29% (SonarSource 2026, down from 70%+ in 2023); 51% professional devs use AI daily; 64% started using AI agents; 75% PR turnaround reduction (9.6 days → 2.4 days, Index.dev); 3.6 hours/week time saved (survey median); 66% frustrated by "almost right" solutions.
Chapter 19 (Security Playbook): Trivy Cascade extension — CanisterWorm self-propagating npm worm (64+ packages, blockchain C2, evaded domain-seizure takedown), spread to Checkmarx KICS/AST GitHub Actions and LiteLLM (95M monthly PyPI downloads); new "AI as Autonomous Vulnerability Researcher" section covering Claude Mythos/Project Glasswing — autonomous zero-day discovery, implications for vibe-coded app security posture.
Chapter 21 (Intel Brief): Six new April 2–9 incident cards: Cursor 3 (Agents Window + Design Mode); Claude Mythos/Project Glasswing (93.9% SWE-bench, zero-day discovery, defense-only restriction); Meta Muse Spark (Meta Superintelligence Labs first model, April 8); Trivy Cascade → CanisterWorm (blockchain C2, 64+ packages, Checkmarx + LiteLLM spread); Claude outages April 6–8 (10-hour outage, 8,000+ Downdetector reports); GitHub Copilot privacy change (April 24 training-by-default). Numbers section updated with Mythos 93.9%, CanisterWorm 64+ packages, trust 29%, PR turnaround 75%. What to Watch expanded with Copilot opt-out deadline and Mythos GA timeline.

April 1, 2026

Chapter 5 (Tools Landscape): Cursor valuation updated to ~$50B (Bloomberg, fundraising talks at $2B+ ARR); Anthropic acquires Bun (JavaScript runtime) — native Bun integration in Claude Code; GitHub Copilot Agent Mode now fully generally available on both VS Code and JetBrains across all Copilot plans.
Chapter 9 (Numbers): Added 73% global daily AI tool usage (Stack Overflow Dev Survey, Q1 2026) and 41% AI-generated code share (Sourcegraph Code Intelligence Report, March 2026); Cursor valuation updated to ~$50B; GitHub Copilot paid users updated to 20M+.
Chapter 19 (Security Playbook): New "Supply Chain Attacks: April 2026 Alert" section covering Axios npm hijack (March 31 — UNC1069/North Korea, WAVESHAPER.V2 RAT, ~100M weekly downloads); LiteLLM credential stealer (versions 1.82.7/1.82.8, March 24); Langflow RCE CVE-2026-33017 (unauthenticated, CISA KEV, exploited within 20h); Trivy Docker Hub compromise CVE-2026-33634. New "Vibe-Coded App Vulnerability Research" section with Georgia Tech Vibe Security Radar data (2,000+ vulns, 400+ secrets in 5,600 apps) and AI-generated code CVE trend (6→15→35/month).
Chapter 21 (Intel Brief): Transitioned to April 2026 brief. Seven new incident cards: Axios supply chain attack (North Korean state actor), LiteLLM/Langflow/Trivy attacks, Georgia Tech vulnerability research, MCP 97M monthly downloads milestone, Cursor self-hosted cloud agents, Vibe Coding 1-year anniversary + Collins Dictionary Word of the Year, SWE-bench model convergence. Numbers section updated with April figures. "What to Watch in May 2026" replaces April watchlist.

March 2026

March 25, 2026

Chapter 5 (Tools Landscape): Claude Code updated for /loop scheduled tasks, 1M token context, 64k max output for Opus 4.6 (v2.1.63→2.1.76 evolution); Replit updated to $400M Series D at $9B valuation; Lovable updated with M&A offensive; GitHub Copilot JetBrains agentic capabilities GA; Windsurf/Devin updated with Codemaps product.
Chapter 9 (Numbers): AI-generated code share updated to 46% (GitHub); US developer daily usage updated to 92%; Replit $9B valuation added to Valuations section.
Chapter 19 (Security Playbook): New "MCP Supply Chain" section covering OpenClaw attack (1,184 malicious packages, ~1 in 5 in ClawHub), CVE-2026-23744 (CVSS 9.8 MCPJam RCE), Azure MCP RCE (CVSS 9.6), 36.7% SSRF exposure across MCP servers, with actionable protection checklist.
Chapter 21 (Intel Brief): Six new incident cards for week of March 18-25: Claude Code /loop, Replit Series D, Lovable M&A, Devin Review + Windsurf Codemaps, Copilot JetBrains GA, OpenClaw supply chain attack. Numbers section updated. "What to Watch" expanded with MCP security, Lovable M&A, Replit ARR target.

March 7, 2026

Chapter 5 (Tools Landscape): Cursor updated to v2.6 (Automations, JetBrains support, MCP Apps). OpenAI Codex CLI updated for GPT-5.4 (native computer use, 1M token context). Claude Code updated with voice mode, $2.5B+ ARR, Pentagon supply-chain risk note. Added Kilo Code (open-source, 1.5M+ users). GitHub Copilot updated to 26M+ users with GPT-5 mini/GPT-4.1 included. Windsurf updated with Gemini 3.1 Pro and LogRocket #1 ranking.
Chapter 9 (Numbers): Claude Code ARR updated to $2.5B+. Copilot users updated to 26M+. Added Emergent AI ($50M ARR in 7 months), Cognition ($500M raise, $10B valuation, $82M+ ARR). Added developer sentiment section (84% use AI, only 3% high trust, 60% favorable view down from 70%+, 15% professional vibe coding adoption). Collins Dictionary Word of the Year updated for 2026.
Chapter 19 (Security Playbook): Added AI Tool Security Advisories section covering Claude Code CVEs (CVE-2025-59536 RCE, CVE-2026-21852 API key exfiltration) with actionable guidance on AI tool attack surfaces.
Chapter 21 (Intel Brief): Added GPT-5.4 launch (computer use, 1M tokens, financial tools). Added Pentagon/Anthropic conflict. Added Claude Code voice mode and CVE patches. Added Kilo Code launch. Added Qwen 3.5 (open weights, 74.1% LiveCodeBench). Updated Cursor to 2.6. Updated Cognition $500M raise. Added developer sentiment and Emergent AI stats. Expanded "What to Watch" with EU AI Act, Kilo Code growth, Pentagon resolution.

March 6, 2026

Chapter 21: Complete rewrite of Monthly Intelligence Brief for March 2026 — open source crisis, Gemini 3 in Jules, Cursor 2.5 subagents, Copilot multi-model access, Pega enterprise vibe coding, Opus 4.6 agent teams, Devin 2.2
Chapter 22: New March 2026 Spotlight: FleetTrack — B2B fleet management built by an operations analyst using Claude Code
Chapter 5: Updated tool references for Cline, Jules, and March 2026 landscape
Chapter 9: Updated GitHub Copilot stats (26M+ users), Devin metrics (67% PR merge rate, $10.2B valuation), Claude Code revenue ($2.5B+)
Landing page: Updated social proof stats, added Vibe Coding Academy cross-promotion section with UTM tracking
All chapters: Updated badges to March 6, 2026

March 1, 2026

Build System: Introduced automated build pipeline for chapter management and updates
Changelog: Added this changelog section — subscribers can now see exactly what changed and when
Per-Chapter Badges: Each chapter now shows its last-updated date
All Chapters: Initial release of all 22 chapters with 200+ prompts

February 2026

February 25, 2026

Initial release: All 22 chapters published
Chapter 1: The Moment Everything Changed — complete timeline from Karpathy's tweet to Opus 4.6
Chapter 5: Full tools landscape covering Cursor, Claude Code, Devin, Jules, Gemini CLI, Codex CLI
Chapter 10: Security analysis including Tenzai study and IDEsaster disclosure
Chapter 17: 200+ production-ready prompts across 10 categories
Chapter 18: Comprehensive tool comparison matrix
Chapter 19: The 30-minute security checklist for vibe-coded applications
Chapter 22: Community showcase with submission guidelines

April 21, 2026

Chapter 21: Monthly Intel Brief updated to version 1.7 — added two incident cards for April 15–21: Claude Opus 4.7 (87.6% SWE-bench Verified, April 18) and Azure MCP Server 2.0 stable release + OAuth 2.1 added to core MCP spec. Callout headline updated. Previous: April 15 — Vercel Vinext CVEs, GLM-5.1, Claude Code reliability cluster.

Vibe Coding

Choose Your Plan

Frequently Asked Questions

Get a free chapter + weekly vibe coding insights

01. The Moment Everything Changed

The Timeline

02. What Vibe Coding Actually Is

The Three Core Loops

What Vibe Coding Is NOT

03. The Philosophy: Trusting the Machine

The End of Code as Sacred Text

The Four Pillars

The Abstraction Argument

04. The Spectrum: Five Levels of AI-Assisted Development

05. The Tools: A Complete Landscape (2025–2026)

AI-Native IDEs

Autonomous Coding Agents

Browser-Based Builders

The Infrastructure Layer: MCP

The Model Race (March 2026 Update)

06. The Agent Revolution

From Copilot to Colleague

What Agents Can Do Today

The April 2026 Benchmark Picture

New Agent Orchestration Frameworks (April 2026)

What Agents Still Struggle With

The Parallel Execution Advantage

07. Vibe Coding in Practice: Real Workflows

08. Real-World Case Studies

09. The Numbers: Adoption and Impact

Adoption

AI Market Share (March–April 2026)

The Agentic Model Race (April 2026)

Revenue & Growth

Valuations (2026)

Productivity

Developer Sentiment (April 2026)

Cultural Impact

10. The Dark Side: Security, Debt, and Failure

The Tenzai Security Study

The Acceleration: 35 CVEs in One Month

Documented Security Incidents

AI as Vulnerability Hunter: The Other Side of the Coin

The Threat Landscape: Ransomware Meets AI

The AI Slopageddon: Open Source Fights Back

The $1.5 Trillion Technical Debt Problem

The "Vibe Coding Hangover"

11. The Great Debate

12. When to Vibe (and When Not To)

🟢 Green Light: Vibe Code Away

🟠 Yellow Light: Proceed with Caution

🔴 Red Light: Don't Vibe Code

13. Mastering the Craft: Advanced Techniques

The Art of the Initial Prompt

Weak vs. Strong Prompts

Key Patterns

14. Building a Sustainable Workflow

15. The Business of Vibes

The New Cost Structure

The New Archetypes

The Talent Shift

16. What Comes Next

Now (Early 2026) — Already Happening

Near-Term (Late 2026)

Medium-Term (2027-2028)

Long-Term (2029+)

Conclusion

Chapter 17: The Complete Prompt Library

How to Use This Library

Category 1: Project Kickoff Prompts

1.1 The Complete Spec Prompt (Expert)

1.2 The Weekend Prototype Prompt (Beginner)

1.3 The "Clone This" Prompt (Intermediate)

1.4 The Landing Page Prompt (Beginner)

Category 2: Feature Addition Prompts

2.1 Authentication System (Advanced)

2.2 Payment Integration (Advanced)

2.3 Real-Time Features (Advanced)

2.4 Search and Filter System (Intermediate)

Category 3: UI/UX Prompts