Architecture

The patterns on this page apply to every implementation language and guide the decisions you’ll make when you set up the project and build the server.

How MCP works

The Model Context Protocol (MCP) is an open protocol that standardises how AI models discover and interact with external applications. Your server declares what it can do, and ChatGPT decides when and how to use those capabilities during a conversation. MCP defines two core primitives:

Tools — Actions ChatGPT can invoke. Each tool has a name, a description (so ChatGPT knows when to use it), an input schema (what parameters to pass), and a handler (your business logic).
Resources — Content ChatGPT can present to the user. In a ChatGPT app, resources are HTML widgets embedded as iframes in the conversation—product catalog, shopping cart, and checkout form.

When a user installs your app, ChatGPT connects to your MCP server, discovers the available tools and resources, and invokes them based on conversation context. Here is what a single interaction looks like end-to-end:

User sends a message

“What plants would do best in the sunroom? There are several windows on three sides of the room.”

ChatGPT selects a tool

ChatGPT reads your tool descriptions and decides to call list_products(sunlight: "Full Sun").

Your MCP server returns a three-part response

A text summary (“Showing 12 plants”) for ChatGPT’s conversational context
Structured data (the product array + filter options) for the widget to render
Metadata pointing to the product catalog widget

ChatGPT embeds the widget

ChatGPT fetches the product catalog resource (a self-contained HTML file) and embeds it as an iframe in the conversation.

The widget renders

The widget receives the structured data and renders an interactive product grid with the “Full Sun” filter pre-selected.

The user interacts

The user browses products and adds items to their cart—all within the widget, with no further round-trips to ChatGPT until the user triggers a new action.

Anatomy of a ChatGPT MCP app

Diagram showing the architecture of the system

Why tools and resources are separate. Tools handle server-side logic such as querying your product database, validating prices, and calling the Gr4vy payment API. Resources, on the other hand, contain the client-side UI—the HTML, CSS, and JavaScript that ChatGPT renders in the conversation. When a tool runs, its return value includes both a text summary for ChatGPT’s conversational context and structured data that ChatGPT forwards to a widget for display.
Why widgets must be self-contained HTML. ChatGPT renders each widget inside an iframe using the text/html+skybridge MIME type. Because the iframe is sandboxed, widgets cannot make external network requests to fetch stylesheets, scripts, or images from your server. This means you must inline all assets—styles, scripts, and any images—into a single HTML file that the resource returns. The same sandbox blocks popups and top-level navigation, which limits which payment methods can complete inside ChatGPT—see payment method support inside ChatGPT.

The three-part tool return value

Every tool returns three things:

content — A text summary for ChatGPT’s conversation context. Keep this minimal (for example, “Showing 14 products”) since the widget handles display.
structuredContent — A JSON object containing the data for the widget. ChatGPT forwards this to the widget as window.openai.toolOutput.
_meta — Metadata telling ChatGPT which widget to render. The key field is openai/outputTemplate, which contains the URI of the resource to embed.

Every widget follows the same lifecycle using the window.openai API inside the iframe:

Receive initial data — Wait for window.openai.toolOutput (the structuredContent from the tool).
Restore persisted state — Read window.openai.widgetState to restore state (for example, cart contents) from a previous render.
Render the UI — Display your app-specific design with the received data.
Persist state on change — Call window.openai.setWidgetState(...) so state survives re-renders.
Listen for updates — Handle ui/notifications/tool-result messages via postMessage for fresh data.
Trigger the next step — Call window.openai.sendFollowUpMessage(...) to advance the flow (for example, “Proceed to Checkout”).

Server-side price validation

Never trust prices, amounts, or cart contents that come in through a tool call. The arguments your tools receive are assembled by ChatGPT from widget interactions and conversation context—the model could alter a value, or a compromised widget could send manipulated input. The checkout tool must always look up canonical prices from your product catalog and pin them into the signed Gr4vy embed token. Mismatches are rejected by Gr4vy at checkout.

Next steps

Continue to Set up the project.

​How MCP works

​Anatomy of a ChatGPT MCP app

​The three-part tool return value

​The widget lifecycle

​Server-side price validation

​Next steps

How MCP works

Anatomy of a ChatGPT MCP app

The three-part tool return value

The widget lifecycle

Server-side price validation

Next steps