How MCP works
The Model Context Protocol (MCP) is an open protocol that standardises how AI models discover and interact with external applications. Your server declares what it can do, and ChatGPT decides when and how to use those capabilities during a conversation. MCP defines two core primitives:- Tools — Actions ChatGPT can invoke. Each tool has a name, a description (so ChatGPT knows when to use it), an input schema (what parameters to pass), and a handler (your business logic).
- Resources — Content ChatGPT can present to the user. In a ChatGPT app, resources are HTML widgets embedded as iframes in the conversation—product catalog, shopping cart, and checkout form.
User sends a message
“What plants would do best in the sunroom? There are several windows on three sides of the room.”
ChatGPT selects a tool
ChatGPT reads your tool descriptions and decides to call
list_products(sunlight: "Full Sun").Your MCP server returns a three-part response
- A text summary (“Showing 12 plants”) for ChatGPT’s conversational context
- Structured data (the product array + filter options) for the widget to render
- Metadata pointing to the product catalog widget
ChatGPT embeds the widget
ChatGPT fetches the product catalog resource (a self-contained HTML file) and embeds it as an iframe in the conversation.
The widget renders
The widget receives the structured data and renders an interactive product grid with the “Full Sun” filter pre-selected.
Anatomy of a ChatGPT MCP app
- Why tools and resources are separate. Tools handle server-side logic such as querying your product database, validating prices, and calling the Gr4vy payment API. Resources, on the other hand, contain the client-side UI—the HTML, CSS, and JavaScript that ChatGPT renders in the conversation. When a tool runs, its return value includes both a text summary for ChatGPT’s conversational context and structured data that ChatGPT forwards to a widget for display.
- Why widgets must be self-contained HTML. ChatGPT renders each widget inside an iframe using the
text/html+skybridgeMIME type. Because the iframe is sandboxed, widgets cannot make external network requests to fetch stylesheets, scripts, or images from your server. This means you must inline all assets—styles, scripts, and any images—into a single HTML file that the resource returns.
The three-part tool return value
Every tool returns three things:content— A text summary for ChatGPT’s conversation context. Keep this minimal (for example, “Showing 14 products”) since the widget handles display.structuredContent— A JSON object containing the data for the widget. ChatGPT forwards this to the widget aswindow.openai.toolOutput._meta— Metadata telling ChatGPT which widget to render. The key field isopenai/outputTemplate, which contains the URI of the resource to embed.
The widget lifecycle
Every widget follows the same lifecycle using thewindow.openai API inside the iframe:
- Receive initial data — Wait for
window.openai.toolOutput(thestructuredContentfrom the tool). - Restore persisted state — Read
window.openai.widgetStateto restore state (for example, cart contents) from a previous render. - Render the UI — Display your app-specific design with the received data.
- Persist state on change — Call
window.openai.setWidgetState(...)so state survives re-renders. - Listen for updates — Handle
ui/notifications/tool-resultmessages viapostMessagefor fresh data. - Trigger the next step — Call
window.openai.sendFollowUpMessage(...)to advance the flow (for example, “Proceed to Checkout”).