> ## Documentation Index
> Fetch the complete documentation index at: https://docs.gr4vy.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Architecture

> How MCP, tools, resources, and widgets fit together in a ChatGPT app.

The patterns on this page apply to every implementation language and guide the decisions you'll make when you [set up the project](/guides/payments/chatgpt-app/setup) and [build the server](/guides/payments/chatgpt-app/mcp-server).

## How MCP works

The [Model Context Protocol](https://modelcontextprotocol.io) (MCP) is an open protocol that standardises how AI models discover and interact with external applications. Your server declares what it can do, and ChatGPT decides when and how to use those capabilities during a conversation.

MCP defines two core primitives:

* **Tools** — Actions ChatGPT can invoke. Each tool has a name, a description (so ChatGPT knows *when* to use it), an input schema (*what parameters* to pass), and a handler (your business logic).
* **Resources** — Content ChatGPT can present to the user. In a ChatGPT app, resources are HTML widgets embedded as iframes in the conversation—product catalog, shopping cart, and checkout form.

When a user installs your app, ChatGPT connects to your MCP server, discovers the available tools and resources, and invokes them based on conversation context.

Here is what a single interaction looks like end-to-end:

<Steps>
  <Step title="User sends a message">
    "What plants would do best in the sunroom? There are several windows on three sides of the room."
  </Step>

  <Step title="ChatGPT selects a tool">
    ChatGPT reads your tool descriptions and decides to call `list_products(sunlight: "Full Sun")`.
  </Step>

  <Step title="Your MCP server returns a three-part response">
    * A text summary ("Showing 12 plants") for ChatGPT's conversational context
    * Structured data (the product array + filter options) for the widget to render
    * Metadata pointing to the product catalog widget
  </Step>

  <Step title="ChatGPT embeds the widget">
    ChatGPT fetches the product catalog resource (a self-contained HTML file) and embeds it as an iframe in the conversation.
  </Step>

  <Step title="The widget renders">
    The widget receives the structured data and renders an interactive product grid with the "Full Sun" filter pre-selected.
  </Step>

  <Step title="The user interacts">
    The user browses products and adds items to their cart—all within the widget, with no further round-trips to ChatGPT until the user triggers a new action.
  </Step>
</Steps>

## Anatomy of a ChatGPT MCP app

<Frame>
  <img src="https://mintcdn.com/gr4vy/-cAfR0gul8Wa25yX/assets/images/embed/chatgpt-app.svg?fit=max&auto=format&n=-cAfR0gul8Wa25yX&q=85&s=5fadace3f3036eecd7b7f419820c0de2" style={{width: "450px", height: "auto", display: "block", margin: "0 auto"}} alt="Diagram showing the architecture of the system" width="841" height="642" data-path="assets/images/embed/chatgpt-app.svg" />
</Frame>

* **Why tools and resources are separate.** Tools handle server-side logic such as querying your product database, validating prices, and calling the Gr4vy payment API. Resources, on the other hand, contain the client-side UI—the HTML, CSS, and JavaScript that ChatGPT renders in the conversation. When a tool runs, its return value includes both a text summary for ChatGPT's conversational context and structured data that ChatGPT forwards to a widget for display.
* **Why widgets must be self-contained HTML.** ChatGPT renders each widget inside an iframe using the `text/html+skybridge` MIME type. Because the iframe is sandboxed, widgets cannot make external network requests to fetch stylesheets, scripts, or images from your server. This means you must inline all assets—styles, scripts, and any images—into a single HTML file that the resource returns.

## The three-part tool return value

Every tool returns three things:

1. **`content`** — A text summary for ChatGPT's conversation context. Keep this minimal (for example, "Showing 14 products") since the widget handles display.
2. **`structuredContent`** — A JSON object containing the data for the widget. ChatGPT forwards this to the widget as `window.openai.toolOutput`.
3. **`_meta`** — Metadata telling ChatGPT *which* widget to render. The key field is `openai/outputTemplate`, which contains the URI of the resource to embed.

## The widget lifecycle

Every widget follows the same lifecycle using the `window.openai` API inside the iframe:

1. **Receive initial data** — Wait for `window.openai.toolOutput` (the `structuredContent` from the tool).
2. **Restore persisted state** — Read `window.openai.widgetState` to restore state (for example, cart contents) from a previous render.
3. **Render the UI** — Display your app-specific design with the received data.
4. **Persist state on change** — Call `window.openai.setWidgetState(...)` so state survives re-renders.
5. **Listen for updates** — Handle `ui/notifications/tool-result` messages via `postMessage` for fresh data.
6. **Trigger the next step** — Call `window.openai.sendFollowUpMessage(...)` to advance the flow (for example, "Proceed to Checkout").

## Server-side price validation

**Never trust prices, amounts, or cart contents that come in through a tool call.** The arguments your tools receive are assembled by ChatGPT from widget interactions and conversation context—the model could alter a value, or a compromised widget could send manipulated input.

The checkout tool must always look up canonical prices from your product catalog and pin them into the signed Gr4vy embed token. Mismatches are rejected by Gr4vy at checkout.

## Next steps

Continue to [Set up the project](/guides/payments/chatgpt-app/setup).
