Building a Multi-Agent Research System with Strands Agents TypeScript SDK and Bedrock AgentCore

Building a Multi-Agent Research System with Strands Agents TypeScript SDK and Bedrock AgentCore

S

Stewart Moreland

Scope and Prerequisites

This guide is a deep, implementation-focused tutorial for building a production-oriented multi-agent research system where:

  • Agents are implemented in TypeScript using the Strands Agents TypeScript SDK (with multi-agent orchestration patterns such as "Agents-as-Tools") [1]
  • The system deploys into Amazon Bedrock AgentCore Runtime and integrates "adjacent" AgentCore services: Gateway, Memory, Identity, Policy, Browser, Code Interpreter, Observability, and Evaluations [2]
  • The repo is a Turborepo monorepo using Yarn workspaces, and includes:
    • TypeScript agent runtime service (multi-agent orchestration + streaming)
    • AWS CDK infrastructure (TypeScript) provisioning Runtime, Memory, Gateway, built-in tools, and auth
    • A React frontend that shows a chat UI and a visual "thinking" timeline driven by streaming events [3]

Baseline Prerequisites (Practical Minimums)

  • Node.js 20+ (required by AgentCore TypeScript quickstart and the AgentCore TypeScript SDK README) [5]
  • AWS access configured locally (AWS CLI credentials) and Bedrock model access enabled where needed [6]
  • Yarn workspaces and a Turborepo pipeline configuration (with declared "outputs" for caching) [7]
  • AWS CDK v2; AgentCore resources can be provisioned with L1 CloudFormation constructs (aws-cdk-lib/aws-bedrockagentcore) or the alpha L2 module (@aws-cdk/aws-bedrock-agentcore-alpha) [8]

Architecture and Service Mapping

Reference Architecture

The system is easiest to reason about if you treat AgentCore as two planes:

  • Control plane (create/update resources): CDK/CloudFormation and/or bedrock-agentcore-control API calls create Runtime, Memory, Gateway, built-in tools, and Identity configuration [9]
  • Data plane (run-time interactions): the agent runtime receives /invocations (SSE) calls, uses Memory for recall + logging events, calls Gateway MCP tools, and may start Browser/Code Interpreter sessions [10]

Below is a practical end-to-end flow that keeps the UI simple while still supporting Identity + policy enforcement in a real system:

Loading diagram...

"Thinking" Visualization Strategy

Strands streaming gives you a highly usable "thinking" surface without exposing sensitive internal reasoning:

  • Lifecycle events and tool events let you show when the model is planning, calling tools, and receiving results [3]
  • Model streaming events include a distinct channel for reasoning deltas (when supported by the underlying model/provider), and the SDK explicitly models "reasoning blocks" and "redactedContent" [11]

A robust UI pattern is:

  • Always render tool/lifecycle timeline (safe, deterministic)
  • Optionally render model "reasoning" stream behind a toggle, and be prepared for redaction (store it only transiently and don't persist it to memory)

Comparison Table of AgentCore Services for a TypeScript Multi-Agent System

AgentCore CapabilityWhat It DoesWhere It Fits in a TypeScript Strands SystemTypeScript Integration Notes
RuntimeRuns your agent as a managed, session-isolated runtime and exposes /invocations (SSE)Hosts the orchestrator + specialists and streams events to UIUse BedrockAgentCoreApp for a compliant server, or implement the HTTP contract yourself
GatewayConverts APIs/Lambda/OpenAPI/Smithy targets into MCP tools behind a single endpointGives agents a unified "tool catalog" and tool execution surfaceCDK supports MCP protocol config + auth; agents use a wrapper or Strands McpClient (Streamable HTTP) to call tools [12]
MemoryShort-term events + long-term extracted records for cross-session personalizationAdd recall context, store conversation events, and support long-term learningUse CreateEvent for event logging; RetrieveMemoryRecords for search; ListSessions / ListEvents for session list and history [21]
IdentityWorkload identities + token vault for OAuth2/API keys; supports inbound and outbound auth flowsSecurely fetch third-party tokens (3LO/2LO), bind tokens to user sessions, and protect inbound endpointsRuntime supports either SigV4 or JWT inbound auth (not both)
PolicyPolicy engine that authorizes/denies tool access (often attached to a Gateway)Enforces least-privilege tool usage at runtimeEnforcement modes include LOG_ONLY for dry run and ENFORCE to block
Browser toolManaged browser sessions for web interaction, scraping, and workflow automationUsed by specialist agents (web research / form filling)Browser sessions can be controlled via WebSocket streaming APIs
Code Interpreter toolSecure sandbox code execution for analysis, parsing, computationUsed by analyst agents (data transforms, charting, extraction)Runs in isolated environments; CDK supports public/sandbox/VPC modes
ObservabilityOTel traces/logs + CloudWatch GenAI dashboards for sessions/traces/spansDebugging and monitoring, plus the substrate for EvaluationsEnable CloudWatch Transaction Search; instrument your code for deep tracing
EvaluationsLLM-as-judge scoring from traces/spans (online + on-demand)Continuous QA and regression monitoring over real trafficEvaluations integrate with Strands/LangGraph through OTel/OpenInference

Turborepo and Yarn Workspaces Monorepo Blueprint

Repository Layout

Use an "apps + packages" structure that cleanly separates deployment units from shared libraries:

repo/
apps/
agent/ # AgentCore Runtime container (TypeScript, Strands)
web/ # React UI (chat + thinking timeline)
packages/
infra/ # AWS CDK app (TypeScript)
shared/ # Shared types: streaming events, API contracts
ui/ # Shared React components (timeline, chat, accordion)
turbo.json
package.json
tsconfig.base.json
.yarnrc.yml

This aligns with Turborepo's task graph model (workspaces are nodes; tasks run per package), and lets you cache build outputs per package [7].

Root package.json and Turbo Caching Basics

Define:

  • workspaces for Yarn
  • packageManager to stabilize Turborepo's lockfile expectations
  • root scripts that delegate to Turbo (not heavy logic in root scripts) [13]

Turborepo caches outputs declared in turbo.json. Build tasks that do not declare outputs aren't cached.

Example turbo.json (minimal but effective pattern):

json
{
"$schema": "https://turbo.build/schema.json",
"pipeline": {
"build": { "dependsOn": ["^build"], "outputs": ["dist/**", "build/**"] },
"lint": { "outputs": [] },
"test": { "dependsOn": ["build"], "outputs": [] },
"dev": { "cache": false }
}
}

Yarn Workspaces Linker Choice

Yarn supports multiple install/linking strategies; modern Yarn defaults to Plug'n'Play (PnP), but node-modules linker is also a stable choice when tooling compatibility matters [14].

For a developer tutorial repo, many teams choose:

  • nodeLinker: node-modules to reduce friction with CDK tooling, bundlers, and native deps, or
  • PnP if your organization is already standardized on it

Agent Implementation in TypeScript

Multi-Agent Orchestration Pattern: Agents-as-Tools

Strands documents "Agents as Tools" as a delegation pattern:

  • An orchestrator agent decides which specialist agent to invoke (each specialist is wrapped as a callable tool) [1]

Strands TypeScript also provides a tool() helper that validates input against Zod and generates JSON schema for the model/tooling layer [15].

In TypeScript, the pattern usually looks like:

  1. Build specialist agents (researcher, analyst, writer) with tailored tools
  2. Wrap each specialist in a Strands tool(...) so the orchestrator can call them
  3. Stream orchestrator events to the UI

Runtime Server: Two Approaches

You can either use the BedrockAgentCoreApp wrapper from the AgentCore TypeScript SDK [4] or implement the HTTP protocol contract yourself. This repo implements the contract manually with a plain Node.js HTTP server so you have full control over routing, auth, and session endpoints.

Required endpoints per [10]:

  • POST /invocations — JSON input (prompt, optional sessionId, userId); response is either JSON (non-streaming) or SSE (Accept: text/event-stream)
  • GET /ping — Health check; return { status: "Healthy", time_of_last_update } (Unix timestamp)
  • GET /ws — Optional; for bidirectional WebSocket streaming

This repo also exposes GET /sessions and GET /sessions/:id/events for the web app to list chat sessions and load conversation history; both require Authorization: Bearer <JWT> and use the JWT sub as actorId for AgentCore Memory.

apps/agent/src/server.ts (excerpt)
import type { UiEvent } from "@repo/shared/events";
import { context, propagation } from "@opentelemetry/api";
import { createServer, IncomingMessage, ServerResponse } from "node:http";
import { orchestrator } from "./orchestrator";
const PORT = parseInt(process.env.PORT || "8080", 10);
const HOST = process.env.HOST || "0.0.0.0";
function getActorIdFromAuth(req: IncomingMessage): string | null {
const auth = req.headers.authorization;
if (!auth?.startsWith("Bearer ")) return null;
const token = auth.slice(7).trim();
try {
const payload = JSON.parse(
Buffer.from(token.split(".")[1].replace(/-/g, "+").replace(/_/g, "/"), "base64"
) as { sub?: string };
return typeof payload.sub === "string" ? payload.sub : null;
} catch { return null; }
}
function sendEvent(res: ServerResponse, event: UiEvent): void {
res.write(`data: ${JSON.stringify(event)}\n\n`);
}
async function handleInvocations(req, res) {
const body = await parseBody(req);
const { prompt, sessionId, userId } = body;
const currentSessionId = sessionId || crypto.randomUUID();
if (req.headers.accept?.includes("text/event-stream")) {
res.writeHead(200, { "Content-Type": "text/event-stream", ... });
sendEvent(res, { type: "meta", sessionId: currentSessionId });
const sessionBaggage = propagation.createBaggage({ "session.id": { value: currentSessionId } });
await context.with(propagation.setBaggage(context.active(), sessionBaggage), async () => {
for await (const event of orchestrator.stream(prompt, { sessionId: currentSessionId, userId })) {
sendEvent(res, event);
}
});
sendEvent(res, { type: "message.done" });
} else {
const result = await context.with(..., () =>
orchestrator.invoke(prompt, { sessionId: currentSessionId, userId }));
res.end(JSON.stringify({ result }));
}
}
function handlePing(res: ServerResponse): void {
res.writeHead(200, { "Content-Type": "application/json" });
res.end(JSON.stringify({
status: "Healthy",
time_of_last_update: Math.floor(Date.now() / 1000),
}));
}
const server = createServer(requestHandler);
server.listen(PORT, HOST, () => { ... });

The orchestrator already yields UiEvent (e.g. message.delta, thinking.delta, tool.start, tool.end, message.done, error), so the server simply forwards them as SSE. Setting OTEL baggage session.id before calling the orchestrator ensures spans carry the session for AgentCore Observability and Evaluations [6].

Why this mapping works:

  • Strands defines a unified AgentStreamEvent union; the orchestrator maps it to UiEvent in mapStrandsEventToUiEvent [16]
  • AgentCore /invocations supports both JSON and SSE response formats [10]

Special Agents and Tools

Specialist Agents

A common research system split:

  • Researcher agent: Browser + Gateway tools (collect sources, browse pages, call internal APIs)
  • Analyst agent: Code Interpreter (extract tables, compute, transform)
  • Writer agent: synthesis + formatting, minimal tools

Strands Agent orchestrates a model, tools, and MCP clients [17].

Wrapping Agents as Tools

This repo defines specialists in apps/agent/src/specialists/: index.ts re-exports researchTool, analysisTool, and writingTool from research.ts, analysis.ts, and writing.ts. The orchestrator in apps/agent/src/orchestrator.ts uses a single Strands Agent with these tools, logs and recalls via the memory adapter, and maps agent.stream() output to UiEvent (e.g. message.delta, thinking.delta, tool.start, tool.end, message.done, error) in mapStrandsEventToUiEvent.

apps/agent/src/specialists/index.ts
export { analysisTool } from "./analysis";
export { researchTool } from "./research";
export { writingTool } from "./writing";
apps/agent/src/specialists/research.ts (tool definition)
import { Agent, BedrockModel, tool } from "@strands-agents/sdk";
import { z } from "zod";
export const researchTool = tool({
name: "research_specialist",
description: "Web research specialist: browse web, extract information, return sourced notes.",
inputSchema: z.object({
task: z.string(),
urls: z.array(z.string().url()).optional(),
context: z.string().optional(),
}),
callback: async ({ task, urls, context }) => {
const agent = await getResearchAgent(); // optional AgentCore Browser; fallback if unavailable
return agent ? String(await agent.invoke(prompt)) : getFallbackResearch(task, urls, context);
},
});

The research specialist optionally uses AgentCore Browser when configured; otherwise it returns simulated research. The analysis and writing specialists follow the same pattern. This is the TypeScript analogue of the "Agents-as-Tools" delegation described in Strands docs.

Integrating AgentCore Gateway via MCP

AgentCore Gateway exposes an MCP endpoint; tool invocation is standardized as JSON-RPC tools/call to /mcp [12].

This repo uses a wrapper class apps/agent/src/gatewayClient.ts (GatewayMcpClient) that supports listTools, callTool, and searchTools. When AGENTCORE_GATEWAY_URL is not set (e.g. local dev), it returns simulated tools so the agent can run without a deployed Gateway. In production, wire the Gateway URL and use an MCP client with Streamable HTTP transport (e.g. @modelcontextprotocol/sdk StreamableHTTPClientTransport with Authorization: Bearer <token>) to call tools [18].

apps/agent/src/gatewayClient.ts (pattern)
class GatewayMcpClient {
constructor(config?: { gatewayUrl?: string; bearerToken?: string }) {
this.gatewayUrl = config?.gatewayUrl || process.env.AGENTCORE_GATEWAY_URL || "";
if (!this.gatewayUrl) {
console.log("[GatewayClient] No Gateway URL configured, tool calls will be simulated");
}
}
async listTools(): Promise<McpToolDefinition[]> { ... }
async callTool(toolName: string, args: unknown): Promise<ToolCallResult> { ... }
}
export const gatewayClient = new GatewayMcpClient();

Integrating AgentCore Memory

The Memory Model You Should Implement

AgentCore Memory has:

  • short-term events (immutable) logged under actorId and sessionId [19]
  • long-term extracted records, retrieved with RetrieveMemoryRecords from a namespace with search criteria [20]

Long-term extraction is driven by configured strategies and is processed asynchronously; you do not insert "long-term memories" directly.

"Dual-Layer" Memory Adapter for Strands TypeScript

Because Strands session management is not yet supported in TypeScript, you should explicitly bridge:

  • UI session/user identity → actorId
  • UI conversation thread → sessionId
  • Each user/assistant exchange → CreateEvent
  • Each new request → RetrieveMemoryRecords and inject "recall context" into the system prompt or pre-message context

This repo's apps/agent/src/memoryAdapter.ts reads config from env (AGENTCORE_MEMORY_ID, AGENTCORE_MEMORY_NAMESPACE). When no memory ID is set, it uses a local in-memory fallback so you can run and test without deploying AgentCore Memory [19].

apps/agent/src/memoryAdapter.ts (pattern)
import {
BedrockAgentCoreClient,
CreateEventCommand,
ListEventsCommand,
ListSessionsCommand,
RetrieveMemoryRecordsCommand,
} from "@aws-sdk/client-bedrock-agentcore";
class AgentCoreMemoryAdapter {
constructor(config?: { memoryId?: string; namespace?: string }) {
this.memoryId = config?.memoryId || process.env.AGENTCORE_MEMORY_ID || "";
this.useLocalFallback = !this.memoryId;
}
async logConversationEvent(actorId: string, sessionId: string, role: "user"|"assistant", content: string) { ... }
async recall(actorId: string, query: string, topK = 5): Promise<string[]> { ... }
async listSessions(actorId: string): Promise<{ sessionId: string; actorId: string; createdAt: Date }[]> { ... }
async listEvents(actorId: string, sessionId: string, options?: { maxResults?: number; includePayloads?: boolean }) { ... }
}
export const memoryAdapter = new AgentCoreMemoryAdapter();

The server uses listSessions and listEvents to power GET /sessions and GET /sessions/:id/events for the web app's session list and conversation history.

Integrating Browser and Code Interpreter Tools

AgentCore built-in tools are explicitly designed to execute in isolated environments; for example, Browser sessions are session-based and can be interacted with programmatically via WebSocket streaming APIs [22].

On the TypeScript side, the AgentCore SDK README shows Strands integrations for Code Interpreter tools:

typescript
import { CodeInterpreterTools } from 'bedrock-agentcore/experimental/code-interpreter/strands'
const codeInterpreter = new CodeInterpreterTools({ region: 'us-east-1' })

For the Browser tool, plan on a "researcher" specialist that:

  • starts a browser session
  • navigates/extracts content
  • emits tool progress events back to the UI (timeline)

Identity and Authentication Patterns That Work with a Web Frontend

AgentCore Runtime supports two inbound auth patterns:

  • IAM SigV4 (default), and
  • JWT Bearer token (configured via an OIDC discovery URL and allowed clients/audiences/scopes) [23]

Cognito in This Repo

This repo uses Amazon Cognito for web app authentication. The stack provisions a User Pool and User Pool Client (no client secret, for SPA) in packages/infra/lib/stack.ts, with callback URLs (e.g. http://localhost:5173/, /auth/callback), token validity, and optional MFA. The web app sends the Cognito Id token in the Authorization: Bearer <token> header to the runtime. The server decodes the JWT (without verification in the minimal implementation; for production use e.g. aws-jwt-verify when COGNITO_USER_POOL_ID is set) and reads sub as the actorId for AgentCore Memory and for GET /sessions and GET /sessions/:id/events. That ties session list and conversation history to the authenticated user.

For web apps where you also want outbound OAuth (third-party APIs):

  • Use AgentCore Identity to obtain outbound tokens and store them in the token vault, bound to user/workload identity
  • Implement an HTTPS callback endpoint in your web app for OAuth session binding (CompleteResourceTokenAuth flow) to prevent authorization URL forwarding attacks [24]

If you keep IAM SigV4 inbound auth but need user-binding, AgentCore supports an X-Amzn-Bedrock-AgentCore-Runtime-User-Id header pattern, with an additional IAM permission (bedrock-agentcore:InvokeAgentRuntimeForUser) and explicit security best practices to prevent "user id spoofing."

AWS CDK Infrastructure and Deployment

This section describes how to deploy the agent and supporting services. This repo uses L1 CloudFormation constructs from aws-cdk-lib/aws-bedrockagentcore (CfnRuntime, CfnMemory, CfnGateway) rather than the alpha L2 module @aws-cdk/aws-bedrock-agentcore-alpha. Conceptual L2-style examples exist in the CDK docs; below reflects the actual packages/infra/lib/stack.ts layout [9].

CDK Resource Model: The Primitives You'll Use

CloudFormation provides first-class AgentCore resource types including Runtime, Memory, Gateway, and built-in tool resources. The CDK alpha library (when used) wraps these in L2 constructs [8]; this repo uses the L1 Cfn* types for maximum control.

What This Repo's Stack Provisions

The single ResearchAgentStack in packages/infra/lib/stack.ts creates, in dependency order:

  • ECR repository for the agent container; CodeBuild project (ARM64) to build the Docker image; Lambda custom resource to trigger the build on deploy
  • Agent IAM role with permissions for: X-Ray (PutTraceSegments, PutTelemetryRecords), CloudWatch Logs and metrics (namespace bedrock-agentcore), Bedrock InvokeModel / InvokeModelWithResponseStream, AgentCore Memory (CreateEvent, RetrieveMemoryRecords, ListSessions, ListEvents), AgentCore Gateway (InvokeGateway), and workload identity tokens (GetWorkloadAccessToken, etc.)
  • Cognito User Pool and User Pool Client (no secret, for SPA); stack outputs: UserPoolId, UserPoolClientId, CognitoRegion for the web app
  • Lambda toolsFunction (inline handler for health_check, search); CloudWatch log group for the agent
  • CfnRuntime: container from ECR (:latest), network PUBLIC, protocol HTTP, role ARN; environment variables include BEDROCK_MODEL_ID, and for AgentCore Observability: AGENT_OBSERVABILITY_ENABLED, OTEL_SERVICE_NAME, OTEL_TRACES_EXPORTER, OTEL_EXPORTER_OTLP_TRACES_ENDPOINT (X-Ray), OTEL_RESOURCE_ATTRIBUTES
  • CfnMemory: name research_memory, event expiry 90 days
  • CfnGateway: MCP protocol, instructions, supported versions, role ARN, authorizerType: "NONE"; Gateway targets can be attached via Control Plane or future CDK
  • Evaluation role and evaluation log group: IAM role for bedrock-agentcore.amazonaws.com (evaluator / online-evaluation-config) with CloudWatch Logs read (trace queries) and write to /aws/bedrock-agentcore/evaluations/*, index policy for aws/spans, and Bedrock InvokeModel; log group for evaluation results; stack outputs: EvaluationRoleArn, EvaluationLogGroupName for use with CreateOnlineEvaluationConfig [30]
packages/infra/lib/stack.ts - CfnRuntime (excerpt)
const agentRuntime = new bedrockagentcore.CfnRuntime(this, "AgentRuntime", {
agentRuntimeName: "research_agent",
agentRuntimeArtifact: {
containerConfiguration: {
containerUri: `${this.ecrRepository.repositoryUri}:latest`,
},
},
networkConfiguration: { networkMode: "PUBLIC" },
protocolConfiguration: "HTTP",
roleArn: this.agentRole.roleArn,
environmentVariables: {
AWS_DEFAULT_REGION: this.region,
BEDROCK_MODEL_ID: "us.anthropic.claude-sonnet-4-20250514-v1:0",
AGENT_OBSERVABILITY_ENABLED: "true",
OTEL_SERVICE_NAME: "research_agent",
OTEL_TRACES_EXPORTER: "otlp",
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT: `https://xray.${this.region}.amazonaws.com/v1/traces`,
OTEL_EXPORTER_OTLP_TRACES_PROTOCOL: "http/protobuf",
OTEL_RESOURCE_ATTRIBUTES: "service.name=research_agent",
},
});
packages/infra/lib/stack.ts - CfnMemory and CfnGateway
const agentMemory = new bedrockagentcore.CfnMemory(this, "AgentMemory", {
name: "research_memory",
description: "Memory for research agent conversations",
eventExpiryDuration: 90,
});
const agentGateway = new bedrockagentcore.CfnGateway(this, "AgentGateway", {
name: "research-gateway",
protocolConfiguration: {
mcp: { instructions: "Tools for the research agent", supportedVersions: ["2025-11-25"] },
},
authorizerType: "NONE",
protocolType: "MCP",
roleArn: this.agentRole.roleArn,
});

Deployment Procedure (Repeatable in CI)

  1. yarn install
  2. yarn turbo run build (build agent + web + infra)
  3. cd packages/infra
  4. cdk bootstrap (once per account/region)
  5. cdk deploy

Start the agent with instrumentation loaded first so OTEL is active (e.g. in Docker: CMD ["node", "--import", "./dist/instrument.js", "dist/server.js"], or yarn start using a script that runs node --import ./dist/instrument.js dist/server.js).

Official Sample Repos Worth Mining

For TypeScript agents and primitives:

  • aws/bedrock-agentcore-sdk-typescript (Runtime wrapper + Strands integration examples) [4]
  • awslabs/bedrock-agentcore-samples-typescript (TypeScript AgentCore samples; shows BedrockAgentCoreApp usage and tool integrations) [26]
  • awslabs/amazon-bedrock-agentcore-samples (broad tutorials: runtime + gateway + MCP hosting samples) [27]

For end-to-end reference architectures (multi-agent + AgentCore):

  • aws-samples/sample-strands-agent-with-agentcore — Multi-agent chatbot with Strands and AgentCore (execution, memory, browser automation, collaboration) [28]

Frontend Thinking UI and Operations

Streaming Contract Between Runtime and React

AgentCore /invocations supports SSE streaming [10].

Strands streaming events give you rich semantic events ("beforeToolCall", "afterToolCall", reasoning deltas, etc.) [3].

The best practice is to translate Strands events into a stable UI event schema (in packages/shared) so the frontend never depends on internal SDK event shapes.

packages/shared/src/events.ts
export type UiEvent =
| { type: "meta"; sessionId: string }
| { type: "message.delta"; text: string }
| { type: "thinking.delta"; text: string }
| { type: "tool.start"; toolName: string; input: unknown }
| { type: "tool.end"; toolName: string; output: unknown }
| { type: "message.done" }
| { type: "error"; message: string }
| { type: "run.start" } // emitted at start of each run so UI can group one assistant message per run

React: Streaming SSE over fetch (POST)

Because you need a POST body (the prompt), use fetch() + ReadableStream parsing rather than EventSource (GET-only). This repo's apps/web/src/hooks/useAgentStream.ts accepts optional getAuthToken, userId, and endpoint; it sends sessionId and userId in the body and sets Authorization: Bearer <token> when available. It handles meta (to capture sessionId), run.start (to group one assistant message per run), and error events.

apps/web/src/hooks/useAgentStream.ts (pattern)
export function useAgentStream(options?: { endpoint?: string; getAuthToken?: () => Promise<string | null>; userId?: string }) {
const [events, setEvents] = useState<UiEvent[]>([]);
const [sessionId, setSessionId] = useState<string | null>(null);
const run = useCallback(async (prompt: string) => {
setEvents((prev) => [...prev, { type: "run.start" }]);
const headers = { "Content-Type": "application/json", Accept: "text/event-stream" };
if (getAuthToken) {
const token = await getAuthToken();
if (token) headers["Authorization"] = `Bearer ${token}`;
}
const res = await fetch(endpoint, {
method: "POST",
headers,
body: JSON.stringify({ prompt, sessionId: sessionId ?? undefined, userId }),
});
const reader = res.body.getReader();
// ... SSE parse loop: data lines -> JSON.parse -> if (event.type === "meta") setSessionId(event.sessionId); setEvents(prev => [...prev, event])
}, [endpoint, sessionId, getAuthToken, userId]);
return { events, isStreaming, error, sessionId, setSessionId, run, reset };
}

The web app uses Cognito (AuthContext, ProtectedRoute); pass getAuthToken (e.g. Id token getter) and userId (e.g. user?.sub) into useAgentStream so the runtime receives a JWT and derives actorId from the token's sub for Memory and session listing.

useSessions and useSessionEvents call GET /sessions and GET /sessions/:id/events with Authorization: Bearer <token> to list chat sessions and load conversation history; both rely on the same JWT for actorId on the server.

The server-side protocol contract explicitly describes SSE as a supported response format for /invocations [10].

UI Composition: Chat + Thinking Timeline

This repo's web app uses components from packages/ui (e.g. accordion, thinking timeline, chat transcript) to show tool and reasoning events:

  • ChatTranscript: renders assembled message.delta into assistant bubbles
  • ThinkingPanel / accordion: renders thinking.delta in a collapsible "scratchpad" (e.g. last four lines of streaming thoughts)
  • Timeline: renders tool.start / tool.end in the chat window, expandable to show request parameters and results

Because Strands exposes structured tool and lifecycle events, the timeline is stable and safe without persisting raw "thought text" to memory.

Observability and Evaluations: Operationalizing Quality

Instrumentation in This Repo

This repo instruments the agent with OpenTelemetry in apps/agent/src/instrument.ts: a NodeSDK with an OTLP trace exporter to AWS X-Ray (same endpoint as the runtime env OTEL_EXPORTER_OTLP_TRACES_ENDPOINT). The tracer must be registered before any other application code. In production the container runs with node --import ./dist/instrument.js dist/server.js (or equivalent); in dev, the start script can use tsx watch --import ./src/instrument.ts src/server.ts so instrumentation loads first [6].

Session correlation: In the invocation handler, set OTEL baggage session.id to the current session ID before calling the orchestrator (as in apps/agent/src/server.ts). Downstream spans then carry the session id, which supports CloudWatch Transaction Search and AgentCore Evaluations.

Observability

AgentCore Observability is tightly coupled to CloudWatch GenAI observability views; first-time setup requires enabling CloudWatch Transaction Search [6].

AgentCore defines sessions/traces/spans, emits default metrics for AgentCore runtime resources, and supports deeper tracing with explicit instrumentation [29].

Plan for:

  • minimal "service-level" visibility out of the box
  • deeper app-level tracing and custom spans via OpenTelemetry instrumentation (as in this repo's instrument.ts)

Evaluations

AgentCore Evaluations can score your agent based on traces and spans; it integrates with Strands via OTel/OpenInference-style instrumentation [30].

This repo's stack creates an Evaluation execution role and an evaluation log group (/aws/bedrock-agentcore/evaluations/research-agent). Online evaluation configs are created via the Control Plane (CLI, Console, or SDK), not CDK. Online evaluations continuously monitor live traffic and write results to a CloudWatch log group [31].

Enabling Online Evaluators

One-time: Enable CloudWatch Transaction Search

AgentCore Observability and Evaluations require Transaction Search to be enabled once per AWS account so spans are ingested and queryable [6].

  • Console: CloudWatch → Setup → Settings → Account → X-Ray traces tab → Transaction Search → View settings → Edit → Enable Transaction Search (e.g. 1% sampling).
  • CLI: Use aws logs put-resource-policy to allow X-Ray to write to the aws/spans (and optionally application-signals) log groups, then aws xray update-trace-segment-destination --destination CloudWatchLogs. See this repo's README.md ("Observability and Evaluations") for the exact policy document and steps.

Allow up to ~10 minutes for spans to appear in Transaction Search.

Post-deploy: Create an online evaluation config

After deploying the stack, use the exported Evaluation Role ARN (ResearchAgentEvaluationRoleArn) and Agent Runtime ID (ResearchAgentRuntimeId) from the CDK outputs.

Option 1 – AgentCore starter toolkit CLI

Install the toolkit (pip install bedrock-agentcore-starter-toolkit), then list built-in evaluators and create an online config [32]:

bash
# List available evaluators (e.g. Builtin.GoalSuccessRate, Builtin.Helpfulness, Builtin.Correctness)
agentcore eval evaluator list
# Create online evaluation (use stack outputs for role ARN and agent ID)
agentcore eval online create \
--name research_agent_eval \
--agent-id <AgentRuntimeId> \
--evaluator Builtin.GoalSuccessRate \
--evaluator Builtin.Helpfulness \
--evaluation-execution-role-arn <ResearchAgentEvaluationRoleArn> \
--sampling-rate 1.0

Evaluator levels: SESSION (e.g. goal completion), TRACE (e.g. helpfulness, correctness), TOOL_CALL (tool selection/parameters). Start with a low sampling rate (1–5%) in production. The config transitions to ACTIVE shortly after creation; use agentcore eval online list and agentcore eval online get --config-id <id> to inspect. Results appear under CloudWatch → GenAI Observability → Bedrock AgentCore → your agent → Evaluations tab.

Option 2 – AWS Console

AgentCore → EvaluationCreate evaluation configuration → choose the agent endpoint, select evaluators, set the execution role to the stack output role ARN (ResearchAgentEvaluationRoleArn).

Option 3 – AWS SDK

Use the bedrock-agentcore-control API CreateOnlineEvaluationConfig with dataSourceConfig (agent endpoint or log groups + serviceNames: research_agent), evaluators (e.g. Builtin.GoalSuccessRate, Builtin.Helpfulness), and evaluationExecutionRoleArn set to the stack output [31].

Results are written to a log group in the form /aws/bedrock-agentcore/evaluations/results/<online-evaluation-config-id>. You can update sampling rate or evaluators with the CLI (agentcore eval online update) or the Control Plane API.

Policy Enforcement and Policy Observability

When you attach policy enforcement to Gateways, AgentCore publishes policy invocation metrics to CloudWatch by default; span data becomes available when gateway traces are enabled [33].

A production rollout pattern is:

  1. Start with policy enforcement in LOG_ONLY to identify what would be denied
  2. Iterate policies / attributes until false positives are eliminated
  3. Switch to ENFORCE

Deployment and Auth "Gotchas" That Affect the Frontend

  • If your runtime endpoint is JWT/OAuth configured, the HTTP protocol contract defines 401 responses with WWW-Authenticate headers for missing auth, and notes differences from SigV4-configured behavior
  • If you integrate OAuth flows (user-delegated outbound access), you must implement session binding to prevent authorization URL forwarding attacks, and register your callback URL against the workload identity
  • Strands TypeScript session persistence is not currently supported, so treat AgentCore Memory as your durable "conversation and preference" substrate, with clear separation between:
    • short-term in-session state (in the runtime session)
    • short-term event log (CreateEvent)
    • long-term extracted records (RetrieveMemoryRecords)