
Building a Multi-Agent Research System with Strands Agents TypeScript SDK and Bedrock AgentCore
Stewart Moreland
Scope and Prerequisites
This guide is a deep, implementation-focused tutorial for building a production-oriented multi-agent research system where:
- Agents are implemented in TypeScript using the Strands Agents TypeScript SDK (with multi-agent orchestration patterns such as "Agents-as-Tools") [1]
- The system deploys into Amazon Bedrock AgentCore Runtime and integrates "adjacent" AgentCore services: Gateway, Memory, Identity, Policy, Browser, Code Interpreter, Observability, and Evaluations [2]
- The repo is a Turborepo monorepo using Yarn workspaces, and includes:
- TypeScript agent runtime service (multi-agent orchestration + streaming)
- AWS CDK infrastructure (TypeScript) provisioning Runtime, Memory, Gateway, built-in tools, and auth
- A React frontend that shows a chat UI and a visual "thinking" timeline driven by streaming events [3]
Important Maturity Note
- The Strands TypeScript SDK is explicitly marked experimental, and not all Python features exist yet (breaking changes expected).
- The "bedrock-agentcore" TypeScript SDK provides strong primitives for Runtime + tools, but (as of its README) Memory/Gateway/Observability are explicitly listed as "coming soon" in that SDK—so you should expect to integrate Memory/Gateway/Evaluations/Policy through AWS SDK v3 clients / HTTPS calls and/or CDK/CloudFormation resources rather than only the convenience SDK [4].
Example Repository
A sample repo is available to follow along with when reading this guide. Or deploy it to your own AWS account and start experimenting right away.
Baseline Prerequisites (Practical Minimums)
- Node.js 20+ (required by AgentCore TypeScript quickstart and the AgentCore TypeScript SDK README) [5]
- AWS access configured locally (AWS CLI credentials) and Bedrock model access enabled where needed [6]
- Yarn workspaces and a Turborepo pipeline configuration (with declared "outputs" for caching) [7]
- AWS CDK v2; AgentCore resources can be provisioned with L1 CloudFormation constructs (
aws-cdk-lib/aws-bedrockagentcore) or the alpha L2 module (@aws-cdk/aws-bedrock-agentcore-alpha) [8]
Architecture and Service Mapping
Reference Architecture
The system is easiest to reason about if you treat AgentCore as two planes:
- Control plane (create/update resources): CDK/CloudFormation and/or
bedrock-agentcore-controlAPI calls create Runtime, Memory, Gateway, built-in tools, and Identity configuration [9] - Data plane (run-time interactions): the agent runtime receives
/invocations(SSE) calls, uses Memory for recall + logging events, calls Gateway MCP tools, and may start Browser/Code Interpreter sessions [10]
Below is a practical end-to-end flow that keeps the UI simple while still supporting Identity + policy enforcement in a real system:
Key Runtime Protocol Facts
- AgentCore HTTP uses host
0.0.0.0, port8080, and requires an ARM64 container /invocationssupports JSON input and JSON or SSE output/wsis optional for bidirectional streaming- The TypeScript
BedrockAgentCoreAppwrapper exists specifically to produce a Runtime-compliant server with request parsing + streaming [10]
"Thinking" Visualization Strategy
Strands streaming gives you a highly usable "thinking" surface without exposing sensitive internal reasoning:
- Lifecycle events and tool events let you show when the model is planning, calling tools, and receiving results [3]
- Model streaming events include a distinct channel for reasoning deltas (when supported by the underlying model/provider), and the SDK explicitly models "reasoning blocks" and "redactedContent" [11]
A robust UI pattern is:
- Always render tool/lifecycle timeline (safe, deterministic)
- Optionally render model "reasoning" stream behind a toggle, and be prepared for redaction (store it only transiently and don't persist it to memory)
Comparison Table of AgentCore Services for a TypeScript Multi-Agent System
| AgentCore Capability | What It Does | Where It Fits in a TypeScript Strands System | TypeScript Integration Notes |
|---|---|---|---|
| Runtime | Runs your agent as a managed, session-isolated runtime and exposes /invocations (SSE) | Hosts the orchestrator + specialists and streams events to UI | Use BedrockAgentCoreApp for a compliant server, or implement the HTTP contract yourself |
| Gateway | Converts APIs/Lambda/OpenAPI/Smithy targets into MCP tools behind a single endpoint | Gives agents a unified "tool catalog" and tool execution surface | CDK supports MCP protocol config + auth; agents use a wrapper or Strands McpClient (Streamable HTTP) to call tools [12] |
| Memory | Short-term events + long-term extracted records for cross-session personalization | Add recall context, store conversation events, and support long-term learning | Use CreateEvent for event logging; RetrieveMemoryRecords for search; ListSessions / ListEvents for session list and history [21] |
| Identity | Workload identities + token vault for OAuth2/API keys; supports inbound and outbound auth flows | Securely fetch third-party tokens (3LO/2LO), bind tokens to user sessions, and protect inbound endpoints | Runtime supports either SigV4 or JWT inbound auth (not both) |
| Policy | Policy engine that authorizes/denies tool access (often attached to a Gateway) | Enforces least-privilege tool usage at runtime | Enforcement modes include LOG_ONLY for dry run and ENFORCE to block |
| Browser tool | Managed browser sessions for web interaction, scraping, and workflow automation | Used by specialist agents (web research / form filling) | Browser sessions can be controlled via WebSocket streaming APIs |
| Code Interpreter tool | Secure sandbox code execution for analysis, parsing, computation | Used by analyst agents (data transforms, charting, extraction) | Runs in isolated environments; CDK supports public/sandbox/VPC modes |
| Observability | OTel traces/logs + CloudWatch GenAI dashboards for sessions/traces/spans | Debugging and monitoring, plus the substrate for Evaluations | Enable CloudWatch Transaction Search; instrument your code for deep tracing |
| Evaluations | LLM-as-judge scoring from traces/spans (online + on-demand) | Continuous QA and regression monitoring over real traffic | Evaluations integrate with Strands/LangGraph through OTel/OpenInference |
Turborepo and Yarn Workspaces Monorepo Blueprint
Repository Layout
Use an "apps + packages" structure that cleanly separates deployment units from shared libraries:
repo/apps/agent/ # AgentCore Runtime container (TypeScript, Strands)web/ # React UI (chat + thinking timeline)packages/infra/ # AWS CDK app (TypeScript)shared/ # Shared types: streaming events, API contractsui/ # Shared React components (timeline, chat, accordion)turbo.jsonpackage.jsontsconfig.base.json.yarnrc.yml
This aligns with Turborepo's task graph model (workspaces are nodes; tasks run per package), and lets you cache build outputs per package [7].
Root package.json and Turbo Caching Basics
Define:
workspacesfor YarnpackageManagerto stabilize Turborepo's lockfile expectations- root scripts that delegate to Turbo (not heavy logic in root scripts) [13]
Turborepo caches outputs declared in turbo.json. Build tasks that do not declare outputs aren't cached.
Example turbo.json (minimal but effective pattern):
{"$schema": "https://turbo.build/schema.json","pipeline": {"build": { "dependsOn": ["^build"], "outputs": ["dist/**", "build/**"] },"lint": { "outputs": [] },"test": { "dependsOn": ["build"], "outputs": [] },"dev": { "cache": false }}}
Yarn Workspaces Linker Choice
Yarn supports multiple install/linking strategies; modern Yarn defaults to Plug'n'Play (PnP), but node-modules linker is also a stable choice when tooling compatibility matters [14].
For a developer tutorial repo, many teams choose:
nodeLinker: node-modulesto reduce friction with CDK tooling, bundlers, and native deps, or- PnP if your organization is already standardized on it
Agent Implementation in TypeScript
Multi-Agent Orchestration Pattern: Agents-as-Tools
Strands documents "Agents as Tools" as a delegation pattern:
- An orchestrator agent decides which specialist agent to invoke (each specialist is wrapped as a callable tool) [1]
Strands TypeScript also provides a tool() helper that validates input against Zod and generates JSON schema for the model/tooling layer [15].
In TypeScript, the pattern usually looks like:
- Build specialist agents (researcher, analyst, writer) with tailored tools
- Wrap each specialist in a Strands
tool(...)so the orchestrator can call them - Stream orchestrator events to the UI
Runtime Server: Two Approaches
You can either use the BedrockAgentCoreApp wrapper from the AgentCore TypeScript SDK [4] or implement the HTTP protocol contract yourself. This repo implements the contract manually with a plain Node.js HTTP server so you have full control over routing, auth, and session endpoints.
Required endpoints per [10]:
- POST /invocations — JSON input (
prompt, optionalsessionId,userId); response is either JSON (non-streaming) or SSE (Accept: text/event-stream) - GET /ping — Health check; return
{ status: "Healthy", time_of_last_update }(Unix timestamp) - GET /ws — Optional; for bidirectional WebSocket streaming
This repo also exposes GET /sessions and GET /sessions/:id/events for the web app to list chat sessions and load conversation history; both require Authorization: Bearer <JWT> and use the JWT sub as actorId for AgentCore Memory.
import type { UiEvent } from "@repo/shared/events";import { context, propagation } from "@opentelemetry/api";import { createServer, IncomingMessage, ServerResponse } from "node:http";import { orchestrator } from "./orchestrator";const PORT = parseInt(process.env.PORT || "8080", 10);const HOST = process.env.HOST || "0.0.0.0";function getActorIdFromAuth(req: IncomingMessage): string | null {const auth = req.headers.authorization;if (!auth?.startsWith("Bearer ")) return null;const token = auth.slice(7).trim();try {const payload = JSON.parse(Buffer.from(token.split(".")[1].replace(/-/g, "+").replace(/_/g, "/"), "base64") as { sub?: string };return typeof payload.sub === "string" ? payload.sub : null;} catch { return null; }}function sendEvent(res: ServerResponse, event: UiEvent): void {res.write(`data: ${JSON.stringify(event)}\n\n`);}async function handleInvocations(req, res) {const body = await parseBody(req);const { prompt, sessionId, userId } = body;const currentSessionId = sessionId || crypto.randomUUID();if (req.headers.accept?.includes("text/event-stream")) {res.writeHead(200, { "Content-Type": "text/event-stream", ... });sendEvent(res, { type: "meta", sessionId: currentSessionId });const sessionBaggage = propagation.createBaggage({ "session.id": { value: currentSessionId } });await context.with(propagation.setBaggage(context.active(), sessionBaggage), async () => {for await (const event of orchestrator.stream(prompt, { sessionId: currentSessionId, userId })) {sendEvent(res, event);}});sendEvent(res, { type: "message.done" });} else {const result = await context.with(..., () =>orchestrator.invoke(prompt, { sessionId: currentSessionId, userId }));res.end(JSON.stringify({ result }));}}function handlePing(res: ServerResponse): void {res.writeHead(200, { "Content-Type": "application/json" });res.end(JSON.stringify({status: "Healthy",time_of_last_update: Math.floor(Date.now() / 1000),}));}const server = createServer(requestHandler);server.listen(PORT, HOST, () => { ... });
The orchestrator already yields UiEvent (e.g. message.delta, thinking.delta, tool.start, tool.end, message.done, error), so the server simply forwards them as SSE. Setting OTEL baggage session.id before calling the orchestrator ensures spans carry the session for AgentCore Observability and Evaluations [6].
Why this mapping works:
- Strands defines a unified
AgentStreamEventunion; the orchestrator maps it toUiEventinmapStrandsEventToUiEvent[16] - AgentCore
/invocationssupports both JSON and SSE response formats [10]
Special Agents and Tools
Specialist Agents
A common research system split:
- Researcher agent: Browser + Gateway tools (collect sources, browse pages, call internal APIs)
- Analyst agent: Code Interpreter (extract tables, compute, transform)
- Writer agent: synthesis + formatting, minimal tools
Strands Agent orchestrates a model, tools, and MCP clients [17].
Wrapping Agents as Tools
This repo defines specialists in apps/agent/src/specialists/: index.ts re-exports researchTool, analysisTool, and writingTool from research.ts, analysis.ts, and writing.ts. The orchestrator in apps/agent/src/orchestrator.ts uses a single Strands Agent with these tools, logs and recalls via the memory adapter, and maps agent.stream() output to UiEvent (e.g. message.delta, thinking.delta, tool.start, tool.end, message.done, error) in mapStrandsEventToUiEvent.
export { analysisTool } from "./analysis";export { researchTool } from "./research";export { writingTool } from "./writing";
import { Agent, BedrockModel, tool } from "@strands-agents/sdk";import { z } from "zod";export const researchTool = tool({name: "research_specialist",description: "Web research specialist: browse web, extract information, return sourced notes.",inputSchema: z.object({task: z.string(),urls: z.array(z.string().url()).optional(),context: z.string().optional(),}),callback: async ({ task, urls, context }) => {const agent = await getResearchAgent(); // optional AgentCore Browser; fallback if unavailablereturn agent ? String(await agent.invoke(prompt)) : getFallbackResearch(task, urls, context);},});
The research specialist optionally uses AgentCore Browser when configured; otherwise it returns simulated research. The analysis and writing specialists follow the same pattern. This is the TypeScript analogue of the "Agents-as-Tools" delegation described in Strands docs.
Integrating AgentCore Gateway via MCP
AgentCore Gateway exposes an MCP endpoint; tool invocation is standardized as JSON-RPC tools/call to /mcp [12].
This repo uses a wrapper class apps/agent/src/gatewayClient.ts (GatewayMcpClient) that supports listTools, callTool, and searchTools. When AGENTCORE_GATEWAY_URL is not set (e.g. local dev), it returns simulated tools so the agent can run without a deployed Gateway. In production, wire the Gateway URL and use an MCP client with Streamable HTTP transport (e.g. @modelcontextprotocol/sdk StreamableHTTPClientTransport with Authorization: Bearer <token>) to call tools [18].
class GatewayMcpClient {constructor(config?: { gatewayUrl?: string; bearerToken?: string }) {this.gatewayUrl = config?.gatewayUrl || process.env.AGENTCORE_GATEWAY_URL || "";if (!this.gatewayUrl) {console.log("[GatewayClient] No Gateway URL configured, tool calls will be simulated");}}async listTools(): Promise<McpToolDefinition[]> { ... }async callTool(toolName: string, args: unknown): Promise<ToolCallResult> { ... }}export const gatewayClient = new GatewayMcpClient();
Gateway Naming Gotcha
When tools are exposed through targets, Gateway prefixes tool names with the target name (e.g. internal-tools__health_check). Your code and/or downstream target handler must account for this.
Integrating AgentCore Memory
The Memory Model You Should Implement
AgentCore Memory has:
- short-term events (immutable) logged under
actorIdandsessionId[19] - long-term extracted records, retrieved with
RetrieveMemoryRecordsfrom a namespace with search criteria [20]
Long-term extraction is driven by configured strategies and is processed asynchronously; you do not insert "long-term memories" directly.
"Dual-Layer" Memory Adapter for Strands TypeScript
Because Strands session management is not yet supported in TypeScript, you should explicitly bridge:
- UI session/user identity →
actorId - UI conversation thread →
sessionId - Each user/assistant exchange →
CreateEvent - Each new request →
RetrieveMemoryRecordsand inject "recall context" into the system prompt or pre-message context
This repo's apps/agent/src/memoryAdapter.ts reads config from env (AGENTCORE_MEMORY_ID, AGENTCORE_MEMORY_NAMESPACE). When no memory ID is set, it uses a local in-memory fallback so you can run and test without deploying AgentCore Memory [19].
import {BedrockAgentCoreClient,CreateEventCommand,ListEventsCommand,ListSessionsCommand,RetrieveMemoryRecordsCommand,} from "@aws-sdk/client-bedrock-agentcore";class AgentCoreMemoryAdapter {constructor(config?: { memoryId?: string; namespace?: string }) {this.memoryId = config?.memoryId || process.env.AGENTCORE_MEMORY_ID || "";this.useLocalFallback = !this.memoryId;}async logConversationEvent(actorId: string, sessionId: string, role: "user"|"assistant", content: string) { ... }async recall(actorId: string, query: string, topK = 5): Promise<string[]> { ... }async listSessions(actorId: string): Promise<{ sessionId: string; actorId: string; createdAt: Date }[]> { ... }async listEvents(actorId: string, sessionId: string, options?: { maxResults?: number; includePayloads?: boolean }) { ... }}export const memoryAdapter = new AgentCoreMemoryAdapter();
The server uses listSessions and listEvents to power GET /sessions and GET /sessions/:id/events for the web app's session list and conversation history.
Memory API Notes
CreateEventstores events (short-term) under actor/session; payload supports "Conversational" and "Blob", but only conversational flows into long-term extractionRetrieveMemoryRecordsis the retrieval/search API withnamespaceandsearchCriteriaincludingsearchQueryandtopKListSessionsandListEventssupport session listing and loading history [21]
Local fallback
When AGENTCORE_MEMORY_ID is not configured, the adapter uses an in-memory store and a local session index so you can develop and test without deploying AgentCore Memory.
Integrating Browser and Code Interpreter Tools
AgentCore built-in tools are explicitly designed to execute in isolated environments; for example, Browser sessions are session-based and can be interacted with programmatically via WebSocket streaming APIs [22].
On the TypeScript side, the AgentCore SDK README shows Strands integrations for Code Interpreter tools:
import { CodeInterpreterTools } from 'bedrock-agentcore/experimental/code-interpreter/strands'const codeInterpreter = new CodeInterpreterTools({ region: 'us-east-1' })
For the Browser tool, plan on a "researcher" specialist that:
- starts a browser session
- navigates/extracts content
- emits tool progress events back to the UI (timeline)
Identity and Authentication Patterns That Work with a Web Frontend
AgentCore Runtime supports two inbound auth patterns:
- IAM SigV4 (default), and
- JWT Bearer token (configured via an OIDC discovery URL and allowed clients/audiences/scopes) [23]
Authentication Constraint
Runtime cannot use both SigV4 and JWT authentication simultaneously in a single runtime version.
Cognito in This Repo
This repo uses Amazon Cognito for web app authentication. The stack provisions a User Pool and User Pool Client (no client secret, for SPA) in packages/infra/lib/stack.ts, with callback URLs (e.g. http://localhost:5173/, /auth/callback), token validity, and optional MFA. The web app sends the Cognito Id token in the Authorization: Bearer <token> header to the runtime. The server decodes the JWT (without verification in the minimal implementation; for production use e.g. aws-jwt-verify when COGNITO_USER_POOL_ID is set) and reads sub as the actorId for AgentCore Memory and for GET /sessions and GET /sessions/:id/events. That ties session list and conversation history to the authenticated user.
For web apps where you also want outbound OAuth (third-party APIs):
- Use AgentCore Identity to obtain outbound tokens and store them in the token vault, bound to user/workload identity
- Implement an HTTPS callback endpoint in your web app for OAuth session binding (
CompleteResourceTokenAuthflow) to prevent authorization URL forwarding attacks [24]
If you keep IAM SigV4 inbound auth but need user-binding, AgentCore supports an X-Amzn-Bedrock-AgentCore-Runtime-User-Id header pattern, with an additional IAM permission (bedrock-agentcore:InvokeAgentRuntimeForUser) and explicit security best practices to prevent "user id spoofing."
AWS CDK Infrastructure and Deployment
This section describes how to deploy the agent and supporting services. This repo uses L1 CloudFormation constructs from aws-cdk-lib/aws-bedrockagentcore (CfnRuntime, CfnMemory, CfnGateway) rather than the alpha L2 module @aws-cdk/aws-bedrock-agentcore-alpha. Conceptual L2-style examples exist in the CDK docs; below reflects the actual packages/infra/lib/stack.ts layout [9].
CDK Resource Model: The Primitives You'll Use
CloudFormation provides first-class AgentCore resource types including Runtime, Memory, Gateway, and built-in tool resources. The CDK alpha library (when used) wraps these in L2 constructs [8]; this repo uses the L1 Cfn* types for maximum control.
What This Repo's Stack Provisions
The single ResearchAgentStack in packages/infra/lib/stack.ts creates, in dependency order:
- ECR repository for the agent container; CodeBuild project (ARM64) to build the Docker image; Lambda custom resource to trigger the build on deploy
- Agent IAM role with permissions for: X-Ray (
PutTraceSegments,PutTelemetryRecords), CloudWatch Logs and metrics (namespacebedrock-agentcore), BedrockInvokeModel/InvokeModelWithResponseStream, AgentCore Memory (CreateEvent,RetrieveMemoryRecords,ListSessions,ListEvents), AgentCore Gateway (InvokeGateway), and workload identity tokens (GetWorkloadAccessToken, etc.) - Cognito User Pool and User Pool Client (no secret, for SPA); stack outputs:
UserPoolId,UserPoolClientId,CognitoRegionfor the web app - Lambda
toolsFunction(inline handler forhealth_check,search); CloudWatch log group for the agent - CfnRuntime: container from ECR (
:latest), networkPUBLIC, protocolHTTP, role ARN; environment variables includeBEDROCK_MODEL_ID, and for AgentCore Observability:AGENT_OBSERVABILITY_ENABLED,OTEL_SERVICE_NAME,OTEL_TRACES_EXPORTER,OTEL_EXPORTER_OTLP_TRACES_ENDPOINT(X-Ray),OTEL_RESOURCE_ATTRIBUTES - CfnMemory: name
research_memory, event expiry 90 days - CfnGateway: MCP protocol, instructions, supported versions, role ARN,
authorizerType: "NONE"; Gateway targets can be attached via Control Plane or future CDK - Evaluation role and evaluation log group: IAM role for
bedrock-agentcore.amazonaws.com(evaluator / online-evaluation-config) with CloudWatch Logs read (trace queries) and write to/aws/bedrock-agentcore/evaluations/*, index policy foraws/spans, and Bedrock InvokeModel; log group for evaluation results; stack outputs:EvaluationRoleArn,EvaluationLogGroupNamefor use withCreateOnlineEvaluationConfig[30]
const agentRuntime = new bedrockagentcore.CfnRuntime(this, "AgentRuntime", {agentRuntimeName: "research_agent",agentRuntimeArtifact: {containerConfiguration: {containerUri: `${this.ecrRepository.repositoryUri}:latest`,},},networkConfiguration: { networkMode: "PUBLIC" },protocolConfiguration: "HTTP",roleArn: this.agentRole.roleArn,environmentVariables: {AWS_DEFAULT_REGION: this.region,BEDROCK_MODEL_ID: "us.anthropic.claude-sonnet-4-20250514-v1:0",AGENT_OBSERVABILITY_ENABLED: "true",OTEL_SERVICE_NAME: "research_agent",OTEL_TRACES_EXPORTER: "otlp",OTEL_EXPORTER_OTLP_TRACES_ENDPOINT: `https://xray.${this.region}.amazonaws.com/v1/traces`,OTEL_EXPORTER_OTLP_TRACES_PROTOCOL: "http/protobuf",OTEL_RESOURCE_ATTRIBUTES: "service.name=research_agent",},});
const agentMemory = new bedrockagentcore.CfnMemory(this, "AgentMemory", {name: "research_memory",description: "Memory for research agent conversations",eventExpiryDuration: 90,});const agentGateway = new bedrockagentcore.CfnGateway(this, "AgentGateway", {name: "research-gateway",protocolConfiguration: {mcp: { instructions: "Tools for the research agent", supportedVersions: ["2025-11-25"] },},authorizerType: "NONE",protocolType: "MCP",roleArn: this.agentRole.roleArn,});
Deployment Procedure (Repeatable in CI)
yarn installyarn turbo run build(build agent + web + infra)cd packages/infracdk bootstrap(once per account/region)cdk deploy
Start the agent with instrumentation loaded first so OTEL is active (e.g. in Docker: CMD ["node", "--import", "./dist/instrument.js", "dist/server.js"], or yarn start using a script that runs node --import ./dist/instrument.js dist/server.js).
VPC Deployment Note
When you deploy VPC-connected runtime or tools, AgentCore may create service-linked roles for network interfaces; ensure the deploying principal has the required IAM permissions [25].
Official Sample Repos Worth Mining
For TypeScript agents and primitives:
aws/bedrock-agentcore-sdk-typescript(Runtime wrapper + Strands integration examples) [4]awslabs/bedrock-agentcore-samples-typescript(TypeScript AgentCore samples; showsBedrockAgentCoreAppusage and tool integrations) [26]awslabs/amazon-bedrock-agentcore-samples(broad tutorials: runtime + gateway + MCP hosting samples) [27]
For end-to-end reference architectures (multi-agent + AgentCore):
- aws-samples/sample-strands-agent-with-agentcore — Multi-agent chatbot with Strands and AgentCore (execution, memory, browser automation, collaboration) [28]
Frontend Thinking UI and Operations
Streaming Contract Between Runtime and React
AgentCore /invocations supports SSE streaming [10].
Strands streaming events give you rich semantic events ("beforeToolCall", "afterToolCall", reasoning deltas, etc.) [3].
The best practice is to translate Strands events into a stable UI event schema (in packages/shared) so the frontend never depends on internal SDK event shapes.
export type UiEvent =| { type: "meta"; sessionId: string }| { type: "message.delta"; text: string }| { type: "thinking.delta"; text: string }| { type: "tool.start"; toolName: string; input: unknown }| { type: "tool.end"; toolName: string; output: unknown }| { type: "message.done" }| { type: "error"; message: string }| { type: "run.start" } // emitted at start of each run so UI can group one assistant message per run
React: Streaming SSE over fetch (POST)
Because you need a POST body (the prompt), use fetch() + ReadableStream parsing rather than EventSource (GET-only). This repo's apps/web/src/hooks/useAgentStream.ts accepts optional getAuthToken, userId, and endpoint; it sends sessionId and userId in the body and sets Authorization: Bearer <token> when available. It handles meta (to capture sessionId), run.start (to group one assistant message per run), and error events.
export function useAgentStream(options?: { endpoint?: string; getAuthToken?: () => Promise<string | null>; userId?: string }) {const [events, setEvents] = useState<UiEvent[]>([]);const [sessionId, setSessionId] = useState<string | null>(null);const run = useCallback(async (prompt: string) => {setEvents((prev) => [...prev, { type: "run.start" }]);const headers = { "Content-Type": "application/json", Accept: "text/event-stream" };if (getAuthToken) {const token = await getAuthToken();if (token) headers["Authorization"] = `Bearer ${token}`;}const res = await fetch(endpoint, {method: "POST",headers,body: JSON.stringify({ prompt, sessionId: sessionId ?? undefined, userId }),});const reader = res.body.getReader();// ... SSE parse loop: data lines -> JSON.parse -> if (event.type === "meta") setSessionId(event.sessionId); setEvents(prev => [...prev, event])}, [endpoint, sessionId, getAuthToken, userId]);return { events, isStreaming, error, sessionId, setSessionId, run, reset };}
The web app uses Cognito (AuthContext, ProtectedRoute); pass getAuthToken (e.g. Id token getter) and userId (e.g. user?.sub) into useAgentStream so the runtime receives a JWT and derives actorId from the token's sub for Memory and session listing.
useSessions and useSessionEvents call GET /sessions and GET /sessions/:id/events with Authorization: Bearer <token> to list chat sessions and load conversation history; both rely on the same JWT for actorId on the server.
The server-side protocol contract explicitly describes SSE as a supported response format for /invocations [10].
UI Composition: Chat + Thinking Timeline
This repo's web app uses components from packages/ui (e.g. accordion, thinking timeline, chat transcript) to show tool and reasoning events:
- ChatTranscript: renders assembled
message.deltainto assistant bubbles - ThinkingPanel / accordion: renders
thinking.deltain a collapsible "scratchpad" (e.g. last four lines of streaming thoughts) - Timeline: renders
tool.start/tool.endin the chat window, expandable to show request parameters and results
Because Strands exposes structured tool and lifecycle events, the timeline is stable and safe without persisting raw "thought text" to memory.
Observability and Evaluations: Operationalizing Quality
Instrumentation in This Repo
This repo instruments the agent with OpenTelemetry in apps/agent/src/instrument.ts: a NodeSDK with an OTLP trace exporter to AWS X-Ray (same endpoint as the runtime env OTEL_EXPORTER_OTLP_TRACES_ENDPOINT). The tracer must be registered before any other application code. In production the container runs with node --import ./dist/instrument.js dist/server.js (or equivalent); in dev, the start script can use tsx watch --import ./src/instrument.ts src/server.ts so instrumentation loads first [6].
Session correlation: In the invocation handler, set OTEL baggage session.id to the current session ID before calling the orchestrator (as in apps/agent/src/server.ts). Downstream spans then carry the session id, which supports CloudWatch Transaction Search and AgentCore Evaluations.
Observability
AgentCore Observability is tightly coupled to CloudWatch GenAI observability views; first-time setup requires enabling CloudWatch Transaction Search [6].
AgentCore defines sessions/traces/spans, emits default metrics for AgentCore runtime resources, and supports deeper tracing with explicit instrumentation [29].
Plan for:
- minimal "service-level" visibility out of the box
- deeper app-level tracing and custom spans via OpenTelemetry instrumentation (as in this repo's
instrument.ts)
Evaluations
AgentCore Evaluations can score your agent based on traces and spans; it integrates with Strands via OTel/OpenInference-style instrumentation [30].
This repo's stack creates an Evaluation execution role and an evaluation log group (/aws/bedrock-agentcore/evaluations/research-agent). Online evaluation configs are created via the Control Plane (CLI, Console, or SDK), not CDK. Online evaluations continuously monitor live traffic and write results to a CloudWatch log group [31].
Enabling Online Evaluators
One-time: Enable CloudWatch Transaction Search
AgentCore Observability and Evaluations require Transaction Search to be enabled once per AWS account so spans are ingested and queryable [6].
- Console: CloudWatch → Setup → Settings → Account → X-Ray traces tab → Transaction Search → View settings → Edit → Enable Transaction Search (e.g. 1% sampling).
- CLI: Use
aws logs put-resource-policyto allow X-Ray to write to theaws/spans(and optionally application-signals) log groups, thenaws xray update-trace-segment-destination --destination CloudWatchLogs. See this repo's README.md ("Observability and Evaluations") for the exact policy document and steps.
Allow up to ~10 minutes for spans to appear in Transaction Search.
Post-deploy: Create an online evaluation config
After deploying the stack, use the exported Evaluation Role ARN (ResearchAgentEvaluationRoleArn) and Agent Runtime ID (ResearchAgentRuntimeId) from the CDK outputs.
Option 1 – AgentCore starter toolkit CLI
Install the toolkit (pip install bedrock-agentcore-starter-toolkit), then list built-in evaluators and create an online config [32]:
# List available evaluators (e.g. Builtin.GoalSuccessRate, Builtin.Helpfulness, Builtin.Correctness)agentcore eval evaluator list# Create online evaluation (use stack outputs for role ARN and agent ID)agentcore eval online create \--name research_agent_eval \--agent-id <AgentRuntimeId> \--evaluator Builtin.GoalSuccessRate \--evaluator Builtin.Helpfulness \--evaluation-execution-role-arn <ResearchAgentEvaluationRoleArn> \--sampling-rate 1.0
Evaluator levels: SESSION (e.g. goal completion), TRACE (e.g. helpfulness, correctness), TOOL_CALL (tool selection/parameters). Start with a low sampling rate (1–5%) in production. The config transitions to ACTIVE shortly after creation; use agentcore eval online list and agentcore eval online get --config-id <id> to inspect. Results appear under CloudWatch → GenAI Observability → Bedrock AgentCore → your agent → Evaluations tab.
Option 2 – AWS Console
AgentCore → Evaluation → Create evaluation configuration → choose the agent endpoint, select evaluators, set the execution role to the stack output role ARN (ResearchAgentEvaluationRoleArn).
Option 3 – AWS SDK
Use the bedrock-agentcore-control API CreateOnlineEvaluationConfig with dataSourceConfig (agent endpoint or log groups + serviceNames: research_agent), evaluators (e.g. Builtin.GoalSuccessRate, Builtin.Helpfulness), and evaluationExecutionRoleArn set to the stack output [31].
Results are written to a log group in the form /aws/bedrock-agentcore/evaluations/results/<online-evaluation-config-id>. You can update sampling rate or evaluators with the CLI (agentcore eval online update) or the Control Plane API.
Policy Enforcement and Policy Observability
When you attach policy enforcement to Gateways, AgentCore publishes policy invocation metrics to CloudWatch by default; span data becomes available when gateway traces are enabled [33].
A production rollout pattern is:
- Start with policy enforcement in
LOG_ONLYto identify what would be denied - Iterate policies / attributes until false positives are eliminated
- Switch to
ENFORCE
Deployment and Auth "Gotchas" That Affect the Frontend
- If your runtime endpoint is JWT/OAuth configured, the HTTP protocol contract defines 401 responses with
WWW-Authenticateheaders for missing auth, and notes differences from SigV4-configured behavior - If you integrate OAuth flows (user-delegated outbound access), you must implement session binding to prevent authorization URL forwarding attacks, and register your callback URL against the workload identity
- Strands TypeScript session persistence is not currently supported, so treat AgentCore Memory as your durable "conversation and preference" substrate, with clear separation between:
- short-term in-session state (in the runtime session)
- short-term event log (CreateEvent)
- long-term extracted records (RetrieveMemoryRecords)