Let’s be real: we’ve all had that moment where our AWS Lambda function hit the dreaded 15-minute timeout, and we died a little bit inside.
Up until recently, if you wanted to build a long-running process — say, a 30-day trial onboarding flow or a multi-step document processing pipeline — you had two choices:
But the game changed at re:Invent 2025. Enter AWS Lambda Durable Functions. We can finally write stateful, long-running workflows directly in TypeScript or in other supported languages. No YAML, no external state management, just pure code.
Imagine you’re reading a massive 1,000-page book. If you have to put it down to sleep, you don’t start from page one the next morning — you use a bookmark.
Durable Functions act as that bookmark for your code. When your function hits a “wait”, AWS “checkpoints” the progress and shuts down the compute. When the event is ready, it “replays” the function, skips the parts it already finished, and picks up exactly where it left off. The idea is very simple but it’s very powerful.
I know what you’re thinking: “Is Step Functions dead?” Not quite. If you are building complex micro-service choreography between 10 different AWS services, Step Functions’ visual designer is still king. But if you are building application logic that just needs to be stateful, Durable Functions is your new best friend.
Let’s look at a real-world scenario: An e-commerce workflow that needs to validate an order, wait for payment, and then ship the goods.
Here is how you would write this using the new @aws-sdk/client-lambda-durable in TypeScript.
import { DurableContext, withDurableExecution } from "@aws/durable-execution-sdk-js";
// Imagine these services interact with your DB and APIs
import { inventoryService, paymentService, shippingService } from "./services";
export const handler = withDurableExecution(
async (event: any, context: DurableContext) => {
const { orderId, amount, items } = event;
// 1. Reserve inventory
// If this fails, the SDK handles the retry. If it succeeds, the result is checkpointed
const inventory = await context.step("reserve-inventory", async () => {
return await inventoryService.reserve(items);
});
// 2. Process payment
const payment = await context.step("process-payment", async () => {
return await paymentService.charge(amount);
});
// 3. Create shipment
// This step only runs once the previous two are successfully checkpointed.
const shipment = await context.step("create-shipment", async () => {
return await shippingService.createShipment(orderId, inventory);
});
return {
orderId,
status: 'completed',
paymentId: payment.id,
shipmentTracking: shipment.trackingNumber
};
}
);
This code represents a significant shift in how we build on AWS. By using the @aws-durable-execution-sdk-js, you are moving away from “fire-and-forget” functions toward Stateful Workflows.
Here is a breakdown of what is actually happening under the hood while this code runs.
withDurableExecution)This isn’t just a decorator; it’s a manager. It tracks every interaction you have with the context object. It maintains a “History Log” of your execution.
context.step)When your code hits await context.step("reserve-inventory", ...):
inventory object, the SDK saves that result into a managed state store (hidden from you).If your process-payment step takes too long or the Lambda environment restarts, AWS triggers the function again.
context.step("reserve-inventory"), the SDK looks at the history log and says: “I already have the result for this!”inventory variable and moves to the next line.Once you move beyond basic sequential steps, the @aws-durable-execution-sdk-js reveals its true power. It allows you to build complex, non-linear systems that feel like a single script but behave like a distributed state machine.
Here is a breakdown of the advanced primitives you can use to orchestrate high-scale applications.
context.wait: The Time-TravelerThis is more than just a sleep() command. When you call wait, your Lambda function stops executing entirely. AWS saves the state and schedules a fresh invocation for the future.
Date.context.waitForCallback: The Human BridgeThis is the “Human-in-the-Loop” pattern. It pauses the code until an external system tells it to continue.
taskToken. You send this token to an external service (like a Slack bot, an Email, or a React UI).SendDurableExecutionCallback API with that specific token.context.parallel: The Turbo-ChargerStandard Promise.all() is dangerous in serverless because one long-running promise could time out the whole Lambda. context.parallel is different. It tells AWS to manage multiple durable operations simultaneously.
context.invoke: The Modular ArchitectThis allows you to call other Lambda functions (durable or standard) as a child process of your current workflow.
sendEmailWorkflow) and call them from any parent.The durable functions are so powerful. But at the same time you’ll need to follow the best practices to avoid the chaos.
When your function “wakes up” from a 7-day wait, it runs from the very beginning to find its place. If your code produces a different result during this replay, the SDK will throw a Non-Deterministic Error.
The Mistake: Using new Date() or Math.random() in the main body.
// ❌ BAD: This will be different every time the function replays
const requestId = Math.random();
The Fix: Wrap these calls in context.step(). This ensures the value is generated once, saved in the checkpoint, and then “replayed” as the exact same value.
// ✅ GOOD: The result is saved to the history after the first run
const requestId = await context.step("get-id", async () => Math.random());
It’s a common trap to think the 15-minute Lambda limit is gone. It isn’t.
context.step must finish within the 15-minute Lambda timeout.AWS stores your execution history (the results of every context.step) in a managed state store.
inventoryService.reserve(), the checkpointing will fail.S3_URL from your steps.Since workflows can live for a year, it’s easy to accidentally leave thousands of executions “Waiting” and racking up state storage costs.
timeout in your waitForCallback and wait operations.RetentionPeriodInDays in the DurableConfig so that finished histories are automatically scrubbed.With the release of AWS Lambda Durable Functions, the boundary between “simple functions” and “complex orchestrators” has finally vanished. We no longer have to choose between the simplicity of a single code file and the resilience of a multi-step state machine.
Even in 2026, Step Functions remains the king of low-code orchestration and massive service integrations (like connecting S3 to Bedrock without any code). However, Durable Functions is now the clear winner for:
LocalDurableTestRunner is significantly faster than deploying state machines to the cloud.Because these functions can run for months, AWS introduced a dedicated Durable Executions tab in the Lambda Console. You can now see a visual timeline of every execution, including exactly when a function “went to sleep” and which specific context.step is currently running.
The true power of this SDK is that it lets you focus on business logic. You don’t have to write “plumbing” code for retries, database checkpoints, or state management. You simply write your function from top to bottom, and AWS ensures that it stays alive until the job is done.
The 15-minute wall has been torn down. It’s time to start building applications that can truly go the distance.