Webhook Architecture: Building Reliable Event-Driven Integrations
Every SaaS product eventually needs to talk to the outside world in real time. A user completes a payment, and your system needs to notify a third-party fulfillment service. A subscription renews, and your client’s internal tools need to update. The standard answer is webhooks — HTTP callbacks that push event data to a URL when something happens.
The concept is simple. The implementation is not. Webhooks touch networking, security, concurrency, and failure handling all at once. We have built webhook systems for several products, both as senders (notifying external systems) and receivers (consuming events from Stripe, GitHub, and other services). This post covers the architecture we use for both sides.
Sending Webhooks: The Naive Approach and Why It Breaks
The tempting first implementation is to fire an HTTP request inline whenever an event occurs:
// Don't do this
async function handlePaymentCompleted(payment: Payment) {
await savePayment(payment);
// This blocks the request and will fail silently
await fetch(webhook.url, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ event: "payment.completed", data: payment }),
});
}
This breaks in several ways. The external server might be slow, temporarily down, or return errors. Your request handler now depends on a third-party server’s availability. If the fetch times out, your user’s payment flow hangs. If you catch and ignore the error, the webhook is silently lost.
The correct approach is to decouple event creation from delivery.
The Queue-Based Architecture
Every reliable webhook system follows the same pattern: persist the event first, deliver it asynchronously, and retry on failure.
Here is the database schema we use:
CREATE TABLE webhook_events (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
event_type text NOT NULL,
payload jsonb NOT NULL,
created_at timestamptz NOT NULL DEFAULT now()
);
CREATE TABLE webhook_subscriptions (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
endpoint_url text NOT NULL,
secret text NOT NULL,
events text[] NOT NULL, -- array of event types to subscribe to
active boolean NOT NULL DEFAULT true,
created_at timestamptz NOT NULL DEFAULT now()
);
CREATE TABLE webhook_deliveries (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
event_id uuid NOT NULL REFERENCES webhook_events(id),
subscription_id uuid NOT NULL REFERENCES webhook_subscriptions(id),
status text NOT NULL DEFAULT 'pending', -- pending, success, failed, exhausted
attempts integer NOT NULL DEFAULT 0,
next_retry_at timestamptz,
last_response_status integer,
last_response_body text,
created_at timestamptz NOT NULL DEFAULT now()
);
CREATE INDEX idx_deliveries_pending ON webhook_deliveries (next_retry_at)
WHERE status IN ('pending', 'failed');
When an event occurs, you insert into webhook_events and create a webhook_deliveries row for each matching subscription. A background worker picks up pending deliveries and attempts them.
async function createWebhookEvent(eventType: string, payload: Record<string, unknown>) {
const event = await db.webhookEvents.insert({
event_type: eventType,
payload,
});
const subscriptions = await db.webhookSubscriptions.findMany({
where: {
active: true,
events: { contains: [eventType] },
},
});
const deliveries = subscriptions.map((sub) => ({
event_id: event.id,
subscription_id: sub.id,
status: "pending",
next_retry_at: new Date(),
}));
await db.webhookDeliveries.insertMany(deliveries);
}

Signing Webhooks for Security
Every webhook request must be signed so the receiver can verify it actually came from your system. The standard approach is HMAC-SHA256 with a shared secret:
import { createHmac } from "node:crypto";
function signPayload(payload: string, secret: string): string {
return createHmac("sha256", secret).update(payload).digest("hex");
}
async function deliverWebhook(delivery: WebhookDelivery, subscription: WebhookSubscription, event: WebhookEvent) {
const body = JSON.stringify({
id: event.id,
event: event.event_type,
created_at: event.created_at,
data: event.payload,
});
const timestamp = Math.floor(Date.now() / 1000).toString();
const signature = signPayload(`${timestamp}.${body}`, subscription.secret);
const response = await fetch(subscription.endpoint_url, {
method: "POST",
headers: {
"Content-Type": "application/json",
"X-Webhook-Id": event.id,
"X-Webhook-Timestamp": timestamp,
"X-Webhook-Signature": `sha256=${signature}`,
},
body,
signal: AbortSignal.timeout(10_000), // 10 second timeout
});
return response;
}
Including a timestamp in the signed payload prevents replay attacks. The receiver should reject signatures where the timestamp is more than a few minutes old.
Retry Logic with Exponential Backoff
External servers go down. Networks have hiccups. Your retry strategy determines whether a temporary outage causes permanent data loss. We use exponential backoff with jitter:
const MAX_ATTEMPTS = 8;
function getNextRetryDelay(attempt: number): number {
// Exponential backoff: 10s, 30s, 1.5m, 5m, 15m, 45m, 2h, 6h
const baseDelay = Math.min(10 * Math.pow(3, attempt), 21600); // cap at 6 hours
const jitter = baseDelay * 0.2 * Math.random(); // +/- 20% jitter
return (baseDelay + jitter) * 1000;
}
async function processDelivery(delivery: WebhookDelivery) {
const subscription = await db.webhookSubscriptions.findById(delivery.subscription_id);
const event = await db.webhookEvents.findById(delivery.event_id);
try {
const response = await deliverWebhook(delivery, subscription, event);
if (response.ok) {
await db.webhookDeliveries.update(delivery.id, {
status: "success",
attempts: delivery.attempts + 1,
last_response_status: response.status,
});
return;
}
// Non-2xx response — schedule retry
const responseBody = await response.text().catch(() => "");
await scheduleRetry(delivery, response.status, responseBody);
} catch (err) {
// Network error or timeout
await scheduleRetry(delivery, null, err instanceof Error ? err.message : "Unknown error");
}
}
async function scheduleRetry(delivery: WebhookDelivery, status: number | null, responseBody: string) {
const nextAttempt = delivery.attempts + 1;
if (nextAttempt >= MAX_ATTEMPTS) {
await db.webhookDeliveries.update(delivery.id, {
status: "exhausted",
attempts: nextAttempt,
last_response_status: status,
last_response_body: responseBody,
});
// Optionally notify the subscription owner
return;
}
const delay = getNextRetryDelay(nextAttempt);
await db.webhookDeliveries.update(delivery.id, {
status: "failed",
attempts: nextAttempt,
next_retry_at: new Date(Date.now() + delay),
last_response_status: status,
last_response_body: responseBody,
});
}
The retry schedule above gives the receiving server roughly 9 hours to recover before the delivery is marked as exhausted. This is generous enough to survive most outages without holding onto events forever.
The Delivery Worker
The background worker polls for pending deliveries and processes them. In production, we run this as a separate process or use a proper job queue (pg-boss works well with PostgreSQL):
async function runWebhookWorker() {
while (true) {
const deliveries = await db.webhookDeliveries.findMany({
where: {
status: { in: ["pending", "failed"] },
next_retry_at: { lte: new Date() },
},
orderBy: { next_retry_at: "asc" },
limit: 50,
});
if (deliveries.length === 0) {
await sleep(5000); // Poll every 5 seconds when idle
continue;
}
// Process in parallel with concurrency limit
await pMap(deliveries, processDelivery, { concurrency: 10 });
}
}
For higher throughput, replace the polling loop with a PostgreSQL LISTEN/NOTIFY trigger or a dedicated job queue.

Receiving Webhooks: Verification and Idempotency
On the receiving side, two things matter above all else: verify the signature and handle duplicates.
Here is how we verify incoming webhooks (using Stripe’s signature scheme as an example, since we integrate with Stripe in several projects including MindHyv and LancerSpace):
import { createHmac, timingSafeEqual } from "node:crypto";
function verifyWebhookSignature(
body: string,
signature: string,
timestamp: string,
secret: string
): boolean {
// Reject old timestamps (> 5 minutes)
const age = Math.abs(Date.now() / 1000 - parseInt(timestamp));
if (age > 300) return false;
const expected = createHmac("sha256", secret)
.update(`${timestamp}.${body}`)
.digest("hex");
const expectedBuffer = Buffer.from(expected, "hex");
const receivedBuffer = Buffer.from(signature.replace("sha256=", ""), "hex");
if (expectedBuffer.length !== receivedBuffer.length) return false;
return timingSafeEqual(expectedBuffer, receivedBuffer);
}
Use timingSafeEqual instead of === to prevent timing attacks. This is not paranoia — it is standard practice.
For idempotency, store processed event IDs and skip duplicates:
async function handleIncomingWebhook(req: Request) {
const body = await req.text();
const signature = req.headers.get("x-webhook-signature") ?? "";
const timestamp = req.headers.get("x-webhook-timestamp") ?? "";
const eventId = req.headers.get("x-webhook-id") ?? "";
if (!verifyWebhookSignature(body, signature, timestamp, WEBHOOK_SECRET)) {
return new Response("Invalid signature", { status: 401 });
}
// Idempotency check
const existing = await db.processedWebhookEvents.findById(eventId);
if (existing) {
return new Response("Already processed", { status: 200 });
}
// Process the event
const event = JSON.parse(body);
await processEvent(event);
// Record that we processed this event
await db.processedWebhookEvents.insert({ id: eventId, processed_at: new Date() });
return new Response("OK", { status: 200 });
}
Always return 200 before doing heavy processing. If your handler takes too long, the sender will time out and retry, creating duplicates. For complex event handling, acknowledge receipt immediately and process asynchronously.
Event Replay and Recovery
Things go wrong. Your receiver had a bug that rejected valid events. A deployment went sideways and your endpoint was down for 20 minutes. You need a way to replay events.
If you are the sender, expose a replay endpoint:
// POST /api/webhooks/replay
// Body: { subscription_id: string, from: string, to: string }
async function replayEvents(subscriptionId: string, from: Date, to: Date) {
const events = await db.webhookEvents.findMany({
where: {
created_at: { gte: from, lte: to },
},
orderBy: { created_at: "asc" },
});
for (const event of events) {
await db.webhookDeliveries.insert({
event_id: event.id,
subscription_id: subscriptionId,
status: "pending",
next_retry_at: new Date(),
});
}
}
If you are the receiver, your idempotency check handles replays naturally. Events you already processed get skipped. Events you missed get processed.

Monitoring and Alerting
A webhook system without monitoring is a webhook system that silently fails. Track these metrics:
Delivery success rate — what percentage of deliveries succeed on the first attempt? A drop signals either your system or a subscriber’s endpoint is having issues.
Time to delivery — how long between event creation and successful delivery? This should be seconds for first attempts and follow your retry schedule for retries.
Exhausted deliveries — events that failed all retry attempts. These represent data loss from the subscriber’s perspective and should trigger alerts.
We log every delivery attempt with the response status and body. When debugging integration issues with clients, this log is invaluable. “Your server returned a 500 with this error message at 3:47 PM” is a much better support interaction than “it’s not working.”
-- Dashboard query: delivery health over the last 24 hours
SELECT
date_trunc('hour', created_at) AS hour,
status,
count(*) AS delivery_count,
avg(attempts) AS avg_attempts
FROM webhook_deliveries
WHERE created_at > now() - interval '24 hours'
GROUP BY hour, status
ORDER BY hour DESC;
Patterns We Have Learned the Hard Way
Do not fan out inline. If an event has 50 subscribers, do not fire 50 HTTP requests synchronously. Insert 50 delivery records and let the worker handle them.
Set aggressive timeouts. We use 10 seconds. If a receiver cannot acknowledge a webhook in 10 seconds, something is wrong on their end. You should not let their slowness back up your queue.
Include enough context in the payload. Send the full object state, not just an ID. Receivers should not have to make an API call back to your system to get the data they need. This reduces coupling and makes the integration more reliable.
Version your webhook payloads. Include a version field in the payload envelope. When you change the data shape, bump the version. This gives receivers time to update their handlers.
Treat webhooks as at-least-once delivery. Never guarantee exactly-once. The receiver must be idempotent. This is a fundamental constraint of distributed systems, not a limitation of your implementation.
We covered some of these distributed systems patterns in our post on TypeScript patterns we use in production, particularly around type-safe event handling.
Wrapping Up
Webhook architecture is one of those areas where the gap between “works in development” and “works in production” is massive. The queue-based delivery pattern, HMAC signatures, exponential backoff retries, and idempotent receivers are not optional — they are the minimum viable architecture for a system that other teams will depend on.
If you are building an integration layer for your product and want to get webhooks right from the start, reach out at [email protected].