Database Migrations Without Downtime: A Step-by-Step Guide

At some point, every growing application needs to change its database schema. Add a column, rename a field, change a data type, restructure a relationship. In development, you drop the database and start fresh. In production, your users are making requests while you alter the table they are reading from.

The default approach — write a migration, deploy it, hope it is fast — works until it does not. We learned this the hard way on an early version of LancerSpace when an ALTER TABLE on the invoices table locked it for 12 seconds during a weekday afternoon. Twelve seconds does not sound like much. It was enough for dozens of API requests to time out, the connection pool to saturate, and the health check to fail and trigger a restart.

That was the last time we deployed a migration without a plan. Here is how we do it now.

Server maintenance and system update process in progress

Why ALTER TABLE Can Lock Your Database

PostgreSQL uses locks to maintain data consistency during schema changes. The important locks for our purposes are:

ACCESS EXCLUSIVE — blocks all reads and writes. Required by ALTER TABLE ... ADD COLUMN ... DEFAULT (before PostgreSQL 11), ALTER TABLE ... ALTER COLUMN TYPE, and DROP COLUMN.
SHARE UPDATE EXCLUSIVE — blocks other schema changes but allows reads and writes. Used by CREATE INDEX CONCURRENTLY.
ROW EXCLUSIVE — the normal lock for INSERT/UPDATE/DELETE. Does not block reads.

The killer is ACCESS EXCLUSIVE. When PostgreSQL takes this lock, every query against that table — reads included — queues up and waits. On a busy table, even a millisecond of lock time means dozens of queued queries. If the migration takes seconds (rewriting the table, validating a constraint), those queued queries can cascade into connection pool exhaustion and application-level timeouts.

Since PostgreSQL 11, adding a column with a constant default value is fast — it writes the default to the catalog rather than rewriting every row. But many other operations still require a full table rewrite or a long-held lock.

-- SAFE in PostgreSQL 11+: adds column without rewriting rows
ALTER TABLE invoices ADD COLUMN currency text DEFAULT 'USD';

-- DANGEROUS: rewrites the entire table, holds ACCESS EXCLUSIVE lock
ALTER TABLE invoices ALTER COLUMN amount TYPE numeric(12,2);

-- DANGEROUS: validates every row, long lock on large tables
ALTER TABLE invoices ADD CONSTRAINT positive_amount CHECK (amount > 0);

The Expand-Contract Pattern

The expand-contract pattern (also called parallel change) is the foundation of zero-downtime migrations. Instead of changing a column in place, you:

Expand: Add the new structure alongside the old one.
Migrate: Copy data from old to new, update application code to write to both.
Contract: Remove the old structure once everything uses the new one.

Each step is a separate deployment. Each step is individually safe. If something goes wrong at any step, you can stop without data loss or downtime.

Let us walk through a real example: renaming a column from name to full_name on a clients table.

Step 1: Expand — Add the New Column

-- Migration 001: Add new column
ALTER TABLE clients ADD COLUMN full_name text;

This is safe. Adding a nullable column with no default takes a near-instant catalog-only lock in PostgreSQL. The old name column still exists and works. The application code does not need to change yet.

Step 2: Backfill and Dual-Write

Deploy application code that writes to both columns:

// Updated insert/update code — writes to both columns
async function updateClient(id: string, fullName: string) {
  const { error } = await supabase
    .from('clients')
    .update({
      name: fullName,       // Old column — still active
      full_name: fullName,  // New column — being populated
    })
    .eq('id', id);
}

Then backfill existing rows:

-- Migration 002: Backfill existing data
-- Do this in batches to avoid long-running transactions
UPDATE clients
SET full_name = name
WHERE full_name IS NULL
  AND id IN (
    SELECT id FROM clients
    WHERE full_name IS NULL
    LIMIT 1000
  );

-- Repeat until all rows are backfilled
-- In production, run this as a script that batches automatically

For large tables, always batch your backfills. A single UPDATE clients SET full_name = name on a million-row table holds a row-level lock on every row for the duration of the transaction. Batching in chunks of 1,000-5,000 rows keeps each transaction short and lets normal traffic interleave.

// Backfill script — batched for production safety
async function backfillFullName() {
  let updated = 0;
  const BATCH_SIZE = 1000;

  while (true) {
    const { data, error } = await supabase.rpc('backfill_full_name_batch', {
      batch_size: BATCH_SIZE,
    });

    if (error) throw error;
    if (data === 0) break; // No more rows to update

    updated += data;
    console.log(`Backfilled ${updated} rows`);

    // Small delay between batches to reduce load
    await new Promise(r => setTimeout(r, 100));
  }

  console.log(`Backfill complete: ${updated} total rows`);
}

-- The database function for batched backfill
create or replace function backfill_full_name_batch(batch_size int)
returns int
language plpgsql
as $$
declare
  rows_updated int;
begin
  with batch as (
    select id
    from clients
    where full_name is null
    limit batch_size
    for update skip locked  -- Skip rows being modified by other transactions
  )
  update clients c
  set full_name = c.name
  from batch
  where c.id = batch.id;

  get diagnostics rows_updated = row_count;
  return rows_updated;
end;
$$;

The FOR UPDATE SKIP LOCKED is important. It prevents the backfill from blocking — or being blocked by — normal application queries that are updating the same rows.

Zero downtime deployment with seamless system transitions

Step 3: Switch Reads

Once backfill is complete and you have verified that all rows have full_name populated, update the application code to read from the new column:

// Now reading from full_name, still writing to both
async function getClient(id: string) {
  const { data } = await supabase
    .from('clients')
    .select('id, full_name, email')  // Reading from new column
    .eq('id', id)
    .single();

  return data;
}

Step 4: Contract — Remove the Old Column

Once you are confident that no code reads from the old name column — check your application code, any scheduled jobs, any reporting queries, any RLS policies — drop it:

-- Migration 003: Drop old column (after verification period)
ALTER TABLE clients DROP COLUMN name;

Dropping a column takes an ACCESS EXCLUSIVE lock, but it is near-instant because it only modifies the catalog (the data is not physically removed until a VACUUM). The brief lock is safe.

Safe Index Creation

Creating an index on a large table is another operation that can cause downtime if done carelessly. A standard CREATE INDEX takes an ACCESS EXCLUSIVE lock for the entire duration of the index build — which can be minutes on a table with millions of rows.

The solution is CREATE INDEX CONCURRENTLY:

-- DANGEROUS: locks the table for the entire build duration
CREATE INDEX idx_invoices_client_id ON invoices(client_id);

-- SAFE: builds the index without blocking reads or writes
CREATE INDEX CONCURRENTLY idx_invoices_client_id ON invoices(client_id);

CONCURRENTLY builds the index in the background. It takes longer and uses more resources, but it does not block normal operations. The tradeoff is worth it for any table serving production traffic.

One caveat: CREATE INDEX CONCURRENTLY cannot run inside a transaction. If you are using a migration tool that wraps each migration in a transaction (most do), you need to configure it to run this migration outside a transaction.

In Supabase migrations:

-- supabase/migrations/20260205_add_invoice_index.sql
-- This migration must be run outside a transaction
-- Supabase migrations run outside transactions by default

CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_invoices_client_id
ON invoices(client_id);

Safe Constraint Addition

Adding a constraint like NOT NULL or a CHECK constraint validates every existing row, which holds a lock for the duration of the validation. On a table with millions of rows, this can take minutes.

The safe approach: add the constraint as NOT VALID, then validate it separately.

-- Step 1: Add constraint without validating existing rows (instant)
ALTER TABLE invoices
ADD CONSTRAINT positive_amount CHECK (amount > 0) NOT VALID;

-- New rows are validated against the constraint immediately
-- Existing rows are NOT validated yet

-- Step 2: Validate existing rows (does not block writes)
ALTER TABLE invoices VALIDATE CONSTRAINT positive_amount;

The NOT VALID addition is near-instant — it only modifies the catalog. New inserts and updates are validated immediately. The VALIDATE CONSTRAINT step scans existing rows but only takes a SHARE UPDATE EXCLUSIVE lock, which allows reads and writes to continue.

The same pattern works for NOT NULL:

Technology system upgrade with modern hardware and software components

-- Instead of: ALTER TABLE clients ALTER COLUMN email SET NOT NULL;
-- Which scans the entire table under ACCESS EXCLUSIVE lock

-- Step 1: Add a check constraint (equivalent to NOT NULL)
ALTER TABLE clients
ADD CONSTRAINT clients_email_not_null CHECK (email IS NOT NULL) NOT VALID;

-- Step 2: Validate in the background
ALTER TABLE clients VALIDATE CONSTRAINT clients_email_not_null;

-- Step 3: Now safe to add the actual NOT NULL (instant, since constraint exists)
ALTER TABLE clients ALTER COLUMN email SET NOT NULL;

-- Step 4: Drop the redundant check constraint
ALTER TABLE clients DROP CONSTRAINT clients_email_not_null;

Changing Column Types

Changing a column type is one of the most dangerous operations because it rewrites the entire table. A table with 10 million rows can take minutes to rewrite, all under ACCESS EXCLUSIVE lock.

The expand-contract pattern handles this cleanly:

-- Changing `price` from integer (cents) to numeric(10,2) (dollars)

-- Step 1: Add new column with desired type
ALTER TABLE products ADD COLUMN price_decimal numeric(10,2);

-- Step 2: Backfill (batched)
UPDATE products SET price_decimal = price / 100.0
WHERE price_decimal IS NULL AND id IN (
  SELECT id FROM products WHERE price_decimal IS NULL LIMIT 1000
);

-- Step 3: (in application code) dual-write, switch reads
-- Step 4: Drop old column
ALTER TABLE products DROP COLUMN price;

-- Step 5: Rename new column (instant catalog operation)
ALTER TABLE products RENAME COLUMN price_decimal TO price;

Our Migration Checklist

Every migration we write goes through this checklist before it reaches production:

Does this migration take a lock? If yes, what kind, and for how long?
Can this be run with CONCURRENTLY or NOT VALID? If yes, use them.
Does this require a table rewrite? If yes, use the expand-contract pattern.
Is there a backfill? If yes, batch it. Never update millions of rows in a single transaction.
Can this migration be reversed? Every expand step should be reversible by dropping the new structure. Every contract step should be deferred until the expand is fully verified.
Has this been tested against a production-sized dataset? Migrations that are instant on a 100-row dev database can lock for minutes on a million-row production table.

We automate this as much as possible. For our Supabase projects, including MindHyv and Trackelio, migrations go through a staging environment with a copy of production data (scrubbed of PII) before they touch production. This catches size-dependent performance issues before they matter.

Related topics we have covered: multi-tenant SaaS architecture discusses how RLS policies interact with schema changes, and shipping fast without breaking things covers the broader deployment practices that make safe migrations part of a reliable release process.

If you are dealing with a tricky migration on a production database and want a second opinion, reach out at [email protected]. We have been through enough migration incidents to know what goes wrong and how to prevent it.