← Insights
Data Engineering

Your lakehouse migration keeps slipping.

The platform works; the old warehouse is still on, still paid for, still trusted. Why migrations stall at eighty percent — and the staged, workload-by-workload path that actually finishes one.

Zak Data Solutions · June 18, 2026

Almost every lakehouse migration reaches a point where it is perpetually “eighty percent done.” The new platform works. A handful of pipelines run on it. And yet the old warehouse is still on, still paid for, still the thing people actually open for the month-end number. The migration has not failed — it has stalled. And a stalled migration is more expensive than no migration at all, because you are now running two systems and fully trusting neither.

Migrations stall for reasons that are predictable and, mostly, avoidable. The same pattern repeats across teams:

  1. 1.Scope is the whole estate. The plan is “move everything to the lakehouse,” so there is no moment where any part is finished. Progress is measured against a denominator that never shrinks.
  2. 2.No first workload was chosen on purpose. The team starts with whatever is in front of it — usually either something trivial that proves nothing, or something business-critical that is too risky to cut over. Momentum never compounds.
  3. 3.“Done” was never defined per workload. Without explicit parity criteria — same numbers within tolerance, acceptable freshness, understood cost — “migrated” is a feeling, not a checkpoint, so the old system stays on just in case.
  4. 4.Dual-run has no exit. Running old and new in parallel is the right way to build confidence, but with no reconciliation gate and no decommission date, parallel quietly becomes permanent and the cost doubles indefinitely.
  5. 5.Cutover fear with no rollback. Nobody wants to flip the switch when the only plan is forward, so the switch never flips. The fear is rational; the missing piece is a tested way back.
  6. 6.The old system is never turned off. Even when a workload fully moves, the legacy path keeps running because shutting it down feels riskier than leaving it on — so the savings that justified the migration never actually arrive.

None of these are technology problems. The lakehouse is fine. The problem is that the migration was framed as one large cutover instead of a sequence of small, finishable ones.

A staged de-risking path

Migrations finish workload by workload, each one carried through the same short lifecycle before the next begins. The discipline is in finishing each stage — not in starting many.

  1. 1.Choose one workload with high value and low blast radius. Not the most critical report, and not a toy. A workload that matters enough to prove the platform and is contained enough that a problem is recoverable. The first finished migration is what makes the second one credible.
  2. 2.Define “done” before you move it. Write the parity criteria down: the numbers match within an agreed tolerance, freshness meets the decision the data feeds, and cost is understood. “Done” should be a checklist someone can sign, not a vibe.
  3. 3.Dual-run with a reconciliation gate, not forever. Run old and new side by side, compare outputs automatically, and set the exit condition in advance — for example, a fixed number of consecutive periods at parity, then cut over. Dual-run is a bridge with a far end, not a place to live.
  4. 4.Cut over with a tested rollback. Before the switch, prove you can switch back. A rehearsed rollback turns an irreversible-feeling decision into a reversible one — which is the only reason anyone is willing to make it.
  5. 5.Decommission the old path on a date. The migration is not done when the new path works; it is done when the old path is off. Schedule the shutdown as part of the workload, not as a someday-cleanup, because the decommission is where the savings live.
  6. 6.Then take the next workload. With one full cycle behind you — including a real cutover and a real shutdown — the second workload is faster, and the team is no longer migrating on faith.

This is slower to start and far faster to finish. A migration run as ten small finishable cutovers reaches a real end. A migration run as one large cutover tends to reach eighty percent and stop.

Where this usually goes wrong

The two stages teams skip are the unglamorous ones: defining done, and decommissioning. Skipping “define done” is why dual-run never ends. Skipping decommission is why the savings never show up. If a migration is stalled today, the fastest diagnostic is to take any workload claimed as migrated and ask two questions: what were the parity criteria, and what date does the old path turn off? If neither has a clear answer, that is the crack.

Stuck at eighty percent?

An architecture review maps your migration as a sequence of finishable workloads — picks the right first one, defines parity for each, and puts decommission dates on the calendar so the savings actually arrive.