Retail leaders: Fix 12 data migration mistakes before it’s too late (Part 4)

Part 4 wraps up the series by focusing on process and architecture, showing how a flawed technical foundation can undermine even the best strategy, people, and testing, ultimately eroding trust in your migration outcomes.

Read time:

8 min

Retail leaders: Fix 12 data migration mistakes before it’s too late (Part 4)

Process & Architecture (The Engine Room)

Welcome to the final instalment of our series on retail data migrations - I hope you enjoyed the keen learnings of DJ Khaled, Admiral Ackbar, and Forrest Gump in our previous post.

Wait, *what*?

Didn’t catch last week’s post? Make sure you have a read through here: LINK

Over the last three weeks, we’ve covered Strategy, People, and Testing. Today, we conclude with the technical foundation: Process and Architecture. Even with great people and smart testing, a fundamentally flawed technical approach will destroy trust in the final output.

MISTAKE #10:

Relying on a manual transformation process

Oof.

The Problem:

It’s time to think about how you’re going to physically transform and move your data into your new system. Your colleague Jeff is a bit of an Excel wizard, and says it’ll be a piece of cake to do that - couple of formulas and job done. So you let him go crazy with it.

A little while later, Jeff demos Employee_Migration_Copy_FINAL_Test (1).xlsx to you.

“Hold on mate, just need to copy the source data in… where was it again?”

5 minutes later.

“Ah yeah, found it. Right, let me just paste it into this tab…”

You see a bunch of #VALUE! errors in the transform tab.

“Oh ignore that, it does that sometimes. I just have to recalibrate the flux capacitor and reverse the polarity of the neutron flow and it’s all good then”

You stare blankly. And nod.

“Yeah so after pasting in the data and then checking the formulas, I give the data a quick eyeball for any weird stuff. Actually on this one there was a problem with the Person Number generation, so what I’m doing is then pasting the transformed data into a new tab and then redoing the number generation manually so I know it’s correct. Excel doesn’t play ball sometimes, but at least it’s fixed!”

Oh, the joy.

“Now for the file generation, I make sure I’m targeting this final tab and not the transform tab, then do Save As and select CSV as the filetype… oh wait, no not that one. Which one was it again… Oh yeah, CSV UTF-8. Not CSV (comma delimited), CSV (Macintosh), CSV (MS-DOS), ….”

Mind close to exploding.

“Then I’ve been using 7Zip to archive because I get the best compression rates out of it. It’s well good! We should get the IT guys to install it on all our machines!”

“Oops, hold on, forgot to update the mappings in the mapping tab. Let me just go through this again…”

That was a fun demo, right? That was just one file, covering one object. Now imagine doing that for the 50-100 objects you need to load into your new system. Good luck with ensuring a consistent process and visibility of change.

Food for thought:

If it’s not repeatable, it cannot be trusted - Jeff’s spreadsheet example above is a perfect representation of this. There’s no repeatability here, it’s Jeff doing Jeff things in a spreadsheet. When you re-run transformations — either iteratively, or for a DM cycle — you want guarantees that each step that needs to be taken, will be, regardless of when it was run, or who’s running it. Consider ways to apply process automation to reduce / remove human error
Not only will it reduce human error, but it reduces the reliance on a single point of failure: Jeff. If he suddenly disappears, your migration is in tatters. Because let’s face it, Jeff won’t have written up a nice SOP for this - he’s doing what he can to keep his head above water.
Be intentional about your process, get predictable results - closely linking in with the previous point. Map out the process that needs to happen. Think about how you connect the dots, and how this process will be run in reality. You may map out this great end-to-end process for creating your stack of data objects, but what happens if a developer just wants to transform employee bank details?
This is where idempotency comes in. That’s a fun word for you. Essentially: no matter how many times you perform the same operation, it will still output the same result as if you ran it just once.
The critical point here: If Angie (your dev) re-runs the employee bank details transformation, what happens? Without considering idempotency, the likely result is duplicate, or malformed data.
This can be fixed with good orchestration and process.

MISTAKE #11:

Writing mapping documents for machines instead of people

Shifting gears a little bit, let’s talk about DM documentation process.

The Problem:

If by some miracle, Jeff had prepared mapping documentation, it might look something like this:

| Source Column | Target Column | Transformation Rules |

| --- | --- | --- |

| EMP_TYPE | WORKER_CLASSIFICATION | =IF(A2="FT", "Regular_FTE", IF(A2="PT", "Part_Time", IF(OR(A2="Temp",A2="Intern"), "Contingent", IF(AND(A2="Contractor", VLOOKUP(B2, 'Finance_Approved'!A:B, 2, FALSE)="Yes"), "Exempt_Contractor", "CEO_Special_Friend")))) |

| BASE_SALARY | ANNUAL_COMP_USD | =IF(ISNUMBER(C2), C2, VALUE(SUBSTITUTE(SUBSTITUTE(UPPER(C2),"K","000"),"$",""))) * IF(D2="Hourly", 2080, 1) / IF(VLOOKUP(E2, 'Country_Codes'!A:C, 3, FALSE)="UK", 0.78, 1) |

| HIRE_DATE | ORIGINAL_START_DATE | =DATE(IF(VALUE(RIGHT(F2,2))>26, 1900+RIGHT(F2,2), 2000+RIGHT(F2,2)), LEFT(F2,2), MID(F2,4,2)) |

| REPORTS_TO | MANAGER_POSITION_ID | =IFERROR(VLOOKUP(G2, 'Active_Roster'!A:C, 3, FALSE), VLOOKUP("VACANT_ROLE", 'Fallback_Structure'!A:B, 2, FALSE)) |

| GENDER_CODE | GENDER_IDENTITY | =IF(OR(H2="M", H2="Male"), "Male", IF(OR(H2="F", H2="Female"), "Female", IF(H2="", "Declined_to_State", "Error_Contact_HR")))+IF(LEFT(I2,3)="Dir", " _Executive_Suite", "") |

| OFFICE_LOC | WORK_LOCATION_ID | =IF(ISNUMBER(SEARCH("Remote", J2)), "REM-001", IF(J2="HQ", "NY-01", VLOOKUP(TRIM(J2), 'IP_To_Building_Map'!A:B, 2, FALSE))) |

This is great if you’re asking ChatGPT to turn your document into a script, or you’ve built something to read in the documents and run the transform rules, but terrible if you’re, you know, a human being? Even with ChatGPT, are you confident you could reconcile what it has output against the spec you’ve pushed in?

Yes, some people can read the formulas fine, but let’s admit it, it’s not immediately obvious what is happening, the formulas are not formatted in a readable manner, and it lacks the separation of mapping and formatting that makes it intimidating to parse.

How are your Business SMEs supposed to read this, let alone understand fully the journey the data point is going through?

Food for thought:

Business SMEs must be able to validate the logic - retail relies on highly interconnected systems (e.g. E-Com, WMS, Store POS, etc.); if your Warehouse Manager can't read the mapping rules, they can't sign off on them.
Mappings should read like policy statements - they need to explain the 'why', not just the 'how'. Advocate for business-understandable logic first, SQL second, helping bridge the gap between developers and stakeholders.
A style guide for these documents can help authors respect continuity across documents.
Review these documents regularly - they aren’t a one-and-done document that you produce at the beginning of the migration project, just to file away and tick a box. It’s a document that has massive value in bridging the understanding between the development team, and everyone else. There should be planned review checkpoints across your project (for example - on exit from each DM cycle) to ensure everyone is abreast of the latest design, and to field wording amendments to enhance clarity.
You must still maintain strict change control to ensure you’re building trust each cycle. These reviews shouldn’t be abused by turning them into re-design sessions out of each DM cycle.

MISTAKE #12:

Optimising ETL speed over the feedback loop

This might seem a little counter-intuitive at first, let me explain.

The Problem:

When it comes to the world of development and engineering, speed is usually king. Obsessing over seconds, or even milliseconds when analysing a process for its performance, can be commonplace.

In some situations, this is warranted. If you’re building live dashboard of trade prices for the stock market, or digital order system from front-of-house to a kitchen, or maybe even something simple, like a till calculating total cost of goods, you want these to be snappy and near instant to provide the best customer experience, and in the first situation, avoid misinformation.

But when it comes to DM, it’s rarely the bottleneck you should be focused on. If a transformation takes a couple minutes to run, the value of having Angie (your dev, if you tuned out earlier) sat there for hours, or days, trying shave a few seconds off that run time, is very little. You will run these transformations for formal purposes maybe a handful of times. Once you’ve cutover into your new system, the transformation code essentially gets thrown in the bin (well, not exactly — you’d likely retain it for a while whilst you stabilise your new system, but you get the point), likely never to be used again.

It’s not like a storefront on your website, where results returning in less than 2 seconds vs. 10 seconds has a massive impact on things like user retention. Your cutover will be meticulously planned, incorporating run times into it, and shaving seconds is unlikely to make any difference, unless you’re operating on razor thin (also read: unrealistic) timelines. But that’s another kettle of fish.

Food for thought:

Define ‘acceptable’ and enforce it - set a hard benchmark for your run times based on your cutover window. If the transformation runs in the time allotted to that object on the plan, it’s done. There’s no need to waste time if it’s already within the threshold for acceptance.
Support with Technical accelerators - Angie can support you by picking up tasks like technical / volumetric reconciliation, so your business SMEs don’t have to. This’ll free the SMEs up to look at the finer detail. Angie can also support with summaries and reports that kickstart the understanding, so time isn’t wasted reviewing an issue that’s already known to the dev team.
Embrace iterative checking - I’m sure there’s some SMEs out there ready to have me shot for this one, but more regular checking of data can help speed up the process during DM cycles. Your SMEs are building familiarity with the new format of the data, they’re getting to test their validation process more often, and dev teams get to understand the sort of questions that may crop up. The more frequent line of communication between developer and SME is also highly underrated for improving ways of working.

In Closing…

I think it’s fair to say, there’s a clear takeaway from this post:

Don’t be like Jeff.

(Sorry to any Jeffs out there, nothing personal)

That about wraps up the series! Keep tuning in for more data related content from all of us here at binary10.

To finish on a more inspired point - DM doesn't have to be the bottleneck that threatens your peak trading season. If we treat migration as a business-led initiative, utilising risk-based validation, and insist on automated, repeatable processes, retail leaders can fundamentally change the trajectory of their ERP programmes for the better.

At binary10, we provide the methodology, the automation, and the real-world expertise to ensure your data works for you, not against you. [Link to contact]

Article by:

Chris Davis

Follow me on LinkedIn

If you are not someone who reads blogs, but love listening to a podcast instead!

Notebooklm has kindly made this podcast version for us.

LISTEN NOW:

Your next read.

Retail leaders: Fix 12 data migration mistakes before it’s too late (Part 3)

The Binary10 Way.

Our vision is to offer an excellent service to our clients, providing them with the strategies and technical services they need to deliver on their critical projects. Not only will we ensure that their data is managed to the highest standard we will also look to help and advise on other project areas to assist in their delivery.

Led by James and Steve, two industry veterans, the Binary10 team cares deeply about our clients and the projects we work on. We are passionate that we make a difference, which means that we do everything in our power to ensure projects are delivered on time, on budget and with the outcomes everyone expected.

We do this by merging deep insight in the field with the attitude and desire to work with the people that form the project teams. By focusing on the human element of data migration, not just the technical side we achieve successful projects and happy clients. We only win if you win!

Seamless data services start here

Trusted experts, proven process, reliable delivery.

Discuss your project

Retail leaders: Fix 12 data migration mistakes before it’s too late (Part 4)

Process & Architecture (The Engine Room)

MISTAKE #10:

Relying on a manual transformation process

The Problem:

Food for thought:

MISTAKE #11:

Writing mapping documents for machines instead of people

The Problem:

Food for thought:

MISTAKE #12:

Optimising ETL speed over the feedback loop

The Problem:

Food for thought:

In Closing…

Chris Davis

LISTEN NOW:

Your next read.

Retail leaders: Fix 12 data migration mistakes before it’s too late (Part 3)

The Binary10 Way.

Seamless data services start here

Subscribe to our newsletter.