Microsoft Dynamics CRM Data Migration Made Simple with Azure Data Factory

By Rashid Khan

Data migration projects can often seem straightforward at first glance. However, as many professionals know, migrating data from legacy or custom-built systems to modern enterprise platforms like Microsoft Dynamics CRM involves a host of challenges. These include dealing with differing data structures, transforming records to meet new schema requirements, and overcoming key mapping issues.

Recently, I undertook a data migration project that involved moving customer data from a custom-built application into Microsoft Dynamics CRM. This migration wasn’t limited to just one data source—we also had to integrate customer data stored in various Excel sheets into CRM. To accomplish this, we relied heavily on Azure Data Factory (ADF) pipelines and SQL stored procedures for data transformation and deduplication. Along the way, we faced and overcame a number of challenges, particularly around key mismatches and ensuring data consistency.

In this post, I will walk through the key steps we followed during the migration process, the architectural decisions we made, and how we tackled the challenges presented by mismatched foreign keys and duplicate customer records.

Understanding the Migration Architecture

The foundation of any successful migration lies in having a clear and well-defined architecture. Our project involved moving customer data from two key sources:

The Custom Application’s Database: This was a legacy system with its unique schema and data format.
External Excel Sheets: The client had customer data spread across multiple Excel files, each with its layout and structure.

To ensure a smooth migration into CRM, we adopted a multi-tiered architecture that involved:

Azure Blob Storage is used to handle and store Excel files.
A SQL Staging Database where we could clean, transform, and deduplicate the data before pushing it to CRM.
Azure Data Factory (ADF) Pipelines are used to automate the data movement and transformation processes.

Step 1: Extracting Data Using ADF Pipelines

Data extraction was the first step in the ETL (Extract, Transform, Load) process. This involved pulling data from both the custom application and the Excel files into the SQL staging database

1.1 Extracting Data from the Custom Application :

The custom application’s database held a wealth of customer data, but it had a schema that didn’t align with Microsoft Dynamics CRM’s structure. We began by creating linked services in Azure Data Factory to establish secure connections to the custom application’s database. From there, we designed ADF pipelines to extract this data and load it into our SQL staging database.

Copy Activity in ADF was used to move the data from the source database to the staging environment.
Incremental Loading: We built the pipeline in a way that allowed incremental loads to ensure that any new or updated records in the custom application would be captured without needing to reload all data each time.

1.2 Extracting Data from Excel Files in Azure Blob Storage

In addition to the custom application, we had to deal with customer data scattered across various Excel files. These files were stored in Azure Blob Storage to ensure secure, high-availability access. We utilized ADF’s Copy Data Activity to read the Excel files and move them into the SQL staging database. The steps involved:

Defining Blob Storage as a linked service within ADF.
Mapping the structure of the Excel files using datasets in ADF to ensure that the data was imported into the correct SQL staging tables.

Each Excel file had its schema, so mapping columns correctly to the SQL staging tables was a critical part of this process.

Step 2: Data Transformation in the SQL Staging Database

With the data from both sources in the SQL staging database, the next step was to transform it into a format that Microsoft Dynamics CRM could accept. CRM has a well-defined schema, which meant we had to carefully reshape the data to ensure compatibility.

2.1 Schema Transformation

We created a dedicated schema called trn (for “transformation”) within the SQL staging database. This schema served as a workspace where data from both the custom application and Excel files could be transformed according to CRM’s structure.

2.2 SQL Stored Procedures for Transformation

Data transformation was handled by a series of SQL stored procedures. These stored procedures were designed to:

Map fields from the custom application and Excel files to the corresponding CRM fields. This included mapping customer names, addresses, contact details, and more.
Perform data cleaning and normalization, such as reformatting dates, standardizing string values, and ensuring data consistency across fields.
Merge records where necessary, particularly when the same customer had multiple entries in different data sources.

By using stored procedures, we automated much of the transformation process, ensuring that every record was properly formatted for insertion into CRM.

Step 3: Key Mapping Challenges

One of the most significant challenges we faced during this migration was handling the foreign key relationships between the custom application and CRM. The custom application uses integer primary keys (PKs) to identify records, while CRM uses GUIDs (Globally Unique Identifiers).

3.1 The Issue: Mismatched Primary Keys

The primary key in the custom application was a simple integer. However, CRM requires GUIDs for its primary key structure, meaning that the relationships between entities (e.g., between a customer and their orders) would be lost if we didn’t properly map the integer keys to GUIDs

3.2 The Solution: Creating a Reference Table

To solve this, we introduced a reference table in the staging database. This table acted as a bridge between the custom application’s integer keys and CRM’s GUIDs. The reference table stored a mapping between the old primary keys (integers) from the custom application and the new GUIDs generated by CRM.

The reference table was populated during the data transformation process. Specifically:

We created new GUIDs for each record in the staging database during the transformation.
These GUIDs were stored alongside the original integer keys in the reference table.

3.3 Using XML Code in ADF Pipelines for Key Matching

In the final migration step, we used XML code within ADF pipelines to perform dynamic lookups against the reference table. This allowed us to retrieve the correct GUIDs from CRM and insert them into the transformed records. The XML code was crucial for matching the reference key to the primary key and lifting the GUID from the original CRM table.

This approach ensured that the foreign key relationships between entities were preserved during migration, despite the mismatch in key formats.

Step 4: Deduplication of Customer Records

Another critical component of the migration process was deduplication. Given the multiple data sources (the custom application and Excel files), there were bound to be duplicate customer records. Duplicate records could lead to confusion and errors in CRM, so we had to ensure that only unique customer records were retained.

This approach ensured that the foreign key relationships between entities were preserved during migration, despite the mismatch in key formats.

4.1 Deduplication Logic

We built deduplication logic directly into the SQL-stored procedures used for data transformation. This logic involved:

Matching records based on key identifiers such as full name, email address, mobile number, and date of birth.
Identifying potential duplicates by comparing these key fields across records.

4.2 Consolidating Duplicate Records

Once duplicates were identified, we applied survivorship rules to consolidate them. In cases where the duplicate records contained conflicting information (e.g., one record had a phone number while the other did not), we used rules to retain the most complete or recent data. This ensured that the resulting customer profile in CRM was accurate and up to date.

4.3 Reference Table for Old and New Key Mapping

As part of the deduplication process, we also updated our reference table to reflect old-to-new key mappings. This allowed us to ensure that the correct GUIDs were assigned to the deduplicated records, maintaining the integrity of the data relationships in CRM.

Step 5: Lifting and Shifting the Data to CRM

With the data transformed, cleaned, and deduplicated, we were ready to load it into Microsoft Dynamics CRM. This final step involved using ADF pipelines to lift the transformed data from the SQL staging database and shift it into CRM’s production environment.

5.1 ADF Pipelines for Data Migration

We designed a series of ADF pipelines to map the transformed data to CRM’s entities. The pipelines were responsible for:

Connecting to the SQL staging database and retrieving the transformed data.
Mapping the transformed records to the appropriate CRM tables and fields.
Ensuring that the foreign key relationships, as mapped in the reference table, were maintained during the insertion process.

5.2 Ensuring Minimal Downtime

One of the key concerns during this stage was minimizing downtime and ensuring that the data migration did not disrupt CRM operations. To achieve this, we scheduled the data migration to occur during non-peak hours, to avoid overwhelming the CRM system with a large volume of data at once.

Conclusion:

Migrating customer data from a custom-built application and external Excel files into Microsoft Dynamics CRM was a challenging but rewarding project. By leveraging Azure Data Factory, SQL stored procedures, and Azure Blob Storage, we were able to successfully extract, transform, and load the data into CRM while overcoming significant challenges around foreign key mismatches and duplicate records.

The use of a reference table for key mapping and deduplication logic in the transformation process ensured that the migrated data was accurate, complete, and fully aligned with CRM’s structure.

Will AI Ever Remove Humans-in-the-Loop

A developer’s guide to the Microsoft Agent Framework

AI Security Guardrails for Enterprise Leaders

As part of the deduplication process, we also updated our reference table to reflect old-to-new key mappings. This allowed us to ensure that the correct GUIDs were assigned to the deduplicated records, maintaining the integrity of the data relationships in CRM.

Let's Shape the
Future Together!

At ITKnocks, we are more than an IT consulting company; we’re your strategic partner in business evolution. With a global footprint and a passion for technology, we craft innovative solutions, ensuring your success. Join us on a journey of excellence, where collaboration meets cutting-edge IT expertise.

Microsoft Dynamics CRM Data Migration Made Simple with Azure Data Factory